Initial Commit

2025-12-03 16:38:10 +01:00
parent c5e26bf594
commit b732d8d4b5
17680 changed files with 5977495 additions and 2 deletions
--- a/database/perl/vendor/lib/HTML/Tree/AboutObjects.pod
+++ b/database/perl/vendor/lib/HTML/Tree/AboutObjects.pod
@@ -0,0 +1,686 @@
+
+#Time-stamp: "2001-02-23 20:07:25 MST" -*-Text-*-
+# This document contains text in Perl "POD" format.
+# Use a POD viewer like perldoc or perlman to render it.
+
+=head1 NAME
+
+HTML::Tree::AboutObjects -- article: "User's View of Object-Oriented Modules"
+
+=head1 SYNOPSIS
+
+  # This an article, not a module.
+
+=head1 DESCRIPTION
+
+The following article by Sean M. Burke first appeared in I<The Perl
+Journal> #17 and is copyright 2000 The Perl Journal. It appears
+courtesy of Jon Orwant and The Perl Journal.  This document may be
+distributed under the same terms as Perl itself.
+
+=head1 A User's View of Object-Oriented Modules
+
+-- Sean M. Burke
+
+The first time that most Perl programmers run into object-oriented
+programming when they need to use a module whose interface is
+object-oriented.  This is often a mystifying experience, since talk of
+"methods" and "constructors" is unintelligible to programmers who
+thought that functions and variables was all there was to worry about.
+
+Articles and books that explain object-oriented programming (OOP), do so
+in terms of how to program that way.  That's understandable, and if you
+learn to write object-oriented code of your own, you'd find it easy to
+use object-oriented code that others write.  But this approach is the
+I<long> way around for people whose immediate goal is just to use
+existing object-oriented modules, but who don't yet want to know all the
+gory details of having to write such modules for themselves.
+
+This article is for those programmers -- programmers who want to know
+about objects from the perspective of using object-oriented modules. 
+
+=head2 Modules and Their Functional Interfaces
+
+Modules are the main way that Perl provides for bundling up code for
+later use by yourself or others.  As I'm sure you can't help noticing
+from reading
+I<The Perl Journal>, CPAN (the Comprehensive Perl Archive
+Network) is the repository for modules (or groups of modules) that
+others have written, to do anything from composing music to accessing
+Web pages.  A good deal of those modules even come with every
+installation of Perl.
+
+One module that you may have used before, and which is fairly typical in
+its interface, is Text::Wrap.  It comes with Perl, so you don't even
+need to install it from CPAN.  You use it in a program of yours, by
+having your program code say early on:
+
+  use Text::Wrap;
+
+and after that, you can access a function called C<wrap>, which inserts
+line-breaks in text that you feed it, so that the text will be wrapped to
+seventy-two (or however many) columns.
+
+The way this C<use Text::Wrap> business works is that the module
+Text::Wrap exists as a file "Text/Wrap.pm" somewhere in one of your
+library directories.  That file contains Perl code...
+
+=over
+
+Footnote: And mixed in with the Perl code, there's documentation, which
+is what you read with "perldoc Text::Wrap".  The perldoc program simply
+ignores the code and formats the documentation text, whereas "use
+Text::Wrap" loads and runs the code while ignoring the documentation.
+
+=back
+
+...which, among other things, defines a function called C<Text::Wrap::wrap>,
+and then C<exports> that function, which means that when you say C<wrap>
+after having said "use Text::Wrap", you'll be actually calling the
+C<Text::Wrap::wrap> function.  Some modules don't export their
+functions, so you have to call them by their full name, like
+C<Text::Wrap::wrap(...parameters...)>.
+
+Regardless of whether the typical module exports the functions it
+provides, a module is basically just a container for chunks of code that
+do useful things.  The way the module allows for you to interact with
+it, is its I<interface>.  And when, like with Text::Wrap, its interface
+consists of functions, the module is said to have a B<functional
+interface>.
+
+=over
+
+Footnote: the term "function" (and therefore "functionI<al>") has
+various senses.  I'm using the term here in its broadest sense, to
+refer to routines -- bits of code that are called by some name and
+which take parameters and return some value.
+
+=back
+
+Using modules with functional interfaces is straightforward -- instead
+of defining your own "wrap" function with C<sub wrap { ... }>, you
+entrust "use Text::Wrap" to do that for you, along with whatever other
+functions its defines and exports, according to the module's
+documentation.  Without too much bother, you can even write your own
+modules to contain your frequently used functions; I suggest having a look at
+the C<perlmod> man page for more leads on doing this.
+
+=head2 Modules with Object-Oriented Interfaces
+
+So suppose that one day you want to write a program that will automate
+the process of C<ftp>ing a bunch of files from one server down to your
+local machine, and then off to another server.
+
+A quick browse through search.cpan.org turns up the module "Net::FTP",
+which you can download and install it using normal installation
+instructions (unless your sysadmin has already installed it, as many
+have).
+
+Like Text::Wrap or any other module with a familiarly functional
+interface, you start off using Net::FTP in your program by saying:
+
+  use Net::FTP;
+
+However, that's where the similarity ends.  The first hint of
+difference is that the documentation for Net::FTP refers to it as a
+B<class>.  A class is a kind of module, but one that has an
+object-oriented interface.
+
+Whereas modules like Text::Wrap
+provide bits of useful code as I<functions>, to be called like
+C<function(...parameters...)> or like
+C<PackageName::function(...parameters...)>, Net::FTP and other modules
+with object-oriented interfaces provide B<methods>.  Methods are sort of
+like functions in that they have a name and parameters; but methods
+look different, and are different, because you have to call them with a
+syntax that has a class name or an object as a special argument.  I'll
+explain the syntax for method calls, and then later explain what they
+all mean.
+
+Some methods are meant to be called as B<class methods>, with the class
+name (same as the module name) as a special argument.  Class methods
+look like this:
+
+  ClassName->methodname(parameter1, parameter2, ...)
+  ClassName->methodname()   # if no parameters
+  ClassName->methodname     # same as above
+
+which you will sometimes see written:
+
+  methodname ClassName (parameter1, parameter2, ...)
+  methodname ClassName      # if no parameters
+
+Basically all class methods are for making new objects, and methods that
+make objects are called "B<constructors>" (and the process of making them
+is called "constructing" or "instantiating").  Constructor methods
+typically have the name "new", or something including "new"
+("new_from_file", etc.); but they can conceivably be named
+anything -- DBI's constructor method is named "connect", for example.
+
+The object that a constructor method returns is
+typically captured in a scalar variable:
+
+  $object = ClassName->new(param1, param2...);
+
+Once you have an object (more later on exactly what that is), you can
+use the other kind of method call syntax, the syntax for B<object method>
+calls.  Calling object methods is just like class methods, except
+that instead of the ClassName as the special argument,
+you use an expression that yeilds an "object".  Usually this is
+just a scalar variable that you earlier captured the
+output of the constructor in.  Object method calls look like this:
+
+  $object->methodname(parameter1, parameter2, ...);
+  $object->methodname()   # if no parameters
+  $object->methodname     # same as above
+  
+which is occasionally written as:
+
+  methodname $object (parameter1, parameter2, ...)
+  methodname $object      # if no parameters
+
+Examples of method calls are:
+
+  my $session1 = Net::FTP->new("ftp.myhost.com");
+    # Calls a class method "new", from class Net::FTP,
+    #  with the single parameter "ftp.myhost.com",
+    #  and saves the return value (which is, as usual,
+    #  an object), in $session1.
+    # Could also be written:
+    #  new Net::FTP('ftp.myhost.com')
+  $session1->login("sburke","aoeuaoeu")
+    || die "failed to login!\n";
+     # calling the object method "login"
+  print "Dir:\n", $session1->dir(), "\n";
+  $session1->quit;
+    # same as $session1->quit()
+  print "Done\n";
+  exit;
+
+Incidentally, I suggest always using the syntaxes with parentheses and
+"->" in them,
+
+=over
+
+Footnote: the character-pair "->" is supposed to look like an
+arrow, not "negative greater-than"!
+
+=back
+
+and avoiding the syntaxes that start out "methodname $object" or
+"methodname ModuleName".  When everything's going right, they all mean
+the same thing as the "->" variants, but the syntax with "->" is more
+visually distinct from function calls, as well as being immune to some
+kinds of rare but puzzling ambiguities that can arise when you're trying
+to call methods that have the same name as subroutines you've defined.
+
+But, syntactic alternatives aside, all this talk of constructing objects
+and object methods begs the question -- what I<is> an object?  There are
+several angles to this question that the rest of this article will
+answer in turn: what can you do with objects?  what's in an object?
+what's an object value?  and why do some modules use objects at all? 
+
+=head2 What Can You Do with Objects?
+
+You've seen that you can make objects, and call object methods with
+them.  But what are object methods for?  The answer depends on the class:
+
+A Net::FTP object represents a session between your computer and an FTP
+server.  So the methods you call on a Net::FTP object are for doing
+whatever you'd need to do across an FTP connection.  You make the
+session and log in:
+
+  my $session = Net::FTP->new('ftp.aol.com');
+  die "Couldn't connect!" unless defined $session;
+    # The class method call to "new" will return
+    # the new object if it goes OK, otherwise it
+    # will return undef.
+    
+  $session->login('sburke', 'p@ssw3rD')
+   || die "Did I change my password again?";
+    # The object method "login" will give a true
+    # return value if actually logs in, otherwise
+    # it'll return false.
+    
+You can use the session object to change directory on that session:
+
+  $session->cwd("/home/sburke/public_html")
+     || die "Hey, that was REALLY supposed to work!";
+   # if the cwd fails, it'll return false
+
+...get files from the machine at the other end of the session...
+
+  foreach my $f ('log_report_ua.txt', 'log_report_dom.txt',
+                 'log_report_browsers.txt')
+  {
+    $session->get($f) || warn "Getting $f failed!"
+  };
+
+...and plenty else, ending finally with closing the connection:
+
+  $session->quit();
+
+In short, object methods are for doing things related to (or with)
+whatever the object represents.  For FTP sessions, it's about sending
+commands to the server at the other end of the connection, and that's
+about it -- there, methods are for doing something to the world outside
+the object, and the objects is just something that specifies what bit
+of the world (well, what FTP session) to act upon.
+
+With most other classes, however, the object itself stores some kind of
+information, and it typically makes no sense to do things with such an
+object without considering the data that's in the object.
+
+=head2 What's I<in> an Object?
+
+An object is (with rare exceptions) a data structure containing a
+bunch of attributes, each of which has a value, as well as a name
+that you use when you
+read or set the attribute's value.  Some of the object's attributes are
+private, meaning you'll never see them documented because they're not
+for you to read or write; but most of the object's documented attributes
+are at least readable, and usually writeable, by you.  Net::FTP objects
+are a bit thin on attributes, so we'll use objects from the class
+Business::US_Amort for this example.  Business::US_Amort is a very
+simple class (available from CPAN) that I wrote for making calculations
+to do with loans (specifically, amortization, using US-style
+algorithms).
+
+An object of the class Business::US_Amort represents a loan with
+particular parameters, i.e., attributes.  The most basic attributes of a
+"loan object" are its interest rate, its principal (how much money it's
+for), and it's term (how long it'll take to repay).  You need to set
+these attributes before anything else can be done with the object.  The
+way to get at those attributes for loan objects is just like the
+way to get at attributes for any class's objects: through accessors.
+An B<accessor> is simply any method that accesses (whether reading or
+writing, AKA getting or putting) some attribute in the given object.
+Moreover, accessors are the B<only> way that you can change
+an object's attributes.  (If a module's documentation wants you to
+know about any other way, it'll tell you.)
+
+Usually, for simplicity's sake, an accessor is named after the attribute
+it reads or writes.  With Business::US_Amort objects, the accessors you
+need to use first are C<principal>, C<interest_rate>, and C<term>.
+Then, with at least those attributes set, you can call the C<run> method
+to figure out several things about the loan.  Then you can call various
+accessors, like C<total_paid_toward_interest>, to read the results:
+
+  use Business::US_Amort;
+  my $loan = Business::US_Amort->new;
+  # Set the necessary attributes:
+  $loan->principal(123654);
+  $loan->interest_rate(9.25);
+  $loan->term(20); # twenty years
+
+  # NOW we know enough to calculate:
+  $loan->run;
+  
+  # And see what came of that:
+  print
+    "Total paid toward interest: A WHOPPING ",
+    $loan->total_paid_interest, "!!\n";
+
+This illustrates a convention that's common with accessors: calling the
+accessor with no arguments (as with $loan->total_paid_interest) usually
+means to read the value of that attribute, but providing a value (as
+with $loan->term(20)) means you want that attribute to be set to that
+value.  This stands to reason: why would you be providing a value, if
+not to set the attribute to that value?
+
+Although a loan's term, principal, and interest rates are all single
+numeric values, an objects values can any kind of scalar, or an array,
+or even a hash.  Moreover, an attribute's value(s) can be objects
+themselves.  For example, consider MIDI files (as I wrote about in
+TPJ#13): a MIDI file usually consists of several tracks.  A MIDI file is
+complex enough to merit being an object with attributes like its overall
+tempo, the file-format variant it's in, and the list of instrument
+tracks in the file.  But tracks themselves are complex enough to be
+objects too, with attributes like their track-type, a list of MIDI
+commands if they're a MIDI track, or raw data if they're not.  So I
+ended up writing the MIDI modules so that the "tracks" attribute of a
+MIDI::Opus object is an array of objects from the class MIDI::Track.
+This may seem like a runaround -- you ask what's in one object, and get
+I<another> object, or several!  But in this case, it exactly reflects
+what the module is for -- MIDI files contain MIDI tracks, which then
+contain data.
+
+=head2 What is an Object Value?
+
+When you call a constructor like Net::FTP->new(I<hostname>), you get
+back an object value, a value you can later use, in combination with a
+method name, to call object methods. 
+
+Now, so far we've been pretending, in the above examples, that the
+variables $session or $loan I<are> the objects you're dealing with.
+This idea is innocuous up to a point, but it's really a misconception
+that will, at best, limit you in what you know how to do.  The reality
+is not that the variables $session or $query are objects; it's a little
+more indirect -- they I<hold> values that symbolize objects.  The kind of
+value that $session or $query hold is what I'm calling an object value. 
+
+To understand what kind of value this is, first think about the other
+kinds of scalar values you know about: The first two scalar values you
+probably ever ran into in Perl are B<numbers> and B<strings>, which you
+learned (or just assumed) will usually turn into each other on demand;
+that is, the three-character string "2.5" can become the quantity two
+and a half, and vice versa.  Then, especially if you started using
+C<perl -w> early on, you learned about the B<undefined value>, which can
+turn into 0 if you treat it as a number, or the empty-string if you
+treat it as a string.
+
+=over
+
+Footnote: You may I<also> have been learning about references, in which
+case you're ready to hear that object values are just a kind of
+reference, except that they reflect the class that created thing they point
+to, instead of merely being a plain old array reference, hash reference,
+etc.  I<If> this makes makes sense to you, and you want to know more
+about how objects are implemented in Perl, have a look at the
+C<perltoot> man page. 
+
+=back
+
+And now you're learning about B<object values>.  An object value is a
+value that points to a data structure somewhere in memory, which is
+where all the attributes for this object are stored.  That data
+structure as a whole belongs to a class (probably the one you named in
+the constructor method, like ClassName->new), so that the object value
+can be used as part of object method calls. 
+
+If you want to actually I<see> what an object value is, you might try
+just saying "print $object".  That'll get you something like this:
+
+  Net::FTP=GLOB(0x20154240)
+
+or
+
+  Business::US_Amort=HASH(0x15424020)
+
+That's not very helpful if you wanted to really get at the object's
+insides, but that's because the object value is only a symbol for the
+object.  This may all sound very abstruse and metaphysical, so a
+real-world allegory might be very helpful:
+
+=over
+
+You get an advertisement in the mail saying that you have been
+(im)personally selected to have the rare privilege of applying for a
+credit card.  For whatever reason, I<this> offer sounds good to you, so you
+fill out the form and mail it back to the credit card company.  They
+gleefully approve the application and create your account, and send you
+a card with a number on it.
+
+Now, you can do things with the number on that card -- clerks at stores
+can ring up things you want to buy, and charge your account by keying in
+the number on the card.  You can pay for things you order online by
+punching in the card number as part of your online order.  You can pay
+off part of the account by sending the credit card people some of your
+money (well, a check) with some note (usually the pre-printed slip)
+that has the card number for the account you want to pay toward.  And you
+should be able to call the credit card company's computer and ask it
+things about the card, like its balance, its credit limit, its APR, and
+maybe an itemization of recent purchases ad payments.
+
+Now, what you're I<really> doing is manipulating a credit card
+I<account>, a completely abstract entity with some data attached to it
+(balance, APR, etc).  But for ease of access, you have a credit card
+I<number> that is a symbol for that account.  Now, that symbol is just a
+bunch of digits, and the number is effectively meaningless and useless
+in and of itself -- but in the appropriate context, it's understood to
+I<mean> the credit card account you're accessing.
+
+=back
+
+This is exactly the relationship between objects and object values, and
+from this analogy, several facts about object values are a bit more
+explicable:
+
+* An object value does nothing in and of itself, but it's useful when
+you use it in the context of an $object->method call, the same way that
+a card number is useful in the context of some operation dealing with a
+card account.
+
+Moreover, several copies of the same object value all refer to the same
+object, the same way that making several copies of your card number
+won't change the fact that they all still refer to the same single
+account (this is true whether you're "copying" the number by just
+writing it down on different slips of paper, or whether you go to the
+trouble of forging exact replicas of your own plastic credit card).  That's
+why this:
+
+  $x = Net::FTP->new("ftp.aol.com");
+  $x->login("sburke", "aoeuaoeu");
+
+does the same thing as this:
+
+  $x = Net::FTP->new("ftp.aol.com");
+  $y = $x;
+  $z = $y;
+  $z->login("sburke", "aoeuaoeu");
+
+That is, $z and $y and $x are three different I<slots> for values,
+but what's in those slots are all object values pointing to the same
+object -- you don't have three different FTP connections, just three
+variables with values pointing to the some single FTP connection.
+
+* You can't tell much of anything about the object just by looking at
+the object value, any more than you can see your credit account balance
+by holding the plastic card up to the light, or by adding up the digits
+in your credit card number.
+
+* You can't just make up your own object values and have them work --
+they can come only from constructor methods of the appropriate class.
+Similarly, you get a credit card number I<only> by having a bank approve
+your application for a credit card account -- at which point I<they>
+let I<you> know what the number of your new card is.
+
+Now, there's even more to the fact that you can't just make up your own
+object value: even though you can print an object value and get a string
+like "Net::FTP=GLOB(0x20154240)", that string is just a
+I<representation> of an object value.
+
+Internally, an object value has a basically different type from a
+string, or a number, or the undefined value -- if $x holds a real
+string, then that value's slot in memory says "this is a value of type
+I<string>, and its characters are...", whereas if it's an object value,
+the value's slot in memory says, "this is a value of type I<reference>,
+and the location in memory that it points to is..." (and by looking at
+what's at that location, Perl can tell the class of what's there). 
+
+Perl programmers typically don't have to think about all these details
+of Perl's internals.  Many other languages force you to be more
+conscious of the differences between all of these (and also between
+types of numbers, which are stored differently depending on their size
+and whether they have fractional parts).  But Perl does its best to
+hide the different types of scalars from you -- it turns numbers into
+strings and back as needed, and takes the string or number
+representation of undef or of object values as needed.  However, you
+can't go from a string representation of an object value, back to an
+object value.  And that's why this doesn't work:
+
+   $x = Net::FTP->new('ftp.aol.com');
+   $y = Net::FTP->new('ftp.netcom.com');
+   $z = Net::FTP->new('ftp.qualcomm.com');
+   $all = join(' ', $x,$y,$z);           # !!!
+  ...later...
+   ($aol, $netcom, $qualcomm) = split(' ', $all);  # !!!
+   $aol->login("sburke", "aoeuaoeu");
+   $netcom->login("sburke", "qjkxqjkx");
+   $qualcomm->login("smb", "dhtndhtn");
+
+This fails because $aol ends up holding merely the B<string representation>
+of the object value from $x, not the object value itself -- when
+C<join> tried to join the characters of the "strings" $x, $y, and $z,
+Perl saw that they weren't strings at all, so it gave C<join> their
+string representations.
+
+Unfortunately, this distinction between object values and their string
+representations doesn't really fit into the analogy of credit card
+numbers, because credit card numbers really I<are> numbers -- even
+thought they don't express any meaningful quantity, if you stored them
+in a database as a quantity (as opposed to just an ASCII string),
+that wouldn't stop them from being valid as credit card numbers.
+
+This may seem rather academic, but there's there's two common mistakes
+programmers new to objects often make, which make sense only in terms of
+the distinction between object values and their string representations:
+
+The first common error involves forgetting (or never having known in the
+first place) that when you go to use a value as a hash key, Perl uses
+the string representation of that value.  When you want to use the
+numeric value two and a half as a key, Perl turns it into the
+three-character string "2.5".  But if you then want to use that string
+as a number, Perl will treat it as meaning two and a half, so you're
+usually none the wiser that Perl converted the number to a string and
+back.  But recall that Perl can't turn strings back into objects -- so
+if you tried to use a Net::FTP object value as a hash key, Perl actually
+used its string representation, like "Net::FTP=GLOB(0x20154240)", but
+that string is unusable as an object value.  (Incidentally, there's
+a module Tie::RefHash that implements hashes that I<do> let you use
+real object-values as keys.)
+
+The second common error with object values is in
+trying to save an object value to disk (whether printing it to a
+file, or storing it in a conventional database file).  All you'll get is the
+string, which will be useless.
+
+When you want to save an object and restore it later, you may find that
+the object's class already provides a method specifically for this.  For
+example, MIDI::Opus provides methods for writing an object to disk as a
+standard MIDI file.  The file can later be read back into memory by
+a MIDI::Opus constructor method, which will return a new MIDI::Opus
+object representing whatever file you tell it to read into memory.
+Similar methods are available with, for example, classes that
+manipulate graphic images and can save them to files, which can be read
+back later.
+
+But some classes, like Business::US_Amort, provide no such methods for
+storing an object in a file.  When this is the case, you can try
+using any of the Data::Dumper, Storable, or FreezeThaw modules.  Using
+these will be unproblematic for objects of most classes, but it may run
+into limitations with others.  For example, a Business::US_Amort
+object can be turned into a string with Data::Dumper, and that string
+written to a file.  When it's restored later, its attributes will be
+accessible as normal.  But in the unlikely case that the loan object was
+saved in mid-calculation, the calculation may not be resumable.  This is
+because of the way that that I<particular> class does its calculations,
+but similar limitations may occur with objects from other classses.
+
+But often, even I<wanting> to save an object is basically wrong -- what would
+saving an ftp I<session> even mean?  Saving the hostname, username, and
+password?  current directory on both machines?  the local TCP/IP port
+number?  In the case of "saving" a Net::FTP object, you're better off
+just saving whatever details you actually need for your own purposes,
+so that you can make a new object later and just set those values for it.
+
+=head2 So Why Do Some Modules Use Objects?
+
+All these details of using objects are definitely enough to make you
+wonder -- is it worth the bother?  If you're a module author, writing
+your module with an object-oriented interface restricts the audience of
+potential users to those who understand the basic concepts of objects
+and object values, as well as Perl's syntax for calling methods.  Why
+complicate things by having an object-oriented interface?
+
+A somewhat esoteric answer is that a module has an object-oriented
+interface because the module's insides are written in an
+object-oriented style.  This article is about the basics of
+object-oriented I<interfaces>, and it'd be going far afield to explain
+what object-oriented I<design> is.  But the short story is that
+object-oriented design is just one way of attacking messy problems.
+It's a way that many programmers find very helpful (and which others
+happen to find to be far more of a hassle than it's worth,
+incidentally), and it just happens to show up for you, the module user,
+as merely the style of interface. 
+
+But a simpler answer is that a functional interface is sometimes a
+hindrance, because it limits the number of things you can do at once --
+limiting it, in fact, to one.  For many problems that some modules are
+meant to solve, doing without an object-oriented interface would be like
+wishing that Perl didn't use filehandles.  The ideas are rather simpler
+-- just imagine that Perl let you access files, but I<only> one at a
+time, with code like:
+
+  open("foo.txt") || die "Can't open foo.txt: $!";
+  while(readline) {
+    print $_ if /bar/;
+  }
+  close;
+
+That hypothetical kind of Perl would be simpler, by doing without
+filehandles.  But you'd be out of luck if you wanted to read from
+one file while reading from another, or read from two and print to a
+third.
+
+In the same way, a functional FTP module would be fine for just
+uploading files to one server at a time, but it wouldn't allow you to
+easily write programs that make need to use I<several> simultaneous
+sessions (like "look at server A and server B, and if A has a file
+called X.dat, then download it locally and then upload it to server B --
+except if B has a file called Y.dat, in which case do it the other way
+around").
+
+Some kinds of problems that modules solve just lend themselves to an
+object-oriented interface.  For those kinds of tasks, a functional
+interface would be more familiar, but less powerful.  Learning to use
+object-oriented modules' interfaces does require becoming comfortable
+with the concepts from this article.  But in the end it will allow you
+to use a broader range of modules and, with them, to write programs
+that can do more.
+
+B<[end body of article]>
+
+=head2 [Author Credit]
+
+Sean M. Burke has contributed several modules to CPAN, about half of
+them object-oriented.
+
+[The next section should be in a greybox:]
+
+=head2 The Gory Details
+
+For sake of clarity of explanation, I had to oversimplify some of the
+facts about objects.  Here's a few of the gorier details:
+
+* Every example I gave of a constructor was a class method.  But object
+methods can be constructors, too, if the class was written to work that
+way: $new = $old->copy, $node_y = $node_x->new_subnode, or the like.
+
+* I've given the impression that there's two kinds of methods: object
+methods and class methods.  In fact, the same method can be both,
+because it's not the kind of method it is, but the kind of calls it's
+written to accept -- calls that pass an object, or calls that pass a
+class-name.
+
+* The term "object value" isn't something you'll find used much anywhere
+else.  It's just my shorthand for what would properly be called an
+"object reference" or "reference to a blessed item".  In fact, people
+usually say "object" when they properly mean a reference to that object.
+
+* I mentioned creating objects with I<con>structors, but I didn't
+mention destroying them with I<de>structor -- a destructor is a kind of
+method that you call to tidy up the object once you're done with it, and
+want it to neatly go away (close connections, delete temporary files,
+free up memory, etc).  But because of the way Perl handles memory,
+most modules won't require the user to know about destructors.
+
+* I said that class method syntax has to have the class name, as in
+$session = B<Net::FTP>->new($host).  Actually, you can instead use any
+expression that returns a class name: $ftp_class = 'Net::FTP'; $session
+= B<$ftp_class>->new($host).  Moreover, instead of the method name for
+object- or class-method calls, you can use a scalar holding the method
+name: $foo->B<$method>($host).  But, in practice, these syntaxes are
+rarely useful.
+
+And finally, to learn about objects from the perspective of writing
+your own classes, see the C<perltoot> documentation,
+or Damian Conway's exhaustive and clear book I<Object Oriented Perl>
+(Manning Publications 1999, ISBN 1-884777-79-1).
+
+=head1 BACK
+
+Return to the L<HTML::Tree|HTML::Tree> docs.
+
+=cut
+
--- a/database/perl/vendor/lib/HTML/Tree/AboutTrees.pod
+++ b/database/perl/vendor/lib/HTML/Tree/AboutTrees.pod
--- a/database/perl/vendor/lib/HTML/Tree/Scanning.pod
+++ b/database/perl/vendor/lib/HTML/Tree/Scanning.pod
@@ -0,0 +1,714 @@
+
+#Time-stamp: "2001-03-10 23:19:11 MST" -*-Text-*-
+# This document contains text in Perl "POD" format.
+# Use a POD viewer like perldoc or perlman to render it.
+
+=head1 NAME
+
+HTML::Tree::Scanning -- article: "Scanning HTML"
+
+=head1 SYNOPSIS
+
+  # This an article, not a module.
+
+=head1 DESCRIPTION
+
+The following article by Sean M. Burke first appeared in I<The Perl
+Journal> #19 and is copyright 2000 The Perl Journal. It appears
+courtesy of Jon Orwant and The Perl Journal.  This document may be
+distributed under the same terms as Perl itself.
+
+(Note that this is discussed in chapters 6 through 10 of the
+book I<Perl and LWP> L<http://lwp.interglacial.com/> which
+was written after the following documentation, and which is
+available free online.)
+
+=head1 Scanning HTML
+
+-- Sean M. Burke
+
+In I<The Perl Journal> issue 17, Ken MacFarlane's article "Parsing
+HTML with HTML::Parser" describes how the HTML::Parser module scans
+HTML source as a stream of start-tags, end-tags, text, comments, etc.
+In TPJ #18, my "Trees" article kicked around the idea of tree-shaped
+data structures.  Now I'll try to tie it together, in a discussion of
+HTML trees.
+
+The CPAN module HTML::TreeBuilder takes the
+tags that HTML::Parser picks out, and builds a parse tree -- a
+tree-shaped network of objects...
+
+=over
+
+Footnote:
+And if you need a quick explanation of objects, see my TPJ17 article "A
+User's View of Object-Oriented Modules"; or go whole hog and get Damian
+Conway's excellent book I<Object-Oriented Perl>, from Manning
+Publications.
+
+=back
+
+...representing the structured content of the HTML document.  And once
+the document is parsed as a tree, you'll find the common tasks
+of extracting data from that HTML document/tree to be quite
+straightforward.
+
+=head2 HTML::Parser, HTML::TreeBuilder, and HTML::Element
+
+You use HTML::TreeBuilder to make a parse tree out of an HTML source
+file, by simply saying:
+
+  use HTML::TreeBuilder;
+  my $tree = HTML::TreeBuilder->new();
+  $tree->parse_file('foo.html');
+
+and then C<$tree> contains a parse tree built from the HTML source from
+the file "foo.html".  The way this parse tree is represented is with a
+network of objects -- C<$tree> is the root, an element with tag-name
+"html", and its children typically include a "head" and "body" element,
+and so on.  Elements in the tree are objects of the class
+HTML::Element.
+
+So, if you take this source:
+
+  <html><head><title>Doc 1</title></head>
+  <body>
+  Stuff <hr> 2000-08-17
+  </body></html>
+ 
+and feed it to HTML::TreeBuilder, it'll return a tree of objects that
+looks like this:
+
+               html
+             /      \
+         head        body
+        /          /   |  \
+     title    "Stuff"  hr  "2000-08-17"
+       |
+    "Doc 1"
+
+This is a pretty simple document, but if it were any more complex,
+it'd be a bit hard to draw in that style, since it's sprawl left and
+right.  The same tree can be represented a bit more easily sideways,
+with indenting:
+
+  . html
+     . head
+        . title
+           . "Doc 1"
+     . body
+        . "Stuff"
+        . hr
+        . "2000-08-17"
+
+Either way expresses the same structure.  In that structure, the root
+node is an object of the class HTML::Element
+
+=over
+
+Footnote:
+Well actually, the root is of the class HTML::TreeBuilder, but that's
+just a subclass of HTML::Element, plus the few extra methods like
+C<parse_file> that elaborate the tree
+
+=back
+
+, with the tag name "html", and with two children: an HTML::Element
+object whose tag names are "head" and "body".  And each of those
+elements have children, and so on down.  Not all elements (as we'll
+call the objects of class HTML::Element) have children -- the "hr"
+element doesn't.  And note all nodes in the tree are elements -- the
+text nodes ("Doc 1", "Stuff", and "2000-08-17") are just strings.
+
+Objects of the class HTML::Element each have three noteworthy attributes:
+
+=over
+
+=item C<_tag> -- (best accessed as C<$e-E<gt>tag>)
+this element's tag-name, lowercased (e.g., "em" for an "em" element).
+
+=over
+
+Footnote: Yes, this is misnamed.  In proper SGML terminology, this is
+instead called a "GI", short for "generic identifier"; and the term
+"tag" is used for a token of SGML source that represents either
+the start of an element (a start-tag like "<em lang='fr'>") or the end
+of an element (an end-tag like "</em>".  However, since more people
+claim to have been abducted by aliens than to have ever seen the
+SGML standard, and since both encounters typically involve a feeling of
+"missing time", it's not surprising that the terminology of the SGML
+standard is not closely followed.
+
+=back
+
+=item C<_parent> -- (best accessed as C<$e-E<gt>parent>)
+the element that is C<$obj>'s parent, or undef if this element is the
+root of its tree.
+
+=item C<_content> -- (best accessed as C<$e-E<gt>content_list>)
+the list of nodes (i.e., elements or text segments) that are C<$e>'s
+children.
+
+=back
+
+Moreover, if an element object has any attributes in the SGML sense of
+the word, then those are readable as C<$e-E<gt>attr('name')> -- for
+example, with the object built from having parsed "E<lt>a
+B<id='foo'>E<gt>barE<lt>/aE<gt>", C<$e-E<gt>attr('id')> will return
+the string "foo".  Moreover, C<$e-E<gt>tag> on that object returns the
+string "a", C<$e-E<gt>content_list> returns a list consisting of just
+the single scalar "bar", and C<$e-E<gt>parent> returns the object
+that's this node's parent -- which may be, for example, a "p" element.
+
+And that's all that there is to it -- you throw HTML
+source at TreeBuilder, and it returns a tree built of HTML::Element
+objects and some text strings.
+
+However, what do you I<do> with a tree of objects?  People code
+information into HTML trees not for the fun of arranging elements, but
+to represent the structure of specific text and images -- some text is
+in this "li" element, some other text is in that heading, some
+images are in that other table cell that has those attributes, and so on.
+
+Now, it may happen that you're rendering that whole HTML tree into some
+layout format.  Or you could be trying to make some systematic change to
+the HTML tree before dumping it out as HTML source again.  But, in my
+experience, by far the most common programming task that Perl
+programmers face with HTML is in trying to extract some piece
+of information from a larger document.  Since that's so common (and
+also since it involves concepts that are basic to more complex tasks),
+that is what the rest of this article will be about.
+
+=head2 Scanning HTML trees
+
+Suppose you have a thousand HTML documents, each of them a press
+release.  They all start out:
+
+  [...lots of leading images and junk...]
+  <h1>ConGlomCo to Open New Corporate Office in Ougadougou</h1>
+  BAKERSFIELD, CA, 2000-04-24 -- ConGlomCo's vice president in charge
+  of world conquest, Rock Feldspar, announced today the opening of a
+  new office in Ougadougou, the capital city of Burkino Faso, gateway
+  to the bustling "Silicon Sahara" of Africa...
+  [...etc...]
+
+...and what you've got to do is, for each document, copy whatever text
+is in the "h1" element, so that you can, for example, make a table of
+contents of it.  Now, there are three ways to do this:
+
+=over
+
+=item * You can just use a regexp to scan the file for a text pattern.
+
+For many very simple tasks, this will do fine.  Many HTML documents are,
+in practice, very consistently formatted as far as placement of
+linebreaks and whitespace, so you could just get away with scanning the
+file like so:
+
+  sub get_heading {
+    my $filename = $_[0];
+    local *HTML;
+    open(HTML, $filename)
+      or die "Couldn't open $filename);
+    my $heading;
+   Line:
+    while(<HTML>) {
+      if( m{<h1>(.*?)</h1>}i ) {  # match it!
+        $heading = $1;
+        last Line;
+      }
+    }
+    close(HTML);
+    warn "No heading in $filename?"
+     unless defined $heading;
+    return $heading;
+  }
+
+This is quick and fast, but awfully fragile -- if there's a newline in
+the middle of a heading's text, it won't match the above regexp, and
+you'll get an error.  The regexp will also fail if the "h1" element's
+start-tag has any attributes.  If you have to adapt your code to fit
+more kinds of start-tags, you'll end up basically reinventing part of
+HTML::Parser, at which point you should probably just stop, and use
+HTML::Parser itself:
+
+=item * You can use HTML::Parser to scan the file for an "h1" start-tag
+token, then capture all the text tokens until the "h1" close-tag.  This
+approach is extensively covered in the Ken MacFarlane's TPJ17 article
+"Parsing HTML with HTML::Parser".  (A variant of this approach is to use
+HTML::TokeParser, which presents a different and rather handier
+interface to the tokens that HTML::Parser picks out.)
+
+Using HTML::Parser is less fragile than our first approach, since it's
+not sensitive to the exact internal formatting of the start-tag (much
+less whether it's split across two lines).  However, when you need more
+information about the context of the "h1" element, or if you're having
+to deal with any of the tricky bits of HTML, such as parsing of tables,
+you'll find out the flat list of tokens that HTML::Parser returns
+isn't immediately useful.  To get something useful out of those tokens,
+you'll need to write code that knows some things about what elements
+take no content (as with "hr" elements), and that a "</p>" end-tags
+are omissible, so a "<p>" will end any currently
+open paragraph -- and you're well on your way to pointlessly
+reinventing much of the code in HTML::TreeBuilder
+
+=over
+
+Footnote:
+And, as the person who last rewrote that module, I can attest that it
+wasn't terribly easy to get right!  Never underestimate the perversity
+of people coding HTML.
+
+=back
+
+, at which point you should probably just stop, and use
+HTML::TreeBuilder itself:
+
+=item * You can use HTML::Treebuilder, and scan the tree of element
+objects that you get back.
+
+=back
+
+The last approach, using HTML::TreeBuilder, is the diametric opposite of
+first approach:  The first approach involves just elementary Perl and one
+regexp, whereas the TreeBuilder approach involves being at home with
+the concept of tree-shaped data structures and modules with
+object-oriented interfaces, as well as with the particular interfaces
+that HTML::TreeBuilder and HTML::Element provide.
+
+However, what the TreeBuilder approach has going for it is that it's
+the most robust, because it involves dealing with HTML in its "native"
+format -- it deals with the tree structure that HTML code represents,
+without any consideration of how the source is coded and with what
+tags omitted.
+
+So, to extract the text from the "h1" elements of an HTML document:
+
+  sub get_heading {
+    my $tree = HTML::TreeBuilder->new;
+    $tree->parse_file($_[0]);   # !
+    my $heading;
+    my $h1 = $tree->look_down('_tag', 'h1');  # !
+    if($h1) {
+      $heading = $h1->as_text;   # !
+    } else {
+      warn "No heading in $_[0]?";
+    }
+    $tree->delete; # clear memory!
+    return $heading;
+  }
+
+This uses some unfamiliar methods that need explaining.  The
+C<parse_file> method that we've seen before, builds a tree based on
+source from the file given.  The C<delete> method is for marking a
+tree's contents as available for garbage collection, when you're done
+with the tree.  The C<as_text> method returns a string that contains
+all the text bits that are children (or otherwise descendants) of the
+given node -- to get the text content of the C<$h1> object, we could
+just say:
+
+  $heading = join '', $h1->content_list;
+
+but that will work only if we're sure that the "h1" element's children
+will be only text bits -- if the document contained:
+
+  <h1>Local Man Sees <cite>Blade</cite> Again</h1>
+
+then the sub-tree would be:
+
+  . h1
+    . "Local Man Sees "
+    . cite
+      . "Blade"
+    . " Again'
+
+so C<join '', $h1-E<gt>content_list> will be something like:
+
+  Local Man Sees HTML::Element=HASH(0x15424040) Again
+
+whereas C<$h1-E<gt>as_text> would yield:
+
+  Local Man Sees Blade Again
+
+and depending on what you're doing with the heading text, you might
+want the C<as_HTML> method instead.  It returns the (sub)tree
+represented as HTML source.  C<$h1-E<gt>as_HTML> would yield:
+
+  <h1>Local Man Sees <cite>Blade</cite> Again</h1>
+
+However, if you wanted the contents of C<$h1> as HTML, but not the
+C<$h1> itself, you could say:
+
+  join '',
+    map(
+      ref($_) ? $_->as_HTML : $_,
+      $h1->content_list
+    )
+
+This C<map> iterates over the nodes in C<$h1>'s list of children; and
+for each node that's just a text bit (as "Local Man Sees " is), it just
+passes through that string value, and for each node that's an actual
+object (causing C<ref> to be true), C<as_HTML> will used instead of the
+string value of the object itself (which would be something quite
+useless, as most object values are).  So that C<as_HTML> for the "cite"
+element will be the string "<cite>BladeE<lt>/cite>".  And then,
+finally, C<join> just puts into one string all the strings that the
+C<map> returns.
+
+Last but not least, the most important method in our C<get_heading> sub
+is the C<look_down> method.  This method looks down at the subtree
+starting at the given object (C<$h1>), looking for elements that meet
+criteria you provide.
+
+The criteria are specified in the method's argument list.  Each
+criterion can consist of two scalars, a key and a value, which express
+that you want elements that have that attribute (like "_tag", or
+"src") with the given value ("h1"); or the criterion can be a
+reference to a subroutine that, when called on the given element,
+returns true if that is a node you're looking for.  If you specify
+several criteria, then that's taken to mean that you want all the
+elements that each satisfy I<all> the criteria.  (In other words,
+there's an "implicit AND".)
+
+And finally, there's a bit of an optimization -- if you call the
+C<look_down> method in a scalar context, you get just the I<first> node
+(or undef if none) -- and, in fact, once C<look_down> finds that first
+matching element, it doesn't bother looking any further.
+
+So the example:
+
+  $h1 = $tree->look_down('_tag', 'h1');
+
+returns the first element at-or-under C<$tree> whose C<"_tag">
+attribute has the value C<"h1">.
+
+=head2 Complex Criteria in Tree Scanning
+
+Now, the above C<look_down> code looks like a lot of bother, with
+barely more benefit than just grepping the file!  But consider if your
+criteria were more complicated -- suppose you found that some of the
+press releases that you were scanning had several "h1" elements,
+possibly before or after the one you actually want.  For example:
+
+  <h1><center>Visit Our Corporate Partner
+   <br><a href="/dyna/clickthru"
+     ><img src="/dyna/vend_ad"></a>
+  </center></h1>
+  <h1><center>ConGlomCo President Schreck to Visit Regional HQ
+   <br><a href="/photos/Schreck_visit_large.jpg"
+     ><img src="/photos/Schreck_visit.jpg"></a>
+  </center></h1>
+
+Here, you want to ignore the first "h1" element because it contains an
+ad, and you want the text from the second "h1".  The problem is in
+formalizing the way you know that it's an ad.  Since ad banners are
+always entreating you to "visit" the sponsoring site, you could exclude
+"h1" elements that contain the word "visit" under them:
+
+  my $real_h1 = $tree->look_down(
+    '_tag', 'h1',
+    sub {
+      $_[0]->as_text !~ m/\bvisit/i
+    }
+  );
+
+The first criterion looks for "h1" elements, and the second criterion
+limits those to only the ones whose text content doesn't match
+C<m/\bvisit/>.  But unfortunately, that won't work for our example,
+since the second "h1" mentions "ConGlomCo President Schreck to
+I<Visit> Regional HQ".
+
+Instead you could try looking for the first "h1" element that
+doesn't contain an image:
+
+  my $real_h1 = $tree->look_down(
+    '_tag', 'h1',
+    sub {
+      not $_[0]->look_down('_tag', 'img')
+    }
+  );
+
+This criterion sub might seem a bit odd, since it calls C<look_down>
+as part of a larger C<look_down> operation, but that's fine.  Note that
+when considered as a boolean value, a C<look_down> in a scalar context
+value returns false (specifically, undef) if there's no matching element
+at or under the given element; and it returns the first matching
+element (which, being a reference and object, is always a true value),
+if any matches.  So, here, 
+
+  sub {
+    not $_[0]->look_down('_tag', 'img')
+  }
+
+means "return true only if this element has no 'img' element as
+descendants (and isn't an 'img' element itself)."
+
+This correctly filters out the first "h1" that contains the ad, but it
+also incorrectly filters out the second "h1" that contains a
+non-advertisement photo besides the headline text you want.
+
+There clearly are detectable differences between the first and second
+"h1" elements -- the only second one contains the string "Schreck", and
+we could just test for that:
+
+  my $real_h1 = $tree->look_down(
+    '_tag', 'h1',
+    sub {
+      $_[0]->as_text =~ m{Schreck}
+    }
+  );
+
+And that works fine for this one example, but unless all thousand of
+your press releases have "Schreck" in the headline, that's just not a
+general solution.  However, if all the ads-in-"h1"s that you want to
+exclude involve a link whose URL involves "/dyna/", then you can use
+that:
+
+  my $real_h1 = $tree->look_down(
+    '_tag', 'h1',
+    sub {
+      my $link = $_[0]->look_down('_tag','a');
+      return 1 unless $link;
+        # no link means it's fine
+      return 0 if $link->attr('href') =~ m{/dyna/};
+        # a link to there is bad
+      return 1; # otherwise okay
+    }
+  );
+
+Or you can look at it another way and say that you want the first "h1"
+element that either contains no images, or else whose image has a "src"
+attribute whose value contains "/photos/":
+
+  my $real_h1 = $tree->look_down(
+    '_tag', 'h1',
+    sub {
+      my $img = $_[0]->look_down('_tag','img');
+      return 1 unless $img;
+        # no image means it's fine
+      return 1 if $img->attr('src') =~ m{/photos/};
+        # good if a photo
+      return 0; # otherwise bad
+    }
+  );
+
+Recall that this use of C<look_down> in a scalar context means to return
+the first element at or under C<$tree> that matches all the criteria.
+But if you notice that you can formulate criteria that'll match several
+possible "h1" elements, some of which may be bogus but the I<last> one
+of which is always the one you want, then you can use C<look_down> in a
+list context, and just use the last element of that list:
+
+  my @h1s = $tree->look_down(
+    '_tag', 'h1',
+    ...maybe more criteria...
+  );
+  die "What, no h1s here?" unless @h1s;
+  my $real_h1 = $h1s[-1]; # last or only
+
+=head2 A Case Study: Scanning Yahoo News's HTML
+
+The above (somewhat contrived) case involves extracting data from a
+bunch of pre-existing HTML files.  In that sort of situation, if your
+code works for all the files, then you know that the code I<works> --
+since the data it's meant to handle won't go changing or growing; and,
+typically, once you've used the program, you'll never need to use it
+again.
+
+The other kind of situation faced in many data extraction tasks is
+where the program is used recurringly to handle new data -- such as
+from ever-changing Web pages.  As a real-world example of this,
+consider a program that you could use (suppose it's crontabbed) to
+extract headline-links from subsections of Yahoo News
+(C<http://dailynews.yahoo.com/>).
+
+Yahoo News has several subsections:
+
+=over
+
+=item http://dailynews.yahoo.com/h/tc/ for technology news
+
+=item http://dailynews.yahoo.com/h/sc/ for science news
+
+=item http://dailynews.yahoo.com/h/hl/ for health news
+
+=item http://dailynews.yahoo.com/h/wl/ for world news
+
+=item http://dailynews.yahoo.com/h/en/ for entertainment news
+
+=back
+
+and others.  All of them are built on the same basic HTML template --
+and a scarily complicated template it is, especially when you look at
+it with an eye toward making up rules that will select where the real
+headline-links are, while screening out all the links to other parts of
+Yahoo, other news services, etc.  You will need to puzzle
+over the HTML source, and scrutinize the output of
+C<$tree-E<gt>dump> on the parse tree of that HTML.
+
+Sometimes the only way to pin down what you're after is by position in
+the tree. For example, headlines of interest may be in the third
+column of the second row of the second table element in a page:
+
+  my $table = ( $tree->look_down('_tag','table') )[1];
+  my $row2  = ( $table->look_down('_tag', 'tr' ) )[1];
+  my $col3  = ( $row2->look-down('_tag', 'td')   )[2];
+  ...then do things with $col3...
+
+Or they may be all the links in a "p" element that has at least three
+"br" elements as children:
+
+  my $p = $tree->look_down(
+    '_tag', 'p',
+    sub {
+      2 < grep { ref($_) and $_->tag eq 'br' }
+               $_[0]->content_list
+    }
+  );
+  @links = $p->look_down('_tag', 'a');
+
+But almost always, you can get away with looking for properties of the
+of the thing itself, rather than just looking for contexts.  Now, if
+you're lucky, the document you're looking through has clear semantic
+tagging, such is as useful in CSS -- note the
+class="headlinelink" bit here:
+
+  <a href="...long_news_url..." class="headlinelink">Elvis
+  seen in tortilla</a>
+
+If you find anything like that, you could leap right in and select
+links with:
+
+  @links = $tree->look_down('class','headlinelink');
+
+Regrettably, your chances of seeing any sort of semantic markup
+principles really being followed with actual HTML are pretty thin.
+
+=over
+
+Footnote:
+In fact, your chances of finding a page that is simply free of HTML
+errors are even thinner.  And surprisingly, sites like Amazon or Yahoo
+are typically worse as far as quality of code than personal sites
+whose entire production cycle involves simply being saved and uploaded
+from Netscape Composer.
+
+=back
+
+The code may be sort of "accidentally semantic", however -- for example,
+in a set of pages I was scanning recently, I found that looking for
+"td" elements with a "width" attribute value of "375" got me exactly
+what I wanted.  No-one designing that page ever conceived of
+"width=375" as I<meaning> "this is a headline", but if you impute it
+to mean that, it works.
+
+An approach like this happens to work for the Yahoo News code, because
+the headline-links are distinguished by the fact that they (and they
+alone) contain a "b" element:
+
+  <a href="...long_news_url..."><b>Elvis seen in tortilla</b></a>
+
+or, diagrammed as a part of the parse tree:
+
+  . a  [href="...long_news_url..."]
+    . b
+      . "Elvis seen in tortilla"
+
+A rule that matches these can be formalized as "look for any 'a'
+element that has only one daughter node, which must be a 'b' element".
+And this is what it looks like when cooked up as a C<look_down>
+expression and prefaced with a bit of code that retrieves the text of
+the given Yahoo News page and feeds it to TreeBuilder:
+
+  use strict;
+  use HTML::TreeBuilder 2.97;
+  use LWP::UserAgent;
+  sub get_headlines {
+    my $url = $_[0] || die "What URL?";
+    
+    my $response = LWP::UserAgent->new->request(
+      HTTP::Request->new( GET => $url )
+    );
+    unless($response->is_success) {
+      warn "Couldn't get $url: ", $response->status_line, "\n";
+      return;
+    }
+    
+    my $tree = HTML::TreeBuilder->new();
+    $tree->parse($response->content);
+    $tree->eof;
+    
+    my @out;
+    foreach my $link (
+      $tree->look_down(   # !
+        '_tag', 'a',
+        sub {
+          return unless $_[0]->attr('href');
+          my @c = $_[0]->content_list;
+          @c == 1 and ref $c[0] and $c[0]->tag eq 'b';
+        }
+      )
+    ) {
+      push @out, [ $link->attr('href'), $link->as_text ];
+    }
+    
+    warn "Odd, fewer than 6 stories in $url!" if @out < 6;
+    $tree->delete;
+    return @out;
+  }
+
+...and add a bit of code to actually call that routine and display the
+results...
+
+  foreach my $section (qw[tc sc hl wl en]) {
+    my @links = get_headlines(
+      "http://dailynews.yahoo.com/h/$section/"
+    );
+    print
+      $section, ": ", scalar(@links), " stories\n",
+      map(("  ", $_->[0], " : ", $_->[1], "\n"), @links),
+      "\n";
+  }
+
+And we've got our own headline-extractor service!  This in and of
+itself isn't no amazingly useful (since if you want to see the
+headlines, you I<can> just look at the Yahoo News pages), but it could
+easily be the basis for quite useful features like filtering the
+headlines for matching certain keywords of interest to you.
+
+Now, one of these days, Yahoo News will decide to change its HTML
+template.  When this happens, this will appear to the above program as
+there being no links that meet the given criteria; or, less likely,
+dozens of erroneous links will meet the criteria.  In either case, the
+criteria will have to be changed for the new template; they may just
+need adjustment, or you may need to scrap them and start over.
+
+=head2 I<Regardez, duvet!>
+
+It's often quite a challenge to write criteria to match the desired
+parts of an HTML parse tree.  Very often you I<can> pull it off with a
+simple C<$tree-E<gt>look_down('_tag', 'h1')>, but sometimes you do
+have to keep adding and refining criteria, until you might end up with
+complex filters like what I've shown in this article.  The
+benefit to learning how to deal with HTML parse trees is that one main
+search tool, the C<look_down> method, can do most of the work, making
+simple things easy, while still making hard things possible.
+
+B<[end body of article]>
+
+=head2 [Author Credit]
+
+Sean M. Burke (C<sburke@cpan.org>) is the current maintainer of
+C<HTML::TreeBuilder> and C<HTML::Element>, both originally by
+Gisle Aas.
+
+Sean adds: "I'd like to thank the folks who listened to me ramble
+incessantly about HTML::TreeBuilder and HTML::Element at this year's Yet
+Another Perl Conference and O'Reilly Open Source Software Convention."
+
+=head1 BACK
+
+Return to the L<HTML::Tree|HTML::Tree> docs.
+
+=cut
+