bionic (3) WWW.3.gz

Provided by: swish++_6.1.5-5_amd64 bug

WWW(3)                                      Library Functions Manual                                      WWW(3)

NAME

       WWW - World Wide Web Package

SYNOPSIS

       extract_description( FILE )
       extract_meta( FILE, NAME )
       hyperlink( LIST )

DESCRIPTION

       This  package  provides  a  utility  functions  for the World Wide Web to extract descriptions of or meta
       information from files, and hyperlink text.

SUBROUTINES

       The following Perl subroutines are defined and available:

       extract_description( FILE )
              Extracts a description from an HTML or plain text file given by the FILE name; FILE should  be  an
              absolute  path.   The  first $description::chars (default: 2048) characters are read.  If the file
              ends in one of the extensions htm, html, or shtml, it is presumed to be an HTML file; if the  file
              ends  in  txt, it is presumed to be a plain text file.  Other extensions are not recognized and no
              description is returned for them.

              For  HTML  files,   first,   if   a   <META   NAME="description"   CONTENT="...">   or   a   <META
              NAME="DC.description"  CONTENT="...">  (Dublin Core) element is found, then the words specified as
              the value of the CONTENT attribute is returned as the description.

              Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and <TITLE> tags, and all other HTML
              tags  are  stripped.   If <AREA ... ALT="..."> or <IMG ... ALT="..."> elements are found, then the
              words specified as the value of the ALT attributes are extracted.

              Finally, for either HTML or plain text  files,  at  most  $description::words  (default:  50)  are
              returned.

       extract_meta( FILE, NAME )
              Extracts  the  value  of the CONTENT attribute from a META element having the given NAME attribute
              from an HTML file given by the FILE name; FILE should be an absolute path.  The file must  end  in
              one  of  the  extensions  htm,  html,  or  shtml  to  be  considered  an  HTML  file.   The  first
              $description::chars (default: 2048) characters  are  read.   The  characters  are  cached  between
              consecutive calls using the same filename.

       hyperlink( LIST )
              Adds hyperlinks to strings: that is strings that contain substrings that are valid URLs (according
              to RFC 1630) have the appropriate  HTML  tags  ``wrapped''  around  them  so  that  they  will  be
              selectable  when  displayed in a browser.  The ftp, gopher, http, https, mailto, news, telnet, and
              wais URLs are recognized.  Example:

                 Read all about it at
                 http://www.usatoday.com/

            becomes:

                 Read all about it at
                 <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>

SEE ALSO

       perl(1)

       Tim Berners-Lee.  ``Universal Resource Identifiers in WWW,'' Request for Comments 1630,  Network  Working
       Group of the Internet Engineering Task Force, June 1994.

       Tim  Berners-Lee,  Larry  Masinter,  and Mark McCahill.  ``Uniform Resource Locators (URL),'' Request for
       Comments 1738, Network Working Group, 1994.

       Dave Raggett, Arnaud Le Hors, and Ian Jacobs.  ``Notes on helping search engines index your  Web  site,''
       HTML  4.0  Specification,  Appendix  B:  Performance,  Implementation,  and  Design Notes, World Wide Web
       Consortium, April 1998.

       --.  ``Objects, Images, and Applets: How to specify alternate  text,''  HTML  4.0  Specification,  §13.8,
       World Wide Web Consortium, April 1998.

       Dublin  Core  Directorate.   ``The  Dublin  Core:  A  Simple  Content  Description  Model  for Electronic
       Resources.''

       Larry Wall, et al.  Programming Perl, 3rd ed., O'Reilly & Associates, Inc., Sebastopol, CA, 2000.

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

WWW                                             February 12, 2000                                         WWW(3)