Provided by: swish++_6.1.5-2.2_amd64 bug

WWW(3)                               Library Functions Manual                              WWW(3)

NAME

       WWW - World Wide Web Package

SYNOPSIS

       extract_description( FILE )
       extract_meta( FILE, NAME )
       hyperlink( LIST )

DESCRIPTION

       This  package  provides a utility functions for the World Wide Web to extract descriptions
       of or meta information from files, and hyperlink text.

SUBROUTINES

       The following Perl subroutines are defined and available:

       extract_description( FILE )
              Extracts a description from an HTML or plain text file given by the FILE name; FILE
              should  be  an  absolute  path.   The  first  $description::chars  (default:  2048)
              characters are read.  If the file ends in one  of  the  extensions  htm,  html,  or
              shtml,  it  is presumed to be an HTML file; if the file ends in txt, it is presumed
              to be a plain text file.  Other extensions are not recognized and no description is
              returned for them.

              For  HTML  files,  first,  if  a <META NAME="description" CONTENT="..."> or a <META
              NAME="DC.description" CONTENT="..."> (Dublin Core) element is found, then the words
              specified as the value of the CONTENT attribute is returned as the description.

              Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and <TITLE> tags, and
              all other HTML tags are stripped.  If <AREA ... ALT="..."> or <IMG  ...  ALT="...">
              elements are found, then the words specified as the value of the ALT attributes are
              extracted.

              Finally, for either HTML or plain text files, at most $description::words (default:
              50) are returned.

       extract_meta( FILE, NAME )
              Extracts  the  value  of the CONTENT attribute from a META element having the given
              NAME attribute from an HTML file given by the FILE name; FILE should be an absolute
              path.   The  file  must  end  in  one  of  the extensions htm, html, or shtml to be
              considered an HTML file.  The first $description::chars (default: 2048)  characters
              are  read.   The  characters  are  cached  between consecutive calls using the same
              filename.

       hyperlink( LIST )
              Adds hyperlinks to strings: that is strings that contain substrings that are  valid
              URLs (according to RFC 1630) have the appropriate HTML tags ``wrapped'' around them
              so that they will be selectable when displayed in  a  browser.   The  ftp,  gopher,
              http, https, mailto, news, telnet, and wais URLs are recognized.  Example:

                 Read all about it at
                 http://www.usatoday.com/

            becomes:

                 Read all about it at
                 <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>

SEE ALSO

       perl(1)

       Tim  Berners-Lee.   ``Universal  Resource Identifiers in WWW,'' Request for Comments 1630,
       Network Working Group of the Internet Engineering Task Force, June 1994.

       Tim Berners-Lee, Larry Masinter, and Mark McCahill.  ``Uniform Resource Locators  (URL),''
       Request for Comments 1738, Network Working Group, 1994.

       Dave  Raggett,  Arnaud  Le  Hors, and Ian Jacobs.  ``Notes on helping search engines index
       your Web site,'' HTML 4.0 Specification,  Appendix  B:  Performance,  Implementation,  and
       Design Notes, World Wide Web Consortium, April 1998.

       --.    ``Objects,  Images,  and  Applets:  How  to  specify  alternate  text,''  HTML  4.0
       Specification, ยง13.8, World Wide Web Consortium, April 1998.

       Dublin Core Directorate.  ``The Dublin  Core:  A  Simple  Content  Description  Model  for
       Electronic Resources.''

       Larry  Wall,  et  al.  Programming Perl, 3rd ed., O'Reilly & Associates, Inc., Sebastopol,
       CA, 2000.

AUTHOR

       Paul J. Lucas <pauljlucas@mac.com>

WWW                                     February 12, 2000                                  WWW(3)