Provided by: libgo-perl_0.15-10_all bug

NAME

         GO::Parser     - parses all GO files formats and types

SYNOPSIS

       fetch GO::Model::Graph objects using a parser:

         # Scenario 1: Getting objects from a file
         use GO::Parser;
         my $parser = new GO::Parser({handler=>'obj',use_cache=>1});
         $parser->parse("function.ontology");     # ontology
         $parser->parse("GO.defs");               # definitions
         $parser->parse("ec2go");                 # external refs
         $parser->parse("gene-associations.sgd"); # gene assocs
         # get GO::Model::Graph object
         my $graph = $parser->handler->graph;

         # Scenario 2: Getting OBO XML from a file
         use GO::Parser;
         my $parser = new GO::Parser({handler=>'xml'});
         $parser->handler->file("output.xml");
         $parser->parse("gene_ontology.obo");

         # Scenario 3: Using an XSL stylesheet to convert the OBO XML
         use GO::Parser;
         my $parser = new GO::Parser({handler=>'xml'});
         # xslt files are kept in in $ENV{GO_ROOT}/xml/xsl
         # (if $GO_ROOT is not set, defaults to install directory)
         $parser->xslt("oboxml_to_owl");
         $parser->handler->file("output.owl-xml");
         $parser->parse("gene_ontology.obo");

         # Scenario 4: via scripts
         my $cmd = "go2xml gene_ontology.obo | xsltproc my-transform.xsl -";
         my $fh = FileHandle->new("$cmd |") || die("problem initiating $cmd");
         while(<$fh>) { print $_ }
         $fh->close || die("problem running $cmd");

DESCRIPTION

       Module for parsing GO flat files; for examples of GO/OBO flatfile formats see:

       <ftp://ftp.geneontology.org/pub/go/ontology>

       <ftp://ftp.geneontology.org/pub/go/gene-associations>

       For a description of the various file formats, see:

       <http://www.geneontology.org/GO.format.html>

       <http://www.geneontology.org/GO.annotation.html#file>

       This module will generate XML events from a correctly formatted GO/OBO file

SEE ALSO

       This module is a part of go-dev, see:

       <http://www.godatabase.org/dev>

       for more details

PUBLIC METHODS

   new
        Title   : new
        Usage   : my $p = GO::Parser->new({format=>'obo_xml',handler=>'obj'});
                  $p->parse("go.obo-xml");
                  my $g = $p->handler->graph;
        Synonyms:
        Function: creates a parser object
        Example :
        Returns : GO::Parser
        Args    : a hashref of arguments:
                   format: a format for which a parser exists
                   handler: a format for which a perl handler exists
                   use_cache: (boolean) see caching below

   parse
        Title   : parse
        Usage   : $p->parse($file);
        Synonyms:
        Function: parses a file
        Example :
        Returns :
        Args    : str filename

   handler
        Title   : handler
        Usage   : my $handler = $p->handler;
        Synonyms:
        Function: gets/sets a GO::Handler object
        Example :
        Returns : L<GO::Handlers::base>
        Args    : L<GO::Handlers::base>

FORMATS

       This module is a front end wrapper for a number of different GO/OBO formats - see the
       relevant module documentation below for details.

       The full list of parsers can be found in the go-perl/GO/Parsers/ directory

       obo_text
           Files with suffix ".obo"

           This is a new file format replacement for the existing GO flat file formats. It
           handles ontologies, definitions and xrefs (but not associations)

       go_ont
           Files with suffix ".ontology"

           These store the ontology DAGs

       go_def
           Files with suffix ".defs"

       go_xref
           External database references for GO terms

           Files with suffix "2go" (eg ec2go, metacyc2go)

       go_assoc
           Annotations of genes or gene products using GO

           Files with prefix "gene-association."

       obo_xml
           Files with suffix ".obo.xml" or ".obo-xml"

           This is the XML version of the OBO flat file format above

           See <http://www.godatabase.org/dev/xml/doc/xml-doc.html>

       obj_yaml
           A YAML dump of the perl GO::Model::Graph object. You need YAML from CPAN for this to
           work

       obj_storable
           A dump of the perl GO::Model::Graph object. You need Storable from CPAN for this to
           work. This is intended to cache objects on the filesystem, for fast access. The
           obj_storable representation may not be portable

   PARSING ARCHITECTURE
       Each parser fires XML events. The XML events are known as Obo-XML.

       These XML events can be caught by a handler written in perl, or they can be caught by an
       XML parser written in some other language, or by using XSL stylesheets.

       go-dev comes with a number of stylesheets in the
         go-dev/xml/xsl directory

       Anything that catches these XML events is known as a handler

       go-perl comes with some standard perl XML handlers, in addition to some standard XSL
       stylesheets. These can be found in the go-dev/go-perl/GO/Handlers directory

       If you are interested in getting perl objects from files then you will want the obj
       handler, which gives back GO::Model::Graph objects

       The parsing architecture gives you the option of using the go-perl object model, or just
       parsing the XML events directly

       If you are using the go-db-perl library, the load-go-into-db.pl script will perform the
       following processes when loading files into the database

       Obo-XML events fired using GO::Parser::* classes
       Obo-XML transformed into godb xml using oboxml_to_godb_prestore.xsl
       godb_prestore.xml stored in database using generic loader

   Obo-XML
       The Obo-XML format DTD is stored in the go-dev/xml/dtd directory

   HOW IT WORKS
       Currently the various parsers and perl event handlers use the stag module for this - see
       Data::Stag for more details, or http://stag.sourceforge.net

   NESTED EVENTS
       nested events can be thought of as xml, without attributes; nested events can easily be
       turned into xml

       events have a start, a body and an end

       event handlers can *catch* these events and do something with them.

       an object handler can turn the events into objects, centred around the GO::Model::Graph
       object; see GO::Handlers::obj

       other handlers can catch the events and convert them into other formats, eg OWL or OBO

       Or you can bypass the handler and get output as an XML stream - to do this, just run the
       go2xml script

       a database loading event handler can catch the events and turn them into SQL statements,
       loading a MySQL or postgres database (see the go-db-perl library)

       the advantage of an event based parsing architecture is that it is easy to build
       lightweight parsers, and heavy weight object models can be bypassed if preferred.

   EXAMPLES
       To see examples of the events generated by the GO::Parser class, run the script go2xml;
       for example

         go2xml function.ontology

       on any GO-formatted flatfile

       This also works on OBO-formatted files:

         go2xml gene_ontology.obo

       You can also use the script "stag-parse.pl" which comes with the Data::Stag distribution.
       for example

         stag-parse.pl -p GO::Parsers::go_assoc_parser gene-association.fb

   XSLT HANDLERS
       The full list can be found in the go-dev/xml/xsl directory

   PERL HANDLERS
       see GO::Handlers::* for all the different handlers possible; more can be added
       dynamically.

       you can either create the handler object yourself, and pass it as an argument, e.g.

         my $apph    = new GO::AppHandle(-db=>"go");
         my $handler = new GO::Handlers::godb({apph=>$apph});
         my $parser  = new GO::Parser({handler=>$handler});
         $parser->parse(@files);

       or you can use one of the registered handlers:

         my $parser = new GO::Parser({handler=>'db',
                                      handler_args=>{apph=>$apph}});

       or you can just do things from the command line

         go2fmt.pl -w oboxml function.ontology

       the registered perl handlers are as follows:

       obo_xml
           writes out OBO-XML (which is basically a straightforward conversion of the event
           stream into XML)

       obo_text
       go_ont
           legacy GO-ontology file format

       go_xref
           GO xref file, for linking GO terms to terms and dbxrefs in other ontologies

       go_defs
           legacy GO-definitions file format

       go_assoc
           GO association file format

       rdf GO XML-RDF file format

       owl OWL format (default: OWL-DL)

           OWL is a W3C standard format for ontologies

           You will need the XSL files from the full go-dev distribution to run this; see the XML
           section in <http://www.godatabase.org/dev>

       prolog
           prolog facts - you will need a prolog compiler/interpreter to use these. You can
           reason over these facts using Obol or the forthcoming Bio-LP project

       sxpr
           lisp style S-Expressions, conforming to the OBO-XML schema; you will need lisp to make
           full use of these. you can also do some nice stuff just within emacs (use lisp-mode
           and load an sxpr file into your buffer)

       godb
           this is actually part of the go-db-perl library, not the go-perl library

           catches events and loads them into a database conforming to the GO database schema;
           see the directory go-dev/sql, as part of the whole go-dev distribution; or
           www.godatabase.org/dev/database

       obj_yaml
           A YAML dump of the perl GO::Model::Graph object. You need YAML from CPAN for this to
           work

       obj_storable
           A dump of the perl GO::Model::Graph object. You need Storable from CPAN for this to
           work. This is intended to cache objects on the filesystem, for fast access. The
           obj_storable representation may not be portable

EXAMPLES OF DATATYPE TEXT FORMATS

   go_ont format
       eg format: go_ont for storing graphs and metadata; for example:

         !version: $Revision: 1.15 $
         !date: $Date: 2006/04/20 22:48:23 $
         !editors: Michael Ashburner (FlyBase), Midori Harris (SGD), Judy Blake (MGD)
         $Gene_Ontology ; GO:0003673
          $cellular_component ; GO:0005575
           %extracellular ; GO:0005576
            <fibrinogen ; GO:0005577
             <fibrinogen alpha chain ; GO:0005972
             <fibrinogen beta chain ; GO:0005973

       See GO::Parsers::go_ont_parser for more details

       this is the following file parsed with events turned directly into OBO-XML:

         <?xml version="1.0" encoding="UTF-8"?>
         <obo>
           <source>
             <source_type>file</source_type>
             <source_path>z.ontology</source_path>
             <source_mtime>1075164285</source_mtime>
           </source>
           <term>
             <id>GO:0003673</id>
             <name>Gene_Ontology</name>
             <ontology>root</ontology>
           </term>
           <term>
             <id>GO:0005575</id>
             <name>cellular_component</name>
             <ontology>root</ontology>
             <is_a>GO:0003673</is_a>
           </term>
           <term>
             <id>GO:0005576</id>
             <name>extracellular</name>
             <ontology>root</ontology>
             <is_a>GO:0005575</is_a>
           </term>
           <term>
             <id>GO:0005577</id>
             <name>fibrinogen</name>
             <ontology>root</ontology>
             <relationship>
               <type>part_of</type>
               <to>GO:0005576</to>
             </relationship>
           </term>
           <term>
             <id>GO:0005972</id>
             <name>fibrinogen alpha chain</name>
             <ontology>root</ontology>
             <relationship>
               <type>part_of</type>
               <to>GO:0005577</to>
             </relationship>
           </term>
           <term>
             <id>GO:0005973</id>
             <name>fibrinogen beta chain</name>
             <ontology>root</ontology>
             <relationship>
               <type>part_of</type>
               <to>GO:0005577</to>
             </relationship>
           </term>
         </obo>

   go_def format
       eg format: go_defs for storing definitions:

         !Gene Ontology definitions
         !
         term: 'de novo' protein folding
         goid: GO:0006458
         definition: Processes that assist the folding of a nascent peptide chain into its correct tertiary structure.
         definition_reference: Sanger:mb

       See GO::Parsers::go_def_parser for more details

   go_xref format
       eg format: go_xrefs for storing links between GO IDs and IDs for terms in other DBs:

         EC:1.-.-.- > GO:oxidoreductase ; GO:0016491
         EC:1.1.-.- > GO:1-phenylethanol dehydrogenase ; GO:0018449

       See GO::Parsers::go_xref_parser for more details

   go_assoc format
       eg format: go-assocs for storing gene-associations:

         SGD     S0004660        AAC1            GO:0005743      SGD:12031|PMID:2167309 TAS             C       ADP/ATP translocator    YMR056C gene    taxon:4932 20010118
         SGD     S0004660        AAC1            GO:0006854      SGD:12031|PMID:2167309 IDA             P       ADP/ATP translocator    YMR056C gene    taxon:4932 20010118

       See GO::Parsers::go_assoc_parser for more details

   obo_text format
       <http://www.geneontology.org/GO.format.html>

   new
         Usage   - my $parser = GO::Parser->new()
         Returns - GO::Parser

       creates a new parser

   create_handler
         Usage   - my $handler = GO::Parser->create_handler('obj');
         Returns - L<GO::Handlers::base>
         Args    - handler type [str]