Ubuntu Manpage: sfetch - get a sequence from a flatfile database.

Provided by: biosquid_1.9g+cvs20050121-5_amd64

NAME

       sfetch - get a sequence from a flatfile database.

SYNOPSIS

       sfetch [options] seqname

DESCRIPTION

       sfetch retrieves the sequence named seqname from a sequence database.

       Which database is used is controlled by the -d and -D options, or "little databases" and "big databases".
       The  directory  location of "big databases" can be specified by environment variables, such as $SWDIR for
       Swissprot, and $GBDIR for Genbank (see -D for complete list).  A complete file path must be specified for
       "little databases".  By default, if neither option is specified and  the  name  looks  like  a  Swissprot
       identifier  (e.g.  it  has a _ character), the $SWDIR environment variable is used to attempt to retrieve
       the sequence seqname from Swissprot.

       A variety of other options are available which allow retrieval  of  subsequences  (-f,-t);  retrieval  by
       accession  number  instead  of  by name (-a); reformatting the extracted sequence into a variety of other
       formats (-F); etc.

       If the database has been SSI indexed, sequence retrieval will be extremely efficient; else, retrieval may
       be painfully slow (the entire database may have to be read into memory to find seqname).  SSI indexing is
       recommended for all large or permanent databases. The program sindex creates SSI indexes for any sequence
       file.

       sfetch was originally named getseq, and was renamed because it clashed with a GCG  program  of  the  same
       name.

OPTIONS

       -a     Interpret seqname as an accession number, not an identifier.

       -d <seqfile>
              Retrieve  the sequence from a sequence file named <seqfile>.  If a GSI index <seqfile>.gsi exists,
              it is used to speed up the retrieval.

       -f <from>
              Extract a subsequence starting from position <from>, rather than from 1. See  -t.   If  <from>  is
              greater  than  <to> (as specified by the -t option), then the sequence is extracted as its reverse
              complement (it is assumed to be nucleic acid sequence).

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -o <outfile>
              Direct the output to a file named <outfile>.  By default, output would go to stdout.

       -r <newname>
              Rename the sequence <newname> in the output after extraction. By default,  the  original  sequence
              identifier  would  be  retained.  Useful,  for  instance,  if  retrieving a sequence fragment; the
              coordinates of the fragment might be added to the name (this is what Pfam does).

       -t <to>
              Extract a subsequence that ends at position <to>, rather than at the end of the sequence. See  -f.
              If <to> is less than <from> (as specified by the -f option), then the sequence is extracted as its
              reverse complement (it is assumed to be nucleic acid sequence)

       -D <database>
              Retrieve the sequence from the main sequence database coded <database>. For each code, there is an
              environment  variable  that  specifies  the directory path to that database.  Recognized codes and
              their corresponding environment variables are -Dsw (Swissprot, $SWDIR); -Dpir (PIR, $PIRDIR); -Dem
              (EMBL, $EMBLDIR); -Dgb (Genbank, $GBDIR); -Dwp (Wormpep,  $WORMDIR);  and  -Dowl  (OWL,  $OWLDIR).
              Each database is read in its native flatfile format.

       -F <format>
              Reformat  the  extracted sequence into a different format.  (By default, the sequence is extracted
              from the database in the same format as the database.) Available formats are embl, fasta, genbank,
              gcg, strider, zuker, ig, pir, squid, and raw.

EXPERT OPTIONS

       --informat <s>
              Specify that the sequence file is in format <s>, rather than the  default  FASTA  format.   Common
              examples  include  Genbank,  EMBL,  GCG,  PIR, Stockholm, Clustal, MSF, or PHYLIP; see the printed
              documentation for a complete list of accepted format names.  This  option  overrides  the  default
              format (FASTA) and the -B Babelfish autodetection option.

AUTHOR

       Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of  Medicine
       Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution
       for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu

Biosquid 1.9g                                     January 2003                                         sfetch(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXPERT OPTIONS

SEE ALSO

AUTHOR