Provided by: biosquid_1.9g+cvs20050121-11_amd64 bug

NAME

       sfetch - get a sequence from a flatfile database.

SYNOPSIS

       sfetch [options] seqname

DESCRIPTION

       sfetch retrieves the sequence named seqname from a sequence database.

       Which database is used is controlled by the -d and -D options, or "little databases" and "big databases".
       The directory location of "big databases" can be specified by environment variables, such as  $SWDIR  for
       Swissprot, and $GBDIR for Genbank (see -D for complete list).  A complete file path must be specified for
       "little databases".  By default, if neither option is specified and  the  name  looks  like  a  Swissprot
       identifier  (e.g.  it  has a _ character), the $SWDIR environment variable is used to attempt to retrieve
       the sequence seqname from Swissprot.

       A variety of other options are available which allow retrieval  of  subsequences  (-f,-t);  retrieval  by
       accession  number  instead  of  by name (-a); reformatting the extracted sequence into a variety of other
       formats (-F); etc.

       If the database has been SSI indexed, sequence retrieval will be extremely efficient; else, retrieval may
       be painfully slow (the entire database may have to be read into memory to find seqname).  SSI indexing is
       recommended for all large or permanent databases. The program sindex creates SSI indexes for any sequence
       file.

       sfetch  was  originally  named  getseq, and was renamed because it clashed with a GCG program of the same
       name.

OPTIONS

       -a     Interpret seqname as an accession number, not an identifier.

       -d <seqfile>
              Retrieve the sequence from a sequence file named <seqfile>.  If a GSI index <seqfile>.gsi  exists,
              it is used to speed up the retrieval.

       -f <from>
              Extract  a  subsequence  starting  from position <from>, rather than from 1. See -t.  If <from> is
              greater than <to> (as specified by the -t option), then the sequence is extracted as  its  reverse
              complement (it is assumed to be nucleic acid sequence).

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -o <outfile>
              Direct the output to a file named <outfile>.  By default, output would go to stdout.

       -r <newname>
              Rename  the  sequence  <newname> in the output after extraction. By default, the original sequence
              identifier would be retained. Useful,  for  instance,  if  retrieving  a  sequence  fragment;  the
              coordinates of the fragment might be added to the name (this is what Pfam does).

       -t <to>
              Extract  a subsequence that ends at position <to>, rather than at the end of the sequence. See -f.
              If <to> is less than <from> (as specified by the -f option), then the sequence is extracted as its
              reverse complement (it is assumed to be nucleic acid sequence)

       -D <database>
              Retrieve the sequence from the main sequence database coded <database>. For each code, there is an
              environment variable that specifies the directory path to that  database.   Recognized  codes  and
              their corresponding environment variables are -Dsw (Swissprot, $SWDIR); -Dpir (PIR, $PIRDIR); -Dem
              (EMBL, $EMBLDIR); -Dgb (Genbank, $GBDIR); -Dwp (Wormpep,  $WORMDIR);  and  -Dowl  (OWL,  $OWLDIR).
              Each database is read in its native flatfile format.

       -F <format>
              Reformat  the  extracted sequence into a different format.  (By default, the sequence is extracted
              from the database in the same format as the database.) Available formats are embl, fasta, genbank,
              gcg, strider, zuker, ig, pir, squid, and raw.

EXPERT OPTIONS

       --informat <s>
              Specify  that  the  sequence  file is in format <s>, rather than the default FASTA format.  Common
              examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal, MSF,  or  PHYLIP;  see  the  printed
              documentation  for  a  complete  list of accepted format names.  This option overrides the default
              format (FASTA) and the -B Babelfish autodetection option.

SEE ALSO

       afetch(1), alistat(1), compalign(1),  compstruct(1),  revcomp(1),  seqsplit(1),  seqstat(1),  shuffle(1),
       sindex(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid  and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
       Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution
       for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu