lunar (1) sfetch.1.gz

Provided by: biosquid_1.9g+cvs20050121-12_amd64 bug

NAME

       sfetch - get a sequence from a flatfile database.

SYNOPSIS

       sfetch [options] seqname

DESCRIPTION

       sfetch retrieves the sequence named seqname from a sequence database.

       Which  database  is used is controlled by the -d and -D options, or "little databases" and
       "big  databases".   The  directory  location  of  "big  databases"  can  be  specified  by
       environment  variables,  such  as $SWDIR for Swissprot, and $GBDIR for Genbank (see -D for
       complete list).  A complete file path  must  be  specified  for  "little  databases".   By
       default,  if  neither  option  is specified and the name looks like a Swissprot identifier
       (e.g. it has a _ character), the  $SWDIR  environment  variable  is  used  to  attempt  to
       retrieve the sequence seqname from Swissprot.

       A  variety  of  other options are available which allow retrieval of subsequences (-f,-t);
       retrieval by accession number instead of by name (-a); reformatting the extracted sequence
       into a variety of other formats (-F); etc.

       If  the  database  has  been  SSI indexed, sequence retrieval will be extremely efficient;
       else, retrieval may be painfully slow (the entire database may have to be read into memory
       to  find  seqname).  SSI indexing is recommended for all large or permanent databases. The
       program sindex creates SSI indexes for any sequence file.

       sfetch was originally named getseq, and was renamed because it clashed with a GCG  program
       of the same name.

OPTIONS

       -a     Interpret seqname as an accession number, not an identifier.

       -d <seqfile>
              Retrieve  the  sequence  from  a  sequence  file  named  <seqfile>.  If a GSI index
              <seqfile>.gsi exists, it is used to speed up the retrieval.

       -f <from>
              Extract a subsequence starting from position <from>, rather than from  1.  See  -t.
              If  <from>  is greater than <to> (as specified by the -t option), then the sequence
              is extracted as its reverse complement (it is assumed to be nucleic acid sequence).

       -h     Print brief help; includes version number and summary  of  all  options,  including
              expert options.

       -o <outfile>
              Direct  the  output  to  a  file  named  <outfile>.  By default, output would go to
              stdout.

       -r <newname>
              Rename the sequence <newname> in the  output  after  extraction.  By  default,  the
              original sequence identifier would be retained. Useful, for instance, if retrieving
              a sequence fragment; the coordinates of the fragment might be  added  to  the  name
              (this is what Pfam does).

       -t <to>
              Extract  a  subsequence  that  ends at position <to>, rather than at the end of the
              sequence. See -f.  If <to> is less than <from> (as specified  by  the  -f  option),
              then  the  sequence  is  extracted  as  its reverse complement (it is assumed to be
              nucleic acid sequence)

       -D <database>
              Retrieve the sequence from the main sequence database coded  <database>.  For  each
              code,  there  is  an environment variable that specifies the directory path to that
              database.  Recognized codes and their corresponding environment variables are  -Dsw
              (Swissprot,  $SWDIR);  -Dpir  (PIR, $PIRDIR); -Dem (EMBL, $EMBLDIR); -Dgb (Genbank,
              $GBDIR); -Dwp (Wormpep, $WORMDIR); and -Dowl (OWL, $OWLDIR).  Each database is read
              in its native flatfile format.

       -F <format>
              Reformat the extracted sequence into a different format.  (By default, the sequence
              is extracted from the database in the  same  format  as  the  database.)  Available
              formats are embl, fasta, genbank, gcg, strider, zuker, ig, pir, squid, and raw.

EXPERT OPTIONS

       --informat <s>
              Specify  that  the  sequence  file  is in format <s>, rather than the default FASTA
              format.  Common examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal,  MSF,
              or  PHYLIP;  see  the  printed documentation for a complete list of accepted format
              names.  This option overrides the default  format  (FASTA)  and  the  -B  Babelfish
              autodetection option.

SEE ALSO

       afetch(1),  alistat(1),  compalign(1), compstruct(1), revcomp(1), seqsplit(1), seqstat(1),
       shuffle(1), sindex(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid and its documentation are  Copyright  (C)  1992-2003  HHMI/Washington  University
       School  of  Medicine  Freely  distributed  under  the GNU General Public License (GPL) See
       COPYING in the source code distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu