Provided by: biosquid_1.9g+cvs20050121-12_amd64 bug

NAME

       sindex - index a sequence database for sfetch

SYNOPSIS

       sindex [options] seqfile1 [seqfile2...]

DESCRIPTION

       sindex  indexes  one  or  more  seqfiles for future sequence retrievals by sfetch.  An SSI
       ("squid sequence index") file is created in the same directory with the sequence files. By
       default, this file is called <seqfile>.ssi.

       If  there  is  more  than  one sequence file on the command line, the SSI filename will be
       constructed from the last sequence file name. This may not be what you want;  see  the  -o
       option to specify your own name for the SSI file.

       sindex is capable of indexing large files (>2 GB) if optional LFS support has been enabled
       at compile-time. See INSTALL instructions that came with @PACKAGE@.

OPTIONS

       -h     Print brief help; includes version number and summary  of  all  options,  including
              expert options.

       -o <ssi outfile>
              Direct  the SSI index to a file named <outfile>.  By default, the SSI file would go
              to <seqfile>.ssi.

EXPERT OPTIONS

       --64   Force the SSI file into 64-bit (large seqfile) mode, even if the seqfile is  small.
              You don't want to do this unless you're debugging.

       --external
              Force  sindex  to do its record sorting by external (on-disk) sorting. This is only
              useful for debugging, too.

       --informat <s>
              Specify that the sequence file is definitely in format <s>;  blocks  sequence  file
              format  autodetection.  This  is useful in automated pipelines, because it improves
              robustness (autodetection can occasionally  go  wrong  on  a  perversely  malformed
              file). Common examples include genbank, embl, gcg, pir, stockholm, clustal, msf, or
              phylip; see the printed documentation for a complete list of accepted format names.

       --pfamseq
              A hack for Pfam; indexes a FASTA file that is known to  have  identifier  lines  in
              format  ">[name]  [accession]  [optional  description]". Normally only the sequence
              name would be indexed as a primary key  in  a  FASTA  SSI  file,  but  this  allows
              indexing both the name (as a primary key) and accession (as a secondary key).

SEE ALSO

       afetch(1),  alistat(1),  compalign(1), compstruct(1), revcomp(1), seqsplit(1), seqstat(1),
       sfetch(1), shuffle(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid and its documentation are  Copyright  (C)  1992-2003  HHMI/Washington  University
       School  of  Medicine  Freely  distributed  under  the GNU General Public License (GPL) See
       COPYING in the source code distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu