xenial (1) sindex.1.gz

Provided by: biosquid_1.9g+cvs20050121-5_amd64 bug

NAME

       sindex - index a sequence database for sfetch

SYNOPSIS

       sindex [options] seqfile1 [seqfile2...]

DESCRIPTION

       sindex  indexes  one  or more seqfiles for future sequence retrievals by sfetch.  An SSI ("squid sequence
       index") file is created in the same directory with the sequence files. By default, this  file  is  called
       <seqfile>.ssi.

       If  there  is  more than one sequence file on the command line, the SSI filename will be constructed from
       the last sequence file name. This may not be what you want; see the -o option to specify  your  own  name
       for the SSI file.

       sindex  is  capable  of indexing large files (>2 GB) if optional LFS support has been enabled at compile-
       time. See INSTALL instructions that came with @PACKAGE@.

OPTIONS

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -o <ssi outfile>
              Direct the SSI index  to  a  file  named  <outfile>.   By  default,  the  SSI  file  would  go  to
              <seqfile>.ssi.

EXPERT OPTIONS

       --64   Force  the SSI file into 64-bit (large seqfile) mode, even if the seqfile is small. You don't want
              to do this unless you're debugging.

       --external
              Force sindex to do its record sorting by external (on-disk)  sorting.  This  is  only  useful  for
              debugging, too.

       --informat <s>
              Specify  that  the  sequence  file  is  definitely  in  format  <s>;  blocks  sequence file format
              autodetection.  This  is  useful  in  automated  pipelines,   because   it   improves   robustness
              (autodetection  can occasionally go wrong on a perversely malformed file). Common examples include
              genbank, embl, gcg, pir, stockholm, clustal, msf, or phylip; see the printed documentation  for  a
              complete list of accepted format names.

       --pfamseq
              A  hack  for  Pfam; indexes a FASTA file that is known to have identifier lines in format ">[name]
              [accession] [optional description]". Normally only the sequence name would be indexed as a primary
              key  in  a FASTA SSI file, but this allows indexing both the name (as a primary key) and accession
              (as a secondary key).

SEE ALSO

       afetch(1), alistat(1),  compalign(1),  compstruct(1),  revcomp(1),  seqsplit(1),  seqstat(1),  sfetch(1),
       shuffle(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid  and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
       Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution
       for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu