Provided by: anfo_0.98-9_amd64 bug

NAME

       dnaindex - index dna file for use with ANFO

SYNOPSIS

       dnaindex [ option ... ]

DESCRIPTION

       dnaindex  builds  an  index  for a dna file.  Dna files must be indexed to be useable with
       anfo(1), it is possible to have multiple indices for the same dna file.

OPTIONS

       -V, --version
              Print version number and exit.

       -o file, --output file
              Write  output   to   file.   file   customarily   ends   in   .idx.    Default   is
              genomename_wordsize.idx.

       -g file, --genome file
              Read the genome from file.  This file name is also stored in the resulting index so
              it can be found automatically whenever the index is used.  It is therefore best  if
              file is just a file name without path.

       -G dir, --genome-dir dir
              Add  dir  to the genome search path.  This is useful if the genome to be indexed is
              not yet in the place where it will later be used.

       -d text, --description text
              Add text as description to the index.  This is purely informative.

       -s size, --wordsize size
              Set the wordsize to size.  A smaller wordsize increases precision at the expense of
              higher  computational  investment.   The  default  is  12, which with a stride of 8
              yields a good compromise.

       -S num, --stride num
              Set the stride to num.  Only one out of num  possible  words  of  dna  is  actually
              indexed.   A  smaller  stride increases precicion at the expense of a bigger index.
              The default is 8, which in  conjunction  with  a  wordsize  of  12  yields  a  good
              compromise.

       -l lim, --limit lim
              Prevents  the  indexing of words that occur more often than lim times.  This can be
              used to ignore repetitive seeds and save the space to store them.  A  good  default
              depends  on  the size of the genome being indexed, something like 500 works for the
              human genome with wordsize 12 and stride 8.

       -h, --histogram
              Produce a histogram of word frequencies.  This can be used to get an indea how  the
              frequency distribution to select an appropriate value for --limit.

       -v, --verbose
              Print a progress indicator during operation.

NOTES

       dnaindex  is  limited  to  genomes  no  longer  than  4 gigabases due to its use of 32 bit
       indices.  The index is quite large, so depending on  parameters,  a  64  bit  platform  is
       needed for genomes in the gigabase range.

       If  a  genome  contains IUPAC ambiguity codes, the affected seeds need to be expanded.  If
       there are many ambiguity codes in a small region, that results in  an  unacceptably  large
       index.

ENVIRONMENT

       ANFO_PATH
              Colon separated list of directories searched for genome files.

FILES

       /etc/popt
              The  system  wide  configuration  file  for popt(3).  dnaindex identifies itself as
              "dnaindex" to popt.

       ~/.popt
              Per user configuration file for popt(3).

BUGS

       None known.

AUTHOR

       Udo Stenzel <udo_stenzel@eva.mpg.de>

SEE ALSO

       anfo(1), fa2dna(1), popt(3), fasta(5)