Provided by: anfo_0.98-4_amd64 bug

NAME

       dnaindex - index dna file for use with ANFO

SYNOPSIS

       dnaindex [ option ... ]

DESCRIPTION

       dnaindex  builds  an  index  for a dna file.  Dna files must be indexed to be useable with anfo(1), it is
       possible to have multiple indices for the same dna file.

OPTIONS

       -V, --version
              Print version number and exit.

       -o file, --output file
              Write output to file. file customarily ends in .idx.  Default is genomename_wordsize.idx.

       -g file, --genome file
              Read the genome from file.  This file name is also stored in the resulting  index  so  it  can  be
              found  automatically whenever the index is used.  It is therefore best if file is just a file name
              without path.

       -G dir, --genome-dir dir
              Add dir to the genome search path.  This is useful if the genome to be indexed is not yet  in  the
              place where it will later be used.

       -d text, --description text
              Add text as description to the index.  This is purely informative.

       -s size, --wordsize size
              Set  the  wordsize  to  size.   A  smaller  wordsize  increases precision at the expense of higher
              computational investment.  The default is 12, which with a stride of 8 yields a good compromise.

       -S num, --stride num
              Set the stride to num.  Only one out of num possible words of dna is actually indexed.  A  smaller
              stride  increases  precicion  at  the  expense  of  a  bigger  index.   The default is 8, which in
              conjunction with a wordsize of 12 yields a good compromise.

       -l lim, --limit lim
              Prevents the indexing of words that occur more often than lim times.  This can be used  to  ignore
              repetitive  seeds  and  save  the  space to store them.  A good default depends on the size of the
              genome being indexed, something like 500 works for the human genome with wordsize 12 and stride 8.

       -h, --histogram
              Produce a histogram of word frequencies.  This can be used to  get  an  indea  how  the  frequency
              distribution to select an appropriate value for --limit.

       -v, --verbose
              Print a progress indicator during operation.

NOTES

       dnaindex is limited to genomes no longer than 4 gigabases due to its use of 32 bit indices.  The index is
       quite large, so depending on parameters, a 64 bit platform is needed for genomes in the gigabase range.

       If a genome contains IUPAC ambiguity codes, the affected seeds need to be expanded.  If  there  are  many
       ambiguity codes in a small region, that results in an unacceptably large index.

ENVIRONMENT

       ANFO_PATH
              Colon separated list of directories searched for genome files.

FILES

       /etc/popt
              The system wide configuration file for popt(3).  dnaindex identifies itself as "dnaindex" to popt.

       ~/.popt
              Per user configuration file for popt(3).

BUGS

       None known.

AUTHOR

       Udo Stenzel <udo_stenzel@eva.mpg.de>

SEE ALSO

       anfo(1), fa2dna(1), popt(3), fasta(5)