Ubuntu Manpage: bbnorm.sh - Kmer-based error-correction and normalization tool

NAME

       bbnorm.sh - Kmer-based error-correction and normalization tool

SYNOPSIS

       bbnorm.sh in=<input> out=<reads to keep> outt=<reads to toss> hist=<histogram output>

DESCRIPTION

       Normalizes  read  depth  based  on  kmer  counts.   Can  also error-correct, bin reads by kmer depth, and
       generate a kmer depth histogram.  However, Tadpole has superior error-correction to BBNorm.  Please  read
       bbmap/docs/guides/BBNormGuide.txt for more information.

OPTIONS

   Input parameters
       in=null
              Primary input.  Use in2 for paired reads in a second file

       in2=null
              Second input file for paired reads in two files

       extra=null
              Additional files to use for input (generating hash table) but not for output

       fastareadlen=2^31
              Break up FASTA reads longer than this.  Can be useful when processing scaffolded genomes

       tablereads=-1
              Use at most this many reads when building the hashtable (-1 means all)

       kmersample=1
              Process every nth kmer, and skip the rest

       readsample=1
              Process every nth read, and skip the rest

       interleaved=auto
              May  be  set  to true or false to force the input read file to override autodetection of the input
              file as paired interleaved.

       qin=auto
              ASCII offset for input quality.  May be 33 (Sanger), 64 (Illumina), or auto.

   Output parameters
       out=<file>
              File for normalized or corrected reads.  Use out2 for paired reads in a second file

       outt=<file>
              (outtoss) File for reads that were excluded from primary output

       reads=-1
              Only process this number of reads, then quit (-1 means all)

       sampleoutput=t
              Use sampling on output as well as input (not used if sample rates are 1)

       keepall=f
              Set to true to keep all reads (e.g. if you just want error correction).

       zerobin=f
              Set to true if you want kmers with a count of 0 to go in the  0  bin  instead  of  the  1  bin  in
              histograms.

              Default  is  false, to prevent confusion about how there can be 0-count kmers.  The reason is that
              based on the 'minq' and 'minprob' settings, some kmers may be excluded from the bloom filter.

       tmpdir=
              This will specify a directory for temp files (only needed for multipass runs).  If null, they will
              be written to the output directory.

       usetempdir=t
              Allows enabling/disabling of temporary directory; if disabled, temp files will be written  to  the
              output directory.

       qout=auto
              ASCII offset for output quality.  May be 33 (Sanger), 64 (Illumina), or auto (same as input).

       rename=f
              Rename reads based on their kmer depth.

   Hashing parameters
       k=31   Kmer length (values under 32 are most efficient, but arbitrarily high values are supported)

       bits=32
              Bits  per  cell  in  bloom  filter;  must  be  2, 4, 8, 16, or 32.  Maximum kmer depth recorded is
              2^cbits.  Automatically reduced to 16 in 2-pass.

              Large values decrease accuracy for a fixed amount of memory, so use the lowest number you can that
              will still capture highest-depth kmers.

       hashes=3
              Number of times each kmer is hashed and stored.  Higher is slower.

              Higher is MORE accurate if there is enough memory, and  LESS  accurate  if  there  is  not  enough
              memory.

       prefilter=f
              True  is slower, but generally more accurate; filters out low-depth kmers from the main hashtable.
              The prefilter is more memory-efficient because it uses 2-bit cells.

       prehashes=2
              Number of hashes for prefilter.

       prefilterbits=2
              (pbits) Bits per cell in prefilter.

       prefiltersize=0.35
              Fraction of memory to allocate to prefilter.

       buildpasses=1
              More passes can sometimes increase accuracy by iteratively removing low-depth kmers

       minq=6 Ignore kmers containing bases with quality below this

       minprob=0.5
              Ignore kmers with overall probability of correctness below this

       threads=auto
              (t) Spawn exactly X hashing threads (default is  number  of  logical  processors).   Total  active
              threads may exceed X due to I/O threads.

       rdk=t  (removeduplicatekmers) When true, a kmer's count will only be incremented once per read pair, even
              if that kmer occurs more than once.

   Normalization parameters
       fixspikes=f
              (fs)  Do  a  slower, high-precision bloom filter lookup of kmers that appear to have an abnormally
              high depth due to collisions.

       target=100
              (tgt) Target normalization depth.  NOTE:  All depth parameters control kmer depth, not read depth.

              For kmer depth Dk, read depth Dr, read length R, and kmer size K: Dr=Dk*(R/(R-K+1))

       maxdepth=-1
              (max) Reads will not be downsampled when below this depth, even  if  they  are  above  the  target
              depth.

       mindepth=5
              (min)  Kmers  with  depth  below  this number will not be included when calculating the depth of a
              read.

       minkmers=15
              (mgkpr) Reads  must  have  at  least  this  many  kmers  over  min  depth  to  be  retained.   Aka
              'mingoodkmersperread'.

       percentile=54.0
              (dp)  Read  depth  is  by default inferred from the 54th percentile of kmer depth, but this may be
              changed to any number 1-100.

       uselowerdepth=t
              (uld) For pairs, use the depth of the lower read as the depth proxy.

       deterministic=t
              (dr) Generate random numbers deterministically to ensure identical output between  multiple  runs.
              May decrease speed with a huge number of threads.

       passes=2
              (p) 1 pass is the basic mode.  2 passes (default) allows greater accuracy, error detection, better
              contol of output depth.

   Error detection parameters
       hdp=90.0
              (highdepthpercentile)  Position  in  sorted  kmer  depth array used as proxy of a read's high kmer
              depth.

       ldp=25.0
              (lowdepthpercentile) Position in sorted kmer depth array used as proxy of a read's low kmer depth.

       tossbadreads=f
              (tbr) Throw away reads detected as containing errors.

       requirebothbad=f
              (rbb) Only toss bad pairs if both reads are bad.

       errordetectratio=125
              (edr) Reads with a ratio of at least this much between their high and  low  depth  kmers  will  be
              classified as error reads.

       highthresh=12
              (ht) Threshold for high kmer.  A high kmer at this or above are considered non-error.

       lowthresh=3
              (lt) Threshold for low kmer.  Kmers at this and below are always considered errors.

   Error correction parameters
       ecc=f  Set to true to correct errors.  NOTE: Tadpole is now preferred for ecc as it does a better job.

       ecclimit=3
              Correct up to this many errors per read.  If more are detected, the read will remain unchanged.

       errorcorrectratio=140
              (ecr)  Adjacent  kmers  with  a depth ratio of at least this much between will be classified as an
              error.

       echighthresh=22
              (echt) Threshold for high kmer.  A kmer at this or above may be considered non-error.

       eclowthresh=2
              (eclt) Threshold for low kmer.  Kmers at this and below are considered errors.

       eccmaxqual=127
              Do not correct bases with quality above this value.

       aec=f  (aggressiveErrorCorrection) Sets more aggressive values of ecr=100, ecclimit=7, echt=16, eclt=3.

       cec=f  (conservativeErrorCorrection) Sets more  conservative  values  of  ecr=180,  ecclimit=2,  echt=30,
              eclt=1, sl=4, pl=4.

       meo=f  (markErrorsOnly)  Marks  errors  by  reducing  quality value of suspected errors; does not correct
              anything.

       mue=t  (markUncorrectableErrors) Marks errors only on uncorrectable reads; requires 'ecc=t'.

       overlap=f
              (ecco) Error correct by read overlap.

   Depth binning parameters
       lowbindepth=10
              (lbd) Cutoff for low depth bin.

       highbindepth=80
              (hbd) Cutoff for high depth bin.

       outlow=<file>
              Pairs in which both reads have a median below lbd go into this file.

       outhigh=<file>
              Pairs in which both reads have a median above hbd go into this file.

       outmid=<file>
              All other pairs go into this file.

   Histogram parameters
       hist=<file>
              Specify a file to write the input kmer depth histogram.

       histout=<file>
              Specify a file to write the output kmer depth histogram.

       histcol=3
              (histogramcolumns) Number of histogram columns, 2 or 3.

       pzc=f  (printzerocoverage) Print lines in the histogram with zero coverage.

       histlen=1048576
              Max kmer depth displayed in histogram.  Also affects statistics displayed,  but  does  not  affect
              normalization.

   Peak calling parameters
       peaks=<file>
              Write the peaks to this file.  Default is stdout.

       minHeight=2
              (h) Ignore peaks shorter than this.

       minVolume=5
              (v) Ignore peaks with less area than this.

       minWidth=3
              (w) Ignore peaks narrower than this.

       minPeak=2
              (minp) Ignore peaks with an X-value below this.

       maxPeak=BIG
              (maxp) Ignore peaks with an X-value above this.

       maxPeakCount=8
              (maxpc) Print up to this many peaks (prioritizing height).

   Java Parameters:
       -Xmx   This will set Java's memory usage, overriding autodetection.

              -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs.  The max is typically 85%
              of physical memory.

       -eoom  This  flag  will  cause  the  process to exit if an out-of-memory exception occurs.  Requires Java
              8u92+.

       -da    Disable assertions.

AUTHOR

       Written by Brian Bushnell (Last modified October 19, 2017)

       Please contact  Brian  Bushnell  at  bbushnell@lbl.gov  if  you  encounter  any  problems,  or  post  at:
       http://seqanswers.com/forums/showthread.php?t=41057

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.

bbnorm.sh 38.43                                    April 2019                                       BBNORM.SH(1)