lunar (1) bamadapterfind.1.gz

Provided by: biobambam2_2.0.185+ds-1_amd64 bug

NAME

       bamadapterfind - find adapter contamination in sequencing reads

SYNOPSIS

       bamdapterfind [options]

DESCRIPTION

       bamdapterfind  scans  a  BAM  file  for contaminations by sequencing adapters. It uses two
       separate methods for this detection:

       list:  each read is matched against a predefined list of adapter sequences. A sequence  is
              considered  as  matching  if there is an overlap of a least adpmatchminscore bases,
              the overlap covers at least a factor of adpmatchminfrac of the adapter's length and
              the  indel  free local alignment between the adapter and the read covers at least a
              factor of adpmatchminpfrac of the length of the possible overlap between  the  two.
              If  such a match is found, then the auxiliary field as is filled with the length of
              the match, af is filled with the fraction of the adapter sequence matched and aa is
              filled with the name of the matched adapter sequence.

       overlap:
              the two mates need to have a match similar to the following two lines

                  s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15s16t0t1t2t3
          x3x2x1x0s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15s16

              where  an  infix  s0s1s2...  of  the  first  read  matches  a suffix of the reverse
              complement of the second read. In this case it is likely that the  first  read  has
              been  sequenced  beyond  the  end  of  the  payload  sequence and into the attached
              adapter. This overlap needs to be at least MIN_OVERLAP bases long to be considered.
              If  such  an overlap is found, then the adjacent sequences are checked for a match,
              where in the example x3x2x1x0 needs to be the reverse complement of  t0t1t2t3.  The
              adjacent sequences are checked up to a limit of ADAPTER_MATCH base pairs. If such a
              match is found then the auxiliary field ah is set to 1 and a3 is used to store  the
              length of the suspected adapter sequence.

       The following key=value pairs can be given at the program start:

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If  libmaus  has  been compiled with support for igzip (see https://software.intel.com/en-
       us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-
       data) then an additional valid value is

       11:    igzip compression

       verbose=<1>: Valid values are

       1:     print progress report on standard error

       0:     do not print progress report

       mod=<1048576>:  if  verbose=1  then  this  sets  the frequency of progress reports, i.e. a
       report is given for each mod'th input read/alignment

       adaptersbam=<>: file name of the BAM file containing the list  of  adapter  used  for  the
       adapter  matching  described above under list. The program contains an internal list which
       is used if this key is not given.

       SEED_LENGTH=<12>: length of the seed used for detecting overlaps in overlap based matching
       (see overlap above, default value is 12 base pairs).

       PCT_MISMATCH=<10>:  percentage  of  mismatches  allowed  for  overlap  matching. This only
       includes the overlap, not the suspected attached adapter sequence. The  default  value  is
       10.

       MAX_SEED_MISMATCHES=<SEED_LENGTH*PCT_MISMATCH>:  maximum  number  of mismatches allowed in
       the seed. By default this value is computed as SEED_LENGTH*PCT_MISMATCH.

       MIN_OVERLAP=<32>: minimum length of overlap  for  overlap  matching  in  base  pairs  (see
       above). The default value is 32.

       ADAPTER_MATCH=<12>: maximum number of base pairs to check for matching adapters in overlap
       based matching. The default value is 12.

       adpmatchminscore=<16> minimum score for list based adapter matching  (see  above,  default
       value is 16)

       adpmatchminfrac=<0.75>  minimum  fraction  of  adapter  sequence which needs to match (see
       above, default value is 0.75=75%)

       adpmatchminpfrac=<0.8> minimum fraction of overlap for adapter list matching  (see  above,
       default value is 0.8=80%)

       clip=<0>  clip  the  adapters  off  and  move  the  corresponding  sequence part to the qs
       auxiliary field and the corresponding quality string part to the qq auxiliary field

       reflen=<3000000000> length of reference sequence/genome

       pA=<0.25> relative frequency of base A in reference sequence/genome

       pC=<0.25> relative frequency of base C in reference sequence/genome

       pG=<0.25> relative frequency of base G in reference sequence/genome

       pT=<0.25> relative frequency of base T in reference sequence/genome

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

       Copyright © 2009-2013 German Tischler,  ©  2011-2013  Genome  Research  Limited.   License
       GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
       This  is free software: you are free to change and redistribute it.  There is NO WARRANTY,
       to the extent permitted by law.