Ubuntu Manpage: AdapterRemoval - Fast short-read adapter trimming and processing

Provided by: adapterremoval_2.3.1-3build1_amd64

NAME

       AdapterRemoval - Fast short-read adapter trimming and processing

SYNOPSIS

       AdapterRemoval [options…] –file1 <filenames> [–file2 <filenames>]

DESCRIPTION

       AdapterRemoval  removes residual adapter sequences from single-end (SE) or paired-end (PE)
       FASTQ reads, optionally trimming Ns and low qualities bases and/or collapsing  overlapping
       paired-end  mates  into  one  read.  Low quality reads are filtered based on the resulting
       length and the number of ambigious nucleotides (‘N’)  present  following  trimming.  These
       operations  may  be  combined with simultaneous demultiplexing using 5’ barcode sequences.
       Alternatively, AdapterRemoval may attempt to reconstruct  a  consensus  adapter  sequences
       from  paired-end  data,  in  order  to  allow  the identification of the adapter sequences
       originally used.

       If you use this program, please cite the paper:
          Schubert, Lindgreen, and Orlando (2016). AdapterRemoval  v2:  rapid  adapter  trimming,
          identification, and read merging. BMC Research Notes, 12;9(1):88

          http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2

       For detailed documentation, please see
          http://adapterremoval.readthedocs.io/en/v2.2.3/

OPTIONS

       --help Display summary of command-line options.

       --version
              Print the version string.

       --file1 filename [filenames...]
              Read  FASTQ reads from one or more files, either uncompressed, bzip2 compressed, or
              gzip compressed. This contains either the single-end (SE) reads or, if  paired-end,
              the  mate  1 reads. If running in paired-end mode, both --file1 and --file2 must be
              set. See the primary documentation for a list of supported formats.

       --file2 filename [filenames...]
              Read one or more FASTQ files containing mate 2  reads  for  a  paired-end  run.  If
              specified, --file1 must also be set.

       --identify-adapters
              Attempt  to  build  a  consensus  adapter  sequence from fully overlapping pairs of
              paired-end reads. The minimum overlap is controlled  by  --minalignmentlength.  The
              result  will  be  compared  with the values set using --adapter1 and --adapter2. No
              trimming is performed in this mode. Default is off.

       --threads n
              Maximum number of threads. Defaults to 1.

   FASTQ options
       --qualitybase base
              The Phred quality scores encoding used in input reads - either  ‘64’  for  Phred+64
              (Illumina  1.3+  and  1.5+)  or ‘33’ for Phred+33 (Illumina 1.8+). In addition, the
              value ‘solexa’ may be used to specify reads with Solexa encoded scores. Default  is
              33.

       --qualitybase-output base
              The base of the quality score for reads written by AdapterRemoval - either ‘64’ for
              Phred+64 (i.e., Illumina 1.3+ and 1.5+) or ‘33’ for Phred+33  (Illumina  1.8+).  In
              addition,  the  value  ‘solexa’  may  be  used to specify reads with Solexa encoded
              scores. However, note that  quality  scores  are  represented  using  Phred  scores
              internally,  and conversion to and from Solexa scores therefore result in a loss of
              information. The default corresponds to the value given for --qualitybase.

       --qualitymax base
              Specifies the maximum Phred score expected in input files, and  used  when  writing
              output  files.  Possible values are 0 to 93 for Phred+33 encoded files, and 0 to 62
              for Phred+64 encoded files. Defaults to 41.

       --mate-separator separator
              Character separating the mate number (1 or 2) from the read name in FASTQ  records.
              Defaults to ‘/’.

       --interleaved
              Enables --interleaved-input and --interleaved-output.

       --interleaved-input
              If  set, input is expected to be a interleaved FASTQ files specified using --file1,
              in which pairs of reads are written one after the  other  (e.g.  read1/1,  read1/2,
              read2/1, read2/2, etc.).

       --interleaved-ouput
              Write  paired-end  reads to a single file, interleaving mate 1 and mate 2 reads. By
              default, this file is named basename.paired.truncated,  but  this  may  be  changed
              using the --output1 option.

       --combined-output
              Write  all reads into the files specified by --output1 and --output2. The sequences
              of reads discarded due to quality filters or  read  merging  are  replaced  with  a
              single   ‘N’   with   Phred   score   0.   This   option   can   be  combined  with
              --interleaved-output to write PE reads to  a  single  output  file  specified  with
              --output1.

   Output file options
       --basename filename
              Prefix  used  for  the naming output files, unless these names have been overridden
              using the corresponding command-line option (see below).

       --settings file
              Output file containing information on the parameters used in the  run  as  well  as
              overall   statistics   on   the   reads   after   trimming.   Default  filename  is
              ‘basename.settings’.

       --output1 file
              Output   file   containing   trimmed   mate1    reads.    Default    filename    is
              ‘basename.pair1.truncated’   for   paired-end   reads,   ‘basename.truncated’   for
              single-end reads, and ‘basename.paired.truncated’ for interleaved paired-end reads.

       --output2 file
              Output file containing trimmed  mate  2  reads  when  --interleaved-output  is  not
              enabled. Default filename is ‘basename.pair2.truncated’ in paired-end mode.

       --singleton file
              Output file to which containing paired reads for which the mate has been discarded.
              Default filename is ‘basename.singleton.truncated’.

       --outputcollapsed file
              If –collapsed is set, contains overlapping mate-pairs which have been merged into a
              single  read  (PE  mode) or reads for which the adapter was identified by a minimum
              overlap, indicating that the entire template molecule is  present.  This  does  not
              include  which  have  subsequently  been  trimmed  due  to low-quality or ambiguous
              nucleotides. Default filename is ‘basename.collapsed’

       --outputcollapsedtruncated file
              Collapsed reads (see –outputcollapsed) which  were  trimmed  due  the  presence  of
              low-quality      or      ambiguous     nucleotides.     Default     filename     is
              ‘basename.collapsed.truncated’.

       --discarded file
              Contains reads discarded due to  the  –minlength,  –maxlength  or  –maxns  options.
              Default filename is ‘basename.discarded’.

   Output compression options
       --gzip If set, all FASTQ files written by AdapterRemoval will be gzip compressed using the
              compression level specified using --gzip-level. The extension  “.gz”  is  added  to
              files for which no filename was given on the command-line. Defaults to off.

       --gzip-level level
              Determines the compression level used when gzip’ing FASTQ files. Must be a value in
              the range 0 to 9, with 0 disabling compression and 9 being  the  best  compression.
              Defaults to 6.

       --bzip2
              If  set,  all  FASTQ files written by AdapterRemoval will be bzip2 compressed using
              the compression level specified using --bzip2-level. The extension “.bz2” is  added
              to files for which no filename was given on the command-line. Defaults to off.

       --bzip2-level level
              Determines  the  compression level used when bzip2’ing FASTQ files. Must be a value
              in the range 1 to 9, with 9 being the best compression. Defaults to 9.

   FASTQ trimming options
       --adapter1 adapter
              Adapter sequence expected to be found in mate 1 reads, specified in read direction.
              For a detailed description of how to provide the appropriate adapter sequences, see
              the   “Adapters”   section   of    the    online    documentation.    Default    is
              AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG.

       --adapter2 adapter
              Adapter sequence expected to be found in mate 2 reads, specified in read direction.
              For a detailed description of how to provide the appropriate adapter sequences, see
              the    “Adapters”    section    of    the    online   documentation.   Default   is
              AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT.

       --adapter-list filename
              Read one or more adapter sequences from a table. The first two  columns  (separated
              by whitespace) of each line in the file are expected to correspond to values passed
              to –adapter1 and –adapter2. In single-end mode, only column one is required.  Lines
              starting  with  ‘#’  are  ignored.  When  multiple  rows  are  found  in the table,
              AdapterRemoval will try each adapter (pair), and select the best aligning  adapters
              for each FASTQ read processed.

       --minadapteroverlap length
              In  single-end  mode,  reads  are  only trimmed if the overlap between read and the
              adapter is at least X bases long, not counting ambiguous nucleotides (N);  this  is
              independent   of   the  --minalignmentlength  when  using  --collapse,  allowing  a
              conservative selection of putative  complete  inserts  in  single-end  mode,  while
              ensuring that all possible adapter contamination is trimmed. The default is 0.

       --mm mismatchrate
              The  allowed  fraction of mismatches allowed in the aligned region. If the value is
              less than 1, then the value is used directly. If `--mismatchrate is greater than 1,
              the  rate  is  set  to  1  / --mismatchrate. The default setting is 3 when trimming
              adapters, corresponding to a maximum mismatch  rate  of  1/3,  and  10  when  using
              --identify-adapters.

       --shift n
              To  allow  for  missing  bases  in  the 5’ end of the read, the program can let the
              alignment slip --shift bases in the  5’  end.  This  corresponds  to  starting  the
              alignment  maximum  --shift  nucleotides into read2 (for paired-end) or the adapter
              (for single-end). The default is 2.

       --trim5p n [n]
              Trim the 5’ of reads by a fixed amount after removing adapters, but before carrying
              out  quality  based trimming. Specify one value to trim mate 1 and mate 2 reads the
              same amount, or two values separated  by  a  space  to  trim  each  mate  different
              amounts. Off by default.

       --trim3p n [n]
              Trim the 3’ of reads by a fixed amount. See --trim5p.

       --trimns
              Trim consecutive Ns from the 5’ and 3’ termini. If quality trimming is also enabled
              (--trimqualities), then stretches of mixed low-quality bases and/or Ns are trimmed.

       --maxns n
              Discard reads containing more than --max  ambiguous  bases  (‘N’)  after  trimming.
              Default is 1000.

       --trimqualities
              Trim  consecutive  stretches  of  low quality bases (threshold set by --minquality)
              from the 5’ and 3’ termini. If trimming of Ns  is  also  enabled  (--trimns),  then
              stretches of mixed low-quality bases and Ns are trimmed.

       --trimwindows window_size
              Trim  low  quality  bases  using a sliding window based approach inspired by sickle
              with the given window size. See the “Window based quality trimming” section of  the
              manual page for a description of this algorithm.

       --minquality minimum
              Set  the  threshold  for  trimming  low  quality  bases  using  --trimqualities and
              --trimwindows. Default is 2.

       --preserve5p
              If set, bases at the 5p will not  be  trimmed  by  --trimns,  --trimqualities,  and
              --trimwindows.  Collapsed  reads  will  not  be quality trimmed when this option is
              enabled.

       --minlength length
              Reads shorter than this length are discarded following trimming. Defaults to 15.

       --maxlength length
              Reads longer than  this  length  are  discarded  following  trimming.  Defaults  to
              4294967295.

   FASTQ merging options
       --collapse
              In  paired-end  mode,  merge  overlapping  mates  into a single and recalculate the
              quality scores. In single-end mode, attempt to identify  templates  for  which  the
              entire sequence is available. In both cases, complete “collapsed” reads are written
              with a ‘M_’ name prefix, and “collapsed” reads which are  trimmed  due  to  quality
              settings  are  written  with  a ‘MT_’ name prefix. The overlap needs to be at least
              --minalignmentlength nucleotides, with a maximum number of mismatches determined by
              --mm.

       --minalignmentlength length
              The  minimum  overlap between mate 1 and mate 2 before the reads are collapsed into
              one, when collapsing paired-end reads, or  when  attempting  to  identify  complete
              template sequences in single-end mode. Default is 11.

       --seed seed
              When  collaping  reads  at positions where the two reads differ, and the quality of
              the bases are identical, AdapterRemoval will select  a  random  base.  This  option
              specifies  the  seed  used  for the random number generator used by AdapterRemoval.
              This value is also written to the settings file. Note that setting the seed is  not
              reliable in multithreaded mode, since the order of operations is non-deterministic.

       --deterministic
              Enable  deterministic mode; currently only affects –collapse, different overlapping
              bases with equal quality are set to N quality 0, instead of being randomly sampled.

   FASTQ demultiplexing options
       --barcode-list filename
              Perform demultiplxing using table of one or two fixed-length barcodes for SE or  PE
              reads.  The  table  is  expected  to  contain  2  or  3 columns, the first of which
              represent the name of a given sample, and the second and third of  which  represent
              the  mate  1  and  (optionally)  the  mate  2  barcode  sequence.  For  a  detailed
              description, see the “Demultiplexing” section of the online documentation.

       --barcode-mm n

       Maximum number of mismatches allowed when counting mismatches in both the mate 1  and  the
       mate 2 barcode for paired reads.

       --barcode-mm-r1 n
              Maximum number of mismatches allowed for the mate 1 barcode; if not set, this value
              is equal to the --barcode-mm value; cannot be higher than the --barcode-mm value.

       --barcode-mm-r2 n
              Maximum number of mismatches allowed for the mate 2 barcode; if not set, this value
              is equal to the --barcode-mm value; cannot be higher than the --barcode-mm value.

       --demultiplex-only
              Only   carry   out   demultiplexing  using  the  list  of  barcodes  supplied  with
              –barcode-list. No other processing is done.

WINDOW BASED QUALITY TRIMMING

       As of v2.2.2, AdapterRemoval implements sliding window based  approach  to  quality  based
       base-trimming  inspired  by  sickle.  If  window_size  is greater than or equal to 1, that
       number is used as the window size for all reads. If window_size is a number  greater  than
       or  equal to 0 and less than 1, then that number is multiplied by the length of individual
       reads to determine the window size. If the window length is zero or is  greater  than  the
       current read length, then the read length is used instead.

       Reads are trimmed as follows for a given window size:

          1. The new 5’ is determined by locating the first window where both the average quality
             and the quality of the first base in the window is greater than --minquality.

          2. The new 3’ is located by sliding the first window right, until the  average  quality
             becomes less than or equal to --minquality. The new 3’ is placed at the last base in
             that window where the quality is greater than or equal to --minquality.

          3. If no 5’ position could be determined, the read is discarded.

EXIT STATUS

       AdapterRemoval exists with status 0 if the program ran succesfully, and  with  a  non-zero
       exit code if any errors were encountered. Do not use the output from AdapterRemoval if the
       program returned a non-zero exit code!

REPORTING BUGS

       Please report any bugs using the AdapterRemoval issue-tracker:

       https://github.com/MikkelSchubert/adapterremoval/issues

LICENSE

       This program is free software; you can redistribute it and/or modify it under the terms of
       the  GNU  General  Public  License  as  published  by the Free Software Foundation; either
       version 3 of the License, or at your option any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;
       without  even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       See the GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along with this program.
       If not, see <http://www.gnu.org/licenses/>.

AUTHOR

       Mikkel Schubert; Stinus Lindgreen

COPYRIGHT

       2017, Mikkel Schubert; Stinus Lindgreen