Ubuntu Manpage: AdapterRemoval - Remove adapters from sequences in either single end or paired end experiments

Provided by: adapterremoval_2.2.2-1_amd64

NAME

       AdapterRemoval - Remove adapters from sequences in either single end or paired end experiments

SYNOPSIS

       AdapterRemoval --file1 filenames [--file2 filenames] [--interleaved] [--interleaved-input]
       [--interleaved-output] [--combined-output] [--basename filename] [--identify-adapters] [--trimns]
       [--maxns max] [--trimqualities] [--trimwindows length] [--minquality minimum] [--collapse] [--version]
       [--mm mismatchrate] [--minlength len] [--minalignmentlength len] [--qualitybase base]
       [--qualitybase-output base] [--shift num] [--adapter1 sequence] [--adapter2 sequence] [--adapter-list
       filename] [--barcode-list filename] [--barcode-mm num] [--barcode-mm-r1 num] [--barcode-mm-r2 num]
       [--demultiplex-only] [--output1 filename] [--output2 filename] [--singleton filename] [--outputcollapsed
       filename] [--outputcollapsedtruncated filename] [--discarded filename] [--settings filename] [--seed
       seed] [--gzip] [--gzip-level level] [--threads num] [--version] [--help]

DESCRIPTION

AdapterRemoval reads either one FASTQ file (single ended mode) or two FASTQ files (paired ended mode). It
removes the residual adapter sequence from the reads and optionally trims Ns from the reads, and low
qualities bases using the quality string, and collapses overlapping paired ended mates into one read.
Reads are discarded if the remaining genomic part is too short, or if the read contains more than an
(user specified) amount of amigious nucleotides ('N'). These operations may be combined with simultaneous
demultiplexing. Alternatively, AdapterRemoval may attempt to reconstruct a consensus adapter sequences
from paired-ended data, in order to allow the identification of the adapter sequences originally used,
and thereby ensure proper trimming of these reads.

The reads and adapters are transformed to upper case for comparison. It is assumed that the letter 'N' is
used for an unknown nucleotide, but in case the program encounters a '.' in the sequence, they will be
treated as (and translated into) Ns. The program tries to check for invalid input and / or nonsensical
combinations of parameters but please report strange behaviour, bugs and such to MikkelSch@gmail.com

If you use this program, please cite the paper:

Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and
read merging. BMC Research Notes, 12;9(1):88
http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2

OPTIONS

       --file1 filename [...]
                Read FASTQ reads from one or more files. This contains either the single ended (SE) reads or, if
                paired  ended, the mate 1 reads. If running in paired end mode, both --file1 and --file2 must be
                set. The files may optionally be gzip or bzip2 compressed.

       --file2 filename [...]
                Read one or more FASTQ files containing mate 2 reads for a paired end run. If specified, --file1
                must also be set. The files may optionally be gzip or bzip2 compressed.

       --interleaved
                Enables --interleaved-input and --interleaved-output.

       --interleaved-input
                If set, input is expected to be a single FASTQ file specified using --file1, in which  pairs  of
                paired-end reads are listed one after each other (read1/1, read1/2, read2/1, read2/2, etc.).

       --interleaved-ouput
                If  set,  and AdapterRemoval is processing paired-end reads, retained pairs of reads are written
                to a single FASTQ file, one pair after each other (read1/1, read1/2, read2/1, read2/2, etc.). By
                default, this file is named  basename.paired.truncated,  but  this  may  be  changed  using  the
                --output1 option.

       --combined-output
                If  set,  all  reads are written to the same file(s), specified by --output1 and --output2. Each
                read is further marked by either a "PASSED" or a "FAILED" flag,  and  any  read  that  has  been
                FAILED  (including the mate for collapsed reads) are replaced with a single 'N' with Phred score
                0. This option can be combined with --interleaved / --interleaved-output to write all reads to a
                single output file specified with --output1.

       --basename filename
                Determines the default filename for output files, unless overridden using  the  specific  output
                file  settings.  For  single-ended  mode,  the following filenames are used: basename.truncated,
                basename.discarded, and basename.settings. In paired end mode, the following filenames are used:
                basename.pair1.truncated,        basename.pair2.truncated,         basename.singleton.truncated,
                basename.discarded,  and  basename.settings.  If collapsing of reads is enabled for paired ended
                mode,    the    following    filenames    are     also     used:     basename.collapsed,     and
                basename.collapsed.truncated.  The  default  basename  is  your_output.  If  gzip compression is
                enabled, the extension ".gz" is added to all files but the  filename.settings  file,  while  the
                extension ".bz2" is used if bzip2 compression is enabled.

       --identify-adapters
                For  paired  ended  reads  only.  In  this  mode, AdapterRemoval will attempt to reconstruct the
                adapter sequences used for a set of paired ended reads,  by  locating  fully  overlapping  read-
                pairs,  and  generating  a consensus sequence from the bases identified as adapter sequence. The
                minimum overlap is controlled by minalignmentlength. The values passed  to  the  --adapter1  and
                --adapter2  command-line options are used for visual comparison with the consensus sequence, but
                otherwise not used in the consensus building.

       --trimns Remove stretches of Ns from the output reads in both the 5' and 3' end. If quality  trimming  is
                also enabled, stretches of mixed low-quality bases and/or Ns are trimmed.

       --maxns max
                If a read has more than max Ns after trimming, it is discarded (default is not to use).

       --trimqualities
                Remove consecutive stretches of low quality bases (threshold set by minquality) from both the 5'
                and  3'  end  of the reads. All bases with minquality or lower are trimmed. If trimming of Ns is
                also enabled, stretches of mixed low-quality bases and/or Ns are trimmed.

       --trimwindows length
                Remove low quality bases using a sliding window bases approach inspired by sickle:

                1. The new 5' is determined by locating the first window where both the average quality and  the
                quality of the first base in the window is greater than minquality.

                2.  The  new  3' is located by sliding the first window right, until the average quality becomes
                less than or equal to minquality. The new 3' is placed at the last base in that window where the
                quality is greater than or equal to minquality.

                3. If no 5' position could be determined, the read is discarded.

                The value of length may be a number greater than or equal  to  1,  in  which  case  that  number
                (rounded  down  to  the nearest whole number) is used as the window length, or it may be a value
                greater than or equal to zero. In the latter case, that number is multipled  by  the  lenght  of
                each  read,  to  determine  the window length. For example, a trimwindow value of 0.1 and a read
                length of 100 would result in 10 bp windows. If the  resulting  window  length  is  zero  or  is
                greater than the current read length, then the read length is used instead.

       --minquality minimum
                Set  the  threshold for trimming low quality bases. Default is 2. The minimum can be set with or
                without the Phred quality base.

       --collapse
                In paired-end mode, if the two mates overlap, collapse the two reads into one  read  by  merging
                the  two  and  recalculating  the  quality  scores. In single-end mode, this instead attempts to
                identify templates for  which  the  entire  sequence  is  available.  In  both  cases,  complete
                "collapsed"  reads  are written with a 'M_' name prefix, and "collapsed" reads which are trimmed
                due to quality settings are written with a 'MT_' name prefix. The overlap needs to be  at  least
                minalignmentlength nucleotides, with a maximum number of mismatches determined by mm.

       --mm mismatchrate
                The  allowed  fraction of mismatches allowed in the aligned region. If 0 < mismatchrate < 1, the
                rate is used directly. If mismatchrate > 1, the rate  is  set  to  1/mismatchrate.  The  default
                setting is 3, corresponding to a maximum mismatch rate of 1/3.

       --minlength len
                The  minimum  length  required  after  trimming  and adapter removal. Reads shorter than len are
                discarded. Default is 15 nucleotides.

       --minalignmentlength len
                The minimum overlap between mate 1 and mate 2 before the reads  are  collapsed  into  one,  when
                collapsing  paired  end  reads,  or  when  attempting to identify complete template sequences in
                single-end mode. Default is 11 nucleotides.

       --qualitybase base
                The base of the quality score - either '64' for Phred+Phred (i.e., Illumina 1.3+  and  1.5+)  or
                '33'  for Phred+33 (Illumina 1.8+). In addition, the value 'solexa' may be used to specify reads
                with Solexa encoded scores. Default is 33.

       --qualitybase-output base
                The base of the quality score for reads written by AdapterRemoval - either '64' for  Phred+Phred
                (i.e.,  Illumina  1.3+  and  1.5+)  or '33' for Phred+33 (Illumina 1.8+). In addition, the value
                'solexa' may be used to specify reads with Solexa encoded scores.  However,  note  that  quality
                scores  are  represented using PHRED scores internally, and conversion to and from Solexa scores
                therefore result in a loss of information. The  default  corresponds  to  the  value  given  for
                --qualitybase.

       --shift num
                To allow for missing bases in the 5' end of the read, the program can let the alignment slip num
                bases in the 5' end. This corresponds to starting the alignment maximum num nucleotides in read2
                (for paired end) or the adapter (for single end). The default shift valule is 2.

       --adapter1 sequence
       --adapter2 sequence
                Specify  the  adapter sequences that you wish to trim. The Adapter #2 sequence is only used when
                trimming paired-ended data.

                The Adapter #1 and Adapter #2 sequences are expected to be found in the mate 1 and  the  mate  2
                reads  respectively,  while  ignoring  any  difference in case and treating Ns as wildcards. The
                default sequences are

                Adapter #1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

                Adapter #2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

                Assuming these were the adapters used to generate our data, we should therefore see these in the
                FASTQ files:

                  $ grep -i "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC......ATCTCGTATGCCGTCTTCTGCTTG" file1.fq
                  B<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAAAACAAGAAT
                  CTGGAGTTCB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAA
                  GGB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>CAAATTGAAAACAC
                  ...

                  $ grep -i "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT" file2.fq
                  CB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAGAAAAACATCTTG
                  GAACTCCAGB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAATAGA
                  GAACTB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAACATAAGACCTA
                  ...

                Note that --adapter1 and --adapter2 replaces the --pcr[12] options of AdapterRemoval  v1.x,  for
                which  the --pcr2 sequence was expected to be reverse complemented compared --adaper2. Using the
                --pcr[12] options is not recommended!

       --adapter-list filename
                Read one or more PCR sequences from a table. The first two columns (separated by whitespace)  of
                each  line in the file are expected to correspond to values passed to --adapter1 and --adapter2.
                In single ended mode, only column one is required. Lines starting with  '#'  are  ignored.  When
                multiple  PCR  sequences  or  sequence pairs are specified, AdapterRemoval will try each adapter
                (pair) listed in the table, and select the best aligning adapters for each read processed.

       --barcode-list filename
                Read a table of one or two fixed-length barcodes and perform demultiplexing of single or  double
                indexed reads. The table is expected to contain 2 or 3 columns, the first of which represent the
                name  of a given sample, and the second and third of which represent the mate 1 and (optionally)
                the mate 2 barcode sequence:

                    $ cat barcodes.txt
                    sample_1 ATGCGGA TGAATCT
                    sample_2 ATGGATT ATAGTGA
                    sample_7 CAAAACT TCGCTGC

                Results are written to ${basename}.${sample_name}.*, using the default names  for  other  output
                files.    A    setting    file    with    statistics    is    written   for   each   sample   at
                ${basename}.${sample_name}.settings,  as  is  a  setting  file  containing  the   demultiplexing
                statistics, at ${basename}.settings.

                When  demultiplexing  is used, the barcode identified for a given read is automatically added to
                the adapter sequence, in order to ensure that  overlapping  reads  are  correctly  trimmed.  The
                .settings  file  represents  this by showing the reverse complemented) barcode sequence added to
                the --adapter1 and --adapter2 sequences, followed by an underscore (shown here for barcodes pair
                ATGCGGA / TGAATCT):

                    [Adapter sequences]
                    Adapter1[0]: AGATTCA_AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
                    Adapter2[0]: TCCGCAT_AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

                Note that the sequence added to each adapter is the reverse complement of the  barcode  sequence
                of  the  other  mate,  as  this  sequence is expected to be found immediately before the adapter
                sequence.

       --barcode-mm num
                The maximum number of mismatches allowed for barcodes, when counting mismatches in both the mate
                1 and mate 2 barcodes. In conjunction with the --barcode-mm-r1 and --barcode-mm-r2, this  allows
                fine-grained  control  over the barcode comparisons. If not set, this value is set to the sum of
                --barcode-mm-r1 and --barcode-mm-r2.

                For example, to allow one mismatch in either the mate 1 or the mate 2 barcode, one might specify
                --barcode-mm 1; to allow a mismatch in the mate 1 and / or the mate 2 barcode, one might specify
                --barcode-mm 2 --barcode-mm-r1 1 --barcode-mm-r2 1, and so on.

       --barcode-mm-r1 num
                The maximum number of mismatches allowed in the mate 1 barcode; if not set, this number is equal
                to the value of --barcode-mm. This number cannot exceed the value specified for --barcode-mm.

       --barcode-mm-r2 num
                The maximum number of mismatches allowed in the mate 1 barcode; if not set, this number is equal
                to the value of --barcode-mm. This number cannot exceed the value specified for --barcode-mm.

       --demultiplex-only num
                Only carry out demultiplexing, using the list of barcodes supplied  using  --barcode-list.  Note
                that trimming and filtering options do not apply to this mode of operation.

       --output1 file
       --output2 file
       --singleton file
       --outputcollapsed file
       --outputcollapsedtruncated file
       --discarded file
       --settings file
                Instead  of  using  the  default  behaviour  where the program automatically generates the files
                needed, you can specify where each type of output is directed. This can  be  files,  pipes  etc.
                thus  making  it possible to easily zip the output on the fly. Default files are still generated
                if nothing else is specified.

                The types of output in single end mode are:

                output1 contains the trimmed reads.

                The types of output in paired end mode are:

                output1 contains trimmed mate1 reads.

                output2 contains trimmed mate2 reads.

                singleton contains all reads where the other mate in a pair is discarded.

                outputcollapsed Contains pairs that overlap and are collapsed into a single read (if  --collapse
                is used). The reads are renamed with an @M_ prefix.

                outputcollapsedtruncated  Contains  pairs  that overlap and are collapsed into a single read (if
                --collapse is used) and have further been trimmed due to Ns and/or low  quality  nucleotides  in
                the 5' or 3' end. The reads are renamed with an @MT_ prefix.

                The types of output in both single end and paired end mode are:

                discarded contains all reads that are discarded by the program.

                settings contains information on the parameters used in the run as well as overall statistics on
                the reads after trimming such as average length.

       --seed seed
                When  collaping  reads at positions where the two reads differ, and the quality of the bases are
                identical, AdapterRemoval will select a random base. This option specifies the seed used for the
                random number generator used by AdapterRemoval. This value is also written to the settings file.
                Note that setting the seed is not reliable in multithreaded mode, since the order of  operations
                is non-deterministic.

       --gzip   If  set, all FASTQ files written by AdapterRemoval will be gzip compressed using the compression
                level specified using --gzip-level. The extension ".gz" is added to files for which no  filename
                was given on the commandline.

       --gzip-level
                Determines  the compression level used when gzip'ing FASTQ files. Must be a value in the range 0
                to 9, with 0 disabling compression and 9 being the best compression. Defaults to 6.

       --bzip2  If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using the compression
                level specified using --bzip2-level. The extension  ".bz2"  is  added  to  files  for  which  no
                filename was given on the commandline.

       --bzip2-level
                Determines the compression level used when bzip2'ing FASTQ files. Must be a value in the range 1
                to 9, with 9 being the best compression. Defaults to 9.

       --threads
                Maximum  number  of  threads  to  use  for  current  run;  note that file IO is single-threaded,
                regardless of the number of threads specified.

       --version
                Output the version of the program.

       --help   Output the summary of available command-line options, including  default  values  and/or  values
                specified on the command-line.

EXAMPLE: Single end experiment

       The  following command removes adapters from the file reads_1.fq trims both Ns and low quality bases from
       the reads, and gzip compresses the resulting files. The --basename option is used to specify  the  prefix
       for output files.

           $ AdapterRemoval --file1 reads_1.fq --basename output_single --trimns --trimqualities --gzip

       Since    --gzip   and   --basename   is   specified,   the   trimmed   FASTQ   reads   are   written   to
       output_single.truncated.gz, the dicarded FASTQ  reads  are  written  to  output_single.discarded.gz,  and
       settings and summary statistics are written to output_single.settings.

       Note  that  by  default,  AdapterRemoval  does not require a minimum number of bases overlapping with the
       adapter sequence, before reads are trimmed. This may result in an excess of very short  (1  -  3  bp)  3'
       fragments  being falsely identified as adapter sequences, and trimmed. This behavior may be changed using
       the --minadapteroverlap option, which allows the specification of a minimum number  of  bases  (excluding
       Ns)  that must be aligned to carry trimming. For example, use --minadapteroverlap 3 to require an overlap
       of at least 3 bp.

EXAMPLE: Paired end experiment.

       The following command removes adapters from a paired-end reads, where the mate 1 and  mate  2  reads  are
       kept  in files reads_1.fq and reads_2.fq, respectively. The reads are trimmed for both Ns and low quality
       bases, and overlapping reads (at least 11 nucleotides, per default) are merged (collapsed):

           $ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_paired --trimns --trimqualities --collapse

       This command generates the files output_paired.pair1.truncated and  output_paired.pair2.truncated,  which
       contain  trimmed  pairs  of  reads which were not collapsed, output_paired.singleton.truncated containing
       reads  where  one  mate   was   discarded,   output_paired.collapsed   containing   merged   reads,   and
       output_paired.collapsed.truncated  containing  merged reads that have been trimmed due to the --trimns or
       --trimqualities options. Finally, the output_paired.discarded and output_paired.settings files correspond
       to those of the single-end run.

EXAMPLE: Interleaved FASTQ reads.

       AdapterRemoval is able to read and write paired-end reads stored in a single, so-called interleaved FASTQ
       file (one pair at a time, first mate 1, then mate 2). This is accomplished by specifying the location  of
       the file using --file1 and *also* setting the --interleaved command-line option:

           $ AdapterRemoval --interleaved --file1 interleaved.fq --basename output_interleaved

       Other  than  taking  just a single input file, this mode operates almost exactly like paired end trimming
       (as described above); the mode differs only in that paired reads are not  written  to  a  'pair1'  and  a
       'pair2'  file,  but  instead these are instead written to a single, interleaved file, named 'paired'. The
       location of this file is controlled using the --output1 option. Enabling either  reading  or  writing  of
       interleaved  FASTQ  files,  both  not  both,  can  be  accomplished  by  specifying  the  either  of  the
       --interleaved-input and --interleaved-output options, both of which  are  enabled  by  the  --interleaved
       option.

EXAMPLE: Different quality score encodings.

       By default, AdapterRemoval expects the quality scores in FASTQ reads to be Phred+33 encoded, meaning that
       the  error  probabilities  are  encoded  as  (char)('!' - 10 * log10(p)). Most data will be encoded using
       Phred+33, but Phred+64 and 'Solexa' encoded quality scores are also  supported.  These  are  selected  by
       specifying the --qualitybase command-line option (specifying either '33', '64', or 'solexa')::

           $ AdapterRemoval --qualitybase 64 --file1 reads_q64.fq --basename phred_64_encoded

       By default, reads are written using the *same* encoding as the input. If a different encoding is desired,
       this may be accomplished using the --qualitybase-output option:

           $ AdapterRemoval --qualitybase 64 --qualitybase-output 33 --file1 reads_q64.fq --basename phred_33_encoded

       Note furthermore that AdapterRemoval by default only expects quality scores in the range 0 - 41 (or -5 to
       41  in the case of Solexa encoded scores). If input data using a different maximum quality score is to be
       processed, or if the desired maximum quality score of collapsed reads is greater than 41, then this limit
       may be increased using the --qualitymax option:

           $ AdapterRemoval --qualitymax 50 --file1 reads_1.fq --file2 reads_2.fq --collapsed --basename collapsed_q50

       For a detailed overview of Phred encoding schemes currently and previously in use, see e.g. the Wikipedia
       article on the subject: https://en.wikipedia.org/wiki/FASTQ_format#Encoding

EXAMPLE: Paired end reads containing multiple, distinct adapter pairs.

       It is possible to trim data that contains multiple adapter pairs, by providing a one or two-column  table
       containing  possible adapter combinations (for single-end and paired-end trimming, respectively; see e.g.
       examples/adapters.txt):

           $ cat adapters.txt
           AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTG    AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
           AAACTTGCTCTGTGCCCGCTCCGTATGTCACAACAGTGCGTGTATCACCTCAATGCAGGACTCA    GATCGGGAGTAATTTGGAGGCAGTAGTTCGTCGAAACTCGGAGCGTCTTTAGCAGGAG
           CTAATTTGCCGTAGCGACGTACTTCAGCCTCCAGGAATTGGACCCTTACGCACACGCATTCATG    TACCGTGAAAGGTGCGCTTAGTGGCATATGCGTTAAGAGCTAGGTAACGGTCTGGAGG
           GTTCATACGACGACGACCAATGGCACACTTATCCGGTACTTGCGTTTCAATGCGCATGCCCCAT    TAAGAAACTCGGAGTTTGGCCTGCGAGGTAGCTTGGGTGTTATGAAGAACGGCATGCG
           CCATGCCCCGAAGATTCCTATACCCTTAAGGTCGCAATTGTTCGAGTAAGCTGTACGCGCCCAT    GTTGCATTGACCCGAAGGGCTCGATGTTTAGGGAGGTCAGAAGTTGAGCGGGTTCAAA

       This table is then specified using the --adapter-list option:

           $ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_multi --trimns --trimqualities --collapse --adapter-list adapters.txt

       The resulting .summary file contains an overview of how frequently each adapter (pair) was used.

       Note that in the case of paired-end adapters, AdapterRemoval considers only the combinations of  adapters
       specified  in  the  table, one combination per row. For single-end trimming, only the first column of the
       table file is required, and the list may therefore take the form of a file containing  one  sequence  per
       line.

EXAMPLE: Identifying adapter sequences from paired-ended reads

       If  we  did not know the adapter sequences for paired-end reads, AdapterRemoval may be used to generate a
       consensus adapter sequence based on fragments identified as belonging to the  adapters  through  pairwise
       alignments of the reads, provided that the data set contains only a single adpater sequence (not counting
       differences in index sequences).

       In  the  following  example,  the identified adapters corresponds to the default adapter sequences with a
       poly-A tail resulting from sequencing past the end of the insert + templates.  It  is  not  necessary  to
       specify  this  tail  when  using  the --adapter1 or --adapter2 command-line options. The characters shown
       under each of the consensus sequences represented the phred-encoded fraction of bases  identical  to  the
       consensus base, with adapter 1 containing the index CACCTA:

           $ AdapterRemoval --identify-adapters --file1 reads_1.fq --file2 reads_2.fq

           Attemping to identify adapter sequences ...
           Processed a total of 1,000 reads in 0.0s; 129,000 reads per second on average ...
              Found 394 overlapping pairs ...
              Of which 119 contained adapter sequence(s) ...

           Printing adapter sequences, including poly-A tails:
             --adapter1:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
                          ||||||||||||||||||||||||||||||||||******||||||||||||||||||||||||
              Consensus:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAAAAA
                Quality:  55200522544444/4411330333330222222/1.1.1.1111100-00000///..+....--*-)),,+++++++**(('%%%$

               Top 5 most common 9-bp 5'-kmers:
                       1: AGATCGGAA = 96.00% (96)
                       2: AGATGGGAA =  1.00% (1)
                       3: AGCTCGGAA =  1.00% (1)
                       4: AGAGCGAAA =  1.00% (1)
                       5: AGATCGGGA =  1.00% (1)

             --adapter2:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
                          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Consensus:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                Quality:  525555555144141441430333303.2/22-2/-1..11111110--00000///..+....--*-),,,+++++++**(%'%%%$

               Top 5 most common 9-bp 5'-kmers:
                       1: AGATCGGAA = 100.00% (100)

       No files are generated from running the adapter identification step.

       The  consensus  sequences  inferred  are  compared to those specified using the --adapter1 and --adapter2
       command-line options, or with the default values for these if no values  have  been  given  (as  in  this
       case).  Pipes  (|)  indicate  matches  between the provided sequences and the consensus sequence, and "*"
       indicate the presence of unspecified bases (Ns).

EXAMPLE: Demultiplexing of paired end reads

       As  of  version  2.1,  AdapterRemoval  supports  simultanious  demultiplexing   and   adapter   trimming;
       demultiplexing  is  carried out using a simple comparison between the specified barcode sequences and the
       first N bases of the reads, corresponding to the length of the barcodes. Reads identified as containing a
       specific barcode or pair of barcodes are then trimmed using adapter sequences including these barcodes.

       Demultiplexing is enabled by creating a table of barcodes, the first column of which species  the  sample
       name (using characters [a-zA-Z0-9_]) and the second and (optional) third columns specifies the mate 1 and
       mate 2 barcode sequences.

       For   example,   a   table   of   barcodes   from   a   double-indexed  run  might  be  as  follows  (see
       examples/barcodes.txt):

           $ cat barcodes.txt
           sample_1 ATGCGGA TGAATCT
           sample_2 ATGGATT ATAGTGA
           sample_7 CAAAACT TCGCTGC

       In the case of single-read reads, only the first two columns are required. AdapterRemoval is invoked with
       the --barcode-list option, specifying the path to this table:

           $ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_dumux --barcode-list barcodes.txt

       This generates a set of output files for each sample specified in the barcode table, using  the  basename
       (--basename) as the prefix, followed by a dot and the sample name, followed by a dot and the default name
       for a given file type. For example, the output files for sample_2 would be

           output_demux.sample_2.discarded
           output_demux.sample_2.pair1.truncated
           output_demux.sample_2.pair2.truncated
           output_demux.sample_2.settings
           output_demux.sample_2.singleton.truncated

       The  settings  files  generated for each sample summarizes the reads for that sample only; in addition, a
       basename.settings file is generated which summarizes the number and proportion  of  reads  identified  as
       belonging to each sample.

       The  maximum  number  of  mismatches  allowed  when  comparing  barocdes  is controlled using the options
       --barcode-mmI, --barcode-mm-r1, and --barcode-mm-r2, which  specify  the  maximum  number  of  mismatches
       total,  and  the  maximum  number of mismatches for the mate 1 and mate 2 barcodes respectively. Thus, if
       mm_1(i) and mm_2(i) represents the number of mismatches observed for barcode-pair i for a given  pair  of
       reads, these options require that

          1. mm_1(i) <= --barcode-mm-r1
          2. mm_2(i) <= --barcode-mm-r2
          3. mm_1(i) + mm_2(i) <= --barcode-mm

       As of version 2.2, AdapterRemoval can furthermore be used to demultiplex reads without carrying out other
       forms of read trimming. This is accomplished by specifying the --demultiplex-only option:

           $ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_only_demux --barcode-list barcodes.txt --demultiplex-only

       Trimming  and  filtering  related  options  to  not  apply to this mode ("TRIMMING SETTINGS" when viewing
       'AdapterRemoval --help'), but compression (--gzip, --bzip2),  multi-threading  (--threads),  interleaving
       (--interleaved, etc.) and other such options may be used in conjunction with --demultiplex-only.

EXIT STATUS

       0 if everything worked as planned, a non-zero value otherwise.

REPORTING BUGS

       Report bugs to Mikkel Schubert <MikkelSch@gmail.com>.

       Your bugreport should always include:

       • The  output  of AdapterRemoval --version. If you are not running the latest released version you should
         specify why you believe the problem is not fixed in that version.

       • A complete example that others can run that shows the problem.

AUTHOR

       Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.

       Parts of the manual was written by Ole Tange <tange@binf.ku.dk>.

       Parts of the manual was written by Mikkel Schubert <MikkelSch@gmail.com>.

LICENSE

       Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.

       Copyright (C) 2014 Mikkel Schubert <MikkelSch@gmail.com>.

       This program is free software; you can redistribute it and/or modify  it  under  the  terms  of  the  GNU
       General  Public License as published by the Free Software Foundation; either version 3 of the License, or
       at your option any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;  without  even
       the  implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

       You should have received a copy of the GNU General Public License along with this program.  If  not,  see
       <http://www.gnu.org/licenses/>.