Ubuntu Manpage: AdapterRemoval - Remove adapters from sequences in either single end or paired end

Provided by: adapterremoval_2.2.2-1_amd64

NAME

       AdapterRemoval - Remove adapters from sequences in either single end or paired end
       experiments

SYNOPSIS

       AdapterRemoval --file1 filenames [--file2 filenames] [--interleaved] [--interleaved-input]
       [--interleaved-output] [--combined-output] [--basename filename] [--identify-adapters]
       [--trimns] [--maxns max] [--trimqualities] [--trimwindows length] [--minquality minimum]
       [--collapse] [--version] [--mm mismatchrate] [--minlength len] [--minalignmentlength len]
       [--qualitybase base] [--qualitybase-output base] [--shift num] [--adapter1 sequence]
       [--adapter2 sequence] [--adapter-list filename] [--barcode-list filename] [--barcode-mm
       num] [--barcode-mm-r1 num] [--barcode-mm-r2 num] [--demultiplex-only] [--output1 filename]
       [--output2 filename] [--singleton filename] [--outputcollapsed filename]
       [--outputcollapsedtruncated filename] [--discarded filename] [--settings filename] [--seed
       seed] [--gzip] [--gzip-level level] [--threads num] [--version] [--help]

DESCRIPTION

AdapterRemoval reads either one FASTQ file (single ended mode) or two FASTQ files (paired
ended mode). It removes the residual adapter sequence from the reads and optionally trims
Ns from the reads, and low qualities bases using the quality string, and collapses
overlapping paired ended mates into one read. Reads are discarded if the remaining genomic
part is too short, or if the read contains more than an (user specified) amount of
amigious nucleotides ('N'). These operations may be combined with simultaneous
demultiplexing. Alternatively, AdapterRemoval may attempt to reconstruct a consensus
adapter sequences from paired-ended data, in order to allow the identification of the
adapter sequences originally used, and thereby ensure proper trimming of these reads.

The reads and adapters are transformed to upper case for comparison. It is assumed that
the letter 'N' is used for an unknown nucleotide, but in case the program encounters a '.'
in the sequence, they will be treated as (and translated into) Ns. The program tries to
check for invalid input and / or nonsensical combinations of parameters but please report
strange behaviour, bugs and such to MikkelSch@gmail.com

If you use this program, please cite the paper:

Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming,
identification, and read merging. BMC Research Notes, 12;9(1):88
http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2

OPTIONS

       --file1 filename [...]
                Read FASTQ reads from one or more files. This contains either the single ended
                (SE) reads or, if paired ended, the mate 1 reads. If running in paired end mode,
                both --file1 and --file2 must be set. The files may optionally be gzip or bzip2
                compressed.

       --file2 filename [...]
                Read one or more FASTQ files containing mate 2 reads for a paired end run. If
                specified, --file1 must also be set. The files may optionally be gzip or bzip2
                compressed.

       --interleaved
                Enables --interleaved-input and --interleaved-output.

       --interleaved-input
                If set, input is expected to be a single FASTQ file specified using --file1, in
                which pairs of paired-end reads are listed one after each other (read1/1,
                read1/2, read2/1, read2/2, etc.).

       --interleaved-ouput
                If set, and AdapterRemoval is processing paired-end reads, retained pairs of
                reads are written to a single FASTQ file, one pair after each other (read1/1,
                read1/2, read2/1, read2/2, etc.). By default, this file is named
                basename.paired.truncated, but this may be changed using the --output1 option.

       --combined-output
                If set, all reads are written to the same file(s), specified by --output1 and
                --output2. Each read is further marked by either a "PASSED" or a "FAILED" flag,
                and any read that has been FAILED (including the mate for collapsed reads) are
                replaced with a single 'N' with Phred score 0. This option can be combined with
                --interleaved / --interleaved-output to write all reads to a single output file
                specified with --output1.

       --basename filename
                Determines the default filename for output files, unless overridden using the
                specific output file settings. For single-ended mode, the following filenames are
                used: basename.truncated, basename.discarded, and basename.settings. In paired
                end mode, the following filenames are used: basename.pair1.truncated,
                basename.pair2.truncated, basename.singleton.truncated, basename.discarded, and
                basename.settings. If collapsing of reads is enabled for paired ended mode, the
                following filenames are also used: basename.collapsed, and
                basename.collapsed.truncated. The default basename is your_output. If gzip
                compression is enabled, the extension ".gz" is added to all files but the
                filename.settings file, while the extension ".bz2" is used if bzip2 compression
                is enabled.

       --identify-adapters
                For paired ended reads only. In this mode, AdapterRemoval will attempt to
                reconstruct the adapter sequences used for a set of paired ended reads, by
                locating fully overlapping read-pairs, and generating a consensus sequence from
                the bases identified as adapter sequence. The minimum overlap is controlled by
                minalignmentlength. The values passed to the --adapter1 and --adapter2 command-
                line options are used for visual comparison with the consensus sequence, but
                otherwise not used in the consensus building.

       --trimns Remove stretches of Ns from the output reads in both the 5' and 3' end. If
                quality trimming is also enabled, stretches of mixed low-quality bases and/or Ns
                are trimmed.

       --maxns max
                If a read has more than max Ns after trimming, it is discarded (default is not to
                use).

       --trimqualities
                Remove consecutive stretches of low quality bases (threshold set by minquality)
                from both the 5' and 3' end of the reads. All bases with minquality or lower are
                trimmed. If trimming of Ns is also enabled, stretches of mixed low-quality bases
                and/or Ns are trimmed.

       --trimwindows length
                Remove low quality bases using a sliding window bases approach inspired by
                sickle:

                1. The new 5' is determined by locating the first window where both the average
                quality and the quality of the first base in the window is greater than
                minquality.

                2. The new 3' is located by sliding the first window right, until the average
                quality becomes less than or equal to minquality. The new 3' is placed at the
                last base in that window where the quality is greater than or equal to
                minquality.

                3. If no 5' position could be determined, the read is discarded.

                The value of length may be a number greater than or equal to 1, in which case
                that number (rounded down to the nearest whole number) is used as the window
                length, or it may be a value greater than or equal to zero. In the latter case,
                that number is multipled by the lenght of each read, to determine the window
                length. For example, a trimwindow value of 0.1 and a read length of 100 would
                result in 10 bp windows. If the resulting window length is zero or is greater
                than the current read length, then the read length is used instead.

       --minquality minimum
                Set the threshold for trimming low quality bases. Default is 2. The minimum can
                be set with or without the Phred quality base.

       --collapse
                In paired-end mode, if the two mates overlap, collapse the two reads into one
                read by merging the two and recalculating the quality scores. In single-end mode,
                this instead attempts to identify templates for which the entire sequence is
                available. In both cases, complete "collapsed" reads are written with a 'M_' name
                prefix, and "collapsed" reads which are trimmed due to quality settings are
                written with a 'MT_' name prefix. The overlap needs to be at least
                minalignmentlength nucleotides, with a maximum number of mismatches determined by
                mm.

       --mm mismatchrate
                The allowed fraction of mismatches allowed in the aligned region. If 0 <
                mismatchrate < 1, the rate is used directly. If mismatchrate > 1, the rate is set
                to 1/mismatchrate. The default setting is 3, corresponding to a maximum mismatch
                rate of 1/3.

       --minlength len
                The minimum length required after trimming and adapter removal. Reads shorter
                than len are discarded. Default is 15 nucleotides.

       --minalignmentlength len
                The minimum overlap between mate 1 and mate 2 before the reads are collapsed into
                one, when collapsing paired end reads, or when attempting to identify complete
                template sequences in single-end mode. Default is 11 nucleotides.

       --qualitybase base
                The base of the quality score - either '64' for Phred+Phred (i.e., Illumina 1.3+
                and 1.5+) or '33' for Phred+33 (Illumina 1.8+). In addition, the value 'solexa'
                may be used to specify reads with Solexa encoded scores. Default is 33.

       --qualitybase-output base
                The base of the quality score for reads written by AdapterRemoval - either '64'
                for Phred+Phred (i.e., Illumina 1.3+ and 1.5+) or '33' for Phred+33 (Illumina
                1.8+). In addition, the value 'solexa' may be used to specify reads with Solexa
                encoded scores. However, note that quality scores are represented using PHRED
                scores internally, and conversion to and from Solexa scores therefore result in a
                loss of information. The default corresponds to the value given for
                --qualitybase.

       --shift num
                To allow for missing bases in the 5' end of the read, the program can let the
                alignment slip num bases in the 5' end. This corresponds to starting the
                alignment maximum num nucleotides in read2 (for paired end) or the adapter (for
                single end). The default shift valule is 2.

       --adapter1 sequence
       --adapter2 sequence
                Specify the adapter sequences that you wish to trim. The Adapter #2 sequence is
                only used when trimming paired-ended data.

                The Adapter #1 and Adapter #2 sequences are expected to be found in the mate 1
                and the mate 2 reads respectively, while ignoring any difference in case and
                treating Ns as wildcards. The default sequences are

                Adapter #1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

                Adapter #2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

                Assuming these were the adapters used to generate our data, we should therefore
                see these in the FASTQ files:

                  $ grep -i "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC......ATCTCGTATGCCGTCTTCTGCTTG" file1.fq
                  B<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAAAACAAGAAT
                  CTGGAGTTCB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAA
                  GGB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>CAAATTGAAAACAC
                  ...

                  $ grep -i "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT" file2.fq
                  CB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAGAAAAACATCTTG
                  GAACTCCAGB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAATAGA
                  GAACTB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAACATAAGACCTA
                  ...

                Note that --adapter1 and --adapter2 replaces the --pcr[12] options of
                AdapterRemoval v1.x, for which the --pcr2 sequence was expected to be reverse
                complemented compared --adaper2. Using the --pcr[12] options is not recommended!

       --adapter-list filename
                Read one or more PCR sequences from a table. The first two columns (separated by
                whitespace) of each line in the file are expected to correspond to values passed
                to --adapter1 and --adapter2. In single ended mode, only column one is required.
                Lines starting with '#' are ignored. When multiple PCR sequences or sequence
                pairs are specified, AdapterRemoval will try each adapter (pair) listed in the
                table, and select the best aligning adapters for each read processed.

       --barcode-list filename
                Read a table of one or two fixed-length barcodes and perform demultiplexing of
                single or double indexed reads. The table is expected to contain 2 or 3 columns,
                the first of which represent the name of a given sample, and the second and third
                of which represent the mate 1 and (optionally) the mate 2 barcode sequence:

                    $ cat barcodes.txt
                    sample_1 ATGCGGA TGAATCT
                    sample_2 ATGGATT ATAGTGA
                    sample_7 CAAAACT TCGCTGC

                Results are written to ${basename}.${sample_name}.*, using the default names for
                other output files. A setting file with statistics is written for each sample at
                ${basename}.${sample_name}.settings, as is a setting file containing the
                demultiplexing statistics, at ${basename}.settings.

                When demultiplexing is used, the barcode identified for a given read is
                automatically added to the adapter sequence, in order to ensure that overlapping
                reads are correctly trimmed. The .settings file represents this by showing the
                reverse complemented) barcode sequence added to the --adapter1 and --adapter2
                sequences, followed by an underscore (shown here for barcodes pair ATGCGGA /
                TGAATCT):

                    [Adapter sequences]
                    Adapter1[0]: AGATTCA_AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
                    Adapter2[0]: TCCGCAT_AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

                Note that the sequence added to each adapter is the reverse complement of the
                barcode sequence of the other mate, as this sequence is expected to be found
                immediately before the adapter sequence.

       --barcode-mm num
                The maximum number of mismatches allowed for barcodes, when counting mismatches
                in both the mate 1 and mate 2 barcodes. In conjunction with the --barcode-mm-r1
                and --barcode-mm-r2, this allows fine-grained control over the barcode
                comparisons. If not set, this value is set to the sum of --barcode-mm-r1 and
                --barcode-mm-r2.

                For example, to allow one mismatch in either the mate 1 or the mate 2 barcode,
                one might specify --barcode-mm 1; to allow a mismatch in the mate 1 and / or the
                mate 2 barcode, one might specify --barcode-mm 2 --barcode-mm-r1 1
                --barcode-mm-r2 1, and so on.

       --barcode-mm-r1 num
                The maximum number of mismatches allowed in the mate 1 barcode; if not set, this
                number is equal to the value of --barcode-mm. This number cannot exceed the value
                specified for --barcode-mm.

       --barcode-mm-r2 num
                The maximum number of mismatches allowed in the mate 1 barcode; if not set, this
                number is equal to the value of --barcode-mm. This number cannot exceed the value
                specified for --barcode-mm.

       --demultiplex-only num
                Only carry out demultiplexing, using the list of barcodes supplied using
                --barcode-list. Note that trimming and filtering options do not apply to this
                mode of operation.

       --output1 file
       --output2 file
       --singleton file
       --outputcollapsed file
       --outputcollapsedtruncated file
       --discarded file
       --settings file
                Instead of using the default behaviour where the program automatically generates
                the files needed, you can specify where each type of output is directed. This can
                be files, pipes etc. thus making it possible to easily zip the output on the fly.
                Default files are still generated if nothing else is specified.

                The types of output in single end mode are:

                output1 contains the trimmed reads.

                The types of output in paired end mode are:

                output1 contains trimmed mate1 reads.

                output2 contains trimmed mate2 reads.

                singleton contains all reads where the other mate in a pair is discarded.

                outputcollapsed Contains pairs that overlap and are collapsed into a single read
                (if --collapse is used). The reads are renamed with an @M_ prefix.

                outputcollapsedtruncated Contains pairs that overlap and are collapsed into a
                single read (if --collapse is used) and have further been trimmed due to Ns
                and/or low quality nucleotides in the 5' or 3' end. The reads are renamed with an
                @MT_ prefix.

                The types of output in both single end and paired end mode are:

                discarded contains all reads that are discarded by the program.

                settings contains information on the parameters used in the run as well as
                overall statistics on the reads after trimming such as average length.

       --seed seed
                When collaping reads at positions where the two reads differ, and the quality of
                the bases are identical, AdapterRemoval will select a random base. This option
                specifies the seed used for the random number generator used by AdapterRemoval.
                This value is also written to the settings file. Note that setting the seed is
                not reliable in multithreaded mode, since the order of operations is non-
                deterministic.

       --gzip   If set, all FASTQ files written by AdapterRemoval will be gzip compressed using
                the compression level specified using --gzip-level. The extension ".gz" is added
                to files for which no filename was given on the commandline.

       --gzip-level
                Determines the compression level used when gzip'ing FASTQ files. Must be a value
                in the range 0 to 9, with 0 disabling compression and 9 being the best
                compression. Defaults to 6.

       --bzip2  If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using
                the compression level specified using --bzip2-level. The extension ".bz2" is
                added to files for which no filename was given on the commandline.

       --bzip2-level
                Determines the compression level used when bzip2'ing FASTQ files. Must be a value
                in the range 1 to 9, with 9 being the best compression. Defaults to 9.

       --threads
                Maximum number of threads to use for current run; note that file IO is single-
                threaded, regardless of the number of threads specified.

       --version
                Output the version of the program.

       --help   Output the summary of available command-line options, including default values
                and/or values specified on the command-line.

EXAMPLE: Single end experiment

       The following command removes adapters from the file reads_1.fq trims both Ns and low
       quality bases from the reads, and gzip compresses the resulting files. The --basename
       option is used to specify the prefix for output files.

           $ AdapterRemoval --file1 reads_1.fq --basename output_single --trimns --trimqualities --gzip

       Since --gzip and --basename is specified, the trimmed FASTQ reads are written to
       output_single.truncated.gz, the dicarded FASTQ reads are written to
       output_single.discarded.gz, and settings and summary statistics are written to
       output_single.settings.

       Note that by default, AdapterRemoval does not require a minimum number of bases
       overlapping with the adapter sequence, before reads are trimmed. This may result in an
       excess of very short (1 - 3 bp) 3' fragments being falsely identified as adapter
       sequences, and trimmed. This behavior may be changed using the --minadapteroverlap option,
       which allows the specification of a minimum number of bases (excluding Ns) that must be
       aligned to carry trimming. For example, use --minadapteroverlap 3 to require an overlap of
       at least 3 bp.

EXAMPLE: Paired end experiment.

       The following command removes adapters from a paired-end reads, where the mate 1 and mate
       2 reads are kept in files reads_1.fq and reads_2.fq, respectively. The reads are trimmed
       for both Ns and low quality bases, and overlapping reads (at least 11 nucleotides, per
       default) are merged (collapsed):

           $ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_paired --trimns --trimqualities --collapse

       This command generates the files output_paired.pair1.truncated and
       output_paired.pair2.truncated, which contain trimmed pairs of reads which were not
       collapsed, output_paired.singleton.truncated containing reads where one mate was
       discarded, output_paired.collapsed containing merged reads, and
       output_paired.collapsed.truncated containing merged reads that have been trimmed due to
       the --trimns or --trimqualities options. Finally, the output_paired.discarded and
       output_paired.settings files correspond to those of the single-end run.

EXAMPLE: Interleaved FASTQ reads.

       AdapterRemoval is able to read and write paired-end reads stored in a single, so-called
       interleaved FASTQ file (one pair at a time, first mate 1, then mate 2). This is
       accomplished by specifying the location of the file using --file1 and *also* setting the
       --interleaved command-line option:

           $ AdapterRemoval --interleaved --file1 interleaved.fq --basename output_interleaved

       Other than taking just a single input file, this mode operates almost exactly like paired
       end trimming (as described above); the mode differs only in that paired reads are not
       written to a 'pair1' and a 'pair2' file, but instead these are instead written to a
       single, interleaved file, named 'paired'. The location of this file is controlled using
       the --output1 option. Enabling either reading or writing of interleaved FASTQ files, both
       not both, can be accomplished by specifying the either of the --interleaved-input and
       --interleaved-output options, both of which are enabled by the --interleaved option.

EXAMPLE: Different quality score encodings.

       By default, AdapterRemoval expects the quality scores in FASTQ reads to be Phred+33
       encoded, meaning that the error probabilities are encoded as (char)('!' - 10 * log10(p)).
       Most data will be encoded using Phred+33, but Phred+64 and 'Solexa' encoded quality scores
       are also supported. These are selected by specifying the --qualitybase command-line option
       (specifying either '33', '64', or 'solexa')::

           $ AdapterRemoval --qualitybase 64 --file1 reads_q64.fq --basename phred_64_encoded

       By default, reads are written using the *same* encoding as the input. If a different
       encoding is desired, this may be accomplished using the --qualitybase-output option:

           $ AdapterRemoval --qualitybase 64 --qualitybase-output 33 --file1 reads_q64.fq --basename phred_33_encoded

       Note furthermore that AdapterRemoval by default only expects quality scores in the range 0
       - 41 (or -5 to 41 in the case of Solexa encoded scores). If input data using a different
       maximum quality score is to be processed, or if the desired maximum quality score of
       collapsed reads is greater than 41, then this limit may be increased using the
       --qualitymax option:

           $ AdapterRemoval --qualitymax 50 --file1 reads_1.fq --file2 reads_2.fq --collapsed --basename collapsed_q50

       For a detailed overview of Phred encoding schemes currently and previously in use, see
       e.g. the Wikipedia article on the subject:
       https://en.wikipedia.org/wiki/FASTQ_format#Encoding

EXAMPLE: Paired end reads containing multiple, distinct adapter pairs.

       It is possible to trim data that contains multiple adapter pairs, by providing a one or
       two-column table containing possible adapter combinations (for single-end and paired-end
       trimming, respectively; see e.g. examples/adapters.txt):

           $ cat adapters.txt
           AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTG    AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
           AAACTTGCTCTGTGCCCGCTCCGTATGTCACAACAGTGCGTGTATCACCTCAATGCAGGACTCA    GATCGGGAGTAATTTGGAGGCAGTAGTTCGTCGAAACTCGGAGCGTCTTTAGCAGGAG
           CTAATTTGCCGTAGCGACGTACTTCAGCCTCCAGGAATTGGACCCTTACGCACACGCATTCATG    TACCGTGAAAGGTGCGCTTAGTGGCATATGCGTTAAGAGCTAGGTAACGGTCTGGAGG
           GTTCATACGACGACGACCAATGGCACACTTATCCGGTACTTGCGTTTCAATGCGCATGCCCCAT    TAAGAAACTCGGAGTTTGGCCTGCGAGGTAGCTTGGGTGTTATGAAGAACGGCATGCG
           CCATGCCCCGAAGATTCCTATACCCTTAAGGTCGCAATTGTTCGAGTAAGCTGTACGCGCCCAT    GTTGCATTGACCCGAAGGGCTCGATGTTTAGGGAGGTCAGAAGTTGAGCGGGTTCAAA

       This table is then specified using the --adapter-list option:

           $ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_multi --trimns --trimqualities --collapse --adapter-list adapters.txt

       The resulting .summary file contains an overview of how frequently each adapter (pair) was
       used.

       Note that in the case of paired-end adapters, AdapterRemoval considers only the
       combinations of adapters specified in the table, one combination per row. For single-end
       trimming, only the first column of the table file is required, and the list may therefore
       take the form of a file containing one sequence per line.

EXAMPLE: Identifying adapter sequences from paired-ended reads

       If we did not know the adapter sequences for paired-end reads, AdapterRemoval may be used
       to generate a consensus adapter sequence based on fragments identified as belonging to the
       adapters through pairwise alignments of the reads, provided that the data set contains
       only a single adpater sequence (not counting differences in index sequences).

       In the following example, the identified adapters corresponds to the default adapter
       sequences with a poly-A tail resulting from sequencing past the end of the insert +
       templates. It is not necessary to specify this tail when using the --adapter1 or
       --adapter2 command-line options. The characters shown under each of the consensus
       sequences represented the phred-encoded fraction of bases identical to the consensus base,
       with adapter 1 containing the index CACCTA:

           $ AdapterRemoval --identify-adapters --file1 reads_1.fq --file2 reads_2.fq

           Attemping to identify adapter sequences ...
           Processed a total of 1,000 reads in 0.0s; 129,000 reads per second on average ...
              Found 394 overlapping pairs ...
              Of which 119 contained adapter sequence(s) ...

           Printing adapter sequences, including poly-A tails:
             --adapter1:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
                          ||||||||||||||||||||||||||||||||||******||||||||||||||||||||||||
              Consensus:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAAAAA
                Quality:  55200522544444/4411330333330222222/1.1.1.1111100-00000///..+....--*-)),,+++++++**(('%%%$

               Top 5 most common 9-bp 5'-kmers:
                       1: AGATCGGAA = 96.00% (96)
                       2: AGATGGGAA =  1.00% (1)
                       3: AGCTCGGAA =  1.00% (1)
                       4: AGAGCGAAA =  1.00% (1)
                       5: AGATCGGGA =  1.00% (1)

             --adapter2:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
                          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Consensus:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                Quality:  525555555144141441430333303.2/22-2/-1..11111110--00000///..+....--*-),,,+++++++**(%'%%%$

               Top 5 most common 9-bp 5'-kmers:
                       1: AGATCGGAA = 100.00% (100)

       No files are generated from running the adapter identification step.

       The consensus sequences inferred are compared to those specified using the --adapter1 and
       --adapter2 command-line options, or with the default values for these if no values have
       been given (as in this case). Pipes (|) indicate matches between the provided sequences
       and the consensus sequence, and "*" indicate the presence of unspecified bases (Ns).

EXAMPLE: Demultiplexing of paired end reads

       As of version 2.1, AdapterRemoval supports simultanious demultiplexing and adapter
       trimming; demultiplexing is carried out using a simple comparison between the specified
       barcode sequences and the first N bases of the reads, corresponding to the length of the
       barcodes. Reads identified as containing a specific barcode or pair of barcodes are then
       trimmed using adapter sequences including these barcodes.

       Demultiplexing is enabled by creating a table of barcodes, the first column of which
       species the sample name (using characters [a-zA-Z0-9_]) and the second and (optional)
       third columns specifies the mate 1 and mate 2 barcode sequences.

       For example, a table of barcodes from a double-indexed run might be as follows (see
       examples/barcodes.txt):

           $ cat barcodes.txt
           sample_1 ATGCGGA TGAATCT
           sample_2 ATGGATT ATAGTGA
           sample_7 CAAAACT TCGCTGC

       In the case of single-read reads, only the first two columns are required. AdapterRemoval
       is invoked with the --barcode-list option, specifying the path to this table:

           $ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_dumux --barcode-list barcodes.txt

       This generates a set of output files for each sample specified in the barcode table, using
       the basename (--basename) as the prefix, followed by a dot and the sample name, followed
       by a dot and the default name for a given file type. For example, the output files for
       sample_2 would be

           output_demux.sample_2.discarded
           output_demux.sample_2.pair1.truncated
           output_demux.sample_2.pair2.truncated
           output_demux.sample_2.settings
           output_demux.sample_2.singleton.truncated

       The settings files generated for each sample summarizes the reads for that sample only; in
       addition, a basename.settings file is generated which summarizes the number and proportion
       of reads identified as belonging to each sample.

       The maximum number of mismatches allowed when comparing barocdes is controlled using the
       options --barcode-mmI, --barcode-mm-r1, and --barcode-mm-r2, which specify the maximum
       number of mismatches total, and the maximum number of mismatches for the mate 1 and mate 2
       barcodes respectively. Thus, if mm_1(i) and mm_2(i) represents the number of mismatches
       observed for barcode-pair i for a given pair of reads, these options require that

          1. mm_1(i) <= --barcode-mm-r1
          2. mm_2(i) <= --barcode-mm-r2
          3. mm_1(i) + mm_2(i) <= --barcode-mm

       As of version 2.2, AdapterRemoval can furthermore be used to demultiplex reads without
       carrying out other forms of read trimming. This is accomplished by specifying the
       --demultiplex-only option:

           $ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_only_demux --barcode-list barcodes.txt --demultiplex-only

       Trimming and filtering related options to not apply to this mode ("TRIMMING SETTINGS" when
       viewing 'AdapterRemoval --help'), but compression (--gzip, --bzip2), multi-threading
       (--threads), interleaving (--interleaved, etc.) and other such options may be used in
       conjunction with --demultiplex-only.

EXIT STATUS

       0 if everything worked as planned, a non-zero value otherwise.

REPORTING BUGS

       Report bugs to Mikkel Schubert <MikkelSch@gmail.com>.

       Your bugreport should always include:

       • The output of AdapterRemoval --version. If you are not running the latest released
         version you should specify why you believe the problem is not fixed in that version.

       • A complete example that others can run that shows the problem.

AUTHOR

       Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.

       Parts of the manual was written by Ole Tange <tange@binf.ku.dk>.

       Parts of the manual was written by Mikkel Schubert <MikkelSch@gmail.com>.

LICENSE

       Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.

       Copyright (C) 2014 Mikkel Schubert <MikkelSch@gmail.com>.

       This program is free software; you can redistribute it and/or modify it under the terms of
       the GNU General Public License as published by the Free Software Foundation; either
       version 3 of the License, or at your option any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
       without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       See the GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along with this program.
       If not, see <http://www.gnu.org/licenses/>.