bionic (1) AdapterRemoval.1.gz

Provided by: adapterremoval_2.2.2-1_amd64 bug

NAME

       AdapterRemoval - Remove adapters from sequences in either single end or paired end experiments

SYNOPSIS

       AdapterRemoval --file1 filenames [--file2 filenames] [--interleaved] [--interleaved-input]
       [--interleaved-output] [--combined-output] [--basename filename] [--identify-adapters] [--trimns]
       [--maxns max] [--trimqualities] [--trimwindows length] [--minquality minimum] [--collapse] [--version]
       [--mm mismatchrate] [--minlength len] [--minalignmentlength len] [--qualitybase base]
       [--qualitybase-output base] [--shift num] [--adapter1 sequence] [--adapter2 sequence] [--adapter-list
       filename] [--barcode-list filename] [--barcode-mm num] [--barcode-mm-r1 num] [--barcode-mm-r2 num]
       [--demultiplex-only] [--output1 filename] [--output2 filename] [--singleton filename] [--outputcollapsed
       filename] [--outputcollapsedtruncated filename] [--discarded filename] [--settings filename] [--seed
       seed] [--gzip] [--gzip-level level] [--threads num] [--version] [--help]

DESCRIPTION

       AdapterRemoval reads either one FASTQ file (single ended mode) or two FASTQ files (paired ended mode). It
       removes the residual adapter sequence from the reads and optionally trims Ns from the reads, and low
       qualities bases using the quality string, and collapses overlapping paired ended mates into one read.
       Reads are discarded if the remaining genomic part is too short, or if the read contains more than an
       (user specified) amount of amigious nucleotides ('N'). These operations may be combined with simultaneous
       demultiplexing. Alternatively, AdapterRemoval may attempt to reconstruct a consensus adapter sequences
       from paired-ended data, in order to allow the identification of the adapter sequences originally used,
       and thereby ensure proper trimming of these reads.

       The reads and adapters are transformed to upper case for comparison. It is assumed that the letter 'N' is
       used for an unknown nucleotide, but in case the program encounters a '.' in the sequence, they will be
       treated as (and translated into) Ns. The program tries to check for invalid input and / or nonsensical
       combinations of parameters but please report strange behaviour, bugs and such to MikkelSch@gmail.com

       If you use this program, please cite the paper:

       Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and
       read merging. BMC Research Notes, 12;9(1):88
       http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2

OPTIONS

       --file1 filename [...]
                Read FASTQ reads from one or more files. This contains either the single ended (SE) reads or, if
                paired ended, the mate 1 reads. If running in paired end mode, both --file1 and --file2 must be
                set. The files may optionally be gzip or bzip2 compressed.

       --file2 filename [...]
                Read one or more FASTQ files containing mate 2 reads for a paired end run. If specified, --file1
                must also be set. The files may optionally be gzip or bzip2 compressed.

       --interleaved
                Enables --interleaved-input and --interleaved-output.

       --interleaved-input
                If set, input is expected to be a single FASTQ file specified using --file1, in which pairs of
                paired-end reads are listed one after each other (read1/1, read1/2, read2/1, read2/2, etc.).

       --interleaved-ouput
                If set, and AdapterRemoval is processing paired-end reads, retained pairs of reads are written
                to a single FASTQ file, one pair after each other (read1/1, read1/2, read2/1, read2/2, etc.). By
                default, this file is named basename.paired.truncated, but this may be changed using the
                --output1 option.

       --combined-output
                If set, all reads are written to the same file(s), specified by --output1 and --output2. Each
                read is further marked by either a "PASSED" or a "FAILED" flag, and any read that has been
                FAILED (including the mate for collapsed reads) are replaced with a single 'N' with Phred score
                0. This option can be combined with --interleaved / --interleaved-output to write all reads to a
                single output file specified with --output1.

       --basename filename
                Determines the default filename for output files, unless overridden using the specific output
                file settings. For single-ended mode, the following filenames are used: basename.truncated,
                basename.discarded, and basename.settings. In paired end mode, the following filenames are used:
                basename.pair1.truncated, basename.pair2.truncated, basename.singleton.truncated,
                basename.discarded, and basename.settings. If collapsing of reads is enabled for paired ended
                mode, the following filenames are also used: basename.collapsed, and
                basename.collapsed.truncated. The default basename is your_output. If gzip compression is
                enabled, the extension ".gz" is added to all files but the filename.settings file, while the
                extension ".bz2" is used if bzip2 compression is enabled.

       --identify-adapters
                For paired ended reads only. In this mode, AdapterRemoval will attempt to reconstruct the
                adapter sequences used for a set of paired ended reads, by locating fully overlapping read-
                pairs, and generating a consensus sequence from the bases identified as adapter sequence. The
                minimum overlap is controlled by minalignmentlength. The values passed to the --adapter1 and
                --adapter2 command-line options are used for visual comparison with the consensus sequence, but
                otherwise not used in the consensus building.

       --trimns Remove stretches of Ns from the output reads in both the 5' and 3' end. If quality trimming is
                also enabled, stretches of mixed low-quality bases and/or Ns are trimmed.

       --maxns max
                If a read has more than max Ns after trimming, it is discarded (default is not to use).

       --trimqualities
                Remove consecutive stretches of low quality bases (threshold set by minquality) from both the 5'
                and 3' end of the reads. All bases with minquality or lower are trimmed. If trimming of Ns is
                also enabled, stretches of mixed low-quality bases and/or Ns are trimmed.

       --trimwindows length
                Remove low quality bases using a sliding window bases approach inspired by sickle:

                1. The new 5' is determined by locating the first window where both the average quality and the
                quality of the first base in the window is greater than minquality.

                2. The new 3' is located by sliding the first window right, until the average quality becomes
                less than or equal to minquality. The new 3' is placed at the last base in that window where the
                quality is greater than or equal to minquality.

                3. If no 5' position could be determined, the read is discarded.

                The value of length may be a number greater than or equal to 1, in which case that number
                (rounded down to the nearest whole number) is used as the window length, or it may be a value
                greater than or equal to zero. In the latter case, that number is multipled by the lenght of
                each read, to determine the window length. For example, a trimwindow value of 0.1 and a read
                length of 100 would result in 10 bp windows. If the resulting window length is zero or is
                greater than the current read length, then the read length is used instead.

       --minquality minimum
                Set the threshold for trimming low quality bases. Default is 2. The minimum can be set with or
                without the Phred quality base.

       --collapse
                In paired-end mode, if the two mates overlap, collapse the two reads into one read by merging
                the two and recalculating the quality scores. In single-end mode, this instead attempts to
                identify templates for which the entire sequence is available. In both cases, complete
                "collapsed" reads are written with a 'M_' name prefix, and "collapsed" reads which are trimmed
                due to quality settings are written with a 'MT_' name prefix. The overlap needs to be at least
                minalignmentlength nucleotides, with a maximum number of mismatches determined by mm.

       --mm mismatchrate
                The allowed fraction of mismatches allowed in the aligned region. If 0 < mismatchrate < 1, the
                rate is used directly. If mismatchrate > 1, the rate is set to 1/mismatchrate. The default
                setting is 3, corresponding to a maximum mismatch rate of 1/3.

       --minlength len
                The minimum length required after trimming and adapter removal. Reads shorter than len are
                discarded. Default is 15 nucleotides.

       --minalignmentlength len
                The minimum overlap between mate 1 and mate 2 before the reads are collapsed into one, when
                collapsing paired end reads, or when attempting to identify complete template sequences in
                single-end mode. Default is 11 nucleotides.

       --qualitybase base
                The base of the quality score - either '64' for Phred+Phred (i.e., Illumina 1.3+ and 1.5+) or
                '33' for Phred+33 (Illumina 1.8+). In addition, the value 'solexa' may be used to specify reads
                with Solexa encoded scores. Default is 33.

       --qualitybase-output base
                The base of the quality score for reads written by AdapterRemoval - either '64' for Phred+Phred
                (i.e., Illumina 1.3+ and 1.5+) or '33' for Phred+33 (Illumina 1.8+). In addition, the value
                'solexa' may be used to specify reads with Solexa encoded scores. However, note that quality
                scores are represented using PHRED scores internally, and conversion to and from Solexa scores
                therefore result in a loss of information. The default corresponds to the value given for
                --qualitybase.

       --shift num
                To allow for missing bases in the 5' end of the read, the program can let the alignment slip num
                bases in the 5' end. This corresponds to starting the alignment maximum num nucleotides in read2
                (for paired end) or the adapter (for single end). The default shift valule is 2.

       --adapter1 sequence
       --adapter2 sequence
                Specify the adapter sequences that you wish to trim. The Adapter #2 sequence is only used when
                trimming paired-ended data.

                The Adapter #1 and Adapter #2 sequences are expected to be found in the mate 1 and the mate 2
                reads respectively, while ignoring any difference in case and treating Ns as wildcards. The
                default sequences are

                Adapter #1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

                Adapter #2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

                Assuming these were the adapters used to generate our data, we should therefore see these in the
                FASTQ files:

                  $ grep -i "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC......ATCTCGTATGCCGTCTTCTGCTTG" file1.fq
                  B<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAAAACAAGAAT
                  CTGGAGTTCB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAA
                  GGB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>CAAATTGAAAACAC
                  ...

                  $ grep -i "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT" file2.fq
                  CB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAGAAAAACATCTTG
                  GAACTCCAGB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAATAGA
                  GAACTB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAACATAAGACCTA
                  ...

                Note that --adapter1 and --adapter2 replaces the --pcr[12] options of AdapterRemoval v1.x, for
                which the --pcr2 sequence was expected to be reverse complemented compared --adaper2. Using the
                --pcr[12] options is not recommended!

       --adapter-list filename
                Read one or more PCR sequences from a table. The first two columns (separated by whitespace) of
                each line in the file are expected to correspond to values passed to --adapter1 and --adapter2.
                In single ended mode, only column one is required. Lines starting with '#' are ignored. When
                multiple PCR sequences or sequence pairs are specified, AdapterRemoval will try each adapter
                (pair) listed in the table, and select the best aligning adapters for each read processed.

       --barcode-list filename
                Read a table of one or two fixed-length barcodes and perform demultiplexing of single or double
                indexed reads. The table is expected to contain 2 or 3 columns, the first of which represent the
                name of a given sample, and the second and third of which represent the mate 1 and (optionally)
                the mate 2 barcode sequence:

                    $ cat barcodes.txt
                    sample_1 ATGCGGA TGAATCT
                    sample_2 ATGGATT ATAGTGA
                    sample_7 CAAAACT TCGCTGC

                Results are written to ${basename}.${sample_name}.*, using the default names for other output
                files. A setting file with statistics is written for each sample at
                ${basename}.${sample_name}.settings, as is a setting file containing the demultiplexing
                statistics, at ${basename}.settings.

                When demultiplexing is used, the barcode identified for a given read is automatically added to
                the adapter sequence, in order to ensure that overlapping reads are correctly trimmed. The
                .settings file represents this by showing the reverse complemented) barcode sequence added to
                the --adapter1 and --adapter2 sequences, followed by an underscore (shown here for barcodes pair
                ATGCGGA / TGAATCT):

                    [Adapter sequences]
                    Adapter1[0]: AGATTCA_AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
                    Adapter2[0]: TCCGCAT_AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

                Note that the sequence added to each adapter is the reverse complement of the barcode sequence
                of the other mate, as this sequence is expected to be found immediately before the adapter
                sequence.

       --barcode-mm num
                The maximum number of mismatches allowed for barcodes, when counting mismatches in both the mate
                1 and mate 2 barcodes. In conjunction with the --barcode-mm-r1 and --barcode-mm-r2, this allows
                fine-grained control over the barcode comparisons. If not set, this value is set to the sum of
                --barcode-mm-r1 and --barcode-mm-r2.

                For example, to allow one mismatch in either the mate 1 or the mate 2 barcode, one might specify
                --barcode-mm 1; to allow a mismatch in the mate 1 and / or the mate 2 barcode, one might specify
                --barcode-mm 2 --barcode-mm-r1 1 --barcode-mm-r2 1, and so on.

       --barcode-mm-r1 num
                The maximum number of mismatches allowed in the mate 1 barcode; if not set, this number is equal
                to the value of --barcode-mm. This number cannot exceed the value specified for --barcode-mm.

       --barcode-mm-r2 num
                The maximum number of mismatches allowed in the mate 1 barcode; if not set, this number is equal
                to the value of --barcode-mm. This number cannot exceed the value specified for --barcode-mm.

       --demultiplex-only num
                Only carry out demultiplexing, using the list of barcodes supplied using --barcode-list. Note
                that trimming and filtering options do not apply to this mode of operation.

       --output1 file
       --output2 file
       --singleton file
       --outputcollapsed file
       --outputcollapsedtruncated file
       --discarded file
       --settings file
                Instead of using the default behaviour where the program automatically generates the files
                needed, you can specify where each type of output is directed. This can be files, pipes etc.
                thus making it possible to easily zip the output on the fly. Default files are still generated
                if nothing else is specified.

                The types of output in single end mode are:

                output1 contains the trimmed reads.

                The types of output in paired end mode are:

                output1 contains trimmed mate1 reads.

                output2 contains trimmed mate2 reads.

                singleton contains all reads where the other mate in a pair is discarded.

                outputcollapsed Contains pairs that overlap and are collapsed into a single read (if --collapse
                is used). The reads are renamed with an @M_ prefix.

                outputcollapsedtruncated Contains pairs that overlap and are collapsed into a single read (if
                --collapse is used) and have further been trimmed due to Ns and/or low quality nucleotides in
                the 5' or 3' end. The reads are renamed with an @MT_ prefix.

                The types of output in both single end and paired end mode are:

                discarded contains all reads that are discarded by the program.

                settings contains information on the parameters used in the run as well as overall statistics on
                the reads after trimming such as average length.

       --seed seed
                When collaping reads at positions where the two reads differ, and the quality of the bases are
                identical, AdapterRemoval will select a random base. This option specifies the seed used for the
                random number generator used by AdapterRemoval. This value is also written to the settings file.
                Note that setting the seed is not reliable in multithreaded mode, since the order of operations
                is non-deterministic.

       --gzip   If set, all FASTQ files written by AdapterRemoval will be gzip compressed using the compression
                level specified using --gzip-level. The extension ".gz" is added to files for which no filename
                was given on the commandline.

       --gzip-level
                Determines the compression level used when gzip'ing FASTQ files. Must be a value in the range 0
                to 9, with 0 disabling compression and 9 being the best compression. Defaults to 6.

       --bzip2  If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using the compression
                level specified using --bzip2-level. The extension ".bz2" is added to files for which no
                filename was given on the commandline.

       --bzip2-level
                Determines the compression level used when bzip2'ing FASTQ files. Must be a value in the range 1
                to 9, with 9 being the best compression. Defaults to 9.

       --threads
                Maximum number of threads to use for current run; note that file IO is single-threaded,
                regardless of the number of threads specified.

       --version
                Output the version of the program.

       --help   Output the summary of available command-line options, including default values and/or values
                specified on the command-line.

EXAMPLE: Single end experiment

       The following command removes adapters from the file reads_1.fq trims both Ns and low quality bases from
       the reads, and gzip compresses the resulting files. The --basename option is used to specify the prefix
       for output files.

           $ AdapterRemoval --file1 reads_1.fq --basename output_single --trimns --trimqualities --gzip

       Since --gzip and --basename is specified, the trimmed FASTQ reads are written to
       output_single.truncated.gz, the dicarded FASTQ reads are written to output_single.discarded.gz, and
       settings and summary statistics are written to output_single.settings.

       Note that by default, AdapterRemoval does not require a minimum number of bases overlapping with the
       adapter sequence, before reads are trimmed. This may result in an excess of very short (1 - 3 bp) 3'
       fragments being falsely identified as adapter sequences, and trimmed. This behavior may be changed using
       the --minadapteroverlap option, which allows the specification of a minimum number of bases (excluding
       Ns) that must be aligned to carry trimming. For example, use --minadapteroverlap 3 to require an overlap
       of at least 3 bp.

EXAMPLE: Paired end experiment.

       The following command removes adapters from a paired-end reads, where the mate 1 and mate 2 reads are
       kept in files reads_1.fq and reads_2.fq, respectively. The reads are trimmed for both Ns and low quality
       bases, and overlapping reads (at least 11 nucleotides, per default) are merged (collapsed):

           $ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_paired --trimns --trimqualities --collapse

       This command generates the files output_paired.pair1.truncated and output_paired.pair2.truncated, which
       contain trimmed pairs of reads which were not collapsed, output_paired.singleton.truncated containing
       reads where one mate was discarded, output_paired.collapsed containing merged reads, and
       output_paired.collapsed.truncated containing merged reads that have been trimmed due to the --trimns or
       --trimqualities options. Finally, the output_paired.discarded and output_paired.settings files correspond
       to those of the single-end run.

EXAMPLE: Interleaved FASTQ reads.

       AdapterRemoval is able to read and write paired-end reads stored in a single, so-called interleaved FASTQ
       file (one pair at a time, first mate 1, then mate 2). This is accomplished by specifying the location of
       the file using --file1 and *also* setting the --interleaved command-line option:

           $ AdapterRemoval --interleaved --file1 interleaved.fq --basename output_interleaved

       Other than taking just a single input file, this mode operates almost exactly like paired end trimming
       (as described above); the mode differs only in that paired reads are not written to a 'pair1' and a
       'pair2' file, but instead these are instead written to a single, interleaved file, named 'paired'. The
       location of this file is controlled using the --output1 option. Enabling either reading or writing of
       interleaved FASTQ files, both not both, can be accomplished by specifying the either of the
       --interleaved-input and --interleaved-output options, both of which are enabled by the --interleaved
       option.

EXAMPLE: Different quality score encodings.

       By default, AdapterRemoval expects the quality scores in FASTQ reads to be Phred+33 encoded, meaning that
       the error probabilities are encoded as (char)('!' - 10 * log10(p)). Most data will be encoded using
       Phred+33, but Phred+64 and 'Solexa' encoded quality scores are also supported. These are selected by
       specifying the --qualitybase command-line option (specifying either '33', '64', or 'solexa')::

           $ AdapterRemoval --qualitybase 64 --file1 reads_q64.fq --basename phred_64_encoded

       By default, reads are written using the *same* encoding as the input. If a different encoding is desired,
       this may be accomplished using the --qualitybase-output option:

           $ AdapterRemoval --qualitybase 64 --qualitybase-output 33 --file1 reads_q64.fq --basename phred_33_encoded

       Note furthermore that AdapterRemoval by default only expects quality scores in the range 0 - 41 (or -5 to
       41 in the case of Solexa encoded scores). If input data using a different maximum quality score is to be
       processed, or if the desired maximum quality score of collapsed reads is greater than 41, then this limit
       may be increased using the --qualitymax option:

           $ AdapterRemoval --qualitymax 50 --file1 reads_1.fq --file2 reads_2.fq --collapsed --basename collapsed_q50

       For a detailed overview of Phred encoding schemes currently and previously in use, see e.g. the Wikipedia
       article on the subject: https://en.wikipedia.org/wiki/FASTQ_format#Encoding

EXAMPLE: Paired end reads containing multiple, distinct adapter pairs.

       It is possible to trim data that contains multiple adapter pairs, by providing a one or two-column table
       containing possible adapter combinations (for single-end and paired-end trimming, respectively; see e.g.
       examples/adapters.txt):

           $ cat adapters.txt
           AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTG    AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
           AAACTTGCTCTGTGCCCGCTCCGTATGTCACAACAGTGCGTGTATCACCTCAATGCAGGACTCA    GATCGGGAGTAATTTGGAGGCAGTAGTTCGTCGAAACTCGGAGCGTCTTTAGCAGGAG
           CTAATTTGCCGTAGCGACGTACTTCAGCCTCCAGGAATTGGACCCTTACGCACACGCATTCATG    TACCGTGAAAGGTGCGCTTAGTGGCATATGCGTTAAGAGCTAGGTAACGGTCTGGAGG
           GTTCATACGACGACGACCAATGGCACACTTATCCGGTACTTGCGTTTCAATGCGCATGCCCCAT    TAAGAAACTCGGAGTTTGGCCTGCGAGGTAGCTTGGGTGTTATGAAGAACGGCATGCG
           CCATGCCCCGAAGATTCCTATACCCTTAAGGTCGCAATTGTTCGAGTAAGCTGTACGCGCCCAT    GTTGCATTGACCCGAAGGGCTCGATGTTTAGGGAGGTCAGAAGTTGAGCGGGTTCAAA

       This table is then specified using the --adapter-list option:

           $ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_multi --trimns --trimqualities --collapse --adapter-list adapters.txt

       The resulting .summary file contains an overview of how frequently each adapter (pair) was used.

       Note that in the case of paired-end adapters, AdapterRemoval considers only the combinations of adapters
       specified in the table, one combination per row. For single-end trimming, only the first column of the
       table file is required, and the list may therefore take the form of a file containing one sequence per
       line.

EXAMPLE: Identifying adapter sequences from paired-ended reads

       If we did not know the adapter sequences for paired-end reads, AdapterRemoval may be used to generate a
       consensus adapter sequence based on fragments identified as belonging to the adapters through pairwise
       alignments of the reads, provided that the data set contains only a single adpater sequence (not counting
       differences in index sequences).

       In the following example, the identified adapters corresponds to the default adapter sequences with a
       poly-A tail resulting from sequencing past the end of the insert + templates. It is not necessary to
       specify this tail when using the --adapter1 or --adapter2 command-line options. The characters shown
       under each of the consensus sequences represented the phred-encoded fraction of bases identical to the
       consensus base, with adapter 1 containing the index CACCTA:

           $ AdapterRemoval --identify-adapters --file1 reads_1.fq --file2 reads_2.fq

           Attemping to identify adapter sequences ...
           Processed a total of 1,000 reads in 0.0s; 129,000 reads per second on average ...
              Found 394 overlapping pairs ...
              Of which 119 contained adapter sequence(s) ...

           Printing adapter sequences, including poly-A tails:
             --adapter1:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
                          ||||||||||||||||||||||||||||||||||******||||||||||||||||||||||||
              Consensus:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAAAAA
                Quality:  55200522544444/4411330333330222222/1.1.1.1111100-00000///..+....--*-)),,+++++++**(('%%%$

               Top 5 most common 9-bp 5'-kmers:
                       1: AGATCGGAA = 96.00% (96)
                       2: AGATGGGAA =  1.00% (1)
                       3: AGCTCGGAA =  1.00% (1)
                       4: AGAGCGAAA =  1.00% (1)
                       5: AGATCGGGA =  1.00% (1)

             --adapter2:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
                          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
              Consensus:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                Quality:  525555555144141441430333303.2/22-2/-1..11111110--00000///..+....--*-),,,+++++++**(%'%%%$

               Top 5 most common 9-bp 5'-kmers:
                       1: AGATCGGAA = 100.00% (100)

       No files are generated from running the adapter identification step.

       The consensus sequences inferred are compared to those specified using the --adapter1 and --adapter2
       command-line options, or with the default values for these if no values have been given (as in this
       case). Pipes (|) indicate matches between the provided sequences and the consensus sequence, and "*"
       indicate the presence of unspecified bases (Ns).

EXAMPLE: Demultiplexing of paired end reads

       As of version 2.1, AdapterRemoval supports simultanious demultiplexing and adapter trimming;
       demultiplexing is carried out using a simple comparison between the specified barcode sequences and the
       first N bases of the reads, corresponding to the length of the barcodes. Reads identified as containing a
       specific barcode or pair of barcodes are then trimmed using adapter sequences including these barcodes.

       Demultiplexing is enabled by creating a table of barcodes, the first column of which species the sample
       name (using characters [a-zA-Z0-9_]) and the second and (optional) third columns specifies the mate 1 and
       mate 2 barcode sequences.

       For example, a table of barcodes from a double-indexed run might be as follows (see
       examples/barcodes.txt):

           $ cat barcodes.txt
           sample_1 ATGCGGA TGAATCT
           sample_2 ATGGATT ATAGTGA
           sample_7 CAAAACT TCGCTGC

       In the case of single-read reads, only the first two columns are required. AdapterRemoval is invoked with
       the --barcode-list option, specifying the path to this table:

           $ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_dumux --barcode-list barcodes.txt

       This generates a set of output files for each sample specified in the barcode table, using the basename
       (--basename) as the prefix, followed by a dot and the sample name, followed by a dot and the default name
       for a given file type. For example, the output files for sample_2 would be

           output_demux.sample_2.discarded
           output_demux.sample_2.pair1.truncated
           output_demux.sample_2.pair2.truncated
           output_demux.sample_2.settings
           output_demux.sample_2.singleton.truncated

       The settings files generated for each sample summarizes the reads for that sample only; in addition, a
       basename.settings file is generated which summarizes the number and proportion of reads identified as
       belonging to each sample.

       The maximum number of mismatches allowed when comparing barocdes is controlled using the options
       --barcode-mmI, --barcode-mm-r1, and --barcode-mm-r2, which specify the maximum number of mismatches
       total, and the maximum number of mismatches for the mate 1 and mate 2 barcodes respectively. Thus, if
       mm_1(i) and mm_2(i) represents the number of mismatches observed for barcode-pair i for a given pair of
       reads, these options require that

          1. mm_1(i) <= --barcode-mm-r1
          2. mm_2(i) <= --barcode-mm-r2
          3. mm_1(i) + mm_2(i) <= --barcode-mm

       As of version 2.2, AdapterRemoval can furthermore be used to demultiplex reads without carrying out other
       forms of read trimming. This is accomplished by specifying the --demultiplex-only option:

           $ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_only_demux --barcode-list barcodes.txt --demultiplex-only

       Trimming and filtering related options to not apply to this mode ("TRIMMING SETTINGS" when viewing
       'AdapterRemoval --help'), but compression (--gzip, --bzip2), multi-threading (--threads), interleaving
       (--interleaved, etc.) and other such options may be used in conjunction with --demultiplex-only.

EXIT STATUS

       0 if everything worked as planned, a non-zero value otherwise.

REPORTING BUGS

       Report bugs to Mikkel Schubert <MikkelSch@gmail.com>.

       Your bugreport should always include:

       • The output of AdapterRemoval --version. If you are not running the latest released version you should
         specify why you believe the problem is not fixed in that version.

       • A complete example that others can run that shows the problem.

AUTHOR

       Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.

       Parts of the manual was written by Ole Tange <tange@binf.ku.dk>.

       Parts of the manual was written by Mikkel Schubert <MikkelSch@gmail.com>.

LICENSE

       Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.

       Copyright (C) 2014 Mikkel Schubert <MikkelSch@gmail.com>.

       This program is free software; you can redistribute it and/or modify it under the terms of the GNU
       General Public License as published by the Free Software Foundation; either version 3 of the License, or
       at your option any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
       the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

       You should have received a copy of the GNU General Public License along with this program.  If not, see
       <http://www.gnu.org/licenses/>.

SEE ALSO