Provided by: adapterremoval_2.2.2-1_amd64 

NAME
AdapterRemoval - Remove adapters from sequences in either single end or paired end experiments
SYNOPSIS
AdapterRemoval --file1 filenames [--file2 filenames] [--interleaved] [--interleaved-input]
[--interleaved-output] [--combined-output] [--basename filename] [--identify-adapters] [--trimns]
[--maxns max] [--trimqualities] [--trimwindows length] [--minquality minimum] [--collapse] [--version]
[--mm mismatchrate] [--minlength len] [--minalignmentlength len] [--qualitybase base]
[--qualitybase-output base] [--shift num] [--adapter1 sequence] [--adapter2 sequence] [--adapter-list
filename] [--barcode-list filename] [--barcode-mm num] [--barcode-mm-r1 num] [--barcode-mm-r2 num]
[--demultiplex-only] [--output1 filename] [--output2 filename] [--singleton filename] [--outputcollapsed
filename] [--outputcollapsedtruncated filename] [--discarded filename] [--settings filename] [--seed
seed] [--gzip] [--gzip-level level] [--threads num] [--version] [--help]
DESCRIPTION
AdapterRemoval reads either one FASTQ file (single ended mode) or two FASTQ files (paired ended mode). It
removes the residual adapter sequence from the reads and optionally trims Ns from the reads, and low
qualities bases using the quality string, and collapses overlapping paired ended mates into one read.
Reads are discarded if the remaining genomic part is too short, or if the read contains more than an
(user specified) amount of amigious nucleotides ('N'). These operations may be combined with simultaneous
demultiplexing. Alternatively, AdapterRemoval may attempt to reconstruct a consensus adapter sequences
from paired-ended data, in order to allow the identification of the adapter sequences originally used,
and thereby ensure proper trimming of these reads.
The reads and adapters are transformed to upper case for comparison. It is assumed that the letter 'N' is
used for an unknown nucleotide, but in case the program encounters a '.' in the sequence, they will be
treated as (and translated into) Ns. The program tries to check for invalid input and / or nonsensical
combinations of parameters but please report strange behaviour, bugs and such to MikkelSch@gmail.com
If you use this program, please cite the paper:
Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification, and
read merging. BMC Research Notes, 12;9(1):88
http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2
OPTIONS
--file1 filename [...]
Read FASTQ reads from one or more files. This contains either the single ended (SE) reads or, if
paired ended, the mate 1 reads. If running in paired end mode, both --file1 and --file2 must be
set. The files may optionally be gzip or bzip2 compressed.
--file2 filename [...]
Read one or more FASTQ files containing mate 2 reads for a paired end run. If specified, --file1
must also be set. The files may optionally be gzip or bzip2 compressed.
--interleaved
Enables --interleaved-input and --interleaved-output.
--interleaved-input
If set, input is expected to be a single FASTQ file specified using --file1, in which pairs of
paired-end reads are listed one after each other (read1/1, read1/2, read2/1, read2/2, etc.).
--interleaved-ouput
If set, and AdapterRemoval is processing paired-end reads, retained pairs of reads are written
to a single FASTQ file, one pair after each other (read1/1, read1/2, read2/1, read2/2, etc.). By
default, this file is named basename.paired.truncated, but this may be changed using the
--output1 option.
--combined-output
If set, all reads are written to the same file(s), specified by --output1 and --output2. Each
read is further marked by either a "PASSED" or a "FAILED" flag, and any read that has been
FAILED (including the mate for collapsed reads) are replaced with a single 'N' with Phred score
0. This option can be combined with --interleaved / --interleaved-output to write all reads to a
single output file specified with --output1.
--basename filename
Determines the default filename for output files, unless overridden using the specific output
file settings. For single-ended mode, the following filenames are used: basename.truncated,
basename.discarded, and basename.settings. In paired end mode, the following filenames are used:
basename.pair1.truncated, basename.pair2.truncated, basename.singleton.truncated,
basename.discarded, and basename.settings. If collapsing of reads is enabled for paired ended
mode, the following filenames are also used: basename.collapsed, and
basename.collapsed.truncated. The default basename is your_output. If gzip compression is
enabled, the extension ".gz" is added to all files but the filename.settings file, while the
extension ".bz2" is used if bzip2 compression is enabled.
--identify-adapters
For paired ended reads only. In this mode, AdapterRemoval will attempt to reconstruct the
adapter sequences used for a set of paired ended reads, by locating fully overlapping read-
pairs, and generating a consensus sequence from the bases identified as adapter sequence. The
minimum overlap is controlled by minalignmentlength. The values passed to the --adapter1 and
--adapter2 command-line options are used for visual comparison with the consensus sequence, but
otherwise not used in the consensus building.
--trimns Remove stretches of Ns from the output reads in both the 5' and 3' end. If quality trimming is
also enabled, stretches of mixed low-quality bases and/or Ns are trimmed.
--maxns max
If a read has more than max Ns after trimming, it is discarded (default is not to use).
--trimqualities
Remove consecutive stretches of low quality bases (threshold set by minquality) from both the 5'
and 3' end of the reads. All bases with minquality or lower are trimmed. If trimming of Ns is
also enabled, stretches of mixed low-quality bases and/or Ns are trimmed.
--trimwindows length
Remove low quality bases using a sliding window bases approach inspired by sickle:
1. The new 5' is determined by locating the first window where both the average quality and the
quality of the first base in the window is greater than minquality.
2. The new 3' is located by sliding the first window right, until the average quality becomes
less than or equal to minquality. The new 3' is placed at the last base in that window where the
quality is greater than or equal to minquality.
3. If no 5' position could be determined, the read is discarded.
The value of length may be a number greater than or equal to 1, in which case that number
(rounded down to the nearest whole number) is used as the window length, or it may be a value
greater than or equal to zero. In the latter case, that number is multipled by the lenght of
each read, to determine the window length. For example, a trimwindow value of 0.1 and a read
length of 100 would result in 10 bp windows. If the resulting window length is zero or is
greater than the current read length, then the read length is used instead.
--minquality minimum
Set the threshold for trimming low quality bases. Default is 2. The minimum can be set with or
without the Phred quality base.
--collapse
In paired-end mode, if the two mates overlap, collapse the two reads into one read by merging
the two and recalculating the quality scores. In single-end mode, this instead attempts to
identify templates for which the entire sequence is available. In both cases, complete
"collapsed" reads are written with a 'M_' name prefix, and "collapsed" reads which are trimmed
due to quality settings are written with a 'MT_' name prefix. The overlap needs to be at least
minalignmentlength nucleotides, with a maximum number of mismatches determined by mm.
--mm mismatchrate
The allowed fraction of mismatches allowed in the aligned region. If 0 < mismatchrate < 1, the
rate is used directly. If mismatchrate > 1, the rate is set to 1/mismatchrate. The default
setting is 3, corresponding to a maximum mismatch rate of 1/3.
--minlength len
The minimum length required after trimming and adapter removal. Reads shorter than len are
discarded. Default is 15 nucleotides.
--minalignmentlength len
The minimum overlap between mate 1 and mate 2 before the reads are collapsed into one, when
collapsing paired end reads, or when attempting to identify complete template sequences in
single-end mode. Default is 11 nucleotides.
--qualitybase base
The base of the quality score - either '64' for Phred+Phred (i.e., Illumina 1.3+ and 1.5+) or
'33' for Phred+33 (Illumina 1.8+). In addition, the value 'solexa' may be used to specify reads
with Solexa encoded scores. Default is 33.
--qualitybase-output base
The base of the quality score for reads written by AdapterRemoval - either '64' for Phred+Phred
(i.e., Illumina 1.3+ and 1.5+) or '33' for Phred+33 (Illumina 1.8+). In addition, the value
'solexa' may be used to specify reads with Solexa encoded scores. However, note that quality
scores are represented using PHRED scores internally, and conversion to and from Solexa scores
therefore result in a loss of information. The default corresponds to the value given for
--qualitybase.
--shift num
To allow for missing bases in the 5' end of the read, the program can let the alignment slip num
bases in the 5' end. This corresponds to starting the alignment maximum num nucleotides in read2
(for paired end) or the adapter (for single end). The default shift valule is 2.
--adapter1 sequence
--adapter2 sequence
Specify the adapter sequences that you wish to trim. The Adapter #2 sequence is only used when
trimming paired-ended data.
The Adapter #1 and Adapter #2 sequences are expected to be found in the mate 1 and the mate 2
reads respectively, while ignoring any difference in case and treating Ns as wildcards. The
default sequences are
Adapter #1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
Adapter #2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
Assuming these were the adapters used to generate our data, we should therefore see these in the
FASTQ files:
$ grep -i "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC......ATCTCGTATGCCGTCTTCTGCTTG" file1.fq
B<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAAAACAAGAAT
CTGGAGTTCB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>AAAAAAA
GGB<AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTG>CAAATTGAAAACAC
...
$ grep -i "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT" file2.fq
CB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAGAAAAACATCTTG
GAACTCCAGB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAAAAATAGA
GAACTB<AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT>CAAAAACATAAGACCTA
...
Note that --adapter1 and --adapter2 replaces the --pcr[12] options of AdapterRemoval v1.x, for
which the --pcr2 sequence was expected to be reverse complemented compared --adaper2. Using the
--pcr[12] options is not recommended!
--adapter-list filename
Read one or more PCR sequences from a table. The first two columns (separated by whitespace) of
each line in the file are expected to correspond to values passed to --adapter1 and --adapter2.
In single ended mode, only column one is required. Lines starting with '#' are ignored. When
multiple PCR sequences or sequence pairs are specified, AdapterRemoval will try each adapter
(pair) listed in the table, and select the best aligning adapters for each read processed.
--barcode-list filename
Read a table of one or two fixed-length barcodes and perform demultiplexing of single or double
indexed reads. The table is expected to contain 2 or 3 columns, the first of which represent the
name of a given sample, and the second and third of which represent the mate 1 and (optionally)
the mate 2 barcode sequence:
$ cat barcodes.txt
sample_1 ATGCGGA TGAATCT
sample_2 ATGGATT ATAGTGA
sample_7 CAAAACT TCGCTGC
Results are written to ${basename}.${sample_name}.*, using the default names for other output
files. A setting file with statistics is written for each sample at
${basename}.${sample_name}.settings, as is a setting file containing the demultiplexing
statistics, at ${basename}.settings.
When demultiplexing is used, the barcode identified for a given read is automatically added to
the adapter sequence, in order to ensure that overlapping reads are correctly trimmed. The
.settings file represents this by showing the reverse complemented) barcode sequence added to
the --adapter1 and --adapter2 sequences, followed by an underscore (shown here for barcodes pair
ATGCGGA / TGAATCT):
[Adapter sequences]
Adapter1[0]: AGATTCA_AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
Adapter2[0]: TCCGCAT_AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
Note that the sequence added to each adapter is the reverse complement of the barcode sequence
of the other mate, as this sequence is expected to be found immediately before the adapter
sequence.
--barcode-mm num
The maximum number of mismatches allowed for barcodes, when counting mismatches in both the mate
1 and mate 2 barcodes. In conjunction with the --barcode-mm-r1 and --barcode-mm-r2, this allows
fine-grained control over the barcode comparisons. If not set, this value is set to the sum of
--barcode-mm-r1 and --barcode-mm-r2.
For example, to allow one mismatch in either the mate 1 or the mate 2 barcode, one might specify
--barcode-mm 1; to allow a mismatch in the mate 1 and / or the mate 2 barcode, one might specify
--barcode-mm 2 --barcode-mm-r1 1 --barcode-mm-r2 1, and so on.
--barcode-mm-r1 num
The maximum number of mismatches allowed in the mate 1 barcode; if not set, this number is equal
to the value of --barcode-mm. This number cannot exceed the value specified for --barcode-mm.
--barcode-mm-r2 num
The maximum number of mismatches allowed in the mate 1 barcode; if not set, this number is equal
to the value of --barcode-mm. This number cannot exceed the value specified for --barcode-mm.
--demultiplex-only num
Only carry out demultiplexing, using the list of barcodes supplied using --barcode-list. Note
that trimming and filtering options do not apply to this mode of operation.
--output1 file
--output2 file
--singleton file
--outputcollapsed file
--outputcollapsedtruncated file
--discarded file
--settings file
Instead of using the default behaviour where the program automatically generates the files
needed, you can specify where each type of output is directed. This can be files, pipes etc.
thus making it possible to easily zip the output on the fly. Default files are still generated
if nothing else is specified.
The types of output in single end mode are:
output1 contains the trimmed reads.
The types of output in paired end mode are:
output1 contains trimmed mate1 reads.
output2 contains trimmed mate2 reads.
singleton contains all reads where the other mate in a pair is discarded.
outputcollapsed Contains pairs that overlap and are collapsed into a single read (if --collapse
is used). The reads are renamed with an @M_ prefix.
outputcollapsedtruncated Contains pairs that overlap and are collapsed into a single read (if
--collapse is used) and have further been trimmed due to Ns and/or low quality nucleotides in
the 5' or 3' end. The reads are renamed with an @MT_ prefix.
The types of output in both single end and paired end mode are:
discarded contains all reads that are discarded by the program.
settings contains information on the parameters used in the run as well as overall statistics on
the reads after trimming such as average length.
--seed seed
When collaping reads at positions where the two reads differ, and the quality of the bases are
identical, AdapterRemoval will select a random base. This option specifies the seed used for the
random number generator used by AdapterRemoval. This value is also written to the settings file.
Note that setting the seed is not reliable in multithreaded mode, since the order of operations
is non-deterministic.
--gzip If set, all FASTQ files written by AdapterRemoval will be gzip compressed using the compression
level specified using --gzip-level. The extension ".gz" is added to files for which no filename
was given on the commandline.
--gzip-level
Determines the compression level used when gzip'ing FASTQ files. Must be a value in the range 0
to 9, with 0 disabling compression and 9 being the best compression. Defaults to 6.
--bzip2 If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using the compression
level specified using --bzip2-level. The extension ".bz2" is added to files for which no
filename was given on the commandline.
--bzip2-level
Determines the compression level used when bzip2'ing FASTQ files. Must be a value in the range 1
to 9, with 9 being the best compression. Defaults to 9.
--threads
Maximum number of threads to use for current run; note that file IO is single-threaded,
regardless of the number of threads specified.
--version
Output the version of the program.
--help Output the summary of available command-line options, including default values and/or values
specified on the command-line.
EXAMPLE: Single end experiment
The following command removes adapters from the file reads_1.fq trims both Ns and low quality bases from
the reads, and gzip compresses the resulting files. The --basename option is used to specify the prefix
for output files.
$ AdapterRemoval --file1 reads_1.fq --basename output_single --trimns --trimqualities --gzip
Since --gzip and --basename is specified, the trimmed FASTQ reads are written to
output_single.truncated.gz, the dicarded FASTQ reads are written to output_single.discarded.gz, and
settings and summary statistics are written to output_single.settings.
Note that by default, AdapterRemoval does not require a minimum number of bases overlapping with the
adapter sequence, before reads are trimmed. This may result in an excess of very short (1 - 3 bp) 3'
fragments being falsely identified as adapter sequences, and trimmed. This behavior may be changed using
the --minadapteroverlap option, which allows the specification of a minimum number of bases (excluding
Ns) that must be aligned to carry trimming. For example, use --minadapteroverlap 3 to require an overlap
of at least 3 bp.
EXAMPLE: Paired end experiment.
The following command removes adapters from a paired-end reads, where the mate 1 and mate 2 reads are
kept in files reads_1.fq and reads_2.fq, respectively. The reads are trimmed for both Ns and low quality
bases, and overlapping reads (at least 11 nucleotides, per default) are merged (collapsed):
$ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_paired --trimns --trimqualities --collapse
This command generates the files output_paired.pair1.truncated and output_paired.pair2.truncated, which
contain trimmed pairs of reads which were not collapsed, output_paired.singleton.truncated containing
reads where one mate was discarded, output_paired.collapsed containing merged reads, and
output_paired.collapsed.truncated containing merged reads that have been trimmed due to the --trimns or
--trimqualities options. Finally, the output_paired.discarded and output_paired.settings files correspond
to those of the single-end run.
EXAMPLE: Interleaved FASTQ reads.
AdapterRemoval is able to read and write paired-end reads stored in a single, so-called interleaved FASTQ
file (one pair at a time, first mate 1, then mate 2). This is accomplished by specifying the location of
the file using --file1 and *also* setting the --interleaved command-line option:
$ AdapterRemoval --interleaved --file1 interleaved.fq --basename output_interleaved
Other than taking just a single input file, this mode operates almost exactly like paired end trimming
(as described above); the mode differs only in that paired reads are not written to a 'pair1' and a
'pair2' file, but instead these are instead written to a single, interleaved file, named 'paired'. The
location of this file is controlled using the --output1 option. Enabling either reading or writing of
interleaved FASTQ files, both not both, can be accomplished by specifying the either of the
--interleaved-input and --interleaved-output options, both of which are enabled by the --interleaved
option.
EXAMPLE: Different quality score encodings.
By default, AdapterRemoval expects the quality scores in FASTQ reads to be Phred+33 encoded, meaning that
the error probabilities are encoded as (char)('!' - 10 * log10(p)). Most data will be encoded using
Phred+33, but Phred+64 and 'Solexa' encoded quality scores are also supported. These are selected by
specifying the --qualitybase command-line option (specifying either '33', '64', or 'solexa')::
$ AdapterRemoval --qualitybase 64 --file1 reads_q64.fq --basename phred_64_encoded
By default, reads are written using the *same* encoding as the input. If a different encoding is desired,
this may be accomplished using the --qualitybase-output option:
$ AdapterRemoval --qualitybase 64 --qualitybase-output 33 --file1 reads_q64.fq --basename phred_33_encoded
Note furthermore that AdapterRemoval by default only expects quality scores in the range 0 - 41 (or -5 to
41 in the case of Solexa encoded scores). If input data using a different maximum quality score is to be
processed, or if the desired maximum quality score of collapsed reads is greater than 41, then this limit
may be increased using the --qualitymax option:
$ AdapterRemoval --qualitymax 50 --file1 reads_1.fq --file2 reads_2.fq --collapsed --basename collapsed_q50
For a detailed overview of Phred encoding schemes currently and previously in use, see e.g. the Wikipedia
article on the subject: https://en.wikipedia.org/wiki/FASTQ_format#Encoding
EXAMPLE: Paired end reads containing multiple, distinct adapter pairs.
It is possible to trim data that contains multiple adapter pairs, by providing a one or two-column table
containing possible adapter combinations (for single-end and paired-end trimming, respectively; see e.g.
examples/adapters.txt):
$ cat adapters.txt
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTG AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
AAACTTGCTCTGTGCCCGCTCCGTATGTCACAACAGTGCGTGTATCACCTCAATGCAGGACTCA GATCGGGAGTAATTTGGAGGCAGTAGTTCGTCGAAACTCGGAGCGTCTTTAGCAGGAG
CTAATTTGCCGTAGCGACGTACTTCAGCCTCCAGGAATTGGACCCTTACGCACACGCATTCATG TACCGTGAAAGGTGCGCTTAGTGGCATATGCGTTAAGAGCTAGGTAACGGTCTGGAGG
GTTCATACGACGACGACCAATGGCACACTTATCCGGTACTTGCGTTTCAATGCGCATGCCCCAT TAAGAAACTCGGAGTTTGGCCTGCGAGGTAGCTTGGGTGTTATGAAGAACGGCATGCG
CCATGCCCCGAAGATTCCTATACCCTTAAGGTCGCAATTGTTCGAGTAAGCTGTACGCGCCCAT GTTGCATTGACCCGAAGGGCTCGATGTTTAGGGAGGTCAGAAGTTGAGCGGGTTCAAA
This table is then specified using the --adapter-list option:
$ AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_multi --trimns --trimqualities --collapse --adapter-list adapters.txt
The resulting .summary file contains an overview of how frequently each adapter (pair) was used.
Note that in the case of paired-end adapters, AdapterRemoval considers only the combinations of adapters
specified in the table, one combination per row. For single-end trimming, only the first column of the
table file is required, and the list may therefore take the form of a file containing one sequence per
line.
EXAMPLE: Identifying adapter sequences from paired-ended reads
If we did not know the adapter sequences for paired-end reads, AdapterRemoval may be used to generate a
consensus adapter sequence based on fragments identified as belonging to the adapters through pairwise
alignments of the reads, provided that the data set contains only a single adpater sequence (not counting
differences in index sequences).
In the following example, the identified adapters corresponds to the default adapter sequences with a
poly-A tail resulting from sequencing past the end of the insert + templates. It is not necessary to
specify this tail when using the --adapter1 or --adapter2 command-line options. The characters shown
under each of the consensus sequences represented the phred-encoded fraction of bases identical to the
consensus base, with adapter 1 containing the index CACCTA:
$ AdapterRemoval --identify-adapters --file1 reads_1.fq --file2 reads_2.fq
Attemping to identify adapter sequences ...
Processed a total of 1,000 reads in 0.0s; 129,000 reads per second on average ...
Found 394 overlapping pairs ...
Of which 119 contained adapter sequence(s) ...
Printing adapter sequences, including poly-A tails:
--adapter1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
||||||||||||||||||||||||||||||||||******||||||||||||||||||||||||
Consensus: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAAAAA
Quality: 55200522544444/4411330333330222222/1.1.1.1111100-00000///..+....--*-)),,+++++++**(('%%%$
Top 5 most common 9-bp 5'-kmers:
1: AGATCGGAA = 96.00% (96)
2: AGATGGGAA = 1.00% (1)
3: AGCTCGGAA = 1.00% (1)
4: AGAGCGAAA = 1.00% (1)
5: AGATCGGGA = 1.00% (1)
--adapter2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Consensus: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Quality: 525555555144141441430333303.2/22-2/-1..11111110--00000///..+....--*-),,,+++++++**(%'%%%$
Top 5 most common 9-bp 5'-kmers:
1: AGATCGGAA = 100.00% (100)
No files are generated from running the adapter identification step.
The consensus sequences inferred are compared to those specified using the --adapter1 and --adapter2
command-line options, or with the default values for these if no values have been given (as in this
case). Pipes (|) indicate matches between the provided sequences and the consensus sequence, and "*"
indicate the presence of unspecified bases (Ns).
EXAMPLE: Demultiplexing of paired end reads
As of version 2.1, AdapterRemoval supports simultanious demultiplexing and adapter trimming;
demultiplexing is carried out using a simple comparison between the specified barcode sequences and the
first N bases of the reads, corresponding to the length of the barcodes. Reads identified as containing a
specific barcode or pair of barcodes are then trimmed using adapter sequences including these barcodes.
Demultiplexing is enabled by creating a table of barcodes, the first column of which species the sample
name (using characters [a-zA-Z0-9_]) and the second and (optional) third columns specifies the mate 1 and
mate 2 barcode sequences.
For example, a table of barcodes from a double-indexed run might be as follows (see
examples/barcodes.txt):
$ cat barcodes.txt
sample_1 ATGCGGA TGAATCT
sample_2 ATGGATT ATAGTGA
sample_7 CAAAACT TCGCTGC
In the case of single-read reads, only the first two columns are required. AdapterRemoval is invoked with
the --barcode-list option, specifying the path to this table:
$ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_dumux --barcode-list barcodes.txt
This generates a set of output files for each sample specified in the barcode table, using the basename
(--basename) as the prefix, followed by a dot and the sample name, followed by a dot and the default name
for a given file type. For example, the output files for sample_2 would be
output_demux.sample_2.discarded
output_demux.sample_2.pair1.truncated
output_demux.sample_2.pair2.truncated
output_demux.sample_2.settings
output_demux.sample_2.singleton.truncated
The settings files generated for each sample summarizes the reads for that sample only; in addition, a
basename.settings file is generated which summarizes the number and proportion of reads identified as
belonging to each sample.
The maximum number of mismatches allowed when comparing barocdes is controlled using the options
--barcode-mmI, --barcode-mm-r1, and --barcode-mm-r2, which specify the maximum number of mismatches
total, and the maximum number of mismatches for the mate 1 and mate 2 barcodes respectively. Thus, if
mm_1(i) and mm_2(i) represents the number of mismatches observed for barcode-pair i for a given pair of
reads, these options require that
1. mm_1(i) <= --barcode-mm-r1
2. mm_2(i) <= --barcode-mm-r2
3. mm_1(i) + mm_2(i) <= --barcode-mm
As of version 2.2, AdapterRemoval can furthermore be used to demultiplex reads without carrying out other
forms of read trimming. This is accomplished by specifying the --demultiplex-only option:
$ AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_only_demux --barcode-list barcodes.txt --demultiplex-only
Trimming and filtering related options to not apply to this mode ("TRIMMING SETTINGS" when viewing
'AdapterRemoval --help'), but compression (--gzip, --bzip2), multi-threading (--threads), interleaving
(--interleaved, etc.) and other such options may be used in conjunction with --demultiplex-only.
EXIT STATUS
0 if everything worked as planned, a non-zero value otherwise.
REPORTING BUGS
Report bugs to Mikkel Schubert <MikkelSch@gmail.com>.
Your bugreport should always include:
• The output of AdapterRemoval --version. If you are not running the latest released version you should
specify why you believe the problem is not fixed in that version.
• A complete example that others can run that shows the problem.
AUTHOR
Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.
Parts of the manual was written by Ole Tange <tange@binf.ku.dk>.
Parts of the manual was written by Mikkel Schubert <MikkelSch@gmail.com>.
LICENSE
Copyright (C) 2011 Stinus Lindgreen <stinus@binf.ku.dk>.
Copyright (C) 2014 Mikkel Schubert <MikkelSch@gmail.com>.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU
General Public License as published by the Free Software Foundation; either version 3 of the License, or
at your option any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see
<http://www.gnu.org/licenses/>.
SEE ALSO
perl v5.24.1 2017-07-17 ADAPTERREMOVAL(1)