Ubuntu Manpage: cutadapt - remove adapter sequences from high-throughput sequencing reads

name
synopsis
description
options
see also
author

NAME

       cutadapt - remove adapter sequences from high-throughput sequencing reads

SYNOPSIS

              cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

       For paired-end reads:

              cutadapt  -a  ADAPT1  -A  ADAPT2  [options]  -o  out1.fastq -p out2.fastq in1.fastq
              in2.fastq

DESCRIPTION

       Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC  wildcard  characters
       are  supported.  The  reverse  complement  is *not* automatically searched. All reads from
       input.fastq will be written to output.fastq with the  adapter  sequence  removed.  Adapter
       matching  is  error-tolerant.  Multiple  adapter  sequences  can  be given (use further -a
       options), but only the best-matching adapter will be removed.

       Input may also  be  in  FASTA  format.  Compressed  input  and  output  is  supported  and
       auto-detected  from  the  file  name  (.gz, .xz, .bz2). Use the file name '-' for standard
       input/output. Without the -o option, output is sent to standard output.

OPTIONS

       --help show all command-line options

       --version
              show program's version number and exit

       -h, --help
              show this help message and exit

       --debug
              Print debugging information.

       -f FORMAT, --format=FORMAT
              Input file format; can be either 'fasta',  'fastq'  or  'sra-fastq'.  Ignored  when
              reading csfasta/qual files.  Default: auto-detect from file name extension.

              Finding adapters::

              Parameters  -a,  -g,  -b specify adapters to be removed from each read (or from the
              first read in a pair if data is paired). If specified multiple times, only the best
              matching adapter is trimmed (but see the --times option). When the special notation
              'file:FILE' is used, adapter sequences are read from the given FASTA file.

       -a ADAPTER, --adapter=ADAPTER
              Sequence of an adapter ligated to the 3' end (paired data: of the first read).  The
              adapter  and  subsequent  bases  are  trimmed.  If  a  '$'  character  is  appended
              ('anchoring'), the adapter is only found if it is a suffix of the read.

       -g ADAPTER, --front=ADAPTER
              Sequence of an adapter ligated to the 5' end (paired data: of the first read).  The
              adapter  and  any  preceding  bases  are trimmed. Partial matches at the 5' end are
              allowed. If a '^' character is prepended ('anchoring'), the adapter is  only  found
              if it is a prefix of the read.

       -b ADAPTER, --anywhere=ADAPTER
              Sequence of an adapter that may be ligated to the 5' or 3' end (paired data: of the
              first read). Both types of matches as described under -a und -g  are  allowed.   If
              the  first  base  of  the  read  is  part of the match, the behavior is as with -g,
              otherwise  as  with  -a.  This  option  is  mostly  for  rescuing  failed   library
              preparations - do not use if you know which end your adapter was ligated to!

       -e ERROR_RATE, --error-rate=ERROR_RATE
              Maximum  allowed  error  rate  (no. of errors divided by the length of the matching
              region). Default: 0.1

       --no-indels
              Allow only mismatches in alignments. Default: allow both mismatches and indels

       -n COUNT, --times=COUNT
              Remove up to COUNT adapters from each read. Default: 1

       -O MINLENGTH, --overlap=MINLENGTH
              If the overlap between the read and the adapter is shorter than MINLENGTH, the read
              is  not  modified.  Reduces the no. of bases trimmed due to random adapter matches.
              Default: 3

       --match-read-wildcards
              Interpret IUPAC wildcards in reads. Default: False

       -N, --no-match-adapter-wildcards
              Do not interpret IUPAC wildcards in adapters.

       --no-trim
              Match and redirect reads to output/untrimmed-output as usual,  but  do  not  remove
              adapters.

       --mask-adapter
              Mask adapters with 'N' characters instead of trimming them.

              Additional read modifications:

       -u LENGTH, --cut=LENGTH
              Remove  bases  from  each  read (first read only if paired). If LENGTH is positive,
              remove bases from the beginning. If LENGTH is negative, remove bases from the  end.
              Can be used twice if LENGTHs have different signs.

       -q [5'CUTOFF,]3'CUTOFF, --quality-cutoff=[5'CUTOFF,]3'CUTOFF
              Trim  low-quality bases from 5' and/or 3' ends of each read before adapter removal.
              Applied to both reads if data is paired. If one value is given, only the 3' end  is
              trimmed.  If  two comma-separated cutoffs are given, the 5' end is trimmed with the
              first cutoff, the 3' end with the second.

       --nextseq-trim=3'CUTOFF
              NextSeq-specific quality trimming (each read). Trims also dark cycles appearing  as
              high-quality G bases (EXPERIMENTAL).

       --quality-base=QUALITY_BASE
              Assume  that  quality values in FASTQ are encoded as ascii(quality + QUALITY_BASE).
              This needs to be set to 64 for some old Illumina FASTQ files. Default: 33

       --trim-n
              Trim N's on ends of reads.

       -x PREFIX, --prefix=PREFIX
              Add this prefix to read names. Use {name}  to  insert  the  name  of  the  matching
              adapter.

       -y SUFFIX, --suffix=SUFFIX
              Add this suffix to read names; can also include {name}

       --strip-suffix=STRIP_SUFFIX
              Remove this suffix from read names if present. Can be given multiple times.

       --length-tag=TAG
              Search  for  TAG followed by a decimal number in the description field of the read.
              Replace the decimal number with the  correct  length  of  the  trimmed  read.   For
              example, use --length-tag 'length=' to correct fields like 'length=123'.

              Filtering of processed reads:

       --discard-trimmed, --discard
              Discard  reads  that  contain  an adapter. Also use -O to avoid discarding too many
              randomly matching reads!

       --discard-untrimmed, --trimmed-only
              Discard reads that do not contain the adapter.

       -m LENGTH, --minimum-length=LENGTH
              Discard trimmed reads that are shorter than LENGTH.  Reads that are too short  even
              before  adapter removal are also discarded. In colorspace, an initial primer is not
              counted. Default: 0

       -M LENGTH, --maximum-length=LENGTH
              Discard trimmed reads that are longer than LENGTH.  Reads that are  too  long  even
              before  adapter removal are also discarded. In colorspace, an initial primer is not
              counted. Default: no limit

       --max-n=COUNT
              Discard reads with too many N bases. If COUNT is an integer, it is treated  as  the
              absolute  number  of  N  bases.  If  it  is  between  0 and 1, it is treated as the
              proportion of N's allowed in a read.

              Output:

       --quiet
              Print only error messages.

       -o FILE, --output=FILE
              Write trimmed reads to FILE. FASTQ or FASTA format is chosen  depending  on  input.
              The  summary report is sent to standard output. Use '{name}' in FILE to demultiplex
              reads into multiple files. Default: write to standard output

       --info-file=FILE
              Write information about each read and  its  adapter  matches  into  FILE.  See  the
              documentation for the file format.

       -r FILE, --rest-file=FILE
              When  the  adapter  matches  in  the  middle  of  a read, write the rest (after the
              adapter) into FILE.

       --wildcard-file=FILE
              When the adapter has N bases (wildcards), write  adapter  bases  matching  wildcard
              positions  to FILE.  When there are indels in the alignment, this will often not be
              accurate.

       --too-short-output=FILE
              Write reads that are too short (according to  length  specified  by  -m)  to  FILE.
              Default: discard reads

       --too-long-output=FILE
              Write  reads  that  are  too  long  (according  to length specified by -M) to FILE.
              Default: discard reads

       --untrimmed-output=FILE
              Write reads that do not contain the adapter to FILE.  Default: output to same  file
              as trimmed reads

              Colorspace options:

       -c, --colorspace
              Enable colorspace mode: Also trim the color that is adjacent to the found adapter.

       -d, --double-encode
              Double-encode colors (map 0,1,2,3,4 to A,C,G,T,N).

       -t, --trim-primer
              Trim  primer  base  and  the  first  color  (which  is  the transition to the first
              nucleotide)

       --strip-f3
              Strip the _F3 suffix of read names

       --maq, --bwa
              MAQ- and BWA-compatible colorspace output. This enables -c, -d, -t, --strip-f3  and
              -y '/1'.

       --no-zero-cap
              Do  not change negative quality values to zero in colorspace data. By default, they
              are since many tools have problems with negative qualities.

       -z, --zero-cap
              Change  negative  quality  values  to  zero.  This  is  enabled  by  default   when
              -c/--colorspace is also enabled. Use the above option to disable it.

              Paired-end options:

              The  -A/-G/-B/-U  options work like their -a/-b/-g/-u counterparts, but are applied
              to the second read in each pair.

       -A ADAPTER
              3' adapter to be removed from second read in a pair.

       -G ADAPTER
              5' adapter to be removed from second read in a pair.

       -B ADAPTER
              5'/3 adapter to be removed from second read in a pair.

       -U LENGTH
              Remove LENGTH bases from second read in a pair (see --cut).

       -p FILE, --paired-output=FILE
              Write second read in a pair to FILE.

       --pair-filter=(any|both)
              Which of the reads in a paired-end read have to match the  filtering  criterion  in
              order for it to be filtered. Default: any

       --interleaved
              Read and write interleaved paired-end reads.

       --untrimmed-paired-output=FILE
              Write  second  read  in  a pair to this FILE when no adapter was found in the first
              read. Use this option together  with  --untrimmed-output  when  trimming  pairedend
              reads. Default: output to same file as trimmed reads

       --too-short-paired-output=FILE
              Write  second  read  in a pair to this file if pair is too short. Use together with
              --too-short-output.

       --too-long-paired-output=FILE
              Write second read in a pair to this file if pair is too  long.  Use  together  with
              --too-long-output.

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used  for
       any other usage of the program.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

AUTHOR