Ubuntu Manpage: cutadapt - manual page for cutadapt 1.8.3

name
description
options
see also

Provided by: python-cutadapt_1.9.1-1build1_amd64

NAME

       cutadapt - manual page for cutadapt 1.8.3

DESCRIPTION

cutadapt removes adapter sequences from high-throughput sequencing reads.

Usage:
cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

For paired-end reads:
cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq

Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard characters are supported.
The reverse complement is *not* automatically searched. All reads from input.fastq will be written to
output.fastq with the adapter sequence removed. Adapter matching is error-tolerant. Multiple adapter
sequences can be given (use further -a options), but only the best-matching adapter will be removed.

Input may also be in FASTA format. Compressed input and output is supported and auto-detected from the
file name (.gz, .xz, .bz2). Use the file name '-' for standard input/output. Without the -o option,
output is sent to standard output.

Some other available features are:
* Various other adapter types (5' adapters, "mixed" 5'/3' adapters etc.) * Trimming a fixed
number of bases * Quality trimming * Trimming colorspace reads * Filtering reads by various
criteria

Use "cutadapt --help" to see all command-line options. See http://cutadapt.readthedocs.org/ for full
documentation.

OPTIONS

       --version
              show program's version number and exit

       -h, --help
              show this help message and exit

       -f FORMAT, --format=FORMAT
              Input  file  format;  can  be  either  'fasta',  'fastq'  or  'sra-fastq'.  Ignored  when  reading
              csfasta/qual files (default: auto-detect from file name extension).

              Options that influence how the adapters are found:

              Each of the following three parameters (-a, -b,  -g)  can  be  used  multiple  times  and  in  any
              combination  to  search  for  an entire set of adapters of possibly different types. Only the best
              matching adapter is trimmed from each read (but see the --times  option).  Instead  of  giving  an
              adapter  directly,  you  can  also write file:FILE and the adapter sequences will be read from the
              given FILE (which must be in FASTA format).

       -a ADAPTER, --adapter=ADAPTER
              Sequence of an adapter that was ligated to the 3' end.   The  adapter  itself  and  anything  that
              follows  is  trimmed. If the adapter sequence ends with the '$' character, the adapter is anchored
              to the end of the read and only found if it is a suffix of the read.

       -g ADAPTER, --front=ADAPTER
              Sequence of an adapter that was ligated to the 5' end.  If the adapter sequence  starts  with  the
              character  '^',  the adapter is 'anchored'. An anchored adapter must appear in its entirety at the
              5' end of the read (it is a prefix of the read). A non-anchored adapter may  appear  partially  at
              the  5' end, or it may occur within the read. If it is found within a read, the sequence preceding
              the adapter is also trimmed. In all cases, the adapter itself is trimmed.

       -b ADAPTER, --anywhere=ADAPTER
              Sequence of an adapter that was ligated to the 5' or 3' end. If the adapter is  found  within  the
              read  or overlapping the 3' end of the read, the behavior is the same as for the -a option. If the
              adapter overlaps the 5' end (beginning of the read), the initial portion of the read matching  the
              adapter is trimmed, but anything that follows is kept.

       -e ERROR_RATE, --error-rate=ERROR_RATE
              Maximum  allowed error rate (no. of errors divided by the length of the matching region) (default:
              0.1)

       --no-indels
              Do not allow indels in the alignments  (allow  only  mismatches).  Currently  only  supported  for
              anchored adapters. (default: allow both mismatches and indels)

       -n COUNT, --times=COUNT
              Try  to  remove  adapters at most COUNT times. Useful when an adapter gets appended multiple times
              (default: 1).

       -O LENGTH, --overlap=LENGTH
              Minimum overlap length. If the overlap between the read and the adapter is  shorter  than  LENGTH,
              the read is not modified. This reduces the no. of bases trimmed purely due to short random adapter
              matches (default: 3).

       --match-read-wildcards
              Allow IUPAC wildcards in reads (default: False).

       -N, --no-match-adapter-wildcards
              Do not interpret IUPAC wildcards in adapters.

              Options for filtering of processed reads:

       --discard-trimmed, --discard
              Discard reads that contain the adapter instead of trimming them. Also use -O  in  order  to  avoid
              throwing away too many randomly matching reads!

       --discard-untrimmed, --trimmed-only
              Discard reads that do not contain the adapter.

       -m LENGTH, --minimum-length=LENGTH
              Discard  trimmed reads that are shorter than LENGTH.  Reads that are too short even before adapter
              removal are also discarded. In colorspace, an initial primer is not counted (default: 0).

       -M LENGTH, --maximum-length=LENGTH
              Discard trimmed reads that are longer than LENGTH.  Reads that are too long  even  before  adapter
              removal are also discarded. In colorspace, an initial primer is not counted (default: no limit).

       --no-trim
              Match and redirect reads to output/untrimmed-output as usual, but do not remove adapters.

       --max-n=LENGTH
              The  max proportion of N's allowed in a read. A number < 1 will be treated as a proportion while a
              number > 1 will be treated as the maximum number of N's contained.

       --mask-adapter
              Mask adapters with 'N' characters instead of trimming them.

              Options that influence what gets output to where:

       --quiet
              Do not print a report at the end.

       -o FILE, --output=FILE
              Write modified reads to FILE. FASTQ or FASTA format is chosen  depending  on  input.  The  summary
              report  is sent to standard output. Use '{name}' in FILE to demultiplex reads into multiple files.
              (default: trimmed reads are written to standard output)

       --info-file=FILE
              Write information about each read and its adapter matches into FILE. See the documentation for the
              file format.

       -r FILE, --rest-file=FILE
              When the adapter matches in the middle of a read, write the rest (after the adapter) into FILE.

       --wildcard-file=FILE
              When  the  adapter  has  wildcard bases ('N's), write adapter bases matching wildcard positions to
              FILE.  When there are indels in the alignment, this will often not be accurate.

       --too-short-output=FILE
              Write reads that are too short (according to length specified by -m) to  FILE.  (default:  discard
              reads)

       --too-long-output=FILE
              Write  reads  that  are  too long (according to length specified by -M) to FILE. (default: discard
              reads)

       --untrimmed-output=FILE
              Write reads that do not contain the adapter to FILE.  (default: output to  same  file  as  trimmed
              reads)

              Additional modifications to the reads:

       -u LENGTH, --cut=LENGTH
              Remove  LENGTH  bases from the beginning or end of each read. If LENGTH is positive, the bases are
              removed from the beginning of each read. If LENGTH is negative, the bases are removed from the end
              of each read. This option can be specified twice if the LENGTHs have different signs.

       -q [5'CUTOFF,]3'CUTOFF, --quality-cutoff=[5'CUTOFF,]3'CUTOFF
              Trim  low-quality  bases  from  5' and/or 3' ends of reads before adapter removal. If one value is
              given, only the 3' end is trimmed. If two comma-separated cutoffs are given, the 5' end is trimmed
              with  the  first  cutoff, the 3' end with the second. The algorithm is the same as the one used by
              BWA (see documentation).  (default: no trimming)

       --quality-base=QUALITY_BASE
              Assume that quality values are encoded as ascii(quality  +  QUALITY_BASE).  The  default  (33)  is
              usually  correct,  except for reads produced by some versions of the Illumina pipeline, where this
              should be set to 64. (Default: 33)

       --trim-n
              Trim N's on ends of reads.

       -x PREFIX, --prefix=PREFIX
              Add this prefix to read names

       -y SUFFIX, --suffix=SUFFIX
              Add this suffix to read names

       --strip-suffix=STRIP_SUFFIX
              Remove this suffix from read names if present. Can be given multiple times.

       -c, --colorspace
              Colorspace mode: Also trim the color that is adjacent to the found adapter.

       -d, --double-encode
              When in colorspace, double-encode colors (map 0,1,2,3,4 to A,C,G,T,N).

       -t, --trim-primer
              When in colorspace, trim primer base and the first color (which is the  transition  to  the  first
              nucleotide)

       --strip-f3
              For colorspace: Strip the _F3 suffix of read names

       --maq, --bwa
              MAQ- and BWA-compatible colorspace output. This enables -c, -d, -t, --strip-f3 and -y '/1'.

       --length-tag=TAG
              Search  for  TAG  followed  by  a decimal number in the description field of the read. Replace the
              decimal number with the correct length  of  the  trimmed  read.   For  example,  use  --length-tag
              'length=' to correct fields like 'length=123'.

       --no-zero-cap
              Do  not  change  negative quality values to zero.  Colorspace quality values of -1 would appear as
              spaces in the output FASTQ file. Since many tools have problems with that, negative qualities  are
              converted to zero when trimming colorspace data. Use this option to keep negative qualities.

       -z, --zero-cap
              Change  negative  quality  values to zero. This is enabled by default when -c/--colorspace is also
              enabled. Use the above option to disable it.

              Paired-end options.:

              The -A/-G/-B/-U options work like their -a/-b/-g/-u counterparts.

       -A ADAPTER
              3' adapter to be removed from the second read in a pair.

       -G ADAPTER
              5' adapter to be removed from the second read in a pair.

       -B ADAPTER
              5'/3 adapter to be removed from the second read in a pair.

       -U LENGTH
              Remove LENGTH bases from the beginning or end of each read (see --cut).

       -p FILE, --paired-output=FILE
              Write second read in a pair to FILE.

       --untrimmed-paired-output=FILE
              Write the second read in a pair to this FILE when no adapter was found in the first read. Use this
              option  together  with  --untrimmed-output when trimming pairedend reads. (Default: output to same
              file as trimmed reads.)

NAME

DESCRIPTION

OPTIONS

SEE ALSO