Ubuntu Manpage: fastq-mcf - ea-utils: detect levels of adapter presence, compute likelihoods and locations of the

Provided by: ea-utils_1.1.2+dfsg-9build1_amd64

NAME

       fastq-mcf  -  ea-utils:  detect  levels  of  adapter  presence,  compute likelihoods and locations of the
       adapters

SYNOPSIS

       fastq-mcf [options] <adapters.fa> <reads.fq> [mates1.fq ...]

DESCRIPTION

       Version: 1.04.676

       Detects levels of adapter presence, computes likelihoods and locations  (start,  end)  of  the  adapters.
       Removes the adapter sequences from the fastq file(s).

       Stats go to stderr, unless -o is specified.

       Specify -0 to turn off all default settings

       If  you specify multiple 'paired-end' inputs, then a -o option is required for each.  IE: -o read1.clip.q
       -o read2.clip.fq

OPTIONS

       -h     This help

       -o FIL Output file (stats to stdout)

       -s N.N Log scale for adapter minimum-length-match (2.2)

       -t N   % occurance threshold before adapter clipping (0.25)

       -m N   Minimum clip length, overrides scaled auto (1)

       -p N   Maximum adapter difference percentage (10)

       -l N   Minimum remaining sequence length (19)

       -L N   Maximum remaining sequence length (none)

       -D N   Remove duplicate reads : Read_1 has an identical N bases (0)

       -k N   sKew percentage-less-than causing cycle removal (2)

       -x N   'N' (Bad read) percentage causing cycle removal (20)

       -q N   quality threshold causing base removal (10)

       -w N   window-size for quality trimming (1)

       -H     remove >95% homopolymer reads (no)

       -X     remove low complexity reads (no)

       -0     Set all default parameters to zero/do nothing

       -U|u   Force disable/enable Illumina PF filtering (auto)

       -P N   Phred-scale (auto)

       -R     Don't remove N's from the fronts/ends of reads

       -n     Don't clip, just output what would be done

       -C N   Number of reads to use for subsampling (300k)

       -S     Save all discarded reads to '.skip' files

       -d     Output lots of random debugging stuff

   Quality adjustment options:
       --cycle-adjust
              CYC,AMT   Adjust cycle CYC (negative = offset from end) by amount AMT

       --phred-adjust
              SCORE,AMT Adjust score SCORE by amount AMT

       --phred-adjust-max
              SCORE     Adjust scores > SCORE to SCOTE

   Filtering options*:
       --[mate-]qual-mean
              NUM       Minimum mean quality score

       --[mate-]qual-gt
              NUM,THR   At least NUM quals > THR

       --[mate-]max-ns
              NUM       Maxmium N-calls in a read (can be a %)

       --[mate-]min-len
              NUM       Minimum remaining length (same as -l)

       --homopolymer-pct
              PCT       Homopolymer filter percent (95)

       --lowcomplex-pct
              PCT       Complexity filter percent (95)

       If mate- prefix is used, then applies to second non-barcode read only

       Adapter files are 'fasta' formatted:

       Specify n/a to turn off adapter clipping, and just use filters

       Increasing the scale makes recognition-lengths longer, a scale of 100 will force full-length  recognition
       of adapters.

       Adapter  sequences  with _5p in their label will match 'end's, and sequences with _3p in their label will
       match 'start's, otherwise the 'end' is auto-determined.

       Skew is when one cycle is poor, 'skewed' toward a particular base.  If any nucleotide is  less  than  the
       skew percentage, then the whole cycle is removed.  Disable for methyl-seq, etc.

       Set  the  skew  (-k)  or  N-pct  (-x)  to  0 to turn it off (should be done for miRNA, amplicon and other
       low-complexity situations!)

       Duplicate read filtering is appropriate for assembly  tasks,  and  never  when  read  length  <  expected
       coverage.  -D 50 will use 4.5GB RAM on 100m DNA reads - be careful. Great for RNA assembly.

       *Quality filters are evaluated after clipping/trimming

       Homopolymer  filtering  is a subset of low-complexity, but will not be separately tracked unless both are
       turned on.

fastq-mcf 1.1.2                                     July 2015                                       FASTQ-MCF(1)