Ubuntu Manpage: fastq-mcf - ea-utils: detect levels of adapter presence, compute likelihoods and locations

Provided by: ea-utils_1.1.2+dfsg-4build1_amd64

NAME

       fastq-mcf - ea-utils: detect levels of adapter presence, compute likelihoods and locations
       of the adapters

SYNOPSIS

       fastq-mcf [options] <adapters.fa> <reads.fq> [mates1.fq ...]

DESCRIPTION

       Version: 1.04.676

       Detects levels of adapter presence, computes likelihoods and locations (start, end) of the
       adapters.   Removes the adapter sequences from the fastq file(s).

       Stats go to stderr, unless -o is specified.

       Specify -0 to turn off all default settings

       If  you  specify multiple 'paired-end' inputs, then a -o option is required for each.  IE:
       -o read1.clip.q -o read2.clip.fq

OPTIONS

       -h     This help

       -o FIL Output file (stats to stdout)

       -s N.N Log scale for adapter minimum-length-match (2.2)

       -t N   % occurance threshold before adapter clipping (0.25)

       -m N   Minimum clip length, overrides scaled auto (1)

       -p N   Maximum adapter difference percentage (10)

       -l N   Minimum remaining sequence length (19)

       -L N   Maximum remaining sequence length (none)

       -D N   Remove duplicate reads : Read_1 has an identical N bases (0)

       -k N   sKew percentage-less-than causing cycle removal (2)

       -x N   'N' (Bad read) percentage causing cycle removal (20)

       -q N   quality threshold causing base removal (10)

       -w N   window-size for quality trimming (1)

       -H     remove >95% homopolymer reads (no)

       -X     remove low complexity reads (no)

       -0     Set all default parameters to zero/do nothing

       -U|u   Force disable/enable Illumina PF filtering (auto)

       -P N   Phred-scale (auto)

       -R     Don't remove N's from the fronts/ends of reads

       -n     Don't clip, just output what would be done

       -C N   Number of reads to use for subsampling (300k)

       -S     Save all discarded reads to '.skip' files

       -d     Output lots of random debugging stuff

   Quality adjustment options:
       --cycle-adjust
              CYC,AMT   Adjust cycle CYC (negative = offset from end) by amount AMT

       --phred-adjust
              SCORE,AMT Adjust score SCORE by amount AMT

       --phred-adjust-max
              SCORE     Adjust scores > SCORE to SCOTE

   Filtering options*:
       --[mate-]qual-mean
              NUM       Minimum mean quality score

       --[mate-]qual-gt
              NUM,THR   At least NUM quals > THR

       --[mate-]max-ns
              NUM       Maxmium N-calls in a read (can be a %)

       --[mate-]min-len
              NUM       Minimum remaining length (same as -l)

       --homopolymer-pct
              PCT       Homopolymer filter percent (95)

       --lowcomplex-pct
              PCT       Complexity filter percent (95)

       If mate- prefix is used, then applies to second non-barcode read only

       Adapter files are 'fasta' formatted:

       Specify n/a to turn off adapter clipping, and just use filters

       Increasing the  scale  makes  recognition-lengths  longer,  a  scale  of  100  will  force
       full-length recognition of adapters.

       Adapter  sequences  with  _5p  in their label will match 'end's, and sequences with _3p in
       their label will match 'start's, otherwise the 'end' is auto-determined.

       Skew is when one cycle is poor, 'skewed' toward a particular base.  If any  nucleotide  is
       less  than  the skew percentage, then the whole cycle is removed.  Disable for methyl-seq,
       etc.

       Set the skew (-k) or N-pct (-x) to 0 to turn it off (should be done  for  miRNA,  amplicon
       and other low-complexity situations!)

       Duplicate  read  filtering is appropriate for assembly tasks, and never when read length <
       expected coverage.  -D 50 will use 4.5GB RAM on 100m DNA reads - be careful. Great for RNA
       assembly.

       *Quality filters are evaluated after clipping/trimming

       Homopolymer  filtering  is  a subset of low-complexity, but will not be separately tracked
       unless both are turned on.