Ubuntu Manpage: fastp - Ultra-fast all-in-one FASTQ preprocessor

NAME

       fastp - Ultra-fast all-in-one FASTQ preprocessor

DESCRIPTION

       fastp:  an  ultra-fast all-in-one FASTQ preprocessor version 0.23.0 usage: fastp [options]
       ...  options:

       -i, --in1
              read1 input file name (string [=])

       -o, --out1
              read1 output file name (string [=])

       -I, --in2
              read2 input file name (string [=])

       -O, --out2
              read2 output file name (string [=])

       --unpaired1
              for PE input, if read1 passed QC but read2 not, it will be  written  to  unpaired1.
              Default is to discard it. (string [=])

       --unpaired2
              for PE input, if read2 passed QC but read1 not, it will be written to unpaired2. If
              --unpaired2 is same as --unpaired1 (default mode),  both  unpaired  reads  will  be
              written to this same file. (string [=])

       --overlapped_out
              for  each read pair, output the overlapped region if it has no any mismatched base.
              (string [=])

       --failed_out
              specify the file to store reads that cannot pass the filters. (string [=])

       -m, --merge
              for paired-end input, merge each pair of reads into  a  single  read  if  they  are
              overlapped. The merged reads will be written to the file given by --merged_out, the
              unmerged reads will be written to the files specified by  --out1  and  --out2.  The
              merging mode is disabled by default.

       --merged_out
              in  the  merging  mode,  specify  the  file name to store merged output, or specify
              --stdout to stream the merged output (string [=])

       --include_unmerged
              in the merging mode, write the unmerged or unpaired reads to the file specified  by
              --merge. Disabled by default.

       -6, --phred64
              indicate  the input is using phred64 scoring (it'll be converted to phred33, so the
              output will still be phred33)

       -z, --compression
              compression level for gzip output (1 ~ 9). 1 is fastest, 9 is smallest, default  is
              4. (int [=4])

       --stdin
              input  from  STDIN.  If  the STDIN is interleaved paired-end FASTQ, please also add
              --interleaved_in.

       --stdout
              stream passing-filters reads to STDOUT. This  option  will  result  in  interleaved
              FASTQ output for paired-end output. Disabled by default.

       --interleaved_in
              indicate  that  <in1>  is an interleaved FASTQ which contains both read1 and read2.
              Disabled by default.

       --reads_to_process
              specify how many reads/pairs to be processed. Default 0 means  process  all  reads.
              (int [=0])

       --dont_overwrite
              don't overwrite existing files. Overwritting is allowed by default.

       --fix_mgi_id
              the  MGI  FASTQ  ID  format is not compatible with many BAM operation tools, enable
              this option to fix it.

       -V, --verbose
              output verbose log information (i.e. when every 1M reads are processed).

       -A, --disable_adapter_trimming
              adapter trimming is enabled by  default.  If  this  option  is  specified,  adapter
              trimming is disabled

       -a, --adapter_sequence
              the  adapter  for  read1.  For  SE  data,  if  not  specified,  the adapter will be
              auto-detected. For PE data, this is used if R1/R2 are found not overlapped. (string
              [=auto])

       --adapter_sequence_r2
              the  adapter  for  read2  (PE  data  only).  This  is  used  if R1/R2 are found not
              overlapped. If not specified, it will be the  same  as  <adapter_sequence>  (string
              [=auto])

       --adapter_fasta
              specify  a  FASTA file to trim both read1 and read2 (if PE) by all the sequences in
              this FASTA file (string [=])

       --detect_adapter_for_pe
              by default, the auto-detection for adapter is for SE data input only, turn on  this
              option to enable it for PE data.

       -f, --trim_front1
              trimming how many bases in front for read1, default is 0 (int [=0])

       -t, --trim_tail1
              trimming how many bases in tail for read1, default is 0 (int [=0])

       -b, --max_len1
              if read1 is longer than max_len1, then trim read1 at its tail to make it as long as
              max_len1. Default 0 means no limitation (int [=0])

       -F, --trim_front2
              trimming how many bases in front for read2. If it's not specified, it  will  follow
              read1's settings (int [=0])

       -T, --trim_tail2
              trimming  how  many  bases in tail for read2. If it's not specified, it will follow
              read1's settings (int [=0])

       -B, --max_len2
              if read2 is longer than max_len2, then trim read2 at its tail to make it as long as
              max_len2.  Default  0  means  no  limitation. If it's not specified, it will follow
              read1's settings (int [=0])

       -D, --dedup
              enable deduplication to drop the duplicated reads/pairs

       --dup_calc_accuracy
              accuracy level to calculate duplication (1~6), higher level uses more  memory  (1G,
              2G,  4G,  8G,  16G,  24G).  Default 1 for no-dedup mode, and 3 for dedup mode. (int
              [=0])

       --dont_eval_duplication
              don't evaluate duplication rate to save time and use less memory.

       -g, --trim_poly_g
              force polyG tail  trimming,  by  default  trimming  is  automatically  enabled  for
              Illumina NextSeq/NovaSeq data

       --poly_g_min_len
              the minimum length to detect polyG in the read tail. 10 by default. (int [=10])

       -G, --disable_trim_poly_g
              disable  polyG  tail  trimming,  by  default  trimming is automatically enabled for
              Illumina NextSeq/NovaSeq data

       -x, --trim_poly_x
              enable polyX trimming in 3' ends.

       --poly_x_min_len
              the minimum length to detect polyX in the read tail. 10 by default. (int [=10])

       -5, --cut_front
              move a sliding window from front (5') to tail, drop the bases in the window if  its
              mean quality < threshold, stop otherwise.

       -3, --cut_tail
              move  a sliding window from tail (3') to front, drop the bases in the window if its
              mean quality < threshold, stop otherwise.

       -r, --cut_right
              move a sliding window from front to tail, if meet one window with  mean  quality  <
              threshold, drop the bases in the window and the right part, and then stop.

       -W, --cut_window_size
              the window size option shared by cut_front, cut_tail or cut_sliding. Range: 1~1000,
              default: 4 (int [=4])

       -M, --cut_mean_quality
              the mean quality requirement option shared by cut_front, cut_tail  or  cut_sliding.
              Range: 1~36 default: 20 (Q20) (int [=20])

       --cut_front_window_size
              the  window  size  option of cut_front, default to cut_window_size if not specified
              (int [=4])

       --cut_front_mean_quality
              the mean quality requirement option for cut_front, default to  cut_mean_quality  if
              not specified (int [=20])

       --cut_tail_window_size
              the  window  size  option  of cut_tail, default to cut_window_size if not specified
              (int [=4])

       --cut_tail_mean_quality
              the mean quality requirement option for cut_tail, default  to  cut_mean_quality  if
              not specified (int [=20])

       --cut_right_window_size
              the  window  size  option of cut_right, default to cut_window_size if not specified
              (int [=4])

       --cut_right_mean_quality
              the mean quality requirement option for cut_right, default to  cut_mean_quality  if
              not specified (int [=20])

       -Q, --disable_quality_filtering
              quality  filtering  is  enabled  by  default.  If this option is specified, quality
              filtering is disabled

       -q, --qualified_quality_phred
              the quality value that a base is qualified. Default 15 means phred quality >=Q15 is
              qualified. (int [=15])

       -u, --unqualified_percent_limit
              how  many percents of bases are allowed to be unqualified (0~100). Default 40 means
              40% (int [=40])

       -n, --n_base_limit
              if one read's number of N base is >n_base_limit, then this read/pair is  discarded.
              Default is 5 (int [=5])

       -e, --average_qual
              if  one  read's  average quality score <avg_qual, then this read/pair is discarded.
              Default 0 means no requirement (int [=0])

       -L, --disable_length_filtering
              length filtering is enabled  by  default.  If  this  option  is  specified,  length
              filtering is disabled

       -l, --length_required
              reads shorter than length_required will be discarded, default is 15. (int [=15])

       --length_limit
              reads  longer  than  length_limit will be discarded, default 0 means no limitation.
              (int [=0])

       -y, --low_complexity_filter
              enable low complexity filter. The complexity is defined as the percentage  of  base
              that is different from its next base (base[i] != base[i+1]).

       -Y, --complexity_threshold
              the  threshold  for  low  complexity filter (0~100). Default is 30, which means 30%
              complexity is required. (int [=30])

       --filter_by_index1
              specify a file contains a list of barcodes  of  index1  to  be  filtered  out,  one
              barcode per line (string [=])

       --filter_by_index2
              specify  a  file  contains  a  list  of  barcodes of index2 to be filtered out, one
              barcode per line (string [=])

       --filter_by_index_threshold
              the allowed difference of index  barcode  for  index  filtering,  default  0  means
              completely identical. (int [=0])

       -c, --correction
              enable  base  correction  in  overlapped  regions  (only  for  PE data), default is
              disabled

       --overlap_len_require
              the minimum length to detect overlapped  region  of  PE  reads.  This  will  affect
              overlap  analysis  based  PE merge, adapter trimming and correction. 30 by default.
              (int [=30])

       --overlap_diff_limit
              the maximum number of mismatched bases to detect overlapped  region  of  PE  reads.
              This  will affect overlap analysis based PE merge, adapter trimming and correction.
              5 by default. (int [=5])

       --overlap_diff_percent_limit
              the maximum percentage of mismatched bases to detect overlapped region of PE reads.
              This  will affect overlap analysis based PE merge, adapter trimming and correction.
              Default 20 means 20%. (int [=20])

       -U, --umi
              enable unique molecular identifier (UMI) preprocessing

       --umi_loc
              specify the location of UMI, can be  (index1/index2/read1/read2/per_index/per_read,
              default is none (string [=])

       --umi_len
              if the UMI is in read1/read2, its length should be provided (int [=0])

       --umi_prefix
              if specified, an underline will be used to connect prefix and UMI (i.e. prefix=UMI,
              UMI=AATTCG, final=UMI_AATTCG). No prefix by default (string [=])

       --umi_skip
              if the UMI is in read1/read2, fastp can skip several bases following  UMI,  default
              is 0 (int [=0])

       -p, --overrepresentation_analysis
              enable overrepresented sequence analysis.

       -P, --overrepresentation_sampling
              one    in    (--overrepresentation_sampling)    reads    will   be   computed   for
              overrepresentation analysis (1~10000), smaller  is  slower,  default  is  20.  (int
              [=20])

       -j, --json
              the json format report file name (string [=fastp.json])

       -h, --html
              the html format report file name (string [=fastp.html])

       -R, --report_title
              should be quoted with ' or ", default is "fastp report" (string [=fastp report])

       -w, --thread
              worker thread number, default is 3 (int [=3])

       -s, --split
              split  output  by  limiting  total  split  file  number with this option (2~999), a
              sequential  number  prefix  will  be  added   to   output   name   (   0001.out.fq,
              0002.out.fq...), disabled by default (int [=0])

       -S, --split_by_lines
              split  output by limiting lines of each file with this option(>=1000), a sequential
              number prefix will be added to output name ( 0001.out.fq, 0002.out.fq...), disabled
              by default (long [=0])

       -d, --split_prefix_digits
              the  digits for the sequential number padding (1~10), default is 4, so the filename
              will be padded as 0001.xxx, 0 to disable padding (int [=4])

       --cut_by_quality5
              DEPRECATED, use --cut_front instead.

       --cut_by_quality3
              DEPRECATED, use --cut_tail instead.

       --cut_by_quality_aggressive
              DEPRECATED, use --cut_right instead.

       --discard_unmerged
              DEPRECATED, no effect now, see the introduction for merging.

       -?, --help
              print this message

AUTHOR

        This manpage was written by Nilesh Patra for the Debian distribution and
        can be used for any other usage of the program.