oracular (1) flash.1.gz

Provided by: flash_1.2.11-2_amd64 bug

NAME

       flash - Fast Length Adjustment of SHort reads

SYNOPSIS

       flash [OPTIONS] MATES_1.FASTQ MATES_2.FASTQ

       flash [OPTIONS] --interleaved-input (MATES.FASTQ | -)

       flash [OPTIONS] --tab-delimited-input (MATES.TAB | -)

DESCRIPTION

       FLASH (Fast Length Adjustment of SHort reads) is an accurate and fast tool to merge paired-end reads that
       were generated from DNA fragments whose lengths are shorter than twice the length of reads.  Merged  read
       pairs  result  in  unpaired  longer reads, which are generally more desired in genome assembly and genome
       analysis processes.

       Briefly, the FLASH algorithm considers all possible overlaps at or above a  minimum  length  between  the
       reads  in  a  pair  and  chooses  the  overlap that results in the lowest mismatch density (proportion of
       mismatched bases in the overlapped region).  Ties between multiple overlaps  are  broken  by  considering
       quality scores at mismatch sites.  When building the merged sequence, FLASH computes a consensus sequence
       in   the   overlapped   region.    More   details   can   be   found   in   the   original    publication
       (http://bioinformatics.oxfordjournals.org/content/27/21/2957.full).

   Limitations of FLASH include:
              - FLASH cannot merge paired-end reads that do not overlap.

              -  FLASH  is  not  designed for data that has a significant amount of indel errors (such as Sanger
              sequencing data).  It is best suited for Illumina data.

MANDATORY INPUT

       The most common input to FLASH is two FASTQ files containing read  1  and  read  2  of  each  mate  pair,
       respectively, in the same order.

       Alternatively,  you  may provide one FASTQ file, which may be standard input, containing paired-end reads
       in  either  interleaved  FASTQ  (see  the  --interleaved-input  option)   or   tab-delimited   (see   the
       --tab-delimited-input option) format.  In all cases, gzip compressed input is autodetected.  Also, in all
       cases, the PHRED offset is, by default, assumed to be 33; use the --phred-offset option to change it.

OUTPUT

       The default output of FLASH consists of the following files:

       - out.extendedFrags.fastq
              The merged reads.

       - out.notCombined_1.fastq
              Read 1 of mate pairs that were not merged.

       - out.notCombined_2.fastq
              Read 2 of mate pairs that were not merged.

       - out.hist
              Numeric histogram of merged read lengths.

       - out.histogram
              Visual histogram of merged read lengths.

       FLASH also logs informational messages to standard output.  These can also be redirected to a file, as in
       the following example:

              $ flash reads_1.fq reads_2.fq 2>&1 | tee flash.log

       In addition, FLASH supports several features affecting the output:

              - Writing the merged reads directly to standard output (--to-stdout)

              - Writing gzip compressed output files (-z) or using an external
                 compression program (--compress-prog)

              - Writing the uncombined read pairs in interleaved FASTQ format

              (--interleaved-output)

              - Writing all output reads to a single file in tab-delimited format

              (--tab-delimited-output)

OPTIONS

       -m, --min-overlap=NUM
              The  minimum  required  overlap length between two reads to provide a confident overlap.  Default:
              10bp.

       -M, --max-overlap=NUM
              Maximum overlap length expected in approximately 90% of read pairs.  It is by default set to 65bp,
              which works well for 100bp reads generated from a 180bp library, assuming a normal distribution of
              fragment lengths.  Overlaps longer than the maximum overlap parameter are still considered as good
              overlaps,  but  the  mismatch  density  (explained below) is calculated over the first max_overlap
              bases in the overlapped region rather than the entire overlap.  Default: 65bp, or calculated  from
              the specified read length, fragment length, and fragment length standard deviation.

       -x, --max-mismatch-density=NUM
              Maximum  allowed  ratio  between  the number of mismatched base pairs and the overlap length.  Two
              reads will not be combined with a given overlap if that  overlap  results  in  a  mismatched  base
              density  higher  than this value.  Note: Any occurence of an 'N' in either read is ignored and not
              counted towards the mismatches or overlap length.  Our experimental results  suggest  that  higher
              values  of the maximum mismatch density yield larger numbers of correctly merged read pairs but at
              the expense of higher numbers of incorrectly merged read pairs.  Default: 0.25.

       -O, --allow-outies
              Also try combining read pairs in the "outie" orientation, e.g.

       Read 1: <-----------
              Read 2:       ------------>

              as opposed to only the "innie" orientation, e.g.

       Read 1:
              <------------

              Read 2: ----------->

       FLASH uses the same parameters when trying each
              orientation.  If a read pair can be  combined  in  both  "innie"  and  "outie"  orientations,  the
              better-fitting one will be chosen using the same scoring algorithm that FLASH normally uses.

       This option also causes extra .innie and .outie
              histogram files to be produced.

       -p, --phred-offset=OFFSET
              The  smallest  ASCII  value  of  the characters used to represent quality values of bases in FASTQ
              files.  It should be set to either 33, which corresponds  to  the  later  Illumina  platforms  and
              Sanger platforms, or 64, which corresponds to the earlier Illumina platforms.  Default: 33.

       -r, --read-len=LEN

       -f, --fragment-len=LEN

       -s, --fragment-len-stddev=LEN
              Average  read  length,  fragment  length,  and fragment standard deviation.  These are convenience
              parameters only, as they are  only  used  for  calculating  the  maximum  overlap  (--max-overlap)
              parameter.   The  maximum  overlap  is  calculated  as the overlap of average-length reads from an
              average-size fragment plus 2.5 times the fragment length standard deviation.  The  default  values
              are  -r 100, -f 180, and -s 18, so this works out to a maximum overlap of 65 bp.  If --max-overlap
              is specified, then the specified value overrides the calculated value.

       If you do not know the standard deviation of the
              fragment library, you can probably assume that the  standard  deviation  is  10%  of  the  average
              fragment length.

       --cap-mismatch-quals
              Cap  quality  scores  assigned at mismatch locations to 2.  This was the default behavior in FLASH
              v1.2.7 and earlier.  Later versions will instead calculate such scores as max(|q1 - q2|, 2);  that
              is,  the absolute value of the difference in quality scores, but at least 2.  Essentially, the new
              behavior prevents a low quality base call that is likely a  sequencing  error  from  significantly
              bringing down the quality of a high quality, likely correct base call.

       --interleaved-input
              Instead  of  requiring files MATES_1.FASTQ and MATES_2.FASTQ, allow a single file MATES.FASTQ that
              has the paired-end reads interleaved.  Specify "-" to read from standard input.

       --interleaved-output
              Write the uncombined pairs in interleaved FASTQ format.

       -I, --interleaved
              Equivalent to specifying both --interleaved-input and --interleaved-output.

       -Ti, --tab-delimited-input
              Assume the input is in tab-delimited format rather than FASTQ, in the format  described  below  in
              '--tab-delimited-output'.  In this mode you should provide a single input file, each line of which
              must contain either a read pair (5 fields) or a single read (3 fields).  FLASH will try to combine
              the  read  pairs.   Single  reads  will  be  written  to  the  output  file  as-is  if  also using
              --tab-delimited-output; otherwise they will be ignored.  Note that you  may  specify  "-"  as  the
              input file to read the tab-delimited data from standard input.

       -To, --tab-delimited-output
              Write  output  in tab-delimited format (not FASTQ).  Each line will contain either a combined pair
              in the format 'tag <tab> seq <tab> qual' or an uncombined pair in  the  format  'tag  <tab>  seq_1
              <tab> qual_1 <tab> seq_2 <tab> qual_2'.

       -o, --output-prefix=PREFIX
              Prefix of output files.  Default: "out".

       -d, --output-directory=DIR
              Path to directory for output files.  Default: current working directory.

       -c, --to-stdout
              Write  the  combined  reads to standard output.  In this mode, with FASTQ output (the default) the
              uncombined reads are discarded.  With tab-delimited output, uncombined reads are included  in  the
              tab-delimited  data  written  to standard output.  In both cases, histogram files are not written,
              and informational messages are sent to standard error rather than to standard output.

       -z, --compress
              Compress the output files directly with  zlib,  using  the  gzip  container  format.   Similar  to
              specifying --compress-prog=gzip and --suffix=gz, but may be slightly faster.

       --compress-prog=PROG
              Pipe  the  output  through the compression program PROG, which will be called as `PROG -c -', plus
              any arguments specified by --compress-prog-args.  PROG must read uncompressed data  from  standard
              input  and  write compressed data to standard output when invoked as noted above.  Examples: gzip,
              bzip2, xz, pigz.

       --compress-prog-args=ARGS
              A string of additional arguments that will  be  passed  to  the  compression  program  if  one  is
              specified  with  --compress-prog=PROG.   (The  arguments  '-c  -'  are still passed in addition to
              explicitly specified arguments.)

       --suffix=SUFFIX, --output-suffix=SUFFIX
              Use SUFFIX as the suffix of the output files after ".fastq".  A dot before the suffix is  assumed,
              unless  an  empty  suffix  is  provided.  Default: nothing; or 'gz' if -z is specified; or PROG if
              --compress-prog=PROG is specified.

       -t, --threads=NTHREADS
              Set the number of worker threads.  This is in addition to the I/O  threads.   Default:  number  of
              processors.   Note: if you need FLASH's output to appear deterministically or in the same order as
              the original reads, you must specify -t 1 (--threads=1).

       -q, --quiet
              Do not print informational messages.

       -h, --help
              Display this help and exit.

       -v, --version
              Display version.

AUTHOR

        This manpage was written by Andreas Tille for the Debian distribution and
        can be used for any other usage of the program.