lunar (1) flash.1.gz

Provided by: flash_1.2.11-2_amd64 bug

NAME

       flash - Fast Length Adjustment of SHort reads

SYNOPSIS

       flash [OPTIONS] MATES_1.FASTQ MATES_2.FASTQ

       flash [OPTIONS] --interleaved-input (MATES.FASTQ | -)

       flash [OPTIONS] --tab-delimited-input (MATES.TAB | -)

DESCRIPTION

       FLASH  (Fast  Length  Adjustment  of  SHort  reads)  is an accurate and fast tool to merge
       paired-end reads that were generated from DNA fragments whose  lengths  are  shorter  than
       twice  the  length of reads.  Merged read pairs result in unpaired longer reads, which are
       generally more desired in genome assembly and genome analysis processes.

       Briefly, the FLASH algorithm considers all possible overlaps at or above a minimum  length
       between  the  reads  in a pair and chooses the overlap that results in the lowest mismatch
       density (proportion of mismatched bases in the overlapped region).  Ties between  multiple
       overlaps  are  broken  by considering quality scores at mismatch sites.  When building the
       merged sequence, FLASH computes a consensus  sequence  in  the  overlapped  region.   More
       details       can       be       found       in       the       original       publication
       (http://bioinformatics.oxfordjournals.org/content/27/21/2957.full).

   Limitations of FLASH include:
              - FLASH cannot merge paired-end reads that do not overlap.

              - FLASH is not designed for data that has a  significant  amount  of  indel  errors
              (such as Sanger sequencing data).  It is best suited for Illumina data.

MANDATORY INPUT

       The  most  common  input  to FLASH is two FASTQ files containing read 1 and read 2 of each
       mate pair, respectively, in the same order.

       Alternatively, you may provide one FASTQ file, which may  be  standard  input,  containing
       paired-end  reads  in  either  interleaved  FASTQ  (see the --interleaved-input option) or
       tab-delimited  (see  the  --tab-delimited-input  option)  format.   In  all  cases,   gzip
       compressed  input  is  autodetected.  Also, in all cases, the PHRED offset is, by default,
       assumed to be 33; use the --phred-offset option to change it.

OUTPUT

       The default output of FLASH consists of the following files:

       - out.extendedFrags.fastq
              The merged reads.

       - out.notCombined_1.fastq
              Read 1 of mate pairs that were not merged.

       - out.notCombined_2.fastq
              Read 2 of mate pairs that were not merged.

       - out.hist
              Numeric histogram of merged read lengths.

       - out.histogram
              Visual histogram of merged read lengths.

       FLASH also logs informational messages to standard output.  These can also  be  redirected
       to a file, as in the following example:

              $ flash reads_1.fq reads_2.fq 2>&1 | tee flash.log

       In addition, FLASH supports several features affecting the output:

              - Writing the merged reads directly to standard output (--to-stdout)

              - Writing gzip compressed output files (-z) or using an external
                 compression program (--compress-prog)

              - Writing the uncombined read pairs in interleaved FASTQ format

              (--interleaved-output)

              - Writing all output reads to a single file in tab-delimited format

              (--tab-delimited-output)

OPTIONS

       -m, --min-overlap=NUM
              The  minimum  required  overlap  length  between  two  reads to provide a confident
              overlap.  Default: 10bp.

       -M, --max-overlap=NUM
              Maximum overlap length expected in approximately 90%  of  read  pairs.   It  is  by
              default  set  to  65bp,  which  works  well  for 100bp reads generated from a 180bp
              library, assuming a normal distribution of fragment lengths.  Overlaps longer  than
              the  maximum  overlap  parameter  are  still  considered  as good overlaps, but the
              mismatch density (explained below) is calculated over the first  max_overlap  bases
              in  the  overlapped  region  rather  than  the  entire  overlap.  Default: 65bp, or
              calculated from the specified read length, fragment  length,  and  fragment  length
              standard deviation.

       -x, --max-mismatch-density=NUM
              Maximum  allowed  ratio between the number of mismatched base pairs and the overlap
              length.  Two reads will not be combined  with  a  given  overlap  if  that  overlap
              results  in  a mismatched base density higher than this value.  Note: Any occurence
              of an 'N' in either read is ignored and  not  counted  towards  the  mismatches  or
              overlap length.  Our experimental results suggest that higher values of the maximum
              mismatch density yield larger numbers of correctly merged read  pairs  but  at  the
              expense of higher numbers of incorrectly merged read pairs.  Default: 0.25.

       -O, --allow-outies
              Also try combining read pairs in the "outie" orientation, e.g.

       Read 1: <-----------
              Read 2:       ------------>

              as opposed to only the "innie" orientation, e.g.

       Read 1:
              <------------

              Read 2: ----------->

       FLASH uses the same parameters when trying each
              orientation.   If  a  read  pair  can  be  combined  in  both  "innie"  and "outie"
              orientations, the  better-fitting  one  will  be  chosen  using  the  same  scoring
              algorithm that FLASH normally uses.

       This option also causes extra .innie and .outie
              histogram files to be produced.

       -p, --phred-offset=OFFSET
              The  smallest  ASCII  value  of  the characters used to represent quality values of
              bases in FASTQ files.  It should be set to either  33,  which  corresponds  to  the
              later  Illumina  platforms  and  Sanger  platforms, or 64, which corresponds to the
              earlier Illumina platforms.  Default: 33.

       -r, --read-len=LEN

       -f, --fragment-len=LEN

       -s, --fragment-len-stddev=LEN
              Average read length, fragment length, and fragment standard deviation.   These  are
              convenience  parameters  only,  as  they  are only used for calculating the maximum
              overlap (--max-overlap) parameter.   The  maximum  overlap  is  calculated  as  the
              overlap  of  average-length  reads from an average-size fragment plus 2.5 times the
              fragment length standard deviation.  The default values are -r 100, -f 180, and  -s
              18,  so  this  works  out  to  a  maximum  overlap  of  65 bp.  If --max-overlap is
              specified, then the specified value overrides the calculated value.

       If you do not know the standard deviation of the
              fragment library, you can probably assume that the standard deviation is 10% of the
              average fragment length.

       --cap-mismatch-quals
              Cap  quality  scores  assigned  at  mismatch  locations to 2.  This was the default
              behavior in FLASH v1.2.7 and earlier.  Later versions will instead  calculate  such
              scores  as  max(|q1  -  q2|,  2);  that is, the absolute value of the difference in
              quality scores, but at least 2.  Essentially,  the  new  behavior  prevents  a  low
              quality  base  call  that  is likely a sequencing error from significantly bringing
              down the quality of a high quality, likely correct base call.

       --interleaved-input
              Instead of requiring files MATES_1.FASTQ and MATES_2.FASTQ,  allow  a  single  file
              MATES.FASTQ  that  has  the paired-end reads interleaved.  Specify "-" to read from
              standard input.

       --interleaved-output
              Write the uncombined pairs in interleaved FASTQ format.

       -I, --interleaved
              Equivalent to specifying both --interleaved-input and --interleaved-output.

       -Ti, --tab-delimited-input
              Assume the input is in tab-delimited  format  rather  than  FASTQ,  in  the  format
              described  below  in  '--tab-delimited-output'.   In this mode you should provide a
              single input file, each line of which must contain either a read pair (5 fields) or
              a  single read (3 fields).  FLASH will try to combine the read pairs.  Single reads
              will be written to the output file  as-is  if  also  using  --tab-delimited-output;
              otherwise they will be ignored.  Note that you may specify "-" as the input file to
              read the tab-delimited data from standard input.

       -To, --tab-delimited-output
              Write output in tab-delimited format (not FASTQ).  Each line will contain either  a
              combined pair in the format 'tag <tab> seq <tab> qual' or an uncombined pair in the
              format 'tag <tab> seq_1 <tab> qual_1 <tab> seq_2 <tab> qual_2'.

       -o, --output-prefix=PREFIX
              Prefix of output files.  Default: "out".

       -d, --output-directory=DIR
              Path to directory for output files.  Default: current working directory.

       -c, --to-stdout
              Write the combined reads to standard output.  In this mode, with FASTQ output  (the
              default) the uncombined reads are discarded.  With tab-delimited output, uncombined
              reads are included in the tab-delimited data written to standard output.   In  both
              cases,  histogram  files  are  not  written, and informational messages are sent to
              standard error rather than to standard output.

       -z, --compress
              Compress the output files directly with zlib,  using  the  gzip  container  format.
              Similar  to  specifying  --compress-prog=gzip  and --suffix=gz, but may be slightly
              faster.

       --compress-prog=PROG
              Pipe the output through the compression program PROG, which will be called as `PROG
              -c  -',  plus  any  arguments  specified  by  --compress-prog-args.  PROG must read
              uncompressed data from standard input and write compressed data to standard  output
              when invoked as noted above.  Examples: gzip, bzip2, xz, pigz.

       --compress-prog-args=ARGS
              A  string of additional arguments that will be passed to the compression program if
              one is specified with --compress-prog=PROG.  (The arguments '-c -' are still passed
              in addition to explicitly specified arguments.)

       --suffix=SUFFIX, --output-suffix=SUFFIX
              Use  SUFFIX  as  the  suffix  of the output files after ".fastq".  A dot before the
              suffix is assumed, unless an empty suffix is provided.  Default: nothing;  or  'gz'
              if -z is specified; or PROG if --compress-prog=PROG is specified.

       -t, --threads=NTHREADS
              Set  the  number  of  worker  threads.   This  is  in  addition to the I/O threads.
              Default: number of  processors.   Note:  if  you  need  FLASH's  output  to  appear
              deterministically or in the same order as the original reads, you must specify -t 1
              (--threads=1).

       -q, --quiet
              Do not print informational messages.

       -h, --help
              Display this help and exit.

       -v, --version
              Display version.

AUTHOR

        This manpage was written by Andreas Tille for the Debian distribution and
        can be used for any other usage of the program.