Ubuntu Manpage: mason_frag_sequencing - Fragment Sequencing Simulation

Provided by: seqan-apps_2.4.0+dfsg-12ubuntu2_amd64

NAME

       mason_frag_sequencing - Fragment Sequencing Simulation

SYNOPSIS

       mason_frag_sequencing [OPTIONS] -i IN.fa -o OUT.{fa,fq} [-or OUT2.{fa,fq}]

DESCRIPTION

       Given a FASTA file with fragments, simulate sequencing thereof.

       This  program  is  a  more lightweight version of mason_sequencing without support for the
       application of VCF and fragment sampling.  Output of SAM is also not available.   However,
       it   uses   the  same  code  for  the  simulation  of  the  reads  as  the  more  powerful
       mason_simulator.

       You can use  mason_frag_sequencing  if  you  want  to  implement  you  rown  fragmentation
       behaviour, e.g. if you have implemented your own bias models.

OPTIONS

       -h, --help
              Display the help message.

       --version
              Display version information.

       -q, --quiet
              Low verbosity.

       -v, --verbose
              Higher verbosity.

       -vv, --very-verbose
              Highest verbosity.

       --seed INTEGER
              Seed to use for random number generator. Default: 0.

       -i, --in INPUT_FILE
              Path  to  input  file. Valid filetypes are: .sam[.*], .raw[.*], .gbk[.*], .frn[.*],
              .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*],  .embl[.*],
              and  .bam,  where  *  is  any  of  the  following extensions: gz, bz2, and bgzf for
              transparent (de)compression.

       -o, --out OUTPUT_FILE
              Output of single-end/left end  reads.  Valid  filetypes  are:  .sam[.*],  .raw[.*],
              .frn[.*],  .fq[.*],  .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*],
              and .bam, where * is any of  the  following  extensions:  gz,  bz2,  and  bgzf  for
              transparent (de)compression.

       -or, --out-right OUTPUT_FILE
              Output  of  right  reads.  Giving this options enables paired-end simulation. Valid
              filetypes  are:  .sam[.*],  .raw[.*],  .frn[.*],   .fq[.*],   .fna[.*],   .ffn[.*],
              .fastq[.*],  .fasta[.*],  .faa[.*],  .fa[.*],  and  .bam,  where  *  is  any of the
              following extensions: gz, bz2, and bgzf for transparent (de)compression.

       --force-single-end
              Force single-end simulation although --out-right is given.

   Global Read Simulation Options:
       --seq-technology STRING
              Set sequencing technology to simulate. One of illumina, 454, and  sanger.  Default:
              illumina.

       --seq-mate-orientation STRING
              Orientation  for  paired reads.  See section Read Orientation below. One of FR, RF,
              FF, and FF2. Default: FR.

       --seq-strands STRING
              Strands to simulate from, only applicable to paired sequencing simulation.  One  of
              forward, reverse, and both. Default: both.

       --embed-read-info
              Whether or not to embed read information.

       --read-name-prefix STRING
              Read names will have this prefix. Default: simulated..

   BS-Seq Options:
       --enable-bs-seq
              Enable BS-seq simulation.

       --bs-seq-protocol STRING
              Protocol  to  use  for  BS-Seq  simulation.  One  of directional and undirectional.
              Default: directional.

       --bs-seq-conversion-rate DOUBLE
              Conversion rate for unmethylated Cs to become Ts. In range [0..1]. Default: 0.99.

   Illumina Options:
       --illumina-read-length INTEGER
              Read length for Illumina simulation. In range [1..inf]. Default: 100.

       --illumina-error-profile-file INPUT_FILE
              Path to file with Illumina error profile.  The  file  must  be  a  text  file  with
              floating  point  numbers  separated  by space, each giving a positional error rate.
              Valid filetype is: .txt.

       --illumina-prob-insert DOUBLE
              Insert per-base probability for insertion in Illumina sequencing. In range  [0..1].
              Default: 0.00005.

       --illumina-prob-deletion DOUBLE
              Insert  per-base  probability for deletion in Illumina sequencing. In range [0..1].
              Default: 0.00005.

       --illumina-prob-mismatch-scale DOUBLE
              Scaling factor for Illumina mismatch probability. In range [0..inf]. Default: 1.0.

       --illumina-prob-mismatch DOUBLE
              Average per-base mismatch probability in Illumina sequencing. In range  [0.0..1.0].
              Default: 0.004.

       --illumina-prob-mismatch-begin DOUBLE
              Per-base  mismatch  probability  of  first  base  in  Illumina sequencing. In range
              [0.0..1.0]. Default: 0.002.

       --illumina-prob-mismatch-end DOUBLE
              Per-base mismatch probability  of  last  base  in  Illumina  sequencing.  In  range
              [0.0..1.0]. Default: 0.012.

       --illumina-position-raise DOUBLE
              Point where the error curve raises in relation to read length. In range [0.0..1.0].
              Default: 0.66.

       --illumina-quality-mean-begin DOUBLE
              Mean PHRED quality for non-mismatch bases of first  base  in  Illumina  sequencing.
              Default: 40.0.

       --illumina-quality-mean-end DOUBLE
              Mean  PHRED  quality  for  non-mismatch  bases of last base in Illumina sequencing.
              Default: 39.5.

       --illumina-quality-stddev-begin DOUBLE
              Standard deviation of PHRED  quality  for  non-mismatch  bases  of  first  base  in
              Illumina sequencing. Default: 0.05.

       --illumina-quality-stddev-end DOUBLE
              Standard deviation of PHRED quality for non-mismatch bases of last base in Illumina
              sequencing. Default: 10.0.

       --illumina-mismatch-quality-mean-begin DOUBLE
              Mean PHRED quality for  mismatch  bases  of  first  base  in  Illumina  sequencing.
              Default: 40.0.

       --illumina-mismatch-quality-mean-end DOUBLE
              Mean PHRED quality for mismatch bases of last base in Illumina sequencing. Default:
              30.0.

       --illumina-mismatch-quality-stddev-begin DOUBLE
              Standard deviation of PHRED quality for mismatch bases of first  base  in  Illumina
              sequencing. Default: 3.0.

       --illumina-mismatch-quality-stddev-end DOUBLE
              Standard  deviation  of  PHRED  quality for mismatch bases of last base in Illumina
              sequencing. Default: 15.0.

       --illumina-left-template-fastq INPUT_FILE
              FASTQ file to use for a template for left-end reads. Valid filetypes are: .sam[.*],
              .raw[.*],  .gbk[.*], .frn[.*], .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*],
              .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any of the following extensions:
              gz, bz2, and bgzf for transparent (de)compression.

       --illumina-right-template-fastq INPUT_FILE
              FASTQ  file  to  use  for  a  template  for  right-end  reads. Valid filetypes are:
              .sam[.*], .raw[.*], .gbk[.*], .frn[.*], .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*],
              .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any of the following
              extensions: gz, bz2, and bgzf for transparent (de)compression.

   Sanger Sequencing Options:
       --sanger-read-length-model STRING
              The model to use for sampling the Sanger read length. One of  normal  and  uniform.
              Default: normal.

       --sanger-read-length-min INTEGER
              The  minimal  read  length  when  the  read  length  is sampled uniformly. In range
              [0..inf]. Default: 400.

       --sanger-read-length-max INTEGER
              The maximal read length when  the  read  length  is  sampled  uniformly.  In  range
              [0..inf]. Default: 600.

       --sanger-read-length-mean DOUBLE
              The  mean  read length when the read length is sampled with normal distribution. In
              range [0..inf]. Default: 400.

       --sanger-read-length-error DOUBLE
              The read length standard deviation when the read length is  sampled  uniformly.  In
              range [0..inf]. Default: 40.

       --sanger-prob-mismatch-scale DOUBLE
              Scaling factor for Sanger mismatch probability. In range [0..inf]. Default: 1.0.

       --sanger-prob-mismatch-begin DOUBLE
              Per-base  mismatch  probability  of  first  base  in  Sanger  sequencing.  In range
              [0.0..1.0]. Default: 0.005.

       --sanger-prob-mismatch-end DOUBLE
              Per-base  mismatch  probability  of  last  base  in  Sanger  sequencing.  In  range
              [0.0..1.0]. Default: 0.001.

       --sanger-prob-insertion-begin DOUBLE
              Per-base  insertion  probability  of  first  base  in  Sanger  sequencing. In range
              [0.0..1.0]. Default: 0.0025.

       --sanger-prob-insertion-end DOUBLE
              Per-base insertion  probability  of  last  base  in  Sanger  sequencing.  In  range
              [0.0..1.0]. Default: 0.005.

       --sanger-prob-deletion-begin DOUBLE
              Per-base  deletion  probability  of  first  base  in  Sanger  sequencing.  In range
              [0.0..1.0]. Default: 0.0025.

       --sanger-prob-deletion-end DOUBLE
              Per-base  deletion  probability  of  last  base  in  Sanger  sequencing.  In  range
              [0.0..1.0]. Default: 0.005.

       --sanger-quality-match-start-mean DOUBLE
              Mean  PHRED  quality  for  non-mismatch  bases  of first base in Sanger sequencing.
              Default: 40.0.

       --sanger-quality-match-end-mean DOUBLE
              Mean PHRED quality for non-mismatch  bases  of  last  base  in  Sanger  sequencing.
              Default: 39.5.

       --sanger-quality-match-start-stddev DOUBLE
              Mean  PHRED  quality  for  non-mismatch  bases  of first base in Sanger sequencing.
              Default: 0.1.

       --sanger-quality-match-end-stddev DOUBLE
              Mean PHRED quality for non-mismatch  bases  of  last  base  in  Sanger  sequencing.
              Default: 2.

       --sanger-quality-error-start-mean DOUBLE
              Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default:
              30.

       --sanger-quality-error-end-mean DOUBLE
              Mean PHRED quality for erroneous bases of last base in Sanger sequencing.  Default:
              20.

       --sanger-quality-error-start-stddev DOUBLE
              Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default:
              2.

       --sanger-quality-error-end-stddev DOUBLE
              Mean PHRED quality for erroneous bases of last base in Sanger sequencing.  Default:
              5.

   454 Sequencing Options:
       --454-read-length-model STRING
              The  model  to  use  for  sampling  the 454 read length. One of normal and uniform.
              Default: normal.

       --454-read-length-min INTEGER
              The minimal read length when  the  read  length  is  sampled  uniformly.  In  range
              [0..inf]. Default: 10.

       --454-read-length-max INTEGER
              The  maximal  read  length  when  the  read  length  is sampled uniformly. In range
              [0..inf]. Default: 600.

       --454-read-length-mean DOUBLE
              The mean read length when the read length is sampled with normal  distribution.  In
              range [0..inf]. Default: 400.

       --454-read-length-stddev DOUBLE
              The  read  length  standard  deviation  when the read length is sampled with normal
              distribution. In range [0..inf]. Default: 40.

       --454-no-sqrt-in-std-dev
              For error model, if set then (sigma = k * r)) is  used,  otherwise  (sigma  =  k  *
              sqrt(r)).

       --454-proportionality-factor DOUBLE
              Proportionality  factor  for calculating the standard deviation proportional to the
              read length. In range [0..inf]. Default: 0.15.

       --454-background-noise-mean DOUBLE
              Mean of lognormal distribution to use for the noise. In  range  [0..inf].  Default:
              0.23.

       --454-background-noise-stddev DOUBLE
              Standard  deviation  of  lognormal  distribution  to  use  for  the noise. In range
              [0..inf]. Default: 0.15.

SEQUENCING SIMULATION

       Simulation of base qualities is disabled when writing  out  FASTA  files.   Simulation  of
       paired-end sequencing is enabled when specifying two output files.

READ ORIENTATION

       You  can  use the --mate-orientation to set the relative orientation when doing paired-end
       sequencing.  The valid values are given in the following.

       FR     Reads are inward-facing, the same as Illumina paired-end reads: R1 --> <-- R2.

       RF     Reads are outward-facing, the same as Illumina mate-pair reads: R1 <-- --> R2.

       FF     Reads are on the same strand: R1 --> --> R2.

       FF2    Reads are on the same strand but the "right" reads are sequenced to the left of the
              "left" reads, same as 454 paired: R2 --> --> R1.