Ubuntu Manpage: mason_frag_sequencing - Fragment Sequencing Simulation

Provided by: seqan-apps_2.4.0+dfsg-12ubuntu2_amd64

NAME

       mason_frag_sequencing - Fragment Sequencing Simulation

SYNOPSIS

       mason_frag_sequencing [OPTIONS] -i IN.fa -o OUT.{fa,fq} [-or OUT2.{fa,fq}]

DESCRIPTION

       Given a FASTA file with fragments, simulate sequencing thereof.

       This program is a more lightweight version of mason_sequencing without support for the application of VCF
       and fragment sampling.  Output of SAM is also not available.  However, it uses  the  same  code  for  the
       simulation of the reads as the more powerful mason_simulator.

       You  can use mason_frag_sequencing if you want to implement you rown fragmentation behaviour, e.g. if you
       have implemented your own bias models.

OPTIONS

       -h, --help
              Display the help message.

       --version
              Display version information.

       -q, --quiet
              Low verbosity.

       -v, --verbose
              Higher verbosity.

       -vv, --very-verbose
              Highest verbosity.

       --seed INTEGER
              Seed to use for random number generator. Default: 0.

       -i, --in INPUT_FILE
              Path to input  file.  Valid  filetypes  are:  .sam[.*],  .raw[.*],  .gbk[.*],  .frn[.*],  .fq[.*],
              .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any
              of the following extensions: gz, bz2, and bgzf for transparent (de)compression.

       -o, --out OUTPUT_FILE
              Output of single-end/left end reads. Valid filetypes are: .sam[.*], .raw[.*],  .frn[.*],  .fq[.*],
              .fna[.*],  .ffn[.*],  .fastq[.*],  .fasta[.*],  .faa[.*], .fa[.*], and .bam, where * is any of the
              following extensions: gz, bz2, and bgzf for transparent (de)compression.

       -or, --out-right OUTPUT_FILE
              Output of right reads.  Giving this options enables paired-end simulation.  Valid  filetypes  are:
              .sam[.*],  .raw[.*],  .frn[.*],  .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*], .fasta[.*], .faa[.*],
              .fa[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf  for  transparent
              (de)compression.

       --force-single-end
              Force single-end simulation although --out-right is given.

   Global Read Simulation Options:
       --seq-technology STRING
              Set sequencing technology to simulate. One of illumina, 454, and sanger. Default: illumina.

       --seq-mate-orientation STRING
              Orientation  for  paired  reads.   See section Read Orientation below. One of FR, RF, FF, and FF2.
              Default: FR.

       --seq-strands STRING
              Strands to simulate from, only  applicable  to  paired  sequencing  simulation.  One  of  forward,
              reverse, and both. Default: both.

       --embed-read-info
              Whether or not to embed read information.

       --read-name-prefix STRING
              Read names will have this prefix. Default: simulated..

   BS-Seq Options:
       --enable-bs-seq
              Enable BS-seq simulation.

       --bs-seq-protocol STRING
              Protocol to use for BS-Seq simulation. One of directional and undirectional. Default: directional.

       --bs-seq-conversion-rate DOUBLE
              Conversion rate for unmethylated Cs to become Ts. In range [0..1]. Default: 0.99.

   Illumina Options:
       --illumina-read-length INTEGER
              Read length for Illumina simulation. In range [1..inf]. Default: 100.

       --illumina-error-profile-file INPUT_FILE
              Path  to  file  with  Illumina  error  profile.   The file must be a text file with floating point
              numbers separated by space, each giving a positional error rate. Valid filetype is: .txt.

       --illumina-prob-insert DOUBLE
              Insert per-base probability for insertion  in  Illumina  sequencing.  In  range  [0..1].  Default:
              0.00005.

       --illumina-prob-deletion DOUBLE
              Insert  per-base  probability  for  deletion  in  Illumina  sequencing.  In range [0..1]. Default:
              0.00005.

       --illumina-prob-mismatch-scale DOUBLE
              Scaling factor for Illumina mismatch probability. In range [0..inf]. Default: 1.0.

       --illumina-prob-mismatch DOUBLE
              Average per-base mismatch probability in Illumina sequencing. In range [0.0..1.0]. Default: 0.004.

       --illumina-prob-mismatch-begin DOUBLE
              Per-base mismatch probability of first base in Illumina sequencing. In range [0.0..1.0].  Default:
              0.002.

       --illumina-prob-mismatch-end DOUBLE
              Per-base  mismatch  probability of last base in Illumina sequencing. In range [0.0..1.0]. Default:
              0.012.

       --illumina-position-raise DOUBLE
              Point where the error curve raises in relation to read length. In range [0.0..1.0]. Default: 0.66.

       --illumina-quality-mean-begin DOUBLE
              Mean PHRED quality for non-mismatch bases of first base in Illumina sequencing. Default: 40.0.

       --illumina-quality-mean-end DOUBLE
              Mean PHRED quality for non-mismatch bases of last base in Illumina sequencing. Default: 39.5.

       --illumina-quality-stddev-begin DOUBLE
              Standard deviation of PHRED quality for non-mismatch bases of first base in  Illumina  sequencing.
              Default: 0.05.

       --illumina-quality-stddev-end DOUBLE
              Standard  deviation  of  PHRED quality for non-mismatch bases of last base in Illumina sequencing.
              Default: 10.0.

       --illumina-mismatch-quality-mean-begin DOUBLE
              Mean PHRED quality for mismatch bases of first base in Illumina sequencing. Default: 40.0.

       --illumina-mismatch-quality-mean-end DOUBLE
              Mean PHRED quality for mismatch bases of last base in Illumina sequencing. Default: 30.0.

       --illumina-mismatch-quality-stddev-begin DOUBLE
              Standard deviation of PHRED quality for mismatch bases  of  first  base  in  Illumina  sequencing.
              Default: 3.0.

       --illumina-mismatch-quality-stddev-end DOUBLE
              Standard  deviation  of  PHRED  quality  for  mismatch  bases of last base in Illumina sequencing.
              Default: 15.0.

       --illumina-left-template-fastq INPUT_FILE
              FASTQ file to use for a template for left-end reads.  Valid  filetypes  are:  .sam[.*],  .raw[.*],
              .gbk[.*],  .frn[.*],  .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*],  .fasta[.*],  .faa[.*], .fa[.*],
              .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent
              (de)compression.

       --illumina-right-template-fastq INPUT_FILE
              FASTQ  file  to  use  for a template for right-end reads. Valid filetypes are: .sam[.*], .raw[.*],
              .gbk[.*], .frn[.*],  .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*],  .fasta[.*],  .faa[.*],  .fa[.*],
              .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent
              (de)compression.

   Sanger Sequencing Options:
       --sanger-read-length-model STRING
              The model to use for sampling the Sanger read length. One of normal and uniform. Default: normal.

       --sanger-read-length-min INTEGER
              The minimal read length when the read length is sampled uniformly.  In  range  [0..inf].  Default:
              400.

       --sanger-read-length-max INTEGER
              The  maximal  read  length  when the read length is sampled uniformly. In range [0..inf]. Default:
              600.

       --sanger-read-length-mean DOUBLE
              The mean read length when the read length is sampled with normal distribution. In range  [0..inf].
              Default: 400.

       --sanger-read-length-error DOUBLE
              The  read  length standard deviation when the read length is sampled uniformly. In range [0..inf].
              Default: 40.

       --sanger-prob-mismatch-scale DOUBLE
              Scaling factor for Sanger mismatch probability. In range [0..inf]. Default: 1.0.

       --sanger-prob-mismatch-begin DOUBLE
              Per-base mismatch probability of first base in Sanger sequencing. In  range  [0.0..1.0].  Default:
              0.005.

       --sanger-prob-mismatch-end DOUBLE
              Per-base  mismatch  probability  of  last base in Sanger sequencing. In range [0.0..1.0]. Default:
              0.001.

       --sanger-prob-insertion-begin DOUBLE
              Per-base insertion probability of first base in Sanger sequencing. In range  [0.0..1.0].  Default:
              0.0025.

       --sanger-prob-insertion-end DOUBLE
              Per-base  insertion  probability  of last base in Sanger sequencing. In range [0.0..1.0]. Default:
              0.005.

       --sanger-prob-deletion-begin DOUBLE
              Per-base deletion probability of first base in Sanger sequencing. In  range  [0.0..1.0].  Default:
              0.0025.

       --sanger-prob-deletion-end DOUBLE
              Per-base  deletion  probability  of  last base in Sanger sequencing. In range [0.0..1.0]. Default:
              0.005.

       --sanger-quality-match-start-mean DOUBLE
              Mean PHRED quality for non-mismatch bases of first base in Sanger sequencing. Default: 40.0.

       --sanger-quality-match-end-mean DOUBLE
              Mean PHRED quality for non-mismatch bases of last base in Sanger sequencing. Default: 39.5.

       --sanger-quality-match-start-stddev DOUBLE
              Mean PHRED quality for non-mismatch bases of first base in Sanger sequencing. Default: 0.1.

       --sanger-quality-match-end-stddev DOUBLE
              Mean PHRED quality for non-mismatch bases of last base in Sanger sequencing. Default: 2.

       --sanger-quality-error-start-mean DOUBLE
              Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default: 30.

       --sanger-quality-error-end-mean DOUBLE
              Mean PHRED quality for erroneous bases of last base in Sanger sequencing. Default: 20.

       --sanger-quality-error-start-stddev DOUBLE
              Mean PHRED quality for erroneous bases of first base in Sanger sequencing. Default: 2.

       --sanger-quality-error-end-stddev DOUBLE
              Mean PHRED quality for erroneous bases of last base in Sanger sequencing. Default: 5.

   454 Sequencing Options:
       --454-read-length-model STRING
              The model to use for sampling the 454 read length. One of normal and uniform. Default: normal.

       --454-read-length-min INTEGER
              The minimal read length when the read length is sampled uniformly. In range [0..inf]. Default: 10.

       --454-read-length-max INTEGER
              The maximal read length when the read length is sampled uniformly.  In  range  [0..inf].  Default:
              600.

       --454-read-length-mean DOUBLE
              The  mean read length when the read length is sampled with normal distribution. In range [0..inf].
              Default: 400.

       --454-read-length-stddev DOUBLE
              The read length standard deviation when the read length is sampled with  normal  distribution.  In
              range [0..inf]. Default: 40.

       --454-no-sqrt-in-std-dev
              For error model, if set then (sigma = k * r)) is used, otherwise (sigma = k * sqrt(r)).

       --454-proportionality-factor DOUBLE
              Proportionality  factor for calculating the standard deviation proportional to the read length. In
              range [0..inf]. Default: 0.15.

       --454-background-noise-mean DOUBLE
              Mean of lognormal distribution to use for the noise. In range [0..inf]. Default: 0.23.

       --454-background-noise-stddev DOUBLE
              Standard deviation of lognormal distribution to use for the noise.  In  range  [0..inf].  Default:
              0.15.

SEQUENCING SIMULATION

       Simulation  of  base  qualities  is  disabled  when  writing  out  FASTA files.  Simulation of paired-end
       sequencing is enabled when specifying two output files.

READ ORIENTATION

       You can use the --mate-orientation to set the relative orientation when doing paired-end sequencing.  The
       valid values are given in the following.

       FR     Reads are inward-facing, the same as Illumina paired-end reads: R1 --> <-- R2.

       RF     Reads are outward-facing, the same as Illumina mate-pair reads: R1 <-- --> R2.

       FF     Reads are on the same strand: R1 --> --> R2.

       FF2    Reads  are on the same strand but the "right" reads are sequenced to the left of the "left" reads,
              same as 454 paired: R2 --> --> R1.