xenial (1) pbsim.1.gz

Provided by: pbsim_1.0.3-1_amd64 bug

NAME

       pbsim - simulator for PacBio sequencing reads

SYNOPSIS

       pbsim options <reference.fasta>

DESCRIPTION

       The pbsim command produces simulated PacBio reads for reference FASTA sequence <reference.fasta>.

       Model files (parameters for the --model-qc option) can be found in the /usr/share/pbsim/models directory.

OPTIONS

       The options for pbsim can be divided into general, sampling-based and model-based simulation options.

   General options
       --prefix
              prefix of output files (sd).

       --data-type
              data type. CLR or CCS (CLR).

       --depth
              depth of coverage (CLR: 20.0, CCS: 50.0).

       --length-min
              minimum length (100).

       --length-max
              maximum length (CLR: 25000, CCS: 2500).

       --accuracy-min
              minimum accuracy (CLR: 0.75, CCS: fixed as 0.75). This option can be used only in case of CLR.

       --accuracy-max
              maximum accuracy (CLR: 1.00, CCS: fixed as 1.00). This option can be used only in case of CLR.

       --difference-ratio
              ratio  of  differences.  substitution:insertion:deletion.  Each  value  up to 1000 (CLR: 10:60:30,
              CCS:6:21:73).

       --seed for a pseudorandom number generator (Unix time).

   Options for sampling-based simulation
       --sample-fastq
              FASTQ format file to sample.

       --sample-profile-id
              sample-fastq  (filtered)   profile   ID.   When   using   --sample-fastq,   profile   is   stored.
              sample_profile_<ID>.fastq,   and   sample_profile_<ID>_.stats   are   created.   When   not  using
              --sample-fastq,  profile  is  re-used.  Note  that  when  profile   is   used,   --length-min,max,
              --accuracy-min,max would be the same as the profile.

   Options for model-based simulation
       --model_qc
              model of quality code.

       --length-mean
              mean of length model (CLR: 3000.0, CCS:450.0).

       --length-sd
              standard deviation of length model (CLR: 2300.0, CCS: 170.0).

       --accuracy-mean
              mean  of  accuracy  model (CLR: 0.78, CCS: fixed as 0.98). This option can be used only in case of
              CLR.

       --accuracy-sd
              standard deviation of accuracy model (CLR: 0.02, CCS: fixed as 0.02). This option can be used only
              in case of CLR.

EXAMPLES

       To run model-based simulation:

           pbsim --data-type CLR \
                 --depth 20 \
                 --model_qc /usr/share/pbsim/models/model_qc_clr \
                 reference.fasta

       In  the  example  above,  simulated  read  sequences  are  randomly  sampled  from  a  reference sequence
       ("reference.fasta") and differences (errors) of the sampled reads are introduced. Data type is  CLR,  and
       coverage  depth  is  20. If the reference sequence is multi-FASTA file, the simulated data is created for
       each FASTA. Three output files are created for each FASTA. "sd_0001.ref" is a single-FASTA file which  is
       copied  from  the  reference  sequence.  "sd_0001.fastq" is a simulated read dataset in the FASTQ format.
       "sd_0001.maf" is a list of alignments between reference sequence and simulated reads in the  MAF  format.
       The length and accuracy of reads are simulated based on our model of PacBio read.

       To run sampling-based simulation:

           pbsim --data-type CLR \
                 --depth 20 \
                 --sample-fastq sample.fastq \
                 reference.fastaq

       In  the  sampling-based  simulation,  read length and quality score are the same as those of a read taken
       randomly in the sample PacBio dataset ("sample.fastq").

LICENSE

       pbsim is available under the terms of the GNU General Public License, version 2 (GPL-2).

AUTHORS

       Michiaki Hamada (mhamada@k.u-tokyo.ac.jp), Yukiteru Ono

                                                  January 2016                                          PBSIM(1)