Provided by: pbsim_1.0.3+git20180330.e014b1d+dfsg-3_amd64 bug

NAME

       pbsim - simulator for PacBio sequencing reads

SYNOPSIS

       pbsim [options] <reference.fasta>

DESCRIPTION

       The pbsim command produces simulated PacBio reads for reference FASTA sequence
       <reference.fasta>.

       Model files (parameters for the --model-qc option) can be found in the
       /usr/share/pbsim/models directory.

OPTIONS

       The options for pbsim can be divided into general, sampling-based and model-based
       simulation options.

   General options
       --prefix
           prefix of output files (sd).

       --data-type
           data type. CLR or CCS (CLR).

       --depth
           depth of coverage (CLR: 20.0, CCS: 50.0).

       --length-min
           minimum length (100).

       --length-max
           maximum length (CLR: 25000, CCS: 2500).

       --accuracy-min
           minimum accuracy (CLR: 0.75, CCS: fixed as 0.75). This option can be used only in case
           of CLR.

       --accuracy-max
           maximum accuracy (CLR: 1.00, CCS: fixed as 1.00). This option can be used only in case
           of CLR.

       --difference-ratio
           ratio of differences. substitution:insertion:deletion. Each value up to 1000 (CLR:
           10:60:30, CCS:6:21:73).

       --seed
           for a pseudorandom number generator (Unix time).

   Options for sampling-based simulation
       --sample-fastq
           FASTQ format file to sample.

       --sample-profile-id
           sample-fastq (filtered) profile ID. When using --sample-fastq, profile is stored.
           sample_profile_<ID>.fastq, and sample_profile_<ID>_.stats are created. When not using
           --sample-fastq, profile is re-used. Note that when profile is used, --length-min,max,
           --accuracy-min,max would be the same as the profile.

   Options for model-based simulation
       --model_qc
           model of quality code.

       --length-mean
           mean of length model (CLR: 3000.0, CCS:450.0).

       --length-sd
           standard deviation of length model (CLR: 2300.0, CCS: 170.0).

       --accuracy-mean
           mean of accuracy model (CLR: 0.78, CCS: fixed as 0.98). This option can be used only
           in case of CLR.

       --accuracy-sd
           standard deviation of accuracy model (CLR: 0.02, CCS: fixed as 0.02). This option can
           be used only in case of CLR.

EXAMPLES

       To run model-based simulation:

           pbsim --data-type CLR \
                 --depth 20 \
                 --model_qc /usr/share/pbsim/models/model_qc_clr \
                 reference.fasta

       In the example above, simulated read sequences are randomly sampled from a reference
       sequence ("reference.fasta") and differences (errors) of the sampled reads are introduced.
       Data type is CLR, and coverage depth is 20. If the reference sequence is multi-FASTA file,
       the simulated data is created for each FASTA. Three output files are created for each
       FASTA. "sd_0001.ref" is a single-FASTA file which is copied from the reference sequence.
       "sd_0001.fastq" is a simulated read dataset in the FASTQ format. "sd_0001.maf" is a list
       of alignments between reference sequence and simulated reads in the MAF format. The length
       and accuracy of reads are simulated based on our model of PacBio read.

       To run sampling-based simulation:

           pbsim --data-type CLR \
                 --depth 20 \
                 --sample-fastq sample.fastq \
                 reference.fastaq

       In the sampling-based simulation, read length and quality score are the same as those of a
       read taken randomly in the sample PacBio dataset ("sample.fastq").

LICENSE

       pbsim is available under the terms of the GNU General Public License, version 2 (GPL-2).

AUTHORS

       Michiaki Hamada (mhamada@k.u-tokyo.ac.jp), Yukiteru Ono

                                                                                         PBSIM(1)