Ubuntu Manpage: dsimulator - generate synthetic reads for a random genome

NAME

       dsimulator - generate synthetic reads for a random genome

SYNOPSIS

       dsimulator genlen:double [-cdouble(20.)] [-bdouble(.5)] [-rint] [-mint(10000)] [-sint(2000)][-xint(4000)]
       [-edouble(.15)][-Mfile]

DESCRIPTION

dsimulator first generates a fake genome of size genlen*1Mb long, that has an AT-bias of -b. It then
generates sample reads of mean length -m from a log-normal length distribution with standard deviation
-s, but ignores reads of length less than -x. It collects enough reads to cover the genome -c times and
introduces -e fraction errors into each read where the ratio of insertions, deletions, and substitutions
are set by defined constants INS_RATE (default 73%) and DEL_RATE (default 20%) within generate.c. One
can also control the rate at which reads are picked from the forward and reverse strands by setting the
defined constant FLIP_RATE (default 50/50). The -r option seeds the random number generator for the
generation of the genome so that one can reproducibly generate the same underlying genome to sample from.
If this parameter is missing, then the job id of the invocation seeds the random number generator. The
output is sent to the standard output (i.e. it is a UNIX pipe). The output is in Pacbio .fasta format
suitable as input to fasta2DB(1). Finally, the -M option requests that the coordinates from which each
read has been sampled are written to the indicated file, one line per read, ASCII encoded. This "map"
file essentially tells one where every read belongs in an assembly and is very useful for debugging and
testing purposes. If a read pair is say b,e then if b < e the read was sampled from [b,e] in the forward
direction, and if b > e from [e,b] in the reverse direction.

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO