Provided by: art-nextgen-simulation-tools_20160605+dfsg-4build2_amd64
NAME
art_illumina - Simulation of Illumina sequencers
DESCRIPTION
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. art_illumina can be used for Simulation of Illumina sequencers
USAGE
art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -f <fold_coverage> -ss <sequencing_system> -o <outfile_prefix> art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -f <fold_coverage> -o <outfile_prefix> art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -c <total_num_reads> -o <outfile_prefix> art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -f <fold_coverage> -m <mean_fragsize> -s <std_fragsize> -o <outfile_prefix> art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -c <total_num_reads> -m <mean_fragsize> -s <std_fragsize> -o <outfile_prefix>
OPTIONS
-1 --qprof1 the first-read quality profile -2 --qprof2 the second-read quality profile -amp --amplicon amplicon sequencing simulation -c --rcount total number of reads/read pairs to be generated [per amplicon if for amplicon simulation](not be used together with -f/--fcov) -d --id the prefix identification tag for read ID -ef --errfree indicate to generate the zero sequencing errors SAM file as well the regular one NOTE: the reads in the zero-error SAM file have the same alignment positions as those in the regular SAM file, but have no sequencing errors -f --fcov the fold of read coverage to be simulated or number of reads/read pairs generated for each amplicon -h --help print out usage information -i --in the filename of input DNA/RNA reference -ir --insRate the first-read insertion rate (default: 0.00009) -ir2 --insRate2 the second-read insertion rate (default: 0.00015) -dr --delRate the first-read deletion rate (default: 0.00011) -dr2 --delRate2 the second-read deletion rate (default: 0.00023) -l --len the length of reads to be simulated -m --mflen the mean size of DNA/RNA fragments for paired-end simulations -mp --matepair indicate a mate-pair read simulation -nf --maskN the cutoff frequency of 'N' in a window size of the read length for masking genomic regions NOTE: default: '-nf 1' to mask all regions with 'N'. Use '-nf 0' to turn off masking -na --noALN do not output ALN alignment file -o --out the prefix of output filename -p --paired indicate a paired-end read simulation or to generate reads from both ends of amplicons NOTE: art will automatically switch to a mate-pair simulation if the given mean fragment size >= 2000 -q --quiet turn off end of run summary -qs --qShift the amount to shift every first-read quality score by -qs2 --qShift2 the amount to shift every second-read quality score by NOTE: For -qs/-qs2 option, a positive number will shift up quality scores (the max is 93) that reduce substitution sequencing errors and a negative number will shift down quality scores that increase sequencing errors. If shifting scores by x, the error rate will be 1/(10^(x/10)) of the default profile. -rs --rndSeed the seed for random number generator (default: system time in second) NOTE: using a fixed seed to generate two identical datasets from different runs -s --sdev the standard deviation of DNA/RNA fragment size for paired-end simulations. -sam --samout indicate to generate SAM alignment file -sp --sepProf indicate to use separate quality profiles for different bases (ATGC) -ss --seqSys The name of Illumina sequencing system of the built-in profile used for simulation NOTE: sequencing system id names are: GA1 - Genome Analyzer I, GA2 - Genome Analyzer II HS10 - HiSeq 1000, HS20 - HiSeq 2000, HS25 - HiSeq 2500, MS - MiSeq -M --cigarM indicate to use CIGAR 'M' instead of '=/X' for alignment match/mismatch
NOTES
* ART by default selects a built-in quality score profile according to the read length specified for the run. * For single-end simulation, ART requires input sequence file, outputfile prefix, read length, and read count/fold coverage. * For paired-end simulation (except for amplicon sequencing), ART also requires the parameter values of the mean and standard deviation of DNA/RNA fragment lengths
EXAMPLES
1) single-end read simulation art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 10 -o single_dat 2) paired-end read simulation art_illumina -sam -i reference.fa -p -l 150 -ss HS25 -f 20 -m 200 -s 10 -o paired_dat 3) mate-pair read simulation art_illumina -sam -i reference.fa -mp -l 50 -f 20 -m 2500 -s 50 -o matepair_dat 4) amplicon sequencing simulation with 5' end single-end reads art_illumina -amp -sam -na -i amp_reference.fa -l 50 -f 10 -o amplicon_5end_dat 5) amplicon sequencing simulation with paired-end reads art_illumina -amp -p -sam -na -i amp_reference.fa -l 50 -f 10 -o amplicon_pair_dat 6) amplicon sequencing simulation with matepair reads art_illumina -amp -mp -sam -na -i amp_reference.fa -l 50 -f 10 -o amplicon_mate_dat 7) generate an extra SAM file with zero-sequencing errors for a paired-end read simulation art_illumina -ef -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o paired_twosam_dat 8) reduce the substitution error rate to one 10th of the default profile art_illumina -i reference.fa -qs 10 -qs2 10 -l 50 -f 10 -p -m 500 -s 10 -sam -o reduce_error 9) turn off the masking of genomic regions with unknown nucleotides 'N' art_illumina -nf 0 -sam -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o paired_nomask 10) masking genomic regions with >=5 'N's within the read length 50 art_illumina -nf 5 -sam -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o paired_maskN5
AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.