Provided by: artfastqgenerator_0.0.20150519-3_all bug

NAME

       artfastqgenerator - outputs artificial FASTQ files derived from a reference genome

SYNOPSIS

       artfastqgenerator   -O   <outputPath>   -R   <referenceGenomePath>   -S   <startSequenceIdentifier>   -F1
       <fastq1ForQualityScores>   -F2   <fastq2ForQualityScores>   -CMGCS   <coverageMeanGCcontentSpread>   -CMP
       <coverageMeanPeak>  -CMPGC  <coverageMeanPeakGCcontent> -CSD <coverageSD> -E <endSequenceIdentifier> -GCC
       <GCcontentBasedCoverage> -GCR <GCcontentRegionSize> -L  <logRegionStats>  -N  <nucleobaseBufferSize>  -OF
       <outputFormat>   -RCNF   <readsContainingNfilter>   -RL   <readLength>   -SE  <simulateErrorInRead>  -TLM
       <templateLengthMean> -TLSD <templateLengthSD> -URQS <useRealQualityScores> -X <xStart> -Y <yStart>

DESCRIPTION

       ArtificialFastqGenerator takes the reference genome (in FASTA format) as  input  and  outputs  artificial
       FASTQ  files in the Sanger format. It can accept Phred base quality scores from existing FASTQ files, and
       use them to simulate sequencing errors. Since the  artificial  FASTQs  are  derived  from  the  reference
       genome,   the  reference  genome  provides  a  gold-standard  for  calling  variants  (Single  Nucleotide
       Polymorphisms (SNPs) and insertions and deletions (indels)). This enables evaluation of a Next Generation
       Sequencing  (NGS)  analysis  pipeline  which  aligns  reads  to  the  reference genome and then calls the
       variants.

OPTIONS

       -h     Print usage help.

       -O, <outputPath>
              Path for the artificial fastq and log files, including their base name (must be specified).

       -R, <referenceGenomePath>
              Reference genome sequence file, (must be specified).

       -S, <startSequenceIdentifier>
              Prefix of the sequence identifier in the reference after which read generation should begin  (must
              be specified).

       -F1, <fastq1ForQualityScores>
              First  fastq  file  to  use  for real quality scores, (must be specified if useRealQualityScores =
              true).

       -F2, <fastq2ForQualityScores>
              Second fastq file to use for real quality scores, (must be  specified  if  useRealQualityScores  =
              true).

       -CMGCS, <coverageMeanGCcontentSpread>
              The spread of coverage mean given GC content (default = 0.22).

       -CMP, <coverageMeanPeak>
              The peak coverage mean for a region (default = 37.7).

       -CMPGC, <coverageMeanPeakGCcontent>
              The GC content for regions with peak coverage mean (default = 0.45).

       -CSD, <coverageSD>
              The coverage standard deviation divided by the mean (default = 0.2).

       -E, <endSequenceIdentifier>
              Prefix  of  the sequence identifier in the reference where read generation should stop, (default =
              end of file).

       -GCC, <GCcontentBasedCoverage>
              Whether nucleobase coverage is biased by GC content (default = true).

       -GCR, <GCcontentRegionSize>
              Region size in nucleobases for which to calculate GC content, (default = 150).

       -L, <logRegionStats>
              The region size as a multiple of -NBS for which summary coverage statistics are recorded  (default
              = 2).

       -N, <nucleobaseBufferSize>
              The number of reference sequence nucleobases to buffer in memory, (default = 5000).

       -OF, <outputFormat>
               'default': standard fastq output; 'debug_nucleobases(_nuc|read_ids)': debugging.

       -RCNF, <readsContainingNfilter>
              Filter  out  no  "N-containing" reads (0), "all-N" reads (1), "at-least-1-N" reads (2), (default =
              0).

       -RL, <readLength>
              The length of each read, (default = 76).

       -SE, <simulateErrorInRead>
              Whether to simulate error in the read based on the quality scores, (default = false).

       -TLM, <templateLengthMean>
              The mean DNA template length, (default = 210).

       -TLSD, <templateLengthSD>
              The standard deviation of the DNA template length, (default = 60).

       -URQS, <useRealQualityScores>
              Whether to use real quality scores from existing fastq files or set all to the maximum, (default =
              false).

       -X, <xStart>
              The first read's X coordinate, (default = 1000).

       -Y, <yStart>
              The first read's Y coordinate, (default = 1000).

BUGS

       Any bugs should be reported to Matthew.Frampton@icr.ac.uk

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.