Ubuntu Manpage: artfastqgenerator - outputs artificial FASTQ files derived from a reference genome

Provided by: artfastqgenerator_0.0.20150519-2_all

NAME

       artfastqgenerator - outputs artificial FASTQ files derived from a reference genome

SYNOPSIS

       artfastqgenerator   -O   <outputPath>   -R   <referenceGenomePath>   -S   <startSequenceIdentifier>   -F1
       <fastq1ForQualityScores>   -F2   <fastq2ForQualityScores>   -CMGCS   <coverageMeanGCcontentSpread>   -CMP
       <coverageMeanPeak>  -CMPGC  <coverageMeanPeakGCcontent> -CSD <coverageSD> -E <endSequenceIdentifier> -GCC
       <GCcontentBasedCoverage> -GCR <GCcontentRegionSize> -L  <logRegionStats>  -N  <nucleobaseBufferSize>  -OF
       <outputFormat>   -RCNF   <readsContainingNfilter>   -RL   <readLength>   -SE  <simulateErrorInRead>  -TLM
       <templateLengthMean> -TLSD <templateLengthSD> -URQS <useRealQualityScores> -X <xStart> -Y <yStart>

DESCRIPTION

       ArtificialFastqGenerator takes the reference genome (in FASTA format) as  input  and  outputs  artificial
       FASTQ  files in the Sanger format. It can accept Phred base quality scores from existing FASTQ files, and
       use them to simulate sequencing errors. Since the  artificial  FASTQs  are  derived  from  the  reference
       genome,   the  reference  genome  provides  a  gold-standard  for  calling  variants  (Single  Nucleotide
       Polymorphisms (SNPs) and insertions and deletions (indels)). This enables evaluation of a Next Generation
       Sequencing (NGS) analysis pipeline which aligns  reads  to  the  reference  genome  and  then  calls  the
       variants.

OPTIONS

       -h     Print usage help.

       -O, <outputPath>
              Path for the artificial fastq and log files, including their base name (must be specified).

       -R, <referenceGenomePath>
              Reference genome sequence file, (must be specified).

       -S, <startSequenceIdentifier>
              Prefix  of the sequence identifier in the reference after which read generation should begin (must
              be specified).

       -F1, <fastq1ForQualityScores>
              First fastq file to use for real quality scores, (must  be  specified  if  useRealQualityScores  =
              true).

       -F2, <fastq2ForQualityScores>
              Second  fastq  file  to  use for real quality scores, (must be specified if useRealQualityScores =
              true).

       -CMGCS, <coverageMeanGCcontentSpread>
              The spread of coverage mean given GC content (default = 0.22).

       -CMP, <coverageMeanPeak>
              The peak coverage mean for a region (default = 37.7).

       -CMPGC, <coverageMeanPeakGCcontent>
              The GC content for regions with peak coverage mean (default = 0.45).

       -CSD, <coverageSD>
              The coverage standard deviation divided by the mean (default = 0.2).

       -E, <endSequenceIdentifier>
              Prefix of the sequence identifier in the reference where read generation should stop,  (default  =
              end of file).

       -GCC, <GCcontentBasedCoverage>
              Whether nucleobase coverage is biased by GC content (default = true).

       -GCR, <GCcontentRegionSize>
              Region size in nucleobases for which to calculate GC content, (default = 150).

       -L, <logRegionStats>
              The  region size as a multiple of -NBS for which summary coverage statistics are recorded (default
              = 2).

       -N, <nucleobaseBufferSize>
              The number of reference sequence nucleobases to buffer in memory, (default = 5000).

       -OF, <outputFormat>
               'default': standard fastq output; 'debug_nucleobases(_nuc|read_ids)': debugging.

       -RCNF, <readsContainingNfilter>
              Filter out no "N-containing" reads (0), "all-N" reads (1), "at-least-1-N" reads  (2),  (default  =
              0).

       -RL, <readLength>
              The length of each read, (default = 76).

       -SE, <simulateErrorInRead>
              Whether to simulate error in the read based on the quality scores, (default = false).

       -TLM, <templateLengthMean>
              The mean DNA template length, (default = 210).

       -TLSD, <templateLengthSD>
              The standard deviation of the DNA template length, (default = 60).

       -URQS, <useRealQualityScores>
              Whether to use real quality scores from existing fastq files or set all to the maximum, (default =
              false).

       -X, <xStart>
              The first read's X coordinate, (default = 1000).

       -Y, <yStart>
              The first read's Y coordinate, (default = 1000).

BUGS

       Any bugs should be reported to Matthew.Frampton@icr.ac.uk

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.

artfastqgenerator 0.0.20150519                    February 2016                             ARTFASTQGENERATOR(1)