Provided by: seqan-apps_2.4.0+dfsg-15ubuntu1_amd64 bug

NAME

       rabema_build_gold_standard - RABEMA Gold Standard Builder

SYNOPSIS

       rabema_build_gold_standard   [OPTIONS]   --out-gsi  OUT.gsi  --reference  REF.fa  --in-bam
       PERFECT.{sam,bam}

DESCRIPTION

       This program allows one to build a RABEMA gold standard.  The input is a  reference  FASTA
       file and a perfect SAM/BAM map (e.g. created using RazerS 3 in full-sensitivity mode).

       The  input  SAM/BAM  file  must  be sorted by coordinate.  The program will create a FASTA
       index file REF.fa.fai for fast random access to the reference.

OPTIONS

       -h, --help
              Display the help message.

       --version
              Display version information.

       -v, --verbose
              Enable verbose output.

       -vv, --very-verbose
              Enable even more verbose output.

   Input / Output:
       -o, --out-gsi OUTPUT_FILE
              Path to write the resulting GSI file to. Valid filetype is: .gsi[.*],  where  *  is
              any of the following extensions: gz for transparent (de)compression.

       -r, --reference INPUT_FILE
              Path  to  load  reference  FASTA  from.  Valid  filetypes  are: .sam[.*], .raw[.*],
              .gbk[.*], .frn[.*], .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*],  .faa[.*],
              .fa[.*],  .embl[.*], and .bam, where * is any of the following extensions: gz, bz2,
              and bgzf for transparent (de)compression.

       -b, --in-bam INPUT_FILE
              Path to load the "perfect" SAM/BAM file from. Valid  filetypes  are:  .sam[.*]  and
              .bam, where * is any of the following extensions: gz, bz2, and bgzf for transparent
              (de)compression.

   Gold Standard Parameters:
       --oracle-mode
              Enable oracle mode.  This is used for simulated data when the  input  SAM/BAM  file
              gives exactly one position that is considered as the true sample position.

       --match-N
              When set, N matches all characters without penalty.

       --distance-metric STRING
              Set  distance  metric.  Valid values: hamming, edit.  Default: edit. One of hamming
              and edit. Default: edit.

       -e, --max-error INTEGER
              Maximal error rate to build gold standard for in percent.   This  parameter  is  an
              integer  and  relative  to the read length.  In case of oracle mode, the error rate
              for the read at the sampling position  is  used  and  RATE  is  used  as  a  cutoff
              threshold. Default: 0.

RETURN VALUES

       A return value of 0 indicates success, any other value indicates an error.

EXAMPLES

       rabema_build_gold_standard -e 4 -o OUT.gsi -s IN.sam -r REF.fa
              Build  gold  standard from a SAM file IN.sam with all mapping locations and a FASTA
              reference REF.fa to GSI file OUT.gsi with a maximal error rate of 4.

       rabema_build_gold_standard --distance-metric edit -e 4 -o OUT.gsi -b IN.bam -r REF.fa
              Same as above, but using Hamming instead of edit distance and BAM as the input.

       rabema_build_gold_standard --oracle-mode -o OUT.gsi -s IN.sam -r REF.fa
              Build gold standard from a SAM file IN.sam with the original sample position,  e.g.
              as exported by read simulator Mason.

MEMORY REQUIREMENTS

       From  version  1.1,  great  care  has been taken to keep the memory requirements as low as
       possible. There memory required is two times the size of the largest chromosome plus  some
       constant memory for each match.

       For  example, the memory usage for 100bp human genome reads at 5% error rate was 1.7GB. Of
       this, roughly 400GB came from the chromosome and 1.3GB from the matches.

REFERENCES

       M. Holtgrewe, A.-K. Emde, D. Weese and K. Reinert.  A Novel And Well-Defined  Benchmarking
       Method For Second Generation Read Mapping, BMC Bioinformatics 2011, 12:210.

       http://www.seqan.de/rabema
              RABEMA Homepage

       http://www.seqan.de/mason
              Mason Homepage