lunar (1) rabema_evaluate.1.gz

Provided by: seqan-apps_2.4.0+dfsg-15ubuntu1_amd64 bug

NAME

       rabema_evaluate - RABEMA Evaluation

SYNOPSIS

       rabema_evaluate [OPTIONS] --reference REF.fa --in-gsi IN.gsi --in-bam MAPPING.{sam,bam}

DESCRIPTION

       Compare  the  SAM/bam output MAPPING.sam/MAPPING.bam of any read mapper against the RABEMA
       gold standard previously built with rabema_build_gold_standard.  The input is a  reference
       FASTA file, a gold standard interval (GSI) file and the SAM/BAM input to evaluate.

       The input SAM/BAM file must be sorted by queryname.  The program will create a FASTA index
       file REF.fa.fai for fast random access to the reference.

OPTIONS

       -h, --help
              Display the help message.

       --version
              Display version information.

       -v, --verbose
              Enable verbose output.

       -vv, --very-verbose
              Enable even more verbose output.

   Input / Output:
       -r, --reference INPUT_FILE
              Path to load  reference  FASTA  from.  Valid  filetypes  are:  .sam[.*],  .raw[.*],
              .gbk[.*],  .frn[.*], .fq[.*], .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*],
              .fa[.*], .embl[.*], and .bam, where * is any of the following extensions: gz,  bz2,
              and bgzf for transparent (de)compression.

       -g, --in-gsi INPUT_FILE
              Path  to load gold standard intervals from. If compressed using gzip, the file will
              be decompressed on the fly. Valid filetype is: .gsi[.*], where  *  is  any  of  the
              following extensions: gz for transparent (de)compression.

       -b, --in-bam INPUT_FILE
              Path  to load the read mapper SAM or BAM output from. Valid filetypes are: .sam[.*]
              and .bam, where * is any of  the  following  extensions:  gz,  bz2,  and  bgzf  for
              transparent (de)compression.

       --out-tsv OUTPUT_FILE
              Path to write the statistics to as TSV. Valid filetype is: .rabema_report_tsv.

       --dont-check-sorting
              Do  not check sortedness (by name) of input SAM/BAM files.  This is required if the
              reads are not sorted by name in the original FASTQ files.  Files from the  SRA  and
              ENA generally are sorted.

   Benchmark Parameters:
       --oracle-mode
              Enable  oracle mode.  This is used for simulated data when the input GSI file gives
              exactly one position that is considered as the true sample position.  For simulated
              data.

       --only-unique-reads
              Consider  only reads that a single alignment in the mapping result file. Useful for
              precision computation.

       --match-N
              When set, N matches all characters without penalty.

       --distance-metric STRING
              Set distance metric.  Valid values: hamming, edit.  Default: edit. One  of  hamming
              and edit. Default: edit.

       -e, --max-error INTEGER
              Maximal  error  rate  to  build gold standard for in percent.  This parameter is an
              integer and relative to the read length.  The error rate is ignored in oracle mode,
              here  the  distance  of  the read at the sample position is taken, individually for
              each read.  Default: 0 Default: 0.

       -c, --benchmark-category STRING
              Set benchmark category.  One of {all, all-best, any-best.  Default: all One of all,
              all-best, and any-best. Default: all.

       --trust-NM
              When  set, we trust the alignment and distance from SAM/BAM file and no realignment
              is performed.  Off by default.

       --extra-pos-tag STRING
              If the CIGAR string is absent, the missing alignment end position can  be  provided
              by this BAM tag.

       --ignore-paired-flags
              When  set,  we ignore all SAM/BAM flags related to pairing.  This is necessary when
              analyzing SAM from SOAP's soap2sam.pl script.

       --DONT-PANIC
              Do not stop program execution if an additional hit was found  that  indicates  that
              the gold standard is incorrect.

   Logging:
       --show-missed-intervals
              Show details for each missed interval from the GSI.

       --show-invalid-hits
              Show details for invalid hits (with too high error rate).

       --show-additional-hits
              Show details for additional hits (low enough error rate but not in gold standard.

       --show-hits
              Show details for hit intervals.

       --show-try-hit
              Show details for each alignment in SAM/BAM input.

              The  occurrence  of "invalid" hits in the read mapper's output is not an error.  If
              there are additional hits, however, this shows an error in the gold standard.

RETURN VALUES

       A return value of 0 indicates success, any other value indicates an error.

MEMORY REQUIREMENTS

       From version 1.1, great care has been taken to keep the  memory  requirements  as  low  as
       possible.

       The  evaluation step needs to store the whole reference sequence in memory but little more
       memory.  So, for the human genome, the memory requirements are below 4 GB,  regardless  of
       the size of the GSI or SAM/BAM file.

REFERENCES

       M.  Holtgrewe, A.-K. Emde, D. Weese and K. Reinert.  A Novel And Well-Defined Benchmarking
       Method For Second Generation Read Mapping, BMC Bioinformatics 2011, 12:210.

       http://www.seqan.de/rabema
              RABEMA Homepage

       http://www.seqan.de/mason
              Mason Homepage