Provided by: crac_2.5.2+dfsg-2build2_amd64 bug

NAME

       crac v. 2.5.2 - Crac is a tool to analyse RNA-Seq data provided by NGS.

SYNOPSIS

       crac [ options ] -i <index_file> -r <reads_file1> [reads_file2] -k <int> -o <output_file>

       crac -h|--help
       crac -f|--full-help
       crac -v|--version

DESCRIPTION

       crac CRAC: an integrated approach to the analysis of RNA-seq reads

       Whatever the biological questions it addresses, each RNA-seq analysis requires a computational prediction
       of either small scale mutations, indels, splice junctions or fusion RNAs. This  prediction  is  currently
       performed  using  complex  pipelines  involving  multiple  tools  for  mapping, coverage computation, and
       prediction at distinct steps.  We propose  a  novel  way  of  analyzing  reads  that  integrates  genomic
       locations and local coverage, and delivers all above mentioned predictions in a single step. Our program,
       CRAC, uses a double k-mer profiling approach to detect candidate  mutations,  indels,  splice  or  fusion
       junctions  in  each  single read.  Compared to existing tools, CRAC provides state of the art sensitivity
       and improved precision for all types of predictions, yielding high  rates  of  true  positive  candidates
       (99.5%  for  splice  junctions).  When  applied  to  four  breast cancer libraries, CRAC recovered 74% of
       validated fusion RNAs and predicted reccurrent fusion junctions that were overseen in  previous  studies.
       Importantly, CRAC improves its predictive performance when supplied with e.g. 200 nt reads and should fit
       future needs of read analyses.

SPECIAL OPTIONS

       crac As a lot of softwares, there are many optional parameters in CRAC but only three are mandatory. This
       document  is intended to guide users of CRAC to choose the more appropriate parameters according to their
       needs:

       -h, --help
              to print the principal help page of CRAC

       -f, --full-help
              to print the complete help page of CRAC

       -v, --version
              to print version of CRAC

USUAL ARGUMENTS

   Mandatory options
       All these flags must be set.

       -i <index_file>
              is the name of the index previously built with the crac-index binary file.  Note  that  crac-index
              construct  the  structure  <index_file.ssa>  with  its configuration <index_file.conf> so only the
              prefix <index_file> must be specified (without  extension)  to  consider  the  structure  and  the
              configuration files both in CRAC

       -r <reads_file1> [reads_file2]
              is the source(s) of the FASTA or FASTQ file(s) containing the reads. Note that the number of files
              depends if single or paired-reads. The input file may also be compressed using gzip

       -k <int>
              is the length of the k-mer to be used to map the reads on the reference  <index_file>.  Note  that
              the  condition (k < m) is necessary and reads (or both paired reads) are ignored if m < k. It must
              be chosen to ensure (as much as possible) that a k-mer has a very  high  probability  to  occur  a
              single time on the genome

       -o, --sam <output_file>
              is the output file in SAM format (see the Documentation of SAM format in CRAC for more details) or
              print on STDOUT with "-o -" argument

   Optional parameters
       --stranded
              must be specificied if reads are produced by a stranded  protocol  of  RNA-Seq  (not  stranded  by
              default)

       --fr/--rf/--ff
              set the mates alignement orientation (--rf by default)

       -m, --reads-length, -m <int>
              must  be specified for reads of fixed length. If the read length is fixed, we deeply recommend you
              to specify the read length, by using the  -m  parameter.  CRAC  will  therefore  be  much  faster.
              --reads-length  <int>  is  specified  for  variable or longer reads, reads shorter are ignored and
              reads longer are trimmed

       --treat-multiple <int>
              display alignments with multiple locations (with a fixed limit) rather than a single alignment per
              read in the SAM file

       --nb-threads <int>
              is  the  number  of  threads  to  run  crac, computational time is almost divided by the number of
              threads (one thread by default)

       --max-locs <int>
              corresponds to the max number of occurrences retrieved in the index for a given k-mer: smaller  is
              faster,  but  with  a  small value, you may miss some locations that would help CRAC detecting the
              right cause

       --no-ambiguity <none>
              discard biological events (splice,  svn,  indel,  chimera)  which  have  several  matches  on  the
              reference  index.  Indeed, if crac has identified a biological cause in the read that can match in
              differents places of the genome we classify this cause as a biological undetermined event.

   Optional output arguments
       --gz <none>
              all output files specified after this argument are gzipped (included for the sam file if  -o/--sam
              argument is specified after)

       --bam <none>
              sam output is encode in binary format(BAM)

       --summary <output_file>
              save some statistics about mapping and classification

       --show-progressbar <none>
              show a progress bar for the process times on STDERR

       --use-x-in-cigar <none>
              use X cigar operator when CRAC identifies a mismatch

   Optional output homemade file formats
       --all <base_filename>
              set  output  base  filename  for  all  causes  following.  Note  that only a base_filename must be
              specified. Then, the appropriate file extension is added for each  cause  (SNP,  chimera,  splice,
              etc) set output base filename for all causes following

       --normal <output_file>
              save reads that do not contain any break

       --almost-normal <output_file>
              save reads that do not contain any break but with a variable support

       --single <output_file>
              save  reads which are located in this way: at least --min-percent-single-loc <float> of k-mers are
              once located on the reference index

       --duplicate <output_file>
              save reads which are located in this way: at least --min-percent-duplication-loc <float> of k-mers
              are  a few times on the reference index (ie. between --min-duplication <int> and --max-duplication
              <int> of locations)

       --multiple <output_file>
              save reads which are located in this way: at least --min-percent-multiple-loc  <float>  of  k-mers
              are a many times on the reference index (ie. more than --max-duplication <int> of locations)

       --none <output_file>
               save reads which are not located on the reference index

       --snv <output_file>
              save reads that contain at least a snv

       --indel <output_file>
              save reads that contain at least a biological indel

       --splice <output_file>
              save reads that contain at least a splicing junction

       --weak-splice <output_file>
              save reads that contain at least a low coverage splicing junction

       --chimera <output_file>
              save reads that contain at least a chimera junction (junction on different chromosomes, strands or
              genes)

       --paired-end-chimera <output_file>
              paired-end-chimera <output_file>= save paired-end reads  that  contains  a  chimera  in  the  non-
              sequenced part of the original fragment.

       --biological <output_file>
              save  reads  that  contain a biological cause but for which there is not enough informations to be
              more specific. Note that the biological cause is described for each read

       --errors <output_file>
              save reads that contain at least a sequence error

       --repeat <output_file>
               save reads that contain a repeated sequence: at least
               --min-percent-repetition-loc <float> percent of k-mers of a given
               read are located at least --min-repetition <int> occurrences on the
               reference index

       --undetermined <output_file>
              save reads that contain an undetermined error: some k-mers are not located on the genome, but  the
              reason for that could not be determined. Note that the error is described for each read

       --nothing <output_file>
              save reads that are unclassified

   Optional process for specific research
       --deep-snv
              must  be  specified  to  increase  sensitivity to find SNVs at the cost of more computations (only
              substitution, no indels YET). That process searches for SNV in border  cases  reads.  Those  reads
              would otherwise be classified in bioundetermined

       --stringent-chimera
              must  be  specified  to increase accuracy to find chimera junctions in exchange of sensitivity and
              computational times

   Optional process launcher (once must be selected)
       --emt  launch an exact matching processing of reads on the index. Either the argument specified  with  -k
              is  equal to 0 which means that the entire read is perfectly mapped on the genome or only a factor
              of length k per read is mapped (the first one with a location) and the rest  is  sofclipped.  With
              this  process,  reads  are not indexed and it provides a low memory consumption. Note this kind of
              method is very useful for DGE reads mapping.

       --server
              launch a server to query a given read more precisely. That process is useful for  debugging.  Note
              that  the output arguments will not be taken into account. Give an --input-name-server <string> to
              set the input fifo name (classify.fifo by default) and give an  --output-name-server  <string>  to
              set  the  output  fifo  name (classify.out.fifo by default). The server can then be used through a
              client crac-client

   Additional settings for users
       --detailed-sam
              more informations are added in SAM output file. See the Documentation of SAM format  in  CRAC  for
              more details

       --min-percent-single-loc <float>
              is,  to  consider  a  given  read  as  uniquely  mapped, the minimum proportion of k-mers that are
              uniquely mapped on the index (0.15 by default)

       --min-duplication <int>
              is the minimum number of location to consider a duplicated k-mer (2 by default)

       --max-duplication <int>
              is the maximum number of location to consider a duplicated k-mer (9 by default)

       --min-percent-duplication-loc <float>
              is, to consider a given read as duplicated, the minimum proportion of k-mers that  are  duplicated
              on the index (0.15 by default)

       --min-percent-multiple-loc <float>
              is,  to  consider  a  given read as “multiple”, the minimum proportion of k-mers that are multiple
              mapped on the index (0.50 by default)

       --min-repetition <int>
              is the minimum number of locations to consider a repeated k-mer (20 by default)

       --max-percent-repetition-loc <float>
              is, for a given read, the minimum proportion of k-mers that are repeated on the index to  consider
              a repetition (0.20 by default)

       --max-splice-length <int>
              is  the  threshold  to consider a splice, ie. a splice is reported if the junction length is below
              max-splice-length <int>, a chimera is considered otherwise (distance by default is 300Kb)

       --max-bio-indel <int>
              is the threshold to consider a biological indel, ie. an indel is reported if  the  gap  length  is
              below max-bio-indel, a splice is considered otherwise (distance by default is 15)

       --max-bases-retrieved <int>
              is the number of nucleotides to display in outputfile in case of insertion (15 by default)

       --min-support-no-cover <float>
              is  the  minimum  coverage  to  be  able  to report a biological cause. Note that if a single read
              contains a given substitution, it is difficult (if not impossible) to distinguish a sequence error
              and a biological cause (1.30 by default)

   Additional settings for advanced users
       --min-break-length <int>
              is  the  minimal  break  length  (as the percentage of k, the k-mer length) so that a cause can be
              reported. Theoretically, for a given cause, the break length  is  always  >=  (kmer_length  -  1).
              Otherwise,  the  break may be merged with a close enough break, or the break will be considered as
              undetermined. (0.5 by default)

       --max-bases-randomly-matched <int>
              A k-mer overlapping an exon-exon junction, for example, may still  match  on  the  genome  if  the
              overlap  is  at the end of the read (without loss of generality). This is due to the fact that the
              nucleotides starting the second exon may be the same  as  the  nucleotides  starting  the  intron.
              Theoretically,  there is a 0.25 probability that we have the same nucleotide at the first position
              of the intron and the exon. This option specifies how many nucleotides may be matched randomly  at
              most

       --max-extension-length <int>
              is the maximum number of k-mers extended at each side of a read break. In fact, for a given break,
              k-mers with false locations can generate false biological causes, so the  consistency  is  checked
              for  each  side of the break to discard false k-mers and readjust the good boundaries of the break
              (10 by default)

       --nb-tags-info-stored <int>
              is a buffer to store informations for each thread during the computing phase  (1000  by  default).
              This  value must be increased if threads work below their real capabilities. With --nb-threads 15,
              CPU usage must be about 1400%

       --reads-index <string>
              the reads index data-structure uses by CRAC. Available reads index are:  JELLYFISH  and  GKARRAYS.
              (JELLYFISH by default).

       --nb-nucleotides-snp-comparison <int>
               is the minimum k-mer length tolerated for the deep SNVs search (8 by
               default)

       --max-number-of-merges <int>
               is the maximum number of merges tolerated during the break merge process
               for the chimera detection (4 by default)

       --min-score-chimera-stringent <float>
               is the mimimal score to consider a chimera event
               otherwise it is classify as a bioundetermined event (0.6 by default)

SEE ALSO

       The  full  documentation  for  crac  is  maintained  as  a org manual.  If the info and crac programs are
       properly installed at your site, the command

              info crac

       should give you access to the complete manual.

AUTHOR

   About the crac package.
       You can contact Nicolas PHILIPPE, Mikael SALSON, Jerome AUDOUX and Alban MANCHERON by sending  an  e-mail
       to <crac-bugs@lists.gforge.inria.fr>.

       Programming:
               Nicolas PHILIPPE <nphilippe.research@gmail.com>
               Mikaël SALSON    <mikael.salson@lifl.fr>      Jérome AUDOUX    <jerome.audoux@gmail.com>
       with additional contribution for the packaging of:
               Alban MANCHERON  <alban.mancheron@lirmm.fr>

   About the crac publication.
       You may cite the following paper if you use our tool:

       Gk-arrays: Querying large read collections in main memory: a versatile
       data structure
       Philippe N., Salson M., Lecroq T., Leonard M., Commes T., Rivals E.
       BMC Bioinformatics 2011, 12:242.

       Crac: An integrated RNA-Seq read analysis
       Philippe N., Salson M., Commes T., Rivals E.
       Genome Biology 2013; 14:R30.

                                                   2020-03-22                                            crac(1)