Provided by: gmap_2015-12-31.v7-1_amd64 bug

NAME

       gsnap - Genomic Short-read Nucleotide Alignment Program

SYNOPSIS

       gsnap [OPTIONS...] <FASTA file>, or cat <FASTA file> | gmap [OPTIONS...]

OPTIONS

   Input options (must include -d)
       -D, --dir=directory
              Genome directory.  Default (as specified by --with-gmapdb to the configure program)
              is /var/cache/gmap

       -d, --db=STRING
              Genome database

       --use-sarray=INT
              Whether to use a suffix array, which will give increased speed.  Allowed values:  0
              (no),  1  (yes, plus GSNAP/GMAP algorithm, default), or 2 (yes, and use only suffix
              array algorithm).  Note that  suffix  arrays  will  bias  against  SNP  alleles  in
              SNP-tolerant alignment.

       -k, --kmer=INT
              kmer  size to use in genome database (allowed values: 16 or less) If not specified,
              the program will find the highest available kmer size in the genome database

       --sampling=INT
              Sampling to use in genome database.  If not specified, the program  will  find  the
              smallest available sampling value in the genome database within selected k-mer size

       -q, --part=INT/INT
              Process  only  the  i-th out of every n sequences e.g., 0/100 or 99/100 (useful for
              distributing jobs to a computer farm).

       --input-buffer-size=INT
              Size of input buffer (program reads this many sequences at a time  for  efficiency)
              (default 1000)

       --barcode-length=INT
              Amount of barcode to remove from start of read (default 0)

       --orientation=STRING
              Orientation  of  paired-end reads Allowed values: FR (fwd-rev, or typical Illumina;
              default), RF (rev-fwd, for circularized inserts), or FF (fwd-fwd, same strand)

       --fastq-id-start=INT
              Starting position of identifier in FASTQ header, space-delimited (>= 1)

       --fastq-id-end=INT
              Ending position of identifier in FASTQ header, space-delimited (>= 1)

       Examples:

       @HWUSI-EAS100R:6:73:941:1973#0/1
              start=1, end=1 (default) => identifier is HWUSI-EAS100R:6:73:941:1973#0

       @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
              start=1, end=1  => identifier is  SRR001666.1  start=2,  end=2   =>  identifier  is
              071112_SLXA-EAS1_s_7:5:1:817:345  start=1,  end=2   =>  identifier  is  SRR001666.1
              071112_SLXA-EAS1_s_7:5:1:817:345

       --force-single-end
              When multiple FASTQ files are provided on the command line, GSNAP assumes they  are
              matching paired-end files.  This flag treats each file as single-end.

       --filter-chastity=STRING
              Skips  reads marked by the Illumina chastity program.  Expecting a string after the
              accession having a 'Y' after the first colon, like this:

       @accession 1:Y:0:CTTGTA
              where the 'Y' signifies filtering by  chastity.   Values:  off  (default),  either,
              both.   For  'either',  a  'Y' on either end of a paired-end read will be filtered.
              For 'both', a 'Y' is required on both ends of a paired-end read (or on the only end
              of a single-end read).

       --allow-pe-name-mismatch
              Allows accession names of reads to mismatch in paired-end files

       --gunzip
              Uncompress gzipped input files

       --bunzip2
              Uncompress bzip2-compressed input files

       Computation options

              Note:  GSNAP  has  an  ultrafast  algorithm  for  calculating  mismatches up to and
              including

       ((readlength+2)/kmer - 2) ("ultrafast mismatches").   The  program  will  run  fastest  if
       max-mismatches  (plus  suboptimal-levels)  is within that value.  Also, indels, especially
       end indels, take longer to compute, although the algorithm is still designed to be fast.

       -B, --batch=INT
              Batch mode (default = 2)

                       Mode     Offsets       Positions       Genome          Suffix array
                         0      see note      mmap            mmap            mmap
                         1      see note      mmap & preload  mmap            mmap
                         2      see note      mmap & preload  mmap & preload  mmap & preload
                         3      see note      allocate        mmap & preload  mmap & preload
               (default) 4      see note      allocate        allocate        mmap & preload
                         5      see note      allocate        allocate        allocate

       Note: For a single sequence, all data structures use mmap
              If mmap not available and allocate not chosen, then will use fileio (very slow)

       Note about offsets: Expansion of offsets can be controlled
              independently  by  the  --expand-offsets  flag.   However,  offsets  are   accessed
              relatively fast in this version of GSNAP.

       --use-shared-memory=INT
              If  1  (default), then allocated memory is shared among all processes on this node.
              If 0, then each process has private allocated memory

       --expand-offsets=INT
              Whether to expand the genomic offsets index Values: 0 (no, default),  or  1  (yes).
              Expansion gives faster alignment, but requires more memory

       -m, --max-mismatches=FLOAT
              Maximum  number  of  mismatches  allowed  (if  not  specified, then defaults to the
              ultrafast level of  ((readlength+index_interval-1)/kmer  -  2))  (By  default,  the
              genome  index interval is 3, but this can be changed by providing a different value
              for -q to gmap_build when processing the genome.)

       If specified between 0.0 and 1.0, then treated as a fraction
              of each read length.  Otherwise,  treated  as  an  integral  number  of  mismatches
              (including indel and splicing penalties) For RNA-Seq, you may need to increase this
              value slightly to align reads extending past the ends of an exon.

       --min-coverage=FLOAT
              Minimum coverage required for an alignment.  If specified between 0.0 and 1.0, then
              treated  as  a  fraction  of  each  read length.  Otherwise, treated as an integral
              number of base pairs.  Default value is 0.0.

       --query-unk-mismatch=INT
              Whether to count unknown (N) characters in the query as a mismatch (0=no (default),
              1=yes)

       --genome-unk-mismatch=INT
              Whether  to  count  unknown (N) characters in the genome as a mismatch (0=no, 1=yes
              (default))

       --maxsearch=INT
              Maximum number of alignments to find (default 1000).  Must be larger than --npaths,
              which  is  the  number  to report.  Keeping this number large will allow for random
              selection among multiple  alignments.   Reducing  this  number  can  speed  up  the
              program.

       -i, --indel-penalty=INT
              Penalty  for  an  indel  (default  2).  Counts against mismatches allowed.  To find
              indels, make indel-penalty less than or equal to max-mismatches.  A value <  2  can
              lead to false positives at read ends

       --indel-endlength=INT
              Minimum length at end required for indel alignments (default 4)

       -y, --max-middle-insertions=INT
              Maximum number of middle insertions allowed (default 9)

       -z, --max-middle-deletions=INT Maximum number of middle deletions allowed (default 30)

       -Y, --max-end-insertions=INT
              Maximum number of end insertions allowed (default 3)

       -Z, --max-end-deletions=INT
              Maximum number of end deletions allowed (default 6)

       -M, --suboptimal-levels=INT
              Report  suboptimal  hits  beyond best hit (default 0) All hits with best score plus
              suboptimal-levels are reported

       -a, --adapter-strip=STRING
              Method for removing adapters from reads.  Currently allowed  values:  off,  paired.
              Default  is  "off".   To  turn  on,  specify  "paired", which removes adapters from
              paired-end reads if they appear to be present.

       --trim-mismatch-score=INT
              Score to use for mismatches when trimming at ends  (default  is  -3;  to  turn  off
              trimming,  specify  0).   Warning:  turning  trimming  off will give false positive
              mismatches at the ends of reads

       --trim-indel-score=INT
              Score to use for indels when trimming at ends (default is -2; to turn off trimming,
              specify  0).   Warning: turning trimming off will give false positive indels at the
              ends of reads

       -V, --snpsdir=STRING
              Directory for SNPs index files (created using snpindex)  (default  is  location  of
              genome index files specified using -D and -d)

       -v, --use-snps=STRING
              Use  database  containing  known  SNPs  (in  <STRING>.iit,  built  previously using
              snpindex) for tolerance to SNPs

       --cmetdir=STRING
              Directory for methylcytosine index files  (created  using  cmetindex)  (default  is
              location of genome index files specified using -D, -V, and -d)

       --atoidir=STRING
              Directory  for A-to-I RNA editing index files (created using atoiindex) (default is
              location of genome index files specified using -D, -V, and -d)

       --mode=STRING
              Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded,
              atoi-nonstranded,  ttoc-stranded, or ttoc-nonstranded.  Non-standard modes requires
              you to have previously run the cmetindex or atoiindex programs  (which  also  cover
              the ttoc modes) on the genome

       -t, --nthreads=INT
              Number of worker threads

       Options for GMAP alignment within GSNAP

       --gmap-mode=STRING
              Cases  to  use  GMAP  for  complex alignments containing multiple splices or indels
              Allowed values: none, all, pairsearch, indel_knownsplice, terminal, improve

       (or multiple values, separated by commas).
              Default: all, i.e., pairsearch,indel_knownsplice,terminal,improve

       --trigger-score-for-gmap=INT
              Try GMAP pairsearch on nearby genomic regions if best score (the total of both ends
              if paired-end) exceeds this value (default 5)

       --gmap-min-match-length=INT
              Keep GMAP hit only if it has this many consecutive matches (default 20)

       --gmap-allowance=INT
              Extra mismatch/indel score allowed for GMAP alignments (default 3)

       --max-gmap-pairsearch=INT
              Perform  GMAP  pairsearch  on nearby genomic regions up to this many many candidate
              ends (default 50).  Requires pairsearch in --gmap-mode

       --max-gmap-terminal=INT
              Perform GMAP terminal on nearby genomic regions up  to  this  many  candidate  ends
              (default 50).  Requires terminal in --gmap-mode

       --max-gmap-improvement=INT
              Perform  GMAP  improvement on nearby genomic regions up to this many candidate ends
              (default 5).  Requires improve in --gmap-mode

       --microexon-spliceprob=FLOAT
              Allow microexons only if one of the splice site probabilities is greater than  this
              value (default 0.95)

       Splicing options for DNA-Seq

       --find-dna-chimeras=INT
              Look  for  distant  splicing  in DNA-Seq data (0=no (default), 1=yes) Automatically
              inactivated for RNA-Seq data if -N or -s are specified)

       Splicing options for RNA-Seq

       -N, --novelsplicing=INT
              Look for novel splicing (0=no (default), 1=yes)

       --splicingdir=STRING
              Directory for splicing involving known sites or known introns, as specified by  the
              -s  or  --use-splicing  flag  (default is directory computed from -D and -d flags).
              Note: can just give full pathname to the -s flag instead.

       -s, --use-splicing=STRING
              Look for splicing involving known sites or  known  introns  (in  <STRING>.iit),  at
              short  or  long distances See README instructions for the distinction between known
              sites and known introns

       --ambig-splice-noclip
              For ambiguous known splicing at ends of the read, do not clip at the  splice  site,
              but  extend instead into the intron.  This flag makes sense only if you provide the
              --use-splicing flag, and you  are  trying  to  eliminate  all  soft  clipping  with
              --trim-mismatch-score=0

       -w, --localsplicedist=INT
              Definition of local novel splicing event (default 200000)

       --novelend-splicedist=INT
              Distance to look for novel splices at the ends of reads (default 50000)

       -e, --local-splice-penalty=INT
              Penalty for a local splice (default 0).  Counts against mismatches allowed

       -E, --distant-splice-penalty=INT
              Penalty for a distant splice (default 1).  A distant splice is one where the intron
              length exceeds the value of -w, or --localsplicedist, or is an inversion, scramble,
              or  translocation  between  two  different  chromosomes  Counts  against mismatches
              allowed

       -K, --distant-splice-endlength=INT
              Minimum length at end required for distant  spliced  alignments  (default  20,  min
              allowed is the value of -k, or kmer size)

       -l, --shortend-splice-endlength=INT
              Minimum  length  at  end  required for short-end spliced alignments (default 2, but
              unless known splice sites are provided with the -s flag, GSNAP may still  need  the
              end length to be the value of -k, or kmer size to find a given splice

       --distant-splice-identity=FLOAT
              Minimum identity at end required for distant spliced alignments (default 0.95)

       --antistranded-penalty=INT
              (Not   currently   implemented,  since  it  leads  to  poor  results)  Penalty  for
              antistranded splicing when using stranded RNA-Seq  protocols.   A  positive  value,
              such  as  1,  expects  antisense  on  the  first read and sense on the second read.
              Default is 0, which treats sense and antisense equally well

       --merge-distant-samechr
              Report distant splices on the same chromosome as  a  single  splice,  if  possible.
              Will  produce  a  single  SAM line instead of two SAM lines, which is also done for
              translocations, inversions, and scramble events

       Options for paired-end reads

       --pairmax-dna=INT
              Max total genomic length for DNA-Seq paired reads, or other reads without  splicing
              (default 1000).  Used if -N or -s is not specified.

       --pairmax-rna=INT
              Max total genomic length for RNA-Seq paired reads, or other reads that could have a
              splice (default 200000).  Used if -N or -s is specified.  Should probably match the
              value for -w, --localsplicedist.

       --pairexpect=INT
              Expected  paired-end  length, used for calling splices in medial part of paired-end
              reads (default 200).  Was turned off in previous versions, but reinstated.

       --pairdev=INT
              Allowable deviation from expected paired-end length, used for  calling  splices  in
              medial  part  of  paired-end  reads  (default  100).   Was  turned  off in previous
              versions, but reinstated.

       Options for quality scores

       --quality-protocol=STRING
              Protocol for  input  quality  scores.   Allowed  values:  illumina  (ASCII  64-126)
              (equivalent to -J 64 -j -31) sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

       Default is sanger (no quality print shift)
              SAM output files should have quality scores in sanger protocol

              Or you can customize this behavior with these flags:

       -J, --quality-zero-score=INT
              FASTQ  quality  scores  are  zero  at  this  ASCII  value (default is 33 for sanger
              protocol; for Illumina, select 64)

       -j, --quality-print-shift=INT
              Shift FASTQ quality scores by this amount  in  output  (default  is  0  for  sanger
              protocol; to change Illumina input to Sanger output, select -31)

       Output options

       -n, --npaths=INT
              Maximum number of paths to print (default 100).

       -Q, --quiet-if-excessive
              If more than maximum number of paths are found, then nothing is printed.

       -O, --ordered
              Print output in same order as input (relevant only if there is more than one worker
              thread)

       --show-refdiff
              For GSNAP output in SNP-tolerant alignment, shows all differences relative  to  the
              reference  genome  as  lower  case (otherwise, it shows all differences relative to
              both the reference and alternate genome)

       --clip-overlap
              For paired-end reads whose alignments overlap, clip the overlapping region.

       --merge-overlap
              For paired-end reads whose alignments overlap, merge the two ends into a single end
              (beta implementation)

       --print-snps
              Print  detailed  information  about  SNPs in reads (works only if -v also selected)
              (not fully implemented yet)

       --failsonly
              Print only failed alignments, those with no results

       --nofails
              Exclude printing of failed alignments

       -A, --format=STRING
              Another format type, other than default.  Currently  implemented:  sam,  m8  (BLAST
              tabular format)

       --split-output=STRING
              Basename  for  multiple-file  output,  separately  for nomapping, halfmapping_uniq,
              halfmapping_mult,   unpaired_uniq,   unpaired_mult,    paired_uniq,    paired_mult,
              concordant_uniq, and concordant_mult results

       -o, --output-file=STRING
              File name for a single stream of output results.

       --failed-input=STRING
              Print  completely  failed  alignments  as input FASTA or FASTQ format, to the given
              file, appending .1 or .2, for paired-end data.  If the --split-output flag is  also
              given, this file is generated in addition to the output in the .nomapping file.

       --append-output
              When --split-output or --failed-input is given, this flag will append output to the
              existing files.  Otherwise, the default is to create new files.

       --order-among-best=STRING
              Among alignments tied with the best score, order those alignments  in  this  order.
              Allowed values: genomic, random (default)

       --output-buffer-size=INT
              Buffer  size,  in  queries,  for  output thread (default 1000).  When the number of
              results to be printed exceeds this size, the worker threads are  halted  until  the
              backlog is cleared

       Options for SAM output

       --no-sam-headers
              Do not print headers beginning with '@'

       --add-paired-nomappers
              Add nomapper lines as needed to make all paired-end results alternate between first
              end and second end

       --paired-flag-means-concordant=INT
              Whether the paired bit in the SAM flags means concordant only (1)  or  paired  plus
              concordant (0, default)

       --sam-headers-batch=INT
              Print headers only for this batch, as specified by -q

       --sam-use-0M
              Insert  0M  in  CIGAR between adjacent insertions and deletions Required by Picard,
              but can cause errors in other tools

       --sam-multiple-primaries
              Allows multiple alignments to be marked  as  primary  if  they  have  equally  good
              mapping scores

       --force-xs-dir
              For  RNA-Seq  alignments, disallows XS:A:? when the sense direction is unclear, and
              replaces this value arbitrarily with XS:A:+.  May be useful for some programs, such
              as  Cufflinks,  that  cannot  handle  XS:A:?.   However,  if you use this flag, the
              reported value of XS:A:+ in these cases will not be meaningful.

       --md-lowercase-snp
              In MD string, when  known  SNPs  are  given  by  the  -v  flag,  prints  difference
              nucleotides  as  lower-case  when  they,  differ  from  reference but match a known
              alternate allele

       --extend-soft-clips
              Extends alignments through soft clipped regions

       --action-if-cigar-error
              Action to take if there is a disagreement between CIGAR length and sequence  length
              Allowed values: ignore, warning, noprint (default), abort

       --read-group-id=STRING
              Value to put into read-group id (RG-ID) field

       --read-group-name=STRING
              Value to put into read-group name (RG-SM) field

       --read-group-library=STRING
              Value to put into read-group library (RG-LB) field

       --read-group-platform=STRING
              Value to put into read-group library (RG-PL) field

       Help options

       --check
              Check compiler assumptions

       --version
              Show version

       --help Show this help message

       Other tools of GMAP suite are located in /usr/lib/gmap