Ubuntu Manpage: gsnap - Genomic Short-read Nucleotide Alignment Program

Provided by: gmap_2017-11-15-1_amd64

NAME

       gsnap - Genomic Short-read Nucleotide Alignment Program

SYNOPSIS

       gsnap [OPTIONS...] <FASTA file>, or cat <FASTA file> | gmap [OPTIONS...]

OPTIONS

   Input options (must include -d)
       -D, --dir=directory
              Genome   directory.   Default  (as  specified  by  --with-gmapdb  to  the  configure  program)  is
              /var/cache/gmap

       -d, --db=STRING
              Genome database

       --use-sarray=INT
              Whether to use a suffix array, which will give increased speed.  Allowed values: 0 (no),  1  (yes,
              plus  GSNAP/GMAP  algorithm, default), or 2 (yes, and use only suffix array algorithm).  Note that
              suffix arrays will bias against SNP alleles in SNP-tolerant alignment.  If  there  is  a  conflict
              between this flag and the flag --speed, the --speed flag takes precedence

       -k, --kmer=INT
              kmer  size  to  use  in genome database (allowed values: 16 or less) If not specified, the program
              will find the highest available kmer size in the genome database

       --sampling=INT
              Sampling to use in genome database.   If  not  specified,  the  program  will  find  the  smallest
              available sampling value in the genome database within selected k-mer size

       -q, --part=INT/INT
              Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs
              to a computer farm).

       --input-buffer-size=INT
              Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)

       --barcode-length=INT
              Amount of barcode to remove from start of read (default 0)

       --orientation=STRING
              Orientation  of  paired-end  reads  Allowed values: FR (fwd-rev, or typical Illumina; default), RF
              (rev-fwd, for circularized inserts), or FF (fwd-fwd, same strand)

       --fastq-id-start=INT
              Starting position of identifier in FASTQ header, space-delimited (>= 1)

       --fastq-id-end=INT
              Ending position of identifier in FASTQ header, space-delimited (>= 1)

       Examples:

       @HWUSI-EAS100R:6:73:941:1973#0/1
              start=1, end=1 (default) => identifier is HWUSI-EAS100R:6:73:941:1973#0

       @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
              start=1,   end=1    =>   identifier   is   SRR001666.1   start=2,   end=2    =>   identifier    is
              071112_SLXA-EAS1_s_7:5:1:817:345     start=1,     end=2      =>    identifier    is    SRR001666.1
              071112_SLXA-EAS1_s_7:5:1:817:345

       --force-single-end
              When multiple FASTQ files are provided on the  command  line,  GSNAP  assumes  they  are  matching
              paired-end files.  This flag treats each file as single-end.

       --filter-chastity=STRING
              Skips  reads  marked  by  the  Illumina  chastity program.  Expecting a string after the accession
              having a 'Y' after the first colon, like this:

       @accession 1:Y:0:CTTGTA
              where the 'Y' signifies  filtering  by  chastity.   Values:  off  (default),  either,  both.   For
              'either',  a  'Y'  on  either  end  of  a  paired-end read will be filtered.  For 'both', a 'Y' is
              required on both ends of a paired-end read (or on the only end of a single-end read).

       --allow-pe-name-mismatch
              Allows accession names of reads to mismatch in paired-end files

       --gunzip
              Uncompress gzipped input files

       --bunzip2
              Uncompress bzip2-compressed input files

       Computation options

       --speed=INT
              Speed mode (default = 3) Mode     Suffix array   Hash table

       4      On             Off

       3      On             Only if suffix array yields incomplete answers

       2      On             In addition to suffix array

       1      Off            Yes

       Note: There is a tradeoff between speed and accuracy, so slower speed
              can give better answers.  Levels 1 and 2 are about the same,  while  level  3  is  about  4  times
              faster,  and  then  level  4 is 5 times faster than level 3 However, accuracy of level 3 is better
              than level 4 and almost the same as level 2, so mode 3 is generally recommended and  the  default.
              If there is a conflict between this value and the flag --use-sarray, this takes precedence

       -B, --batch=INT
              Batch mode (default = 2)

                       Mode     Offsets       Positions       Genome          Suffix array
                         0      see note      mmap            mmap            mmap
                         1      see note      mmap & preload  mmap            mmap
                         2      see note      mmap & preload  mmap & preload  mmap & preload
                         3      see note      allocate        mmap & preload  mmap & preload
               (default) 4      see note      allocate        allocate        mmap & preload
                         5      see note      allocate        allocate        allocate

       Note: For a single sequence, all data structures use mmap
              If mmap not available and allocate not chosen, then will use fileio (very slow)

       Note about offsets: Expansion of offsets can be controlled
              independently by the --expand-offsets flag.  However, offsets are accessed relatively fast in this
              version of GSNAP.

       --use-shared-memory=INT
              If 1 (default), then allocated memory is shared among all processes on this node.  If 0, then each
              process has private allocated memory

       --preload-shared-memory
              Load  files  indicated by --batch mode into shared memory for use by other GMAP/GSNAP processes on
              this node, and then exit.  Ignore any input files.

       --unload-shared-memory
              Unload files indicated by --batch mode into shared memory, or  allow  them  to  be  unloaded  when
              existing GMAP/GSNAP processes on this node are finished with them.  Ignore any input files.

       --expand-offsets=INT
              Whether  to expand the genomic offsets index Values: 0 (no, default), or 1 (yes).  Expansion gives
              faster alignment, but requires more memory

       -m, --max-mismatches=FLOAT
              Maximum number of mismatches allowed (if not specified, then defaults to the  ultrafast  level  of
              ((readlength+index_interval-1)/kmer  -  2))  (By default, the genome index interval is 3, but this
              can be changed by providing a different value for -q to gmap_build when processing the genome.)

       If specified between 0.0 and 1.0, then treated as a fraction
              of each read length.  Otherwise, treated as an integral number of mismatches (including indel  and
              splicing  penalties)  For  RNA-Seq,  you  may  need to increase this value slightly to align reads
              extending past the ends of an exon.

       --min-coverage=FLOAT
              Minimum coverage required for an alignment.  If specified between 0.0 and 1.0, then treated  as  a
              fraction  of  each  read length.  Otherwise, treated as an integral number of base pairs.  Default
              value is 0.0.

       --query-unk-mismatch=INT
              Whether to count unknown (N) characters in the query as a mismatch (0=no (default), 1=yes)

       --genome-unk-mismatch=INT
              Whether to count unknown (N) characters in the genome as a mismatch (0=no, 1=yes (default))

       -i, --indel-penalty=INT
              Penalty for an indel (default 2).  Counts  against  mismatches  allowed.   To  find  indels,  make
              indel-penalty  less  than  or equal to max-mismatches.  A value < 2 can lead to false positives at
              read ends

       --indel-endlength=INT
              Minimum length at end required for indel alignments (default 4)

       -y, --max-middle-insertions=INT
              Maximum number of middle insertions allowed (default is readlength - indel-endlength)

       -z, --max-middle-deletions=INT Maximum number of middle deletions allowed (default 30)

       -Y, --max-end-insertions=INT
              Maximum number of end insertions allowed (default 3)

       -Z, --max-end-deletions=INT
              Maximum number of end deletions allowed (default 6)

       -M, --suboptimal-levels=INT
              Report suboptimal hits beyond best hit (default 0) All hits with best score plus suboptimal-levels
              are reported

       -a, --adapter-strip=STRING
              Method for removing adapters from reads.  Currently  allowed  values:  off,  paired.   Default  is
              "off".   To turn on, specify "paired", which removes adapters from paired-end reads if they appear
              to be present.

       --trim-mismatch-score=INT
              Score to use for mismatches when trimming at ends.  To turn off trimming, specify 0.   Default  is
              -3 for both RNA-Seq and DNA-Seq.  Warning: Turning trimming off in RNA-Seq can give false positive
              mismatches at the ends of reads

       --trim-indel-score=INT
              Score  to  use  for indels when trimming at ends.  To turn off trimming, specify 0.  Default is -2
              for both RNA-Seq and DNA-Seq.  Warning: Turning trimming off in RNA-Seq can  give  false  positive
              indels at the ends of reads

       --end-detail=STRING
              Amount  of  alignment  detail  at ends of read: high, medium, or low (default) Note: medium detail
              could increase speed by 20% or so, but will miss some splices at the ends of reads

       -V, --snpsdir=STRING
              Directory for SNPs index files (created using snpindex) (default is location of genome index files
              specified using -D and -d)

       -v, --use-snps=STRING
              Use database containing  known  SNPs  (in  <STRING>.iit,  built  previously  using  snpindex)  for
              tolerance to SNPs

       --cmetdir=STRING
              Directory  for methylcytosine index files (created using cmetindex) (default is location of genome
              index files specified using -D, -V, and -d)

       --atoidir=STRING
              Directory for A-to-I RNA editing index files (created using atoiindex)  (default  is  location  of
              genome index files specified using -D, -V, and -d)

       --mode=STRING
              Alignment    mode:    standard    (default),   cmet-stranded,   cmet-nonstranded,   atoi-stranded,
              atoi-nonstranded, ttoc-stranded, or ttoc-nonstranded.  Non-standard modes  requires  you  to  have
              previously run the cmetindex or atoiindex programs (which also cover the ttoc modes) on the genome

       -t, --nthreads=INT
              Number of worker threads

       --max-anchors=INT
              Controls  number  of candidate segments returned by the complete set algorithm Default is 10.  Can
              be increased to higher  values  to  solve  alignments  with  evenly  spaced  mismatches  at  close
              distances.   However,  higher  values  will  cause GSNAP to run more slowly.  A value of 1000, for
              example, slows down the program by a factor of 10 or so.  Therefore, change  this  value  only  if
              absolutely necessary.

       Options for GMAP alignment within GSNAP

       --gmap-mode=STRING
              Cases  to  use  GMAP  for complex alignments containing multiple splices or indels Allowed values:
              none, all, pairsearch, terminal, improve

       (or multiple values, separated by commas).
              Default: all, i.e., pairsearch,terminal,improve

       --trigger-score-for-gmap=INT
              Try GMAP pairsearch on nearby genomic regions if best score (the total of both ends if paired-end)
              exceeds this value (default 5)

       --gmap-min-match-length=INT
              Keep GMAP hit only if it has this many consecutive matches (default 20)

       --gmap-allowance=INT
              Extra mismatch/indel score allowed for GMAP alignments (default 3)

       --max-gmap-pairsearch=INT
              Perform GMAP pairsearch on nearby genomic regions up to this many  many  candidate  ends  (default
              50).  Requires pairsearch in --gmap-mode

       --max-gmap-terminal=INT
              Perform  GMAP  terminal  on  nearby  genomic  regions up to this many candidate ends (default 50).
              Requires terminal in --gmap-mode

       --max-gmap-improvement=INT
              Perform GMAP improvement on nearby genomic regions up to this many  candidate  ends  (default  5).
              Requires improve in --gmap-mode

       Splicing options for DNA-Seq

       --find-dna-chimeras=INT
              Look  for  distant  splicing in DNA-Seq data (0=no (default), 1=yes) Automatically inactivated for
              RNA-Seq data if -N or -s are specified)

       Splicing options for RNA-Seq

       -N, --novelsplicing=INT
              Look for novel splicing (0=no (default), 1=yes)

       --splicingdir=STRING
              Directory for splicing involving known  sites  or  known  introns,  as  specified  by  the  -s  or
              --use-splicing  flag  (default  is  directory computed from -D and -d flags).  Note: can just give
              full pathname to the -s flag instead.

       -s, --use-splicing=STRING
              Look for splicing involving known sites or known introns  (in  <STRING>.iit),  at  short  or  long
              distances See README instructions for the distinction between known sites and known introns

       --ambig-splice-noclip
              For  ambiguous  known  splicing  at  ends  of the read, do not clip at the splice site, but extend
              instead into the intron.  This flag makes sense only if you provide the --use-splicing  flag,  and
              you are trying to eliminate all soft clipping with --trim-mismatch-score=0

       -w, --localsplicedist=INT
              Definition of local novel splicing event (default 200000)

       --novelend-splicedist=INT
              Distance to look for novel splices at the ends of reads (default 50000)

       -e, --local-splice-penalty=INT
              Penalty for a local splice (default 0).  Counts against mismatches allowed

       -E, --distant-splice-penalty=INT
              Penalty for a distant splice (default 1).  A distant splice is one where the intron length exceeds
              the  value of -w, or --localsplicedist, or is an inversion, scramble, or translocation between two
              different chromosomes Counts against mismatches allowed

       -K, --distant-splice-endlength=INT
              Minimum length at end required for distant spliced alignments (default  20,  min  allowed  is  the
              value of -k, or kmer size)

       -l, --shortend-splice-endlength=INT
              Minimum  length  at  end  required  for  short-end spliced alignments (default 2, but unless known
              splice sites are provided with the -s flag, GSNAP may still need the end length to be the value of
              -k, or kmer size to find a given splice

       --distant-splice-identity=FLOAT
              Minimum identity at end required for distant spliced alignments (default 0.95)

       --antistranded-penalty=INT
              (Not currently implemented, since it leads to poor results) Penalty for antistranded splicing when
              using stranded RNA-Seq protocols.  A positive value, such as 1, expects  antisense  on  the  first
              read and sense on the second read.  Default is 0, which treats sense and antisense equally well

       --merge-distant-samechr
              Report  distant  splices  on  the same chromosome as a single splice, if possible.  Will produce a
              single SAM line instead of two SAM lines, which is also done for translocations,  inversions,  and
              scramble events

       Options for paired-end reads

       --pairmax-dna=INT
              Max total genomic length for DNA-Seq paired reads, or other reads without splicing (default 1000).
              Used if -N or -s is not specified.  This value is also used for circular chromosomes when splicing
              in linear chromosomes is allowed

       --pairmax-rna=INT
              Max  total  genomic  length  for  RNA-Seq  paired  reads,  or other reads that could have a splice
              (default 200000).  Used if -N or -s is  specified.   Should  probably  match  the  value  for  -w,
              --localsplicedist.

       --pairexpect=INT
              Expected  paired-end  length, used for calling splices in medial part of paired-end reads (default
              500).  Was turned off in previous versions, but reinstated.

       --pairdev=INT
              Allowable deviation from expected paired-end length, used for calling splices in  medial  part  of
              paired-end reads (default 100).  Was turned off in previous versions, but reinstated.

       Options for quality scores

       --quality-protocol=STRING
              Protocol  for  input quality scores.  Allowed values: illumina (ASCII 64-126) (equivalent to -J 64
              -j -31) sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

       Default is sanger (no quality print shift)
              SAM output files should have quality scores in sanger protocol

              Or you can customize this behavior with these flags:

       -J, --quality-zero-score=INT
              FASTQ quality scores are zero at this  ASCII  value  (default  is  33  for  sanger  protocol;  for
              Illumina, select 64)

       -j, --quality-print-shift=INT
              Shift  FASTQ  quality scores by this amount in output (default is 0 for sanger protocol; to change
              Illumina input to Sanger output, select -31)

       Output options

       -n, --npaths=INT
              Maximum number of paths to print (default 100).

       -Q, --quiet-if-excessive
              If more than maximum number of paths are found, then nothing is printed.

       -O, --ordered
              Print output in same order as input (relevant only if there is more than one worker thread)

       --show-refdiff
              For GSNAP output in SNP-tolerant alignment, shows all differences relative to the reference genome
              as lower case (otherwise, it shows all differences relative to both the  reference  and  alternate
              genome)

       --clip-overlap
              For paired-end reads whose alignments overlap, clip the overlapping region.

       --merge-overlap
              For  paired-end  reads  whose  alignments  overlap,  merge  the  two  ends into a single end (beta
              implementation)

       --print-snps
              Print detailed information about SNPs in reads  (works  only  if  -v  also  selected)  (not  fully
              implemented yet)

       --failsonly
              Print only failed alignments, those with no results

       --nofails
              Exclude printing of failed alignments

       -A, --format=STRING
              Another format type, other than default.  Currently implemented: sam, m8 (BLAST tabular format)

       --split-output=STRING
              Basename  for  multiple-file output, separately for nomapping, halfmapping_uniq, halfmapping_mult,
              unpaired_uniq,  unpaired_mult,  paired_uniq,  paired_mult,  concordant_uniq,  and  concordant_mult
              results

       -o, --output-file=STRING
              File name for a single stream of output results.

       --failed-input=STRING
              Print completely failed alignments as input FASTA or FASTQ format, to the given file, appending .1
              or  .2,  for paired-end data.  If the --split-output flag is also given, this file is generated in
              addition to the output in the .nomapping file.

       --append-output
              When --split-output or --failed-input is given, this flag  will  append  output  to  the  existing
              files.  Otherwise, the default is to create new files.

       --order-among-best=STRING
              Among  alignments tied with the best score, order those alignments in this order.  Allowed values:
              genomic, random (default)

       --output-buffer-size=INT
              Buffer size, in queries, for output thread (default 1000).  When  the  number  of  results  to  be
              printed exceeds this size, the worker threads are halted until the backlog is cleared

       Options for SAM output

       --no-sam-headers
              Do not print headers beginning with '@'

       --add-paired-nomappers
              Add nomapper lines as needed to make all paired-end results alternate between first end and second
              end

       --paired-flag-means-concordant=INT
              Whether  the  paired  bit in the SAM flags means concordant only (1) or paired plus concordant (0,
              default)

       --sam-headers-batch=INT
              Print headers only for this batch, as specified by -q

       --sam-use-0M
              Insert 0M in CIGAR between adjacent insertions and deletions Required by  Picard,  but  can  cause
              errors in other tools

       --sam-multiple-primaries
              Allows multiple alignments to be marked as primary if they have equally good mapping scores

       --force-xs-dir
              For  RNA-Seq  alignments,  disallows XS:A:? when the sense direction is unclear, and replaces this
              value arbitrarily with XS:A:+.  May be useful for some programs, such as  Cufflinks,  that  cannot
              handle  XS:A:?.   However,  if you use this flag, the reported value of XS:A:+ in these cases will
              not be meaningful.

       --md-lowercase-snp
              In MD string, when known SNPs  are  given  by  the  -v  flag,  prints  difference  nucleotides  as
              lower-case when they, differ from reference but match a known alternate allele

       --extend-soft-clips
              Extends alignments through soft clipped regions

       --action-if-cigar-error
              Action to take if there is a disagreement between CIGAR length and sequence length Allowed values:
              ignore, warning, noprint (default), abort

       --read-group-id=STRING
              Value to put into read-group id (RG-ID) field

       --read-group-name=STRING
              Value to put into read-group name (RG-SM) field

       --read-group-library=STRING
              Value to put into read-group library (RG-LB) field

       --read-group-platform=STRING
              Value to put into read-group library (RG-PL) field

       Help options

       --check
              Check compiler assumptions

       --version
              Show version

       --help Show this help message

       Other tools of GMAP suite are located in /usr/lib/gmap

gsnap 2017-11-15-1                                December 2017                                         GSNAP(1)