Provided by: blasr_5.3+0-1build1_amd64 bug

NAME

       blasr - Map SMRT Sequences to a reference genome.

SYNOPSIS

       blasr reads.bam genome.fasta -bam -out out.bam

       blasr reads.fasta genome.fasta

       blasr reads.fasta genome.fasta -sa genome.fasta.sa

       blasr reads.bax.h5 genome.fasta [-sa genome.fasta.sa]

       blasr reads.bax.h5 genome.fasta -sa genome.fasta.sa -maxScore -100 -minMatch 15 ...

       blasr reads.bax.h5 genome.fasta -sa genome.fasta.sa -nproc 24 -out alignment.out ...

DESCRIPTION

       blasr  is  a  read  mapping  program  that  maps reads to positions in a genome by clustering short exact
       matches between the read and the genome, and scoring clusters using alignment. The matches are  generated
       by  searching all suffixes of a read against the genome using a suffix array. Global chaining methods are
       used to score clusters of matches.

       The only required inputs to blasr are a file of reads and a reference genome.  It is exremely  useful  to
       have read filtering information, and mapping runtime may decrease substantially when a precomputed suffix
       array index on the reference sequence is specified.

       Although  reads  may  be  input  in FASTA format, the recommended input is PacBio BAM files because these
       contain qualtiy value information that is used in the  alignment  and  produces  higher  quality  variant
       detection.  Although alignments can be output in various formats, the recommended output format is PacBio
       BAM.   Support  for  bax.h5  and plx.h5 files will be DEPRECATED.  Support for region tables for h5 files
       will be DEPRECATED.

       When suffix array index of a genome is  not  specified,  the  suffix  array  is  built  before  producing
       alignment. This may be prohibitively slow when the genome is large (e.g. Human). It is best to precompute
       the  suffix  array  of  a  genome using the program sawriter(1), and then specify the suffix array on the
       command line using -sa genome.fa.sa.

       The optional parameters are roughly divided into three  categories:  control  over  anchoring,  alignment
       scoring, and output.

       The  default anchoring parameters are optimal for small genomes and samples with up to 5% divergence from
       the reference genome.  The main parameter governing speed and sensitivity  is  the  -minMatch  parameter.
       For  human  genome  alignments,  a  value of 11 or higher is recommended.  Several methods may be used to
       speed up alignments, at the expense of possibly decreasing sensitivity.

       Regions that are too repetitive may be ignored during mapping by limiting the number of positions a  read
       maps  to  with  the -maxAnchorsPerPosition option. Values between 500 and 1000 are effective in the human
       genome.

       For small genomes such as bacterial genomes or BACs, the default parameters are  sufficient  for  maximal
       sensitivity and good speed.

OPTIONS

       Input Files

              Reads

                     reads.bam
                            A  PacBio  BAM  file  of  reads.   This is the preferred input to blasr because rich
                            quality value (insertion,deletion, and substitution quality values)  information  is
                            maintained.   The  extra  quality information improves variant detection and mapping
                            speed.

                     reads.fasta
                            A multi-fasta file of reads, though any fasta file is valid input

                     reads.bax.h5|reads.plx.h5
                            the old DEPRECATED output format of SMRT reads.

                     input.fofn
                            File of file names

              -sa suffixArrayFile
                     Use the suffix array 'sa' for detecting matches between the reads and the  reference.   The
                     suffix array has been prepared by the sawriter(1) program.

              -ctab tab
                     A  table  of  tuple  counts  used  to  estimate match significance.  This is by the program
                     'printTupleCountTable'.  While it is quick to generate  on  the  fly,  if  there  are  many
                     invocations of blasr, it is useful to precompute the ctab.

              -regionTable table (DEPRECATED)
                     Read  in  a  read-region  table in HDF format for masking portions of reads.  This may be a
                     single table if there is just one input file, or a fofn.  When a region table is specified,
                     any region table inside the reads.plx.h5 or reads.bax.h5 files are ignored.
       (DEPRECATED) Options for modifying reads.

              There is ancilliary information about substrings of reads that is stored in a 'region  table'  for
              each read file.  Because HDF is used, the region table may be part of the .bax.h5 or .plx.h5 file,
              or  a  separate  file.  A contiguously read substring from the template is a subread, and any read
              may contain multiple subreads. The boundaries of the subreads may  be  inferred  from  the  region
              table  either  directly  or  by  definition  of  adapter boundaries.  Typically region tables also
              contain information for the location of the high and low quality regions of reads.  Reads produced
              by spurious reads from empty ZMWs have a high quality start coordinate equal to high quality  end,
              making no usable read.

              -useccs
                     Align  the circular consensus sequence (ccs), then report alignments of the ccs subreads to
                     the window that the ccs was mapped to.  Only alignments of the subreads are reported.

              -useccsall
                     Similar to -useccs, except all subreads are aligned, rather than just the subreads used  to
                     call the ccs.  This will include reads that only cover part of the template.

              -useccsdenovo
                     Align the circular consensus, and report only the alignment of the ccs sequence.

              -noSplitSubreads (false)
                     Do  not  split  subreads  at  adapters. This is typically only useful when the genome in an
                     unrolled version  of  a  known  template,  and  contains  template-adapter-reverse_template
                     sequence.

              -ignoreRegions (false)
                     Ignore any information in the region table.

              -ignoreHQRegions (false)
                     Ignore any hq regions in the region table.
       Alignments To Report

              -bestn n (10)
                     Report the top n alignments.

              -hitPolicy (all)
                     Specify a policy to treat multiple hits from [all, allbest, random, randombest, leftmost]

                     all    report all alignments.

                     allbest
                            report all equally top scoring alignments.

                     random report a random alignment.

                     randombest
                            report a random alignment from multiple equally top scoring alignments.

                     leftmost
                            report  an  alignment which has the best alignmentscore and has the smallest mapping
                            coordinate in any reference.

              -placeRepeatsRandomly (false)
                     DEPRECATED! If true, equivalent to -hitPolicy randombest.

              -randomSeed (0)
                     Seed for random number generator. By default (0), use current time as seed.

              -noSortRefinedAlignments (false)
                     Once candidate alignments are generated and scored via sparse dynamic programming, they are
                     rescored using local alignment that accounts for different error profiles.  Resorting based
                     on the local alignment may change the order the hits are returned.

              -allowAdjacentIndels
                     When specified, adjacent insertion or deletions are allowed. Otherwise, adjacent  insertion
                     and  deletions  are  merged  into  one  operation.   Using quality values to guide pairwise
                     alignments may dictate that the higher probability alignment contains  adjacent  insertions
                     or  deletions.   Current tools such as GATK do not permit this and so they are not reported
                     by default.
       Output Formats and Files

              -out out (terminal)
                     Write output to out.

              -sam   Write output in SAM format.

              -m t   If not printing SAM, modify the output of the alignment.

              When t is:

                     0      Print blast like output with |'s connecting matched nucleotides.

                     1      Print only a summary: score and pos.

                     2      Print in Compare.xml format.

                     3      Print in vulgar format (DEPRECATED).

                     4      Print a longer tabular version of the alignment.

                     5      Print in a machine-parsable format that is read by compareSequences.py.

              -header
                     Print a header as the first line of the output file describing the contents of each column.

              -titleTable tab (NULL)
                     Construct a table of reference sequence titles.  The reference sequences are enumerated  by
                     row,  0,1,...   The  reference  index  is printed in alignment results rather than the full
                     reference name.  This makes output concise, particularly whenvery verbose titles  exist  in
                     reference names.

              -unaligned file
                     Output reads that are not aligned to file

              -clipping [none|hard|subread|soft] (none)

                     Use no/hard/subread/soft clipping, ONLY for SAM/BAM output.

              -printSAMQV (false)
                     Print quality values to SAM output.

              -cigarUseSeqMatch (false)
                     CIGAR  strings  in  SAM/BAM output use '=' and 'X' to represent sequence match and mismatch
                     instead of 'M'.
       Options for anchoring alignment regions.

              This will have the greatest effect on speed and sensitivity.

              -minMatch m (12)
                     Minimum seed length.  Higher minMatch will speed up alignment, but decrease sensitivity.

              -maxMatch l (inf)
                     Stop mapping a read to the genome when the lcp length reaches l.  This is useful  when  the
                     query  is  part  of the reference, for example when constructing pairwise alignments for de
                     novo assembly.

              -maxLCPLength l (inf)
                     The same as -maxMatch.

              -maxAnchorsPerPosition m (10000)
                     Do not add anchors from a position if it matches to more than m locations in the target.

              -advanceExactMatches E (0)
                     Another trick for speeding up alignments with match - E fewer anchors.  Rather than finding
                     anchors between the read and the genome at every position in the read, when  an  anchor  is
                     found at position i in a read of length L, the next position in a read to find an anchor is
                     at i+L-E.  Use this when alignining already assembled contigs.

              -nCandidates n (10)
                     Keep  up  to  n  candidates  for  the best alignment.  A large value of n will slow mapping
                     because the slower dynamic programming steps are applied to more clusters of anchors  which
                     can be a rate limiting step when reads are very long.

              -concordant (false)
                     Map  all subreads of a zmw (hole) to where the longest full pass subread of the zmw aligned
                     to. This requires to use the region table and hq regions.   This  option  only  works  when
                     reads are in base or pulse h5 format.

              -concordantTemplate (mediansubread)
                     Select  a  full pass subread of a zmw as template for concordant mapping.  longestsubread -
                     use the longest full pass subread mediansubread  - use the median length full pass  subread
                     typicalsubread  -  use  the  second longest full pass subread if length of the longest full
                     pass subread is an outlier

              -fastMaxInterval (false)
                     Fast search maximum increasing intervals as alignment candidates.  The  search  is  not  as
                     exhaustive as the default, but is much faster.

              -aggressiveIntervalCut (false)
                     Agreesively  filter  out  non-promising  alignment candidates, if there exists at least one
                     promising candidate. If this  option  is  turned  on,  blasr  is  likely  to  ignore  short
                     alignments of ALU elements.

              -fastSDP (false)
                     Use a fast heuristic algorithm to speed up sparse dynamic programming.
       Options for Refining Hits

              -sdpTupleSize K (11)
                     Use matches of length K to speed dynamic programming alignments.  This controls accuracy of
                     assigning  gaps  in  pairwise alignments once a mapping has been found, rather than mapping
                     sensitivity itself.

              -scoreMatrix score matrix string
                     Specify an alternative score matrix for scoring fasta reads.  The matrix is in the format
                       A C G T N
                     A a b c d e
                     C f g h i j
                     G k l m n o
                     T p q r s t
                     N u v w x y

                     The values a...y should be input as a quoted space separated string: "a b c ... y".  Lowerf
                     scores  are  better,  so  matches should be less than mismatches e.g. a,g,m,s = -5 (match),
                     mismatch = 6.

              -affineOpen value (10)
                     Set the penalty for opening an affine alignment.

              -affineExtend a (0)
                     Change affine (extension) gap penalty. Lower value allows more gaps.
       Options for overlap/dynamic programming alignments and pairwise overlap for de novo assembly.

              -useQuality (false)
                     Use  substitution/insertion/deletion/merge  quality  values  to  score  gap  and   mismatch
                     penalties  in pairwise alignments. Because the insertion and deletion rates are much higher
                     than substitution, this will make  many  alignments  favor  an  insertion/deletion  over  a
                     substitution.nNaive   consensus   calling   methods   will  then  often  miss  substitution
                     polymorphisms. This option should be used when calling consensus using the  Quiver  method.
                     Furthermore,  when  not  using  quality  values  to score alignments, there will be a lower
                     consensus accuracy in homolymer regions.

              -affineAlign (false)
                     Refine alignment using affine guided align.
       Options for filtering reads and alignments

              -minReadLength l (50)
                     Skip reads that have a full length less than l. Subreads may be shorter.

              -minSubreadLength l (0)
                     Do not align subreads of length less than l.

              -minRawSubreadScore m (0)
                     Do not align subreads whose quality score in region table is less than  m  (quality  scores
                     should be in range [0, 1000]).

              -maxScore m (-200)
                     Maximum score to output (high is bad, negative good).

              -minAlnLength
                     (0) Report alignments only if their lengths are greater than minAlnLength.

              -minPctSimilarity  (0)  Report  alignments  only  if  their  percentage similairty is greater than
                     minPctSimilarity.

              -minPctAccuracy
                     (0) Report alignments only if their percentage accuray is greater than minAccuracy.
       Options for parallel alignment

              -nproc N (1)
                     Align using N processes. All large data structures such as the suffix array and tuple count
                     table are shared.

              -start S (0)
                     Index of the first read to begin aligning.  This is  useful  when  multiple  instances  are
                     running on the same data, for example when on a multi-rack cluster.

              -stride S (1)
                     Align one read every S reads.
       Options for subsampling reads.

              -subsample (0)
                     Proportion of reads to randomly subsample (expressed as a decimal) and align.

              -holeNumbers LIST
                     When  specified,  only  align  reads  whose ZMW hole numbers are in LIST.  LIST is a comma-
                     delimited string of ranges, such as '1,2,3,10-13'.  This option only works when  reads  are
                     in bam, bax.h5 or plx.h5 format.

       -h     Print help information.

CITATION

       To  cite  BLASR, please use: Chaisson M.J., and Tesler G., Mapping single molecule sequencing reads using
       Basic Local Alignment with Successive Refinement (BLASR):  Theory  and  Application,  BMC  Bioinformatics
       2012, 13:238.

BUGS

       Please report any bugs to https://github.com/PacificBiosciences/blasr/issues.

SEE ALSO

       loadPulses(1) pls2fasta(1) samFilter(1) samtoh5(1) samtom4(1) sawriter(1) sdpMatcher(1) toAfg(1)

blasr 3ca7fe8                                       July 2015                                           BLASR(1)