lunar (1) blasr.1.gz

Provided by: blasr_5.3.5+dfsg-6_amd64 bug

NAME

       blasr - Map SMRT Sequences to a reference genome

SYNOPSIS

       blasr reads.bam genome.fasta --bam --out out.bam

       blasr reads.fasta genome.fasta

       blasr reads.fasta genome.fasta --sa genome.fasta.sa

       blasr reads.bax.h5 genome.fasta [--sa genome.fasta.sa]

       blasr reads.bax.h5 genome.fasta --sa genome.fasta.sa --maxScore 100 --minMatch 15 ...

       blasr reads.bax.h5 genome.fasta --sa genome.fasta.sa --nproc 24 --out alignment.out ...

DESCRIPTION

       blasr  is  a  read  mapping program that maps reads to positions in a genome by clustering
       short exact matches between the read and the genome, and scoring clusters using alignment.
       The  matches  are generated by searching all suffixes of a read against the genome using a
       suffix array. Global chaining methods are used to score clusters of matches.

       The only required inputs to blasr are a file of reads  and  a  reference  genome.   It  is
       exremely  useful  to  have  read  filtering  information, and mapping runtime may decrease
       substantially when  a  precomputed  suffix  array  index  on  the  reference  sequence  is
       specified.

       Although  reads  may  be  input in FASTA format, the recommended input is PacBio BAM files
       because these contain quality value information that is used in the alignment and produces
       higher  quality  variant detection.  Although alignments can be output in various formats,
       the recommended output format is PacBio BAM.  Support to bax.h5 and plx.h5 files  will  be
       DEPRECATED.  Support to region tables for h5 files will be DEPRECATED.

       When  suffix  array  index  of a genome is not specified, the suffix array is built before
       producing alignment.   This may be prohibitively slow  when  the  genome  is  large  (e.g.
       Human).  It is best to precompute the suffix array of a genome using the program sawriter,
       and then specify the suffix array on the command line using -sa genome.fa.sa.

       The optional parameters are roughly divided into three categories: control over anchoring,
       alignment scoring, and output.

       The  default  anchoring parameters are optimal for small genomes and samples with up to 5%
       divergence from the reference genome.  The main parameter governing speed and  sensitivity
       is  the  -minMatch  parameter.   For  human  genome alignments, a value of 11 or higher is
       recommended.  Several methods may be used to  speed  up  alignments,  at  the  expense  of
       possibly decreasing sensitivity.

       Regions  that  are  too repetitive may be ignored during mapping by limiting the number of
       positions a read maps to with the -maxAnchorsPerPosition option.  Values between  500  and
       1000 are effective in the human genome.

       For small genomes such as bacterial genomes or BACs, the default parameters are sufficient
       for maximal sensitivity and good speed.

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used  for
       any other usage of the program.