focal (1) blasr.1.gz

Provided by: blasr_5.3.3+dfsg-4build1_amd64 bug

NAME

       blasr - Map SMRT Sequences to a reference genome

SYNOPSIS

       blasr reads.bam genome.fasta --bam --out out.bam

       blasr reads.fasta genome.fasta

       blasr reads.fasta genome.fasta --sa genome.fasta.sa

       blasr reads.bax.h5 genome.fasta [--sa genome.fasta.sa]

       blasr reads.bax.h5 genome.fasta --sa genome.fasta.sa --maxScore 100 --minMatch 15 ...

       blasr reads.bax.h5 genome.fasta --sa genome.fasta.sa --nproc 24 --out alignment.out ...

DESCRIPTION

       blasr  is  a  read  mapping  program  that  maps reads to positions in a genome by clustering short exact
       matches between the read and the genome, and scoring clusters using alignment. The matches are  generated
       by  searching all suffixes of a read against the genome using a suffix array. Global chaining methods are
       used to score clusters of matches.

       The only required inputs to blasr are a file of reads and a reference genome.  It is exremely  useful  to
       have read filtering information, and mapping runtime may decrease substantially when a precomputed suffix
       array index on the reference sequence is specified.

       Although reads may be input in FASTA format, the recommended input is  PacBio  BAM  files  because  these
       contain  quality  value  information  that  is  used in the alignment and produces higher quality variant
       detection.  Although alignments can be output in various formats, the recommended output format is PacBio
       BAM.   Support to bax.h5 and plx.h5 files will be DEPRECATED.  Support to region tables for h5 files will
       be DEPRECATED.

       When suffix array index of a genome is  not  specified,  the  suffix  array  is  built  before  producing
       alignment.    This  may  be  prohibitively  slow  when  the  genome is large (e.g. Human).  It is best to
       precompute the suffix array of a genome using the program sawriter, and then specify the suffix array  on
       the command line using -sa genome.fa.sa.

       The  optional  parameters  are  roughly  divided into three categories: control over anchoring, alignment
       scoring, and output.

       The default anchoring parameters are optimal for small genomes and samples with up to 5% divergence  from
       the  reference  genome.   The  main parameter governing speed and sensitivity is the -minMatch parameter.
       For human genome alignments, a value of 11 or higher is recommended.  Several  methods  may  be  used  to
       speed up alignments, at the expense of possibly decreasing sensitivity.

       Regions  that are too repetitive may be ignored during mapping by limiting the number of positions a read
       maps to with the -maxAnchorsPerPosition option.  Values between 500 and 1000 are effective in  the  human
       genome.

       For  small  genomes  such as bacterial genomes or BACs, the default parameters are sufficient for maximal
       sensitivity and good speed.

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.