bionic (1) minimap.1.gz

Provided by: minimap_0.2-3_amd64 bug

NAME

       minimap - fast mapping between long DNA sequences

SYNOPSIS

       minimap  [-lSOV]  [-k  kmer]  [-w  winSize] [-I batchSize] [-d dumpFile] [-f occThres] [-r bandWidth] [-m
       minShared] [-c minCount] [-L minMatch] [-g maxGap] [-T dustThres] [-t  nThreads]  [-x  preset]  target.fa
       query.fa > output.paf

DESCRIPTION

       Minimap  is  a  tool  to efficiently find multiple approximate mapping positions between two sets of long
       sequences, such as between reads and reference genomes, between genomes and  between  long  noisy  reads.
       Minimap has an indexing and a mapping phase. In the indexing phase, it collects all minimizers of a large
       batch of target sequences in a hash table; in the mapping phase, it identifies good clusters of  colinear
       minimizer hits. Minimap does not generate detailed alignments between the target and the query sequences.
       It only outputs the approximate start and the end coordinates of these clusters.

OPTIONS

   Indexing options
       -k INT    Minimizer k-mer length [15]

       -w INT    Minimizer window size [2/3 of k-mer length]. A minimizer is the smallest k-mer in a window of w
                 consecutive k-mers.

       -I NUM    Load  at  most NUM target bases into RAM for indexing [4G]. If there are more than NUM bases in
                 target.fa, minimap needs to read query.fa multiple times to map it against each batch of target
                 sequences.  NUM may be ending with k/K/m/M/g/G.

       -d FILE   Dump minimizer index to FILE [no dump]

       -l        Indicate  that  target.fa  is  in fact a minimizer index generated by option -d, not a FASTA or
                 FASTQ file.

   Mapping options
       -f FLOAT  Ignore top FLOAT fraction of most occurring minimizers [0.001]

       -r INT    Approximate bandwidth for initial minimizer  hits  clustering  [500].  A  minimizer  hit  is  a
                 minimizer present in both the target and query sequences. A minimizer hit cluster is a group of
                 potentially colinear minimizer hits between a target and a query sequence.

       -m FLOAT  Merge initial minimizer hit clusters if FLOAT or  higher  fraction  of  minimizers  are  shared
                 between the clusters [0.5]

       -c INT    Retain a minimizer hit cluster if it contains INT or more minimizer hits [4]

       -L INT    Discard a minimizer hit cluster if after colinearization, the number of matching bases is below
                 INT [40]. This option mainly reduces the size of output. It has little effect on the speed  and
                 peak memory.

       -g INT    Split  a  minimizer  hit  cluster at a gap INT-bp or longer that does not contain any minimizer
                 hits [10000]

       -T INT    Mask regions on query sequences with SDUST score threshold INT; 0 to disable [0]. SDUST  is  an
                 algorithm  to  identify  low-complexity subsequences. It is not enabled by default. If SDUST is
                 preferred, a value between 20 and 25 is recommended. A higher threshold masks less sequences.

       -S        Perform all-vs-all mapping. In this mode, if  the  query  sequence  name  is  lexicographically
                 larger  than  the  target sequence name, the hits between them will be suppressed; if the query
                 sequence name is the same as the target name, diagonal minimizer hits will also be suppressed.

       -O        Drop a minimizer hit if it is far away from other hits (EXPERIMENTAL). This  option  is  useful
                 for mapping long chromosomes from two diverged species.

       -x STR    Changing  multiple  settings  based  on  STR  [not set]. It is recommended to apply this option
                 before other options, such that the  following  options  may  override  the  multiple  settings
                 modified by this option.

                 ava10k  for PacBio or Oxford Nanopore all-vs-all read mapping (-Sw5 -L100 -m0).

   Input/output options
       -t INT    Number  of threads [3]. Minimap uses at most three threads when collecting minimizers on target
                 sequences, and uses up to INT+1 threads when mapping (the extra thread is  for  I/O,  which  is
                 frequently idle and takes little CPU time).

       -V        Print version number to stdout

OUTPUT FORMAT

       Minimap  outputs  mapping  positions  in  the  Pairwise mApping Format (PAF). PAF is a TAB-delimited text
       format with each line consisting of at least 12 fields as are described in the following table:

                     ┌────┬────────┬─────────────────────────────────────────────────────────────┐
                     │ColTypeDescription                         │
                     ├────┼────────┼─────────────────────────────────────────────────────────────┤
                     │  1 │ string │ Query sequence name                                         │
                     │  2 │  int   │ Query sequence length                                       │
                     │  3 │  int   │ Query start coordinate (0-based)                            │
                     │  4 │  int   │ Query end coordinate (0-based)                              │
                     │  5 │  char  │ `+' if query and target on the same strand; `-' if opposite │
                     │  6 │ string │ Target sequence name                                        │
                     │  7 │  int   │ Target sequence length                                      │
                     │  8 │  int   │ Target start coordinate on the original strand              │
                     │  9 │  int   │ Target end coordinate on the original strand                │
                     │ 10 │  int   │ Number of matching bases in the mapping                     │
                     │ 11 │  int   │ Number bases, including gaps, in the mapping                │
                     │ 12 │  int   │ Mapping quality (0-255 with 255 for missing)                │
                     └────┴────────┴─────────────────────────────────────────────────────────────┘

       When the alignment is available, column 11 gives the total number of  sequence  matches,  mismatches  and
       gaps  in  the alignment; column 10 divided by column 11 gives the alignment identity. As minimap does not
       generate detailed alignment, these two columns are approximate. PAF may optionally have additional fields
       in  the  SAM-like typed key-value format. Minimap writes the number of minimizer hits in a cluster to the
       cm tag.

SEE ALSO

       miniasm(1)