lunar (1) minimap.1.gz

Provided by: minimap_0.2-7_amd64 bug

NAME

       minimap - fast mapping between long DNA sequences

SYNOPSIS

       minimap  [-lSOV]  [-k  kmer]  [-w  winSize] [-I batchSize] [-d dumpFile] [-f occThres] [-r
       bandWidth] [-m minShared] [-c minCount] [-L  minMatch]  [-g  maxGap]  [-T  dustThres]  [-t
       nThreads] [-x preset] target.fa query.fa > output.paf

DESCRIPTION

       Minimap  is  a tool to efficiently find multiple approximate mapping positions between two
       sets of long sequences, such as between reads and reference genomes, between  genomes  and
       between  long  noisy  reads.  Minimap has an indexing and a mapping phase. In the indexing
       phase, it collects all minimizers of a large batch of target sequences in a hash table; in
       the  mapping  phase,  it identifies good clusters of colinear minimizer hits. Minimap does
       not generate detailed alignments between the target  and  the  query  sequences.  It  only
       outputs the approximate start and the end coordinates of these clusters.

OPTIONS

   Indexing options
       -k INT    Minimizer k-mer length [15]

       -w INT    Minimizer  window  size [2/3 of k-mer length]. A minimizer is the smallest k-mer
                 in a window of w consecutive k-mers.

       -I NUM    Load at most NUM target bases into RAM for indexing [4G]. If there are more than
                 NUM  bases in target.fa, minimap needs to read query.fa multiple times to map it
                 against each batch of target sequences.  NUM may be ending with k/K/m/M/g/G.

       -d FILE   Dump minimizer index to FILE [no dump]

       -l        Indicate that target.fa is in fact a minimizer index generated by option -d, not
                 a FASTA or FASTQ file.

   Mapping options
       -f FLOAT  Ignore top FLOAT fraction of most occurring minimizers [0.001]

       -r INT    Approximate  bandwidth  for initial minimizer hits clustering [500]. A minimizer
                 hit is a minimizer present in both the target and query sequences.  A  minimizer
                 hit  cluster  is a group of potentially colinear minimizer hits between a target
                 and a query sequence.

       -m FLOAT  Merge initial minimizer hit clusters if FLOAT or higher fraction  of  minimizers
                 are shared between the clusters [0.5]

       -c INT    Retain a minimizer hit cluster if it contains INT or more minimizer hits [4]

       -L INT    Discard a minimizer hit cluster if after colinearization, the number of matching
                 bases is below INT [40]. This option mainly reduces the size of output.  It  has
                 little effect on the speed and peak memory.

       -g INT    Split  a  minimizer  hit cluster at a gap INT-bp or longer that does not contain
                 any minimizer hits [10000]

       -T INT    Mask regions on query sequences with SDUST score threshold  INT;  0  to  disable
                 [0].  SDUST  is  an algorithm to identify low-complexity subsequences. It is not
                 enabled by default. If SDUST  is  preferred,  a  value  between  20  and  25  is
                 recommended. A higher threshold masks less sequences.

       -S        Perform  all-vs-all  mapping.  In  this  mode,  if  the  query  sequence name is
                 lexicographically larger than the target sequence name, the  hits  between  them
                 will  be  suppressed; if the query sequence name is the same as the target name,
                 diagonal minimizer hits will also be suppressed.

       -O        Drop a minimizer hit if it is far away  from  other  hits  (EXPERIMENTAL).  This
                 option is useful for mapping long chromosomes from two diverged species.

       -x STR    Changing  multiple  settings  based on STR [not set]. It is recommended to apply
                 this option before other options, such that the following options  may  override
                 the multiple settings modified by this option.

                 ava10k  for PacBio or Oxford Nanopore all-vs-all read mapping (-Sw5 -L100 -m0).

   Input/output options
       -t INT    Number  of  threads  [3].  Minimap  uses  at  most three threads when collecting
                 minimizers on target sequences, and uses up to INT+1 threads when  mapping  (the
                 extra thread is for I/O, which is frequently idle and takes little CPU time).

       -V        Print version number to stdout

OUTPUT FORMAT

       Minimap  outputs  mapping  positions  in  the Pairwise mApping Format (PAF). PAF is a TAB-
       delimited text format with each line consisting of at least 12 fields as are described  in
       the following table:

              ┌────┬────────┬─────────────────────────────────────────────────────────────┐
              │ColTypeDescription                         │
              ├────┼────────┼─────────────────────────────────────────────────────────────┤
              │  1 │ string │ Query sequence name                                         │
              │  2 │  int   │ Query sequence length                                       │
              │  3 │  int   │ Query start coordinate (0-based)                            │
              │  4 │  int   │ Query end coordinate (0-based)                              │
              │  5 │  char  │ `+' if query and target on the same strand; `-' if opposite │
              │  6 │ string │ Target sequence name                                        │
              │  7 │  int   │ Target sequence length                                      │
              │  8 │  int   │ Target start coordinate on the original strand              │
              │  9 │  int   │ Target end coordinate on the original strand                │
              │ 10 │  int   │ Number of matching bases in the mapping                     │
              │ 11 │  int   │ Number bases, including gaps, in the mapping                │
              │ 12 │  int   │ Mapping quality (0-255 with 255 for missing)                │
              └────┴────────┴─────────────────────────────────────────────────────────────┘

       When  the  alignment  is  available, column 11 gives the total number of sequence matches,
       mismatches and gaps in the alignment; column 10 divided by column 11 gives  the  alignment
       identity.  As  minimap  does  not  generate  detailed  alignment,  these  two  columns are
       approximate. PAF may optionally have additional fields in  the  SAM-like  typed  key-value
       format. Minimap writes the number of minimizer hits in a cluster to the cm tag.

SEE ALSO

       miniasm(1)