lunar (1) miniasm.1.gz

Provided by: miniasm_0.3+dfsg-4_amd64 bug

NAME

       miniasm - de novo assembler for long read sequences

SYNOPSIS

       miniasm  [-b12VR]  [-m  minMatch]  [-i  minIden] [-s minSpan] [-c minCov] [-o minOvlp] [-h
       maxHang] [-I intThres] [-g maxGapDiff] [-d maxBubDist] [-e minUtgSize] [-f  readFile]  [-n
       nRounds] [-r dropRatio] [-F finalDropRatio] [-p outputInfo] mapping.paf > output.gfa

DESCRIPTION

       Miniasm  is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-
       all read self-mappings in the PAF format as input and outputs an assembly graph in the GFA
       format.  Different  from mainstream assemblers, miniasm does not have a consensus step. It
       simply concatenates pieces of read sequences to generate the final unitig sequences.  Thus
       the per-base error rate is similar to the raw input reads.

OPTIONS

   Preselection options
       -R        Pre-filter  clearly  contained  short  reads.  In this mode, mapping.paf is read
                 twice. The first pass identifies contained reads without loading  hits  to  RAM;
                 the  second  pass  skips  contained reads and load the rest into RAM. Due to the
                 2-pass behavior, the peak RAM is greatly reduced, but mapping.paf has  to  be  a
                 normal  file,  not  a  stream.  When this option is in use, it is recommended to
                 reduce -c to 2, as there are fewer  reads  after  pre-filtering.  Applying  -Rc2
                 sometimes improves assembly.

       -m INT    Drop  mappings  having  less  than INT matching bases (col10 in PAF) [100]. This
                 option has the same role as -L of minimap.

       -s INT    Drop mappings shorter than INT-bp [1000]. This option also  affects  the  second
                 round of read filtering and minimal overlap length.

       -i FLOAT  During  read  filtering,  ignore  mappings  with col10/col11 below FLOAT [0.05].
                 Ignored mappings are still used for read overlaps.

       -c INT    Minimal coverage by other reads [3]. In the first round  of  filtering,  miniasm
                 finds  the  longest region covered by INT or more reads. In the second round, it
                 in addition requires each remaining base to be covered by  INT  bases  at  least
                 minSpan/2 from the ends of other reads.

   Overlapping options
       -o INT    Minimal overlap length [same as minSpan]

       -h INT    Maximum overhang length [1000]. An overhang is an unmapped region that should be
                 mapped given a true overlap or true containment. If the overhang  is  too  long,
                 the mapping is considered an internal match and will be ignored.

       -I FLOAT  Minimal  ratio  of  mapping  length  to  mapping+overhang  length  for a mapping
                 considered a containment or an overlap [0.8]. This option has a similar role  to
                 -h, except that it controls the ratio, not length.

   Graph layout options
       -g INT    Maximal gap differences between two reads in a mapping [1000]. This parameter is
                 only used for transitive reduction.

       -d INT    Maximal probing distance for bubble popping [50000].  Bubbles  longer  than  INT
                 will not be popped.

       -e INT    A  unitig  is  considered  small  if  it is composed of less than INT reads [4].
                 Miniasm may try to remove small unitigs at various steps.

       -f FILE   Read sequence file in FASTA or FASTQ  format  for  generating  unitig  sequences
                 [null].  If  this  option  is  absent,  miniasm  produces  a  GFA output without
                 sequences.

       -r FLOAT1,[FLOAT2]
                 Max and min overlap drop ratio  [0.7,0.5].  Let  overlap(v->w)  be  the  overlap
                 length of edge v->w and maxovlp(v)=max_w{overlap(v->w)} be the length of largest
                 overlap. Miniasm drops overlap  v->w  if  overlap(v->w)/maxovlp(v)  is  below  a
                 threshold  controlled  by  this  option. Miniasm applies nRounds rounds of short
                 overlap removal with an increasing threshold between FLOAT1 and FLOAT2.

       -n INT    Rounds of short overlap removal [3].

       -F FLOAT  Overlap drop ratio threshold after short unitig removal [0.8]

   Miscellaneous options
       -b        Indicate that in the input, the same mapping is likely to be given twice

       -1        Skip the first round of pre-assembly read selection

       -2        Skip the second round of pre-assembly read selection

       -p STR    Output information and format [ug]. Possible STR values  include  -  bed:  post-
                 filtered  read  regions  in  the BED format; paf: mappings between post-filtered
                 reads; sg: read overlap graph in the GFA format; ug: unitig  graph  in  the  GFA
                 format.

       -V        Print version number to stdout

INPUT FORMAT

       Miniasm  reads  mapping  positions  in  the Pairwise mApping Format (PAF), which is a TAB-
       delimited text format with each line consisting of at least 12 fields as are described  in
       the following table:

              ┌────┬────────┬─────────────────────────────────────────────────────────────┐
              │ColTypeDescription                         │
              ├────┼────────┼─────────────────────────────────────────────────────────────┤
              │  1 │ string │ Query sequence name                                         │
              │  2 │  int   │ Query sequence length                                       │
              │  3 │  int   │ Query start coordinate (0-based)                            │
              │  4 │  int   │ Query end coordinate (0-based)                              │
              │  5 │  char  │ `+' if query and target on the same strand; `-' if opposite │
              │  6 │ string │ Target sequence name                                        │
              │  7 │  int   │ Target sequence length                                      │
              │  8 │  int   │ Target start coordinate on the original strand              │
              │  9 │  int   │ Target end coordinate on the original strand                │
              │ 10 │  int   │ Number of matching bases in the mapping                     │
              │ 11 │  int   │ Number bases, including gaps, in the mapping                │
              │ 12 │  int   │ Mapping quality (0-255 with 255 for missing)                │
              └────┴────────┴─────────────────────────────────────────────────────────────┘

       Please see minimap(1) for the detailed description of each field.

OUTPUT FORMAT

       Miniasm  outputs  the  assembly  in the Graphical Fragment Assembly format (GFA).  It is a
       line based TAB-delimited format, with the leading letter indicates the type of  the  line.
       The following table gives the line types used by miniasm:

              ┌─────┬─────────────┬──────────────────────────────────────────────────────┐
              │LineCommentFixed fields                     │
              ├─────┼─────────────┼──────────────────────────────────────────────────────┤
              │ H   │ Header      │ N/A                                                  │
              │ S   │ Segment     │ segName segSeq                                       │
              │ L   │ Overlap     │ segName1 segOri1 segName2 segOri2 ovlpCIGAR          │
              │ a   │ Golden path │ utgName utgStart readName:rStart-rEnd readOri incLen │
              └─────┴─────────────┴──────────────────────────────────────────────────────┘

       An  `a'  line indicates that the unitig subsequence in [utgStart,utgStart+incLen) is taken
       from read readName in region [rStart-1,rStart-1+incLen).  It is not a standard  GFA  line.
       An  `x'  line gives a brief summary of each unitig, which can be inferred from `S' and `a'
       lines.

SEE ALSO

       minimap(1)