xenial (1) miniasm.1.gz

Provided by: miniasm_0.2+dfsg-1_amd64 bug

NAME

       miniasm - de novo assembler for long read sequences

SYNOPSIS

       miniasm  [-b12V]  [-m  minMatch]  [-i  minIden]  [-s  minSpan]  [-c minCov] [-o minOvlp] [-h maxHang] [-I
       intThres] [-g maxGapDiff] [-d maxBubDist] [-e minUtgSize] [-f readFile] [-n nRounds] [-r  dropRatio]  [-F
       finalDropRatio] [-p outputInfo] mapping.paf > output.gfa

DESCRIPTION

       Miniasm  is  a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-
       mappings in the PAF format as input and outputs an assembly graph  in  the  GFA  format.  Different  from
       mainstream  assemblers,  miniasm  does  not  have a consensus step. It simply concatenates pieces of read
       sequences to generate the final unitig sequences. Thus the per-base error rate  is  similar  to  the  raw
       input reads.

OPTIONS

   Preselection options
       -m INT    Drop  mappings  having  less  than INT matching bases (col10 in PAF) [100]. This option has the
                 same role as -L of minimap.

       -s INT    Drop mappings shorter than INT-bp [1000]. This option also affects the  second  round  of  read
                 filtering and minimal overlap length.

       -i FLOAT  During  read  filtering,  ignore mappings with col10/col11 below FLOAT [0.05]. Ignored mappings
                 are still used for read overlaps.

       -c INT    Minimal coverage by other reads [3]. In the first round of filtering, miniasm finds the longest
                 region  covered  by  INT  or  more  reads.  In  the  second round, it in addition requires each
                 remaining base to be covered by INT bases at least minSpan/2 from the ends of other reads.

   Overlapping options
       -o INT    Minimal overlap length [same as minSpan]

       -h INT    Maximum overhang length [1000]. An overhang is an unmapped region that should be mapped given a
                 true  overlap  or  true  containment. If the overhang is too long, the mapping is considered an
                 internal match and will be ignored.

       -I FLOAT  Minimal ratio of  mapping  length  to  mapping+overhang  length  for  a  mapping  considered  a
                 containment  or an overlap [0.8]. This option has a similar role to -h, except that it controls
                 the ratio, not length.

   Graph layout options
       -g INT    Maximal gap differences between two reads in a mapping [1000]. This parameter is only used  for
                 transitive reduction.

       -d INT    Maximal  probing  distance  for  bubble  popping  [50000].  Bubbles longer than INT will not be
                 popped.

       -e INT    A unitig is considered small if it is composed of less than INT reads [4]. Miniasm may  try  to
                 remove small unitigs at various steps.

       -f FILE   Read  sequence  file  in  FASTA or FASTQ format for generating unitig sequences [null]. If this
                 option is absent, miniasm produces a GFA output without sequences.

       -r FLOAT1,[FLOAT2]
                 Max and min overlap drop ratio [0.7,0.5]. Let overlap(v->w) be the overlap length of edge  v->w
                 and  maxovlp(v)=max_w{overlap(v->w)}  be  the  length of largest overlap. Miniasm drops overlap
                 v->w if overlap(v->w)/maxovlp(v) is below a threshold controled by this option. Miniasm applies
                 nRounds rounds of short overlap removal with an increasing threshold between FLOAT1 and FLOAT2.

       -n INT    Rounds of short overlap removal [3].

       -F FLOAT  Overlap drop ratio threshold after short unitig removal [0.8]

   Miscellaneous options
       -b        Indicate that in the input, the same mapping is likely to be given twice

       -1        Skip the first round of pre-assembly read selection

       -2        Skip the second round of pre-assembly read selection

       -p STR    Output  information  and  format  [ug].  Possible  STR values include - bed: post-filtered read
                 regions in the BED format; paf: mappings between post-filtered reads; sg: read overlap graph in
                 the GFA format; ug: unitig graph in the GFA format.

       -V        Print version number to stdout

INPUT FORMAT

       Miniasm  reads  mapping  positions  in  the  Pairwise mApping Format (PAF), which is a TAB-delimited text
       format with each line consisting of at least 12 fields as are described in the following table:

                     ┌────┬────────┬─────────────────────────────────────────────────────────────┐
                     │ColTypeDescription                         │
                     ├────┼────────┼─────────────────────────────────────────────────────────────┤
                     │  1 │ string │ Query sequence name                                         │
                     │  2 │  int   │ Query sequence length                                       │
                     │  3 │  int   │ Query start coordinate (0-based)                            │
                     │  4 │  int   │ Query end coordinate (0-based)                              │
                     │  5 │  char  │ `+' if query and target on the same strand; `-' if opposite │
                     │  6 │ string │ Target sequence name                                        │
                     │  7 │  int   │ Target sequence length                                      │
                     │  8 │  int   │ Target start coordinate on the original strand              │
                     │  9 │  int   │ Target end coordinate on the original strand                │
                     │ 10 │  int   │ Number of matching bases in the mapping                     │
                     │ 11 │  int   │ Number bases, including gaps, in the mapping                │
                     │ 12 │  int   │ Mapping quality (0-255 with 255 for missing)                │
                     └────┴────────┴─────────────────────────────────────────────────────────────┘

       Please see minimap(1) for the detailed description of each field.

OUTPUT FORMAT

       Miniasm outputs the assembly in the Graphical Fragment Assembly format (GFA).  It is a  line  based  TAB-
       delimited  format,  with the leading letter indicates the type of the line. The following table gives the
       line types used by miniasm:

                       ┌─────┬─────────────┬────────────────────────────────────────────────────┐
                       │LineCommentFixed fields                    │
                       ├─────┼─────────────┼────────────────────────────────────────────────────┤
                       │ H   │ Header      │ N/A                                                │
                       │ S   │ Segment     │ segName segSeq                                     │
                       │ L   │ Overlap     │ segName1 segOri1 segName2 segOri2 ovlpCIGAR        │
                       │ a   │ Golden path │ utgName utgStart readName:start-end readOri length │
                       └─────┴─────────────┴────────────────────────────────────────────────────┘

SEE ALSO

       minimap(1)