Provided by: miniasm_0.3+dfsg-1_amd64 bug

NAME

       miniasm - de novo assembler for long read sequences

SYNOPSIS

       miniasm  [-b12VR]  [-m  minMatch]  [-i  minIden]  [-s  minSpan] [-c minCov] [-o minOvlp] [-h maxHang] [-I
       intThres] [-g maxGapDiff] [-d maxBubDist] [-e minUtgSize] [-f readFile] [-n nRounds] [-r  dropRatio]  [-F
       finalDropRatio] [-p outputInfo] mapping.paf > output.gfa

DESCRIPTION

       Miniasm  is  a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-
       mappings in the PAF format as input and outputs an assembly graph  in  the  GFA  format.  Different  from
       mainstream  assemblers,  miniasm  does  not  have a consensus step. It simply concatenates pieces of read
       sequences to generate the final unitig sequences. Thus the per-base error rate  is  similar  to  the  raw
       input reads.

OPTIONS

   Preselection options
       -R        Pre-filter  clearly  contained  short reads. In this mode, mapping.paf is read twice. The first
                 pass identifies contained reads without loading hits to RAM; the second  pass  skips  contained
                 reads  and load the rest into RAM. Due to the 2-pass behavior, the peak RAM is greatly reduced,
                 but mapping.paf has to be a normal file, not a stream. When  this  option  is  in  use,  it  is
                 recommended  to  reduce  -c  to  2, as there are fewer reads after pre-filtering. Applying -Rc2
                 sometimes improves assembly.

       -m INT    Drop mappings having less than INT matching bases (col10 in PAF) [100].  This  option  has  the
                 same role as -L of minimap.

       -s INT    Drop  mappings  shorter  than  INT-bp [1000]. This option also affects the second round of read
                 filtering and minimal overlap length.

       -i FLOAT  During read filtering, ignore mappings with col10/col11 below FLOAT  [0.05].  Ignored  mappings
                 are still used for read overlaps.

       -c INT    Minimal coverage by other reads [3]. In the first round of filtering, miniasm finds the longest
                 region covered by INT or more reads.  In  the  second  round,  it  in  addition  requires  each
                 remaining base to be covered by INT bases at least minSpan/2 from the ends of other reads.

   Overlapping options
       -o INT    Minimal overlap length [same as minSpan]

       -h INT    Maximum overhang length [1000]. An overhang is an unmapped region that should be mapped given a
                 true overlap or true containment. If the overhang is too long, the  mapping  is  considered  an
                 internal match and will be ignored.

       -I FLOAT  Minimal  ratio  of  mapping  length  to  mapping+overhang  length  for  a  mapping considered a
                 containment or an overlap [0.8]. This option has a similar role to -h, except that it  controls
                 the ratio, not length.

   Graph layout options
       -g INT    Maximal  gap differences between two reads in a mapping [1000]. This parameter is only used for
                 transitive reduction.

       -d INT    Maximal probing distance for bubble popping [50000].  Bubbles  longer  than  INT  will  not  be
                 popped.

       -e INT    A  unitig  is considered small if it is composed of less than INT reads [4]. Miniasm may try to
                 remove small unitigs at various steps.

       -f FILE   Read sequence file in FASTA or FASTQ format for generating unitig  sequences  [null].  If  this
                 option is absent, miniasm produces a GFA output without sequences.

       -r FLOAT1,[FLOAT2]
                 Max  and min overlap drop ratio [0.7,0.5]. Let overlap(v->w) be the overlap length of edge v->w
                 and maxovlp(v)=max_w{overlap(v->w)} be the length of largest  overlap.  Miniasm  drops  overlap
                 v->w  if  overlap(v->w)/maxovlp(v)  is  below  a  threshold  controlled by this option. Miniasm
                 applies nRounds rounds of short overlap removal with an increasing threshold between FLOAT1 and
                 FLOAT2.

       -n INT    Rounds of short overlap removal [3].

       -F FLOAT  Overlap drop ratio threshold after short unitig removal [0.8]

   Miscellaneous options
       -b        Indicate that in the input, the same mapping is likely to be given twice

       -1        Skip the first round of pre-assembly read selection

       -2        Skip the second round of pre-assembly read selection

       -p STR    Output  information  and  format  [ug].  Possible  STR values include - bed: post-filtered read
                 regions in the BED format; paf: mappings between post-filtered reads; sg: read overlap graph in
                 the GFA format; ug: unitig graph in the GFA format.

       -V        Print version number to stdout

INPUT FORMAT

       Miniasm  reads  mapping  positions  in  the  Pairwise mApping Format (PAF), which is a TAB-delimited text
       format with each line consisting of at least 12 fields as are described in the following table:

                     ┌────┬────────┬─────────────────────────────────────────────────────────────┐
                     │ColTypeDescription                         │
                     ├────┼────────┼─────────────────────────────────────────────────────────────┤
                     │  1 │ string │ Query sequence name                                         │
                     │  2 │  int   │ Query sequence length                                       │
                     │  3 │  int   │ Query start coordinate (0-based)                            │
                     │  4 │  int   │ Query end coordinate (0-based)                              │
                     │  5 │  char  │ `+' if query and target on the same strand; `-' if opposite │
                     │  6 │ string │ Target sequence name                                        │
                     │  7 │  int   │ Target sequence length                                      │
                     │  8 │  int   │ Target start coordinate on the original strand              │
                     │  9 │  int   │ Target end coordinate on the original strand                │
                     │ 10 │  int   │ Number of matching bases in the mapping                     │
                     │ 11 │  int   │ Number bases, including gaps, in the mapping                │
                     │ 12 │  int   │ Mapping quality (0-255 with 255 for missing)                │
                     └────┴────────┴─────────────────────────────────────────────────────────────┘

       Please see minimap(1) for the detailed description of each field.

OUTPUT FORMAT

       Miniasm outputs the assembly in the Graphical Fragment Assembly format (GFA).  It is a  line  based  TAB-
       delimited  format,  with the leading letter indicates the type of the line. The following table gives the
       line types used by miniasm:

                      ┌─────┬─────────────┬──────────────────────────────────────────────────────┐
                      │LineCommentFixed fields                     │
                      ├─────┼─────────────┼──────────────────────────────────────────────────────┤
                      │ H   │ Header      │ N/A                                                  │
                      │ S   │ Segment     │ segName segSeq                                       │
                      │ L   │ Overlap     │ segName1 segOri1 segName2 segOri2 ovlpCIGAR          │
                      │ a   │ Golden path │ utgName utgStart readName:rStart-rEnd readOri incLen │
                      └─────┴─────────────┴──────────────────────────────────────────────────────┘

       An `a' line indicates that the unitig  subsequence  in  [utgStart,utgStart+incLen)  is  taken  from  read
       readName  in region [rStart-1,rStart-1+incLen).  It is not a standard GFA line. An `x' line gives a brief
       summary of each unitig, which can be inferred from `S' and `a' lines.

SEE ALSO

       minimap(1)