Ubuntu Manpage: MMseqs2 - MMseqs2 (Many against Many sequence searching): fast, parallelized protein

Provided by: mmseqs2_15-6f452+ds-2_amd64

NAME

       MMseqs2  -  MMseqs2  (Many  against  Many  sequence searching): fast, parallelized protein
       sequence searches and clustering of huge protein sequence data sets.

SYNOPSIS

       mmseqs <module> args

DESCRIPTION

       MMseqs2 (Many-against-Many sequence searching) is a software suite to search  and  cluster
       huge  proteins/nucleotide  sequence  sets.  MMseqs2  is  open source GPL-licensed software
       implemented in C++ for Linux, MacOS, and  (as  beta  version,  via  cygwin)  Windows.  The
       software  is  designed  to  run  on  multiple  cores  and  servers  and exhibits very good
       scalability. MMseqs2 can run 10000 times faster than BLAST.  At 100  times  its  speed  it
       achieves  almost  the  same  sensitivity.  It  can  perform profile searches with the same
       sensitivity as PSI-BLAST at over 400 times its speed.

       The following depicts the different <module> that can be used.

       Easy workflows (for non-experts)

       An example for running a command using easy-* modules would  be  mmseqs  easy-search  <DB>
       <targetDB>

       easy-search
              Search  with  a  query  fasta  against  target  fasta  (or  database)  and return a
              BLAST-compatible result in a single step

       easy-linsearch
              Linear time search with a query fasta against target fasta (or database) and return
              a BLAST-compatible result in a single step

       easy-linclust
              Compute  clustering  of a fasta/fastq database in linear time. The workflow outputs
              the representative sequences, a cluster tsv and a fasta-like format containing  all
              sequences.

       easy-cluster
              Compute  clustering  of  a  fasta database. The workflow outputs the representative
              sequences, a cluster tsv and a fasta-like format containing all sequences.

       easy-taxonomy
              Compute taxonomy and lowest common ancestor for each sequence. The workflow outputs
              a taxonomic classification for sequences and a hierarchical summery report.

       Main tools  (for non-experts)

       createdb
              Convert protein sequence set in a FASTA file to MMseqs sequence DB format

       search
              Search with query sequence or profile DB (iteratively) through target sequence DB

       linsearch
              Search with query sequence  DB through target sequence DB

       map
              Fast ungapped mapping of query sequences to target sequences.

       cluster
              Compute clustering of a sequence DB (quadratic time)

       linclust
              Cluster sequences of >30% sequence identity *in linear time*

       createindex
              Precompute index table of sequence DB for faster searches

       createlinindex
              Precompute index for linsearch

       enrich
              Enrich a query set by searching iteratively through a profile sequence set.

       rbh
              Find reciprocal best hits between query and target

       clusterupdate
              Update clustering of old sequence DB to clustering of new sequence DB

       Utility tools for format conversions

       createtsv
              Create tab-separated flat file from prefilter DB, alignment DB, cluster DB, or taxa
              DB

       convertalis
              Convert alignment DB to BLAST-tab format or specified custom-column output format

       convertprofiledb
              Convert ffindex DB of HMM files to profile DB

       convert2fasta
              Convert sequence DB to FASTA format

       result2flat
              Create a FASTA-like flat file from prefilter DB, alignment DB, or cluster DB

       createseqfiledb
              Create DB of unaligned FASTA files (1 per cluster) from sequence DB and cluster DB

       Taxonomy tools

       taxonomy
              Compute taxonomy and lowest common ancestor for each sequence.

       createtaxdb
              Annotates a sequence database with NCBI taxonomy information

       addtaxonomy
              Add taxonomy information to result database.

       lca
              Compute the lowest common ancestor from a set of taxa.

       taxonomyreport
              Create Kraken-style taxonomy report.

       filtertaxdb
              Filter taxonomy database.

       Multi-hit search tools

       multihitdb
              Create sequence database and associated metadata for multi hit searches

       multihitsearch
              Search with a grouped set of sequences against another grouped set

       besthitperset
              For each set of sequences compute the best element and updates the p-value

       combinepvalperset
              For each set compute the combined p-value

       summerizeresultsbyset
              For each set compute summary statistics, such as spread-pvalue etc.

       resultsbyset
              For each set compute the combined p-value

       mergeresultsbyset
              Merge results from multiple orfs back to their respective contig

       Utility tools for clustering

       mergeclusters
              Merge multiple cluster DBs into single cluster DB

       Core tools (for advanced users)

       prefilter
              Search with query sequence /  profile  DB  through  target  DB  (k-mer  matching  +
              ungapped alignment)

       ungappedprefilter
              Search  with  query  sequence  /  profile  DB through target DB and compute optimal
              ungapped alignment score

       align
              Compute Smith-Waterman alignments for previous results (e.g. prefilter DB,  cluster
              DB)

       alignall
              Compute all against all Smith-Waterman alignments for a results (e.g. prefilter DB,
              cluster DB)

       transitivealign
              Transfers alignments by transitivity via a center star alignment

       clust
              Cluster sequence DB from alignment DB (e.g. created by searching DB against itself)

       kmermatcher
              Finds exact $k$-mers matches between sequences

       kmersearch
              Search with query sequence through target DB.  (k-mer matching)

       kmerindexdb
              Finds exact $k$-mers matches between sequences and stores them as index

       clusthash
              Cluster sequences of same length and >90% sequence identity *in linear time*

       Utility tools to manipulate DBs

       compress
              Compresses a database.

       decompress
              Decompresses a database.

       apply
              Passes each input database entry to stdin of the specified program, executes it and
              writes its stdout to the output database.

       extractorfs
              Extract open reading frames from all six frames from nucleotide sequence DB

       extractframes
              Extract frames reading frames from a nucleotide sequence DB

       orftocontig
              Obtain  location  information  of  extracted  orfs with respect to their contigs in
              alignment format

       reverseseq
              Reverse each sequence in a DB

       touchdb
              Memory map database

       translatenucs
              Translate nucleotide sequence DB into protein sequence DB

       translateaa
              Translate protein sequence into nucleotide sequence DB

       swapresults
              Reformat prefilter or alignment DB as if target DB had been searched through  query
              DB

       swapdb
              Create a DB where the key is from the first column of the input result DB

       mergedbs
              Merge multiple DBs into a single DB, based on IDs (names) of entries

       splitdb
              Split a mmseqs DB into multiple DBs

       splitsequence
              Split sequences by length

       subtractdbs
              Generate a DB with entries of first DB not occurring in second DB

       filterdb
              Filter   a   DB   by   conditioning   (regex,   numerical,   ...)  on  one  of  its
              whitespace-separated columns

       createsubdb
              Create a subset of a DB from a file of IDs of entries

       view
              Prints entries to console

       rmdb
              Removes the database

       mvdb
              Move the database

       result2profile
              Compute profile and consensus DB from a prefilter, alignment or cluster DB

       result2pp
              Merge the query profiles with target  profiles  according  to  search  results  and
              outputs an enriched profile DB

       result2rbh
              Filter a merged result DB to retain only reciprocal best hits

       result2msa
              Generate   MSAs   for   queries  by  locally  aligning  their  matched  targets  in
              prefilter/alignment/cluster DB

       convertmsa
              Turns an MSA file into an MSA database.

       msa2profile
              Turns an MSA database into a MMseqs profile database.

       profile2pssm
              Converts a profile database into a human readable tab-separated PSSM file.

       profile2cs
              Converts a profile database into a column state sequence.

       result2stats
              Compute statistics for each entry in a sequence, prefilter, alignment or cluster DB

       proteinaln2nucl
              Map protein alignment to nucleotide alignment

       tsv2db
              Turns a TSV file into a MMseqs database

       result2repseq
              Get representative sequences for a result database

       Special-purpose utilities

       rescorediagonal
              Compute sequence identity for diagonal

       alignbykmer
              Predict sequence identity, score, alignment start and end by kmer alignment

       diffseqdbs
              Find IDs of sequences kept, added and removed between two versions of sequence DB

       concatdbs
              Concatenate two DBs, giving new IDs to entries from second input DB

       sortresult
              Sort a result database in the same order as prefilter or align would.

       summarizealis
              Summarize alignment results into a single show uniq. coverage,  coverage  and  avg.
              sequence identity

       summarizeresult
              Extract annotations from alignment DB

       summarizetabs
              Extract annotations from HHblits BAST-tab-formatted results

       gff2db
              Turn a gff3 (generic feature format) file into a gff3 DB

       masksequence
              Soft mask sequences using tantan, low. complex regions in lower case the rest upper

       maskbygff
              X out sequence regions in a sequence DB by features in a gff3 file

       prefixid
              For each entry in a DB prepend the entry ID to the entry itself

       suffixid
              For each entry in a DB append the entry ID to the entry itself

       convertkb
              Convert  UniProt knowledge base files into MMseqs2 database format for the selected
              column types

       summarizeheaders
              Return a new summarized header DB from the UniProt headers of a cluster DB

       extractalignedregion
              Extract aligned sequence region from query

       extractdomains
              Extract highest scoring alignment region for each sequence from BLAST-tab file

       convertca3m
              Converts a cA3M database into a MMseqs2 result database.

       expandaln
              Expands an alignment result based on another.

       countkmer
              Simple kmer  counter,  it  prints  the  numeric,  alphanumeric  representation  and
              kmercount

MMseqs2 (Many against Many sequence searchinJuly 2019                                  MMSEQS2(1)