Provided by: mmseqs2_15-6f452+ds-2_amd64 

NAME
MMseqs2 - MMseqs2 (Many against Many sequence searching): fast, parallelized protein sequence searches
and clustering of huge protein sequence data sets.
SYNOPSIS
mmseqs <module> args
DESCRIPTION
MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge
proteins/nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for
Linux, MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple
cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST. At
100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the
same sensitivity as PSI-BLAST at over 400 times its speed.
The following depicts the different <module> that can be used.
Easy workflows (for non-experts)
An example for running a command using easy-* modules would be mmseqs easy-search <DB> <targetDB>
easy-search
Search with a query fasta against target fasta (or database) and return a BLAST-compatible result
in a single step
easy-linsearch
Linear time search with a query fasta against target fasta (or database) and return a
BLAST-compatible result in a single step
easy-linclust
Compute clustering of a fasta/fastq database in linear time. The workflow outputs the
representative sequences, a cluster tsv and a fasta-like format containing all sequences.
easy-cluster
Compute clustering of a fasta database. The workflow outputs the representative sequences, a
cluster tsv and a fasta-like format containing all sequences.
easy-taxonomy
Compute taxonomy and lowest common ancestor for each sequence. The workflow outputs a taxonomic
classification for sequences and a hierarchical summery report.
Main tools (for non-experts)
createdb
Convert protein sequence set in a FASTA file to MMseqs sequence DB format
search
Search with query sequence or profile DB (iteratively) through target sequence DB
linsearch
Search with query sequence DB through target sequence DB
map
Fast ungapped mapping of query sequences to target sequences.
cluster
Compute clustering of a sequence DB (quadratic time)
linclust
Cluster sequences of >30% sequence identity *in linear time*
createindex
Precompute index table of sequence DB for faster searches
createlinindex
Precompute index for linsearch
enrich
Enrich a query set by searching iteratively through a profile sequence set.
rbh
Find reciprocal best hits between query and target
clusterupdate
Update clustering of old sequence DB to clustering of new sequence DB
Utility tools for format conversions
createtsv
Create tab-separated flat file from prefilter DB, alignment DB, cluster DB, or taxa DB
convertalis
Convert alignment DB to BLAST-tab format or specified custom-column output format
convertprofiledb
Convert ffindex DB of HMM files to profile DB
convert2fasta
Convert sequence DB to FASTA format
result2flat
Create a FASTA-like flat file from prefilter DB, alignment DB, or cluster DB
createseqfiledb
Create DB of unaligned FASTA files (1 per cluster) from sequence DB and cluster DB
Taxonomy tools
taxonomy
Compute taxonomy and lowest common ancestor for each sequence.
createtaxdb
Annotates a sequence database with NCBI taxonomy information
addtaxonomy
Add taxonomy information to result database.
lca
Compute the lowest common ancestor from a set of taxa.
taxonomyreport
Create Kraken-style taxonomy report.
filtertaxdb
Filter taxonomy database.
Multi-hit search tools
multihitdb
Create sequence database and associated metadata for multi hit searches
multihitsearch
Search with a grouped set of sequences against another grouped set
besthitperset
For each set of sequences compute the best element and updates the p-value
combinepvalperset
For each set compute the combined p-value
summerizeresultsbyset
For each set compute summary statistics, such as spread-pvalue etc.
resultsbyset
For each set compute the combined p-value
mergeresultsbyset
Merge results from multiple orfs back to their respective contig
Utility tools for clustering
mergeclusters
Merge multiple cluster DBs into single cluster DB
Core tools (for advanced users)
prefilter
Search with query sequence / profile DB through target DB (k-mer matching + ungapped alignment)
ungappedprefilter
Search with query sequence / profile DB through target DB and compute optimal ungapped alignment
score
align
Compute Smith-Waterman alignments for previous results (e.g. prefilter DB, cluster DB)
alignall
Compute all against all Smith-Waterman alignments for a results (e.g. prefilter DB, cluster DB)
transitivealign
Transfers alignments by transitivity via a center star alignment
clust
Cluster sequence DB from alignment DB (e.g. created by searching DB against itself)
kmermatcher
Finds exact $k$-mers matches between sequences
kmersearch
Search with query sequence through target DB. (k-mer matching)
kmerindexdb
Finds exact $k$-mers matches between sequences and stores them as index
clusthash
Cluster sequences of same length and >90% sequence identity *in linear time*
Utility tools to manipulate DBs
compress
Compresses a database.
decompress
Decompresses a database.
apply
Passes each input database entry to stdin of the specified program, executes it and writes its
stdout to the output database.
extractorfs
Extract open reading frames from all six frames from nucleotide sequence DB
extractframes
Extract frames reading frames from a nucleotide sequence DB
orftocontig
Obtain location information of extracted orfs with respect to their contigs in alignment format
reverseseq
Reverse each sequence in a DB
touchdb
Memory map database
translatenucs
Translate nucleotide sequence DB into protein sequence DB
translateaa
Translate protein sequence into nucleotide sequence DB
swapresults
Reformat prefilter or alignment DB as if target DB had been searched through query DB
swapdb
Create a DB where the key is from the first column of the input result DB
mergedbs
Merge multiple DBs into a single DB, based on IDs (names) of entries
splitdb
Split a mmseqs DB into multiple DBs
splitsequence
Split sequences by length
subtractdbs
Generate a DB with entries of first DB not occurring in second DB
filterdb
Filter a DB by conditioning (regex, numerical, ...) on one of its whitespace-separated columns
createsubdb
Create a subset of a DB from a file of IDs of entries
view
Prints entries to console
rmdb
Removes the database
mvdb
Move the database
result2profile
Compute profile and consensus DB from a prefilter, alignment or cluster DB
result2pp
Merge the query profiles with target profiles according to search results and outputs an enriched
profile DB
result2rbh
Filter a merged result DB to retain only reciprocal best hits
result2msa
Generate MSAs for queries by locally aligning their matched targets in prefilter/alignment/cluster
DB
convertmsa
Turns an MSA file into an MSA database.
msa2profile
Turns an MSA database into a MMseqs profile database.
profile2pssm
Converts a profile database into a human readable tab-separated PSSM file.
profile2cs
Converts a profile database into a column state sequence.
result2stats
Compute statistics for each entry in a sequence, prefilter, alignment or cluster DB
proteinaln2nucl
Map protein alignment to nucleotide alignment
tsv2db
Turns a TSV file into a MMseqs database
result2repseq
Get representative sequences for a result database
Special-purpose utilities
rescorediagonal
Compute sequence identity for diagonal
alignbykmer
Predict sequence identity, score, alignment start and end by kmer alignment
diffseqdbs
Find IDs of sequences kept, added and removed between two versions of sequence DB
concatdbs
Concatenate two DBs, giving new IDs to entries from second input DB
sortresult
Sort a result database in the same order as prefilter or align would.
summarizealis
Summarize alignment results into a single show uniq. coverage, coverage and avg. sequence identity
summarizeresult
Extract annotations from alignment DB
summarizetabs
Extract annotations from HHblits BAST-tab-formatted results
gff2db
Turn a gff3 (generic feature format) file into a gff3 DB
masksequence
Soft mask sequences using tantan, low. complex regions in lower case the rest upper
maskbygff
X out sequence regions in a sequence DB by features in a gff3 file
prefixid
For each entry in a DB prepend the entry ID to the entry itself
suffixid
For each entry in a DB append the entry ID to the entry itself
convertkb
Convert UniProt knowledge base files into MMseqs2 database format for the selected column types
summarizeheaders
Return a new summarized header DB from the UniProt headers of a cluster DB
extractalignedregion
Extract aligned sequence region from query
extractdomains
Extract highest scoring alignment region for each sequence from BLAST-tab file
convertca3m
Converts a cA3M database into a MMseqs2 result database.
expandaln
Expands an alignment result based on another.
countkmer
Simple kmer counter, it prints the numeric, alphanumeric representation and kmercount
MMseqs2 (Many against Many sequence searching). July 2019 MMSEQS2(1)