Ubuntu Manpage: MMseqs2 - MMseqs2 (Many against Many sequence searching): fast, parallelized protein sequence searches

NAME

       MMseqs2  -  MMseqs2  (Many against Many sequence searching): fast, parallelized protein sequence searches
       and clustering of huge protein sequence data sets.

SYNOPSIS

       mmseqs <module> args

DESCRIPTION

       MMseqs2  (Many-against-Many  sequence  searching)  is  a  software  suite  to  search  and  cluster  huge
       proteins/nucleotide  sequence  sets.  MMseqs2 is open source GPL-licensed software implemented in C++ for
       Linux, MacOS, and (as beta version, via cygwin) Windows. The software is  designed  to  run  on  multiple
       cores  and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST.  At
       100 times its speed it achieves almost the same sensitivity. It can perform  profile  searches  with  the
       same sensitivity as PSI-BLAST at over 400 times its speed.

       The following depicts the different <module> that can be used.

       Easy workflows (for non-experts)

       An example for running a command using easy-* modules would be mmseqs easy-search <DB> <targetDB>

       easy-search
              Search  with a query fasta against target fasta (or database) and return a BLAST-compatible result
              in a single step

       easy-linsearch
              Linear time  search  with  a  query  fasta  against  target  fasta  (or  database)  and  return  a
              BLAST-compatible result in a single step

       easy-linclust
              Compute   clustering  of  a  fasta/fastq  database  in  linear  time.  The  workflow  outputs  the
              representative sequences, a cluster tsv and a fasta-like format containing all sequences.

       easy-cluster
              Compute clustering of a fasta database. The  workflow  outputs  the  representative  sequences,  a
              cluster tsv and a fasta-like format containing all sequences.

       easy-taxonomy
              Compute  taxonomy  and  lowest common ancestor for each sequence. The workflow outputs a taxonomic
              classification for sequences and a hierarchical summery report.

       Main tools  (for non-experts)

       createdb
              Convert protein sequence set in a FASTA file to MMseqs sequence DB format

       search
              Search with query sequence or profile DB (iteratively) through target sequence DB

       linsearch
              Search with query sequence  DB through target sequence DB

       map
              Fast ungapped mapping of query sequences to target sequences.

       cluster
              Compute clustering of a sequence DB (quadratic time)

       linclust
              Cluster sequences of >30% sequence identity *in linear time*

       createindex
              Precompute index table of sequence DB for faster searches

       createlinindex
              Precompute index for linsearch

       enrich
              Enrich a query set by searching iteratively through a profile sequence set.

       rbh
              Find reciprocal best hits between query and target

       clusterupdate
              Update clustering of old sequence DB to clustering of new sequence DB

       Utility tools for format conversions

       createtsv
              Create tab-separated flat file from prefilter DB, alignment DB, cluster DB, or taxa DB

       convertalis
              Convert alignment DB to BLAST-tab format or specified custom-column output format

       convertprofiledb
              Convert ffindex DB of HMM files to profile DB

       convert2fasta
              Convert sequence DB to FASTA format

       result2flat
              Create a FASTA-like flat file from prefilter DB, alignment DB, or cluster DB

       createseqfiledb
              Create DB of unaligned FASTA files (1 per cluster) from sequence DB and cluster DB

       Taxonomy tools

       taxonomy
              Compute taxonomy and lowest common ancestor for each sequence.

       createtaxdb
              Annotates a sequence database with NCBI taxonomy information

       addtaxonomy
              Add taxonomy information to result database.

       lca
              Compute the lowest common ancestor from a set of taxa.

       taxonomyreport
              Create Kraken-style taxonomy report.

       filtertaxdb
              Filter taxonomy database.

       Multi-hit search tools

       multihitdb
              Create sequence database and associated metadata for multi hit searches

       multihitsearch
              Search with a grouped set of sequences against another grouped set

       besthitperset
              For each set of sequences compute the best element and updates the p-value

       combinepvalperset
              For each set compute the combined p-value

       summerizeresultsbyset
              For each set compute summary statistics, such as spread-pvalue etc.

       resultsbyset
              For each set compute the combined p-value

       mergeresultsbyset
              Merge results from multiple orfs back to their respective contig

       Utility tools for clustering

       mergeclusters
              Merge multiple cluster DBs into single cluster DB

       Core tools (for advanced users)

       prefilter
              Search with query sequence / profile DB through target DB (k-mer matching + ungapped alignment)

       ungappedprefilter
              Search with query sequence / profile DB through target DB and compute optimal  ungapped  alignment
              score

       align
              Compute Smith-Waterman alignments for previous results (e.g. prefilter DB, cluster DB)

       alignall
              Compute all against all Smith-Waterman alignments for a results (e.g. prefilter DB, cluster DB)

       transitivealign
              Transfers alignments by transitivity via a center star alignment

       clust
              Cluster sequence DB from alignment DB (e.g. created by searching DB against itself)

       kmermatcher
              Finds exact $k$-mers matches between sequences

       kmersearch
              Search with query sequence through target DB.  (k-mer matching)

       kmerindexdb
              Finds exact $k$-mers matches between sequences and stores them as index

       clusthash
              Cluster sequences of same length and >90% sequence identity *in linear time*

       Utility tools to manipulate DBs

       compress
              Compresses a database.

       decompress
              Decompresses a database.

       apply
              Passes  each  input  database  entry to stdin of the specified program, executes it and writes its
              stdout to the output database.

       extractorfs
              Extract open reading frames from all six frames from nucleotide sequence DB

       extractframes
              Extract frames reading frames from a nucleotide sequence DB

       orftocontig
              Obtain location information of extracted orfs with respect to their contigs in alignment format

       reverseseq
              Reverse each sequence in a DB

       touchdb
              Memory map database

       translatenucs
              Translate nucleotide sequence DB into protein sequence DB

       translateaa
              Translate protein sequence into nucleotide sequence DB

       swapresults
              Reformat prefilter or alignment DB as if target DB had been searched through query DB

       swapdb
              Create a DB where the key is from the first column of the input result DB

       mergedbs
              Merge multiple DBs into a single DB, based on IDs (names) of entries

       splitdb
              Split a mmseqs DB into multiple DBs

       splitsequence
              Split sequences by length

       subtractdbs
              Generate a DB with entries of first DB not occurring in second DB

       filterdb
              Filter a DB by conditioning (regex, numerical, ...) on one of its whitespace-separated columns

       createsubdb
              Create a subset of a DB from a file of IDs of entries

       view
              Prints entries to console

       rmdb
              Removes the database

       mvdb
              Move the database

       result2profile
              Compute profile and consensus DB from a prefilter, alignment or cluster DB

       result2pp
              Merge the query profiles with target profiles according to search results and outputs an  enriched
              profile DB

       result2rbh
              Filter a merged result DB to retain only reciprocal best hits

       result2msa
              Generate MSAs for queries by locally aligning their matched targets in prefilter/alignment/cluster
              DB

       convertmsa
              Turns an MSA file into an MSA database.

       msa2profile
              Turns an MSA database into a MMseqs profile database.

       profile2pssm
              Converts a profile database into a human readable tab-separated PSSM file.

       profile2cs
              Converts a profile database into a column state sequence.

       result2stats
              Compute statistics for each entry in a sequence, prefilter, alignment or cluster DB

       proteinaln2nucl
              Map protein alignment to nucleotide alignment

       tsv2db
              Turns a TSV file into a MMseqs database

       result2repseq
              Get representative sequences for a result database

       Special-purpose utilities

       rescorediagonal
              Compute sequence identity for diagonal

       alignbykmer
              Predict sequence identity, score, alignment start and end by kmer alignment

       diffseqdbs
              Find IDs of sequences kept, added and removed between two versions of sequence DB

       concatdbs
              Concatenate two DBs, giving new IDs to entries from second input DB

       sortresult
              Sort a result database in the same order as prefilter or align would.

       summarizealis
              Summarize alignment results into a single show uniq. coverage, coverage and avg. sequence identity

       summarizeresult
              Extract annotations from alignment DB

       summarizetabs
              Extract annotations from HHblits BAST-tab-formatted results

       gff2db
              Turn a gff3 (generic feature format) file into a gff3 DB

       masksequence
              Soft mask sequences using tantan, low. complex regions in lower case the rest upper

       maskbygff
              X out sequence regions in a sequence DB by features in a gff3 file

       prefixid
              For each entry in a DB prepend the entry ID to the entry itself

       suffixid
              For each entry in a DB append the entry ID to the entry itself

       convertkb
              Convert UniProt knowledge base files into MMseqs2 database format for the selected column types

       summarizeheaders
              Return a new summarized header DB from the UniProt headers of a cluster DB

       extractalignedregion
              Extract aligned sequence region from query

       extractdomains
              Extract highest scoring alignment region for each sequence from BLAST-tab file

       convertca3m
              Converts a cA3M database into a MMseqs2 result database.

       expandaln
              Expands an alignment result based on another.

       countkmer
              Simple kmer counter, it prints the numeric, alphanumeric representation and kmercount

MMseqs2 (Many against Many sequence searching).     July 2019                                         MMSEQS2(1)