lunar (1) vmatch.1.gz

Provided by: vmatch_2.3.1+dfsg-8_amd64 bug

NAME

       vmatch - solve matching tasks

SYNOPSIS

       vmatch [options] indexname

DESCRIPTION

       The program vmatch allows one to solve a multitude of different matching tasks over an
       index constructed by mkvtree. Each matching task is solved by a combination of options
       specifying

       •   the input,

       •   the kind of matches sought,

       •   additional constraints on the matches,

       •   the direction of the matches (in case of DNA),

       •   the kind of postprocessing to be done,

       •   the output mode and output format.

       Additionally, if there is more than one algorithm to solve a certain matching task, vmatch
       allows one to specify which algorithm is to be used. vmatch supports computing the
       following kinds of matches:

        1. match all substrings of the database sequences against itself. The matches can be one
           of the following kinds:

            1. branching tandem repeats, i.e. repeats where the two instances of the repeat occur
               at consecutive positions

            2. maximal repeats, i.e. pairs of maximal substrings occurring more than once in the
               database sequences

            3. supermaximal repeats, i.e. pairs of maximal substrings occurring more than once in
               the database sequences, but not in any other maximal repeat

        2. match a set of query sequences (given in an extra query file) against the index. The
           matches can be one of the following kinds:

            1. maximal substring matches, i.e. the substrings of the query sequences matching
               substrings of the database sequences. All matches exceeding some minimum
               length,extended maximally to the left and to the right, are reported.

            2. maximal unique matches, i.e. the substrings of the query sequences matching
               substrings of the database sequences. A match is reported if it is unique in the
               database sequences as well as in the query sequences.

            3. complete matches, i.e. a query sequence must completely match (i.e. from the first
               character to the last character) a substring of the database sequences.

       For all these match kinds, the matches themselves can be direct or palindromic (i.e. on
       the reverse strand, in case of DNA sequences). If required, DNA sequences are translated
       into six reading frames and the matches are computed on the protein level, and reported on
       the DNA level. Besides exact matches, also degenerate matches with a maximal number of
       errors (insertions, deletions, and mismatches) are supported. Moreover, degenerate matches
       can be derived from exact matches by extending these using a greedy extension strategy.
       This does not apply to complete matches. For all different match kinds, the matches
       delivered by vmatch can be selected according to their E-value, their identity value, or
       their match score.

       In the default case, a match is reported as a formatted row of numbers, containing its
       lengths, the positions where it occurs, the E-value, the number of errors it contains, the
       match score, and the identity value. Optionally, an alignment of the sequences that are
       involved in the match can be reported. An important feature of vmatch is the capability of
       directly postprocessing the matches found in the following ways:

        1. inverse output, i.e. report substrings of the database sequences or the query
           sequences not covered by a match

        2. masking substrings of the database sequences or the query sequences covered by a match

        3. clustering of a set of database sequences according to the matches found between these
           sequences. The output of this option can be a representation of the clusters, or a set
           of sequences each being representative for a cluster.

        4. chaining of a set of matches, i.e. finding optimal subsets of all matches which do not
           cross

        5. clustering of matches according to the pairwise similarities on the sequences involved
           inthe match

        6. clustering of matches according to the positions where they occur

       Finally, to accommodate many more kinds of user defined post processing tasks, vmatch
       provides the concept of selection functions. These provide an open interface which allow
       arbitrary on-the-fly postprocessing of the matches without output and parsing of the
       matches. For more details on this concept, see the manual.

OPTIONS

       -q <file>
           Specify files containing queries to be matched.

       -dnavsprot <table>
           Perform six frame translation. Specify codon translation table by a number in the
           range [1,23] except for 7, 8, 17, 18, 19 and 20; (default is 1): 1 Standard 2
           Vertebrate Mitochondrial 3 Yeast Mitochondrial 4 Mold Mitochondrial; Protozoan
           Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma 5 Invertebrate
           Mitochondrial 6 Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear 9 Echinoderm
           Mitochondrial 10 Euplotid Nuclear 11 Bacterial 12 Alternative Yeast Nuclear 13
           Ascidian Mitochondrial 14 Flatworm Mitochondrial 15 Blepharisma Macronuclear 16
           Chlorophycean Mitochondrial 21 Trematode Mitochondrial 22 Scenedesmus Obliquus
           Mitochondrial 23 Thraustochytrium Mitochondrial

       -tandem
           Compute right branching tandem repeats.

       -supermax
           Compute supermaximal matches.

       -mum
           Compute maximal unique matches.

       -complete
           Specify that query sequences must match completely.

       -dbnomatch <arg>
           Mask all database substrings containing a match; optional argument:

           •   keepleft means to not mask the left instance of a match

           •   keepright means to not mask the right instance of a match

           •   keepleftifsamesequence means to not mask the left instance of the match if the
               right instance occurs in the same sequence

           •   keeprightifsamesequence means to not mask the right instance of the match if the
               left instance occurs in the same sequence

       -qnomatch
           Show all query substrings not containing a match.

       -dbmaskmatch <arg>
           Mask all database substrings containing a match; optional argument:

           •   keepleft means to not mask the left instance of a match

           •   keepright means to not mask the right instance of a match

           •   keepleftifsamesequence means to not mask the left instance of the match if the
               right instance occurs in the same sequence

           •   keeprightifsamesequence means to not mask the right instance of the match if the
               left instance occurs in the same sequence

       -qmaskmatch
           Mask all query substrings containing a match.

       -pp
           Generic postprocessing of matches.

       -online
           Run algorithms online without using the index.

       -qspeedup <level>
           Specify speedup level when matching queries (0: fast, 2: faster; default is 2), beware
           of time/space tradeoff.

       -d
           Compute direct matches (default).

       -p
           Compute palindromic (i.e. reverse complemented matches).

       -h <dist>
           Specify the allowed hamming distance > 0. In combination with option -complete one can
           switch on the percentage search mode or the best search mode for the percentage search
           mode use an argument of the form ip (where i is a positive integer). This means that
           up to i*100/m mismatches are allowed in a match of a query of length m. For the best
           search mode use an argument of the form ib where i is a positive integer. This means
           that in a first phase the minimum threshold q is determined such that there is still a
           match with q mismatches. q is in the range 0 to i*100/m.

       -e <dist>
           Specify the allowed edit distance > 0. In combination with option -complete one can
           switch on the percentage search mode or the best search mode for the percentage search
           mode use an argument of the form ip (where i is a positive integer). This means that
           up to i*100/m differences are allowed in a match of a query of length m. For the best
           search mode use an argument of the form ib where i is a positive integer. This means
           that in a first phase the minimum threshold q is determined such that there is still a
           match with q differences. q is in the range 0 to i*100/m.

       -allmax
           Show all maximal matches in the order of their computation.

       -seedlength <length>
           Specify the seed length.

       -hxdrop <value>
           Specify the xdrop value for hamming distance extension.

       -exdrop <value>
           Specify the xdrop value for edit distance extension.

       -i
           Give information about number of different matches.

       -dbcluster <args>
           Cluster the database sequences.

           •   first argument is percentage of shorter string to be included in match,

           •   second argument is percentage of larger string to be included in match,

           •   third optional argument is filenameprefix,

           •   fourth optional argument is (minclustersize, maxclustersize)

       -nonredundant
           Generate file with non-redundant set of sequences; only works together with option
           -dbcluster.

       -selfun <file>
           Specify shared object file containing selection function.

       -l <length>
           Specify that match must have the given length, optionally specify minimum and maximum
           size of gaps between repeat instances.

       -leastscore <score>
           Specify the minimum score of a match.

       -evalue <value>
           Specify the maximum E-value of a match.

       -identity <value>
           Specify minimum identity of match in range [1..100%].

       -sort <mode>
           Sort the matches, additional argument is mode: la: ascending order of length ld:
           descending order of length ia: ascending order of first position id: descending order
           of first position ja: ascending order of second position jd: descending order of
           second position ea: ascending order of Evalue ed: descending order of Evalue sa:
           ascending order of score sd: descending order of score ida: ascending order of
           identity idd: descending order of identity

       -best <n>
           Show the best matches (those with smallest E-values), default is best 50.

       -s
           Show the alignment of matching sequences.

       -showdesc
           Show sequence description of match.

       -f
           Show filename where match occurs.

       -absolute
           Show absolute positions.

       -nodist
           Do not show distance of match.

       -noevalue
           Do not show E-value of match.

       -noscore
           Do not show score of match.

       -noidentity
           Do not show identity of match.

       -v
           Verbose mode.

       -version
           Show the version of the Vmatch package.

       -help
           Show basic options.

       -help+
           Show all options.

SEE ALSO

       vmatchselect(1)

                                                                                        VMATCH(1)