Provided by: vsearch_1.1.3+dfsg-1_amd64 bug

NAME

       vsearch — chimera detection, clustering, dereplication, masking, pairwise alignment, searching, shuffling
       and sorting of amplicons from metagenomic projects.

SYNOPSIS

       Chimera detection:
              vsearch --uchime_denovo fastafile (--chimeras | --nonchimeras | --uchimealns | --uchimeout)
              outputfile [options]

              vsearch --uchime_ref fastafile (--chimeras | --nonchimeras | --uchimealns | --uchimeout)
              outputfile --db fastafile [options]

       Clustering:
              vsearch (--cluster_fast | --cluster_size | --cluster_smallmem) fastafile (--alnout | --blast6out |
              --centroids | --clusters | --msaout | --samout | --uc | --userout) outputfile --id real [options]

       Dereplication:
              vsearch --derep_fulllength fastafile (--output | --uc) outputfile [options]

       Masking:
              vsearch --maskfasta fastafile --output outputfile [options]

       Pairwise alignment:
              vsearch --allpairs_global fastafile (--alnout | --blast6out | --matched | --notmatched | --samout
              | --uc | --userout) outputfile (--acceptall | --id real) [options]

       Searching:
              vsearch --usearch_global fastafile --db fastafile (--alnout | --blast6out | --samout | --uc |
              --userout) outputfile --id real [options]

       Shuffling:
              vsearch --shuffle fastafile --output outputfile [options]

       Sorting:
              vsearch (--sortbylength | --sortbysize) fastafile --output outputfile [options]

DESCRIPTION

       Environmental  or clinical molecular diversity studies generate large volumes of amplicons (e.g. SSU-rRNA
       sequences) that need to be checked for chimeras, dereplicated, masked,  sorted,  searched,  clustered  or
       compared  to reference sequences. The aim of vsearch is to offer a all-in-one open source tool to perform
       these tasks, using optimized algorithm implementations  and  harvesting  the  full  potential  of  modern
       computers, thus providing fast and accurate data processing.

       Comparing  nucleotide sequences is at the core of vsearch. To speed up comparisons, vsearch implements an
       extremely fast implementation of the  Needleman-Wunsch  algorithm,  making  use  of  the  Streaming  SIMD
       Extensions  (SSE2)  of  modern x86-64 CPUs. If SSE2 instructions are not available, vsearch exits with an
       error message. For comparisons involving sequences longer than 5,000 nucleotides, vsearch uses  a  slower
       alignment method with smaller memory requirements.

   Input
       vsearch  input  is  a  fasta  file containing one or several nucleotide sequences. For each sequence, the
       sequence identifier is defined as the string comprised between the ">" symbol and the first space, or the
       end of the line, whichever comes first. Additionally, if the line starts  with  ">[;]size=integer;label",
       contains  ">label;size=integer;label"  or  ends  with  ">label;size=integer[;]",  vsearch will remove the
       pattern [;]size=integer[;] from the header and  interpret  integer  as  the  number  of  occurrences  (or
       abundance)  of  the  sequence  in the study. That abundance information is used or created during chimera
       detection, clustering, dereplication, sorting and searching.

       The nucleotide sequence is defined as a string of IUPAC symbols (ACGTURYSWKMDBHVN),  starting  after  the
       end  of the identifier line and ending before the next identifier line, or the file end. vsearch silently
       ignores ascii characters 9 to 13, and exits with an error message if ascii characters 0 to 8, 14  to  31,
       "."  or  "-"  are present. All other ascii or non-ascii characters are stripped and complained about in a
       non-blocking warning message.

       vsearch operations are case insensitive, except when soft masking is activated.  When  using  clustering,
       masking  or  searching commands, the case is important if soft masking is used. Soft masking is specified
       with the options "--dbmask soft" (for  searching)  or  "--qmask  soft"  (for  searching,  clustering  and
       masking).  When  using soft masking, lower case letters indicate masked symbols, while upper case letters
       indicate regular symbols. Masked symbols are never included in the unique k-mers used in searching.  When
       soft  masking  is  not  activated,  all letters are converted to upper case internally and used in result
       files.

       When comparing sequences during chimera detection, dereplication, searching and clustering, T and  U  are
       considered  identical,  regardless  of their case. If two symbols are not identical, their alignment will
       result in the negative mismatch score (default -4), except if one or both of the  symbols  are  ambiguous
       (RYSWKMDBHVN) in which case the score is zero. Alignment of two identical ambiguous symbols (e.g. R vs R)
       also receives a score of zero.

       vsearch  can be compiled to accepted compressed fasta files as input (gz and bzip2 formats). On the other
       hand, special files like pipes, named pipes, or sockets cannot be used as input. To  present  a  progress
       indicator,  vsearch  needs to seek to the end of filename to find its length. Consequently, filename must
       be a regular file, not a stream.

   Options
       vsearch recognizes a large number of command-line options. For easier  navigation,  options  are  grouped
       below   by  theme  (chimera  detection,  clustering,  dereplication,  masking,  shuffling,  sorting,  and
       searching). We start with general options that apply to all themes.

       General options:

              --fasta_width positive integer
                       Fasta files produced by vsearch are wrapped (sequences are written on  lines  of  integer
                       nucleotides, 80 by default). Set that value to 0 to eliminate the wrapping.

              --help   Display a short help and exit.

              --log filename
                       Write  messages  to the specified log file. Information written includes program version,
                       amount of memory available, number of cores and  command  line  options.  The  start  and
                       finish  times are also recorded as well as the elapsed time. The maximum amount of memory
                       consumed is included.  The different commands will usually also  write  some  information
                       about their results. Both fatal, warning and informational messages are written.

              --maxseqlength positive integer
                       All  vsearch  operations  will  discard sequences of length equal or greater than integer
                       (50,000 nucleotides by default).

              --minseqlength positive integer
                       All vsearch  operations  will  discard  sequences  of  length  smaller  than  integer  (1
                       nucleotide   by  default  for  sorting  or  shuffling,  32  nucleotides  for  clustering,
                       dereplication or searching).

              --notrunclabels
                       Do not truncate sequence labels at first space, use the full header in output files.

              --quiet  Suppress all output to stdout and stdout except for warnings and fatal error messages.

              --version
                       Output version information and exit.

       Chimera detection options:

              Chimera detection is based on a scoring function controlled by  five  options  (--dn,  --mindiffs,
              --mindiv,  --minh,  --xn).  Sequences are first sorted by decreasing abundance (if available), and
              compared on their plus strand only (case insensitive).

              In de novo mode, input fasta file should present abundance annotations (pattern [;]size=integer[;]
              in the fasta header). The input order influences the chimera detection, so we  recommend  to  sort
              sequences  by  decreasing  abundance (default of --derep_fulllength command). If your sequence set
              needs to be sorted, please see the --sortbysize command in the sorting section.

              --abskew real
                       When using --uchime_denovo, the  abundance  skew  is  used  to  distinguish  in  a  3-way
                       alignment which sequence is the chimera and which are the parents. The assumption is that
                       chimeras  appear  later  in the PCR amplification process and are therefore less abundant
                       than their parents. The default value is 2.0, which means that the parents should  be  at
                       least  2  times more abundant than their chimera. Any positive value greater than 1.0 can
                       be used.

              --alignwidth positive integer
                       Width of the 3-way alignments in --uchimealns output. The default value is 80. Set  to  0
                       to eliminate wrapping.

              --chimeras filename
                       Output  chimeric sequences to filename, in fasta format. Output order may vary when using
                       multiple threads.

              --db filename
                       When using --uchime_ref, detect chimeras using the  fasta-formatted  reference  sequences
                       contained  in filename. Reference sequences are assumed to be chimera-free. Chimeras will
                       not be detected if their parents (or sufficiently close relatives) are not present in the
                       database.

              --dn real
                       No vote pseudo-count (parameter n in the chimera  scoring  function)  (default  value  is
                       1.4).

              --mindiffs positive integer
                       Minimum number of differences per segment (default value is 3).

              --mindiv real
                       Minimum divergence from closest parent (default value is 0.8).

              --minh real
                       Minimum  score  (h).  Increasing this value tends to reduce the number of false positives
                       and to decrease sensitivity. Default value is 0.28, and values ranging from  0.0  to  1.0
                       included are accepted.

              --nonchimeras filename
                       Output  non-chimeric  sequences  to filename, in fasta format. Output order may vary when
                       using multiple threads.

              --self   When using --uchime_ref, ignore a reference sequence when its label matches the label  of
                       the query sequence (useful to estimate false-positive rate in reference sequences).

              --selfid When  using  --uchime_ref,  ignore  a  reference sequence when its nucleotide sequence is
                       strictly identical with the query sequence.

              --threads positive integer
                       Number of computation threads to use (1 to 256) with --uchime_ref.  The number of threads
                       should be lesser or equal to the number of available CPU cores. The default is to use all
                       available resources and to launch one thread per logical core.

              --uchime_denovo filename
                       Detect chimeras present in the  fasta-formatted  filename,  without  external  references
                       (i.e.  de  novo).  Automatically  sort  the sequences in filename by decreasing abundance
                       beforehand (see the sorting section for details). Multithreading is not supported.

              --uchime_ref filename
                       Detect chimeras present in the fasta-formatted filename by comparing them with  reference
                       sequences (option --db). Multithreading is supported.

              --uchimealns filename
                       Write  the 3-way global alignments (parentA, parentB, chimera) to filename using a human-
                       readable format. Use --alignwidth to modify alignment length. Output order may vary  when
                       using multiple threads.

              --uchimeout filename
                       Write  chimera  detection results to filename using the uchime tab-separated format of 18
                       fields (see the list below). Use --uchimeout5 to use a format compatible with usearch  v5
                       and earlier versions. Rows output order may vary when using multiple threads.

                              1.  score: higher score means a more likely chimeric alignment.

                              2.  Q: query sequence label.

                              3.  A: parent A sequence label.

                              4.  B: parent B sequence label.

                              5.  T:  top  parent  sequence  label (i.e. parent most similar to the query). That
                                  field is removed when using --uchimeout5.

                              6.  idQM: percentage of similarity of query (Q) and model  (M)  constructed  as  a
                                  part of parent A and a part of parent B.

                              7.  idQA: percentage of similarity of query (Q) and parent A.

                              8.  idQB: percentage of similarity of query (Q) and parent B.

                              9.  idAB: percentage of similarity of parent A and parent B.

                              10. idQT: percentage of similarity of query (Q) and top parent (T).

                              11. LY: yes votes in the left part of the model.

                              12. LN: no votes in the left part of the model.

                              13. LA: abstain votes in the left part of the model.

                              14. RY: yes votes in the right part of the model.

                              15. RN: no votes in the right part of the model.

                              16. RA: abstain votes in the right part of the model.

                              17. div: divergence, defined as (idQM - idQT).

                              18. YN: query is chimeric (Y), or not (N), or is a borderline case (?).

              --uchimeout5
                       When  using  --uchimeout, write chimera detection results using a tab-separated format of
                       17 fields (drop the 5th field of --uchimeout), compatible  with  usearch  version  5  and
                       earlier versions.

              --xn real
                       No vote weight (parameter beta in the scoring function) (default value is 8.0).

       Clustering options:

              vsearch  implements  a  single-pass,  greedy  star-clustering algorithm, similar to the algorithms
              implemented in usearch, DNAclust and sumaclust for example. Important parameters  are  the  global
              clustering threshold (--id) and the pairwise identity definition (--iddef).

              --centroids filename
                       Output  cluster  centroid  sequences  to  filename,  in fasta format. The centroid is the
                       sequence that seeded the cluster (i.e. the first sequence of the cluster).

              --cluster_fast filename
                       Clusterize the fasta sequences in filename, automatically perform a sorting by decreasing
                       sequence length beforehand.

              --cluster_size filename
                       Clusterize the fasta sequences in filename, automatically perform a sorting by decreasing
                       sequence abundance beforehand.

              --cluster_smallmem filename
                       Clusterize the fasta sequences in filename without automatically  modifying  their  order
                       beforehand.  Sequence  are  expected  to  be sorted by decreasing sequence length, unless
                       --usersort is used.

              --clusters string
                       Output each cluster to a separate fasta file using the prefix string and a ticker (0,  1,
                       2, etc.) to construct the path and filenames.

              --consout filename
                       Output cluster consensus sequences to filename. For each cluster, a multiple alignment is
                       computed,  and  a  consensus  sequence  is  constructed  by  taking  the  majority symbol
                       (nucleotide or gap) from each column of the alignment. Columns containing a  majority  of
                       gaps are skipped, except for terminal gaps.

              --id real
                       Do  not add the target to the cluster if the pairwise identity with the centroid is lower
                       than real (value ranging from 0.0 to 1.0 included). The pairwise identity is  defined  as
                       the  number  of  (matching columns) / (alignment length - terminal gaps). That definition
                       can be modified by --iddef.

              --iddef 0|1|2|3|4
                       Change the pairwise identity definition used in --id. Values accepted are:

                              0.  CD-HIT definition: (matching columns) / (shortest sequence length).

                              1.  edit distance: (matching columns) / (alignment length).

                              2.  edit distance excluding terminal gaps (same as --id).

                              3.  Marine Biological Lab definition  counting  each  extended  gap  (internal  or
                                  terminal) as a single difference: 1.0 - [(mismatches + gaps)/(longest sequence
                                  length)]

                              4.  BLAST  definition,  equivalent  to  --iddef  2 in a context of global pairwise
                                  alignment.

              --msaout filename
                       Output a multiple sequence alignment  and  a  consensus  sequence  for  each  cluster  to
                       filename,  in  fasta format. The consensus sequence is constructed by taking the majority
                       symbol (nucleotide or gap) from each  column  of  the  alignment.  Columns  containing  a
                       majority of gaps are skipped, except for terminal gaps.

              --qmask none|dust|soft
                       Mask  simple  repeats  and low-complexity regions in sequences using the dust or the soft
                       algorithms, or do not mask (none). Warning, when using soft masking,  clustering  becomes
                       case sensitive. The default is to mask using dust.

              --sizein Take  into  account the abundance annotations present in the input fasta file (search for
                       the pattern "[>;]size=integer[;]" in sequence headers).

              --sizeout
                       Add abundance annotations to the output fasta files (add the pattern ";size=integer;"  to
                       sequence headers). If --sizein is specified, abundance annotations are reported to output
                       files,  and  each  cluster  centroid  receives a new abundance value corresponding to the
                       total abundance of the  amplicons  included  in  the  cluster  (--centroids  option).  If
                       --sizein is not specified, input abundances are set to 1 for amplicons, and to the number
                       of amplicons per cluster for centroids.

              --strand plus|both
                       When  comparing  sequences with the cluster seed, check the plus strand only (default) or
                       check both strands.

              --threads positive integer
                       Number of computation threads to use (1 to 256). The number of threads should  be  lesser
                       or  equal  to  the  number  of  available  CPU cores. The default is to use all available
                       resources and to launch one thread per logical core.

              --uc filename
                       Output clustering results in filename using a uclust-like format. For  a  description  of
                       the format, see <http://www.drive5.com/usearch/manual/ucout.html>.

              --usersort
                       When  using  --cluster_smallmem,  allow  any  sequence input order, not just a decreasing
                       length ordering.

              Most searching options also apply to clustering:
                       --alnout, --blast6out, --fastapairs, --matched, --notmatched,  --maxaccept,  --maxreject,
                       --samout,  --userout,  --userfields,  score  filtering,  gap penalties, masking. (see the
                       Searching section).

       Dereplication options:

              --derep_fulllength filename
                       Merge strictly identical sequences contained in filename. Identical sequences are defined
                       as having the same length and the same string of nucleotides (case insensitive, T  and  U
                       are considered the same).

              --maxuniquesize positive integer
                       Discard sequences with an abundance value greater than integer.

              --minuniquesize positive integer
                       Discard sequences with an abundance value smaller than integer.

              --output filename
                       Write  the  dereplicated  sequences to filename, in fasta format and sorted by decreasing
                       abundance. Identical sequences receive the header of the first sequence of  their  group.
                       If  --sizeout  is  used,  the  number of occurrences (i.e. abundance) of each sequence is
                       indicated at the end of their fasta header using the pattern ";size=integer;".

              --sizein Take into account the abundance annotations present in the input fasta file  (search  for
                       the pattern "[>;]size=integer[;]" in sequence headers).

              --sizeout
                       Add  abundance  annotations to the output fasta file (add the pattern ";size=integer;" to
                       sequence headers).  If --sizein  is  specified,  each  unique  sequence  receives  a  new
                       abundance  value  corresponding  to  its  total  abundance  (sum of the abundances of its
                       occurrences). If --sizein is not specified, input abundances  are  set  to  1,  and  each
                       unique sequence receives a new abundance value corresponding to its number of occurrences
                       in the input file.

              --strand plus|both
                       When  searching for strictly identical sequences, check the plus strand only (default) or
                       check both strands.

              --topn positive integer
                       Output only the top integer sequences (i.e. the most abundant).

              --uc filename
                       Output dereplication results in filename using a uclust-like format. For a description of
                       the format, see  <http://www.drive5.com/usearch/manual/ucout.html>.  In  the  context  of
                       dereplication, the option --uc_allhits has no effect on the --uc output.

       Masking options:

              An  input  sequence  can be composed of lower- or uppercase nucleotides. Lowercase nucleotides are
              silently set to uppercase before masking, unless the --qmask soft option is  used.  Here  are  the
              results  of  combined masking options --qmask (or --dbmask for database sequences) and --hardmask,
              assuming each input sequences contains both lower and uppercase nucleotides:

              qmask   hardmask                       action
              ───────────────────────────────────────────────────────────────────
              none    off        no masking, all symbols uppercased
              none    on         no masking, all symbols uppercased
              dust    off        masked symbols lowercased, others uppercased
              dust    on         masked symbols changed to Ns, others uppercased
              soft    off        lowercase symbols masked, no case changes
              soft    on         lowercase symbols masked and changed to Ns

              --hardmask
                       Mask low-complexity regions by replacing them with Ns instead of setting  them  to  lower
                       case.

              --maskfasta filename
                       Mask  simple  repeats  and low-complexity regions in sequences contained in filename. The
                       default is to mask using dust (use --qmask to modify that behavior).

              --output filename
                       Write the masked sequences to filename, in fasta format.

              --qmask none|dust|soft
                       Mask simple repeats and low-complexity regions in sequences using the dust  or  the  soft
                       algorithms, or do not mask (none). The default is to mask using dust.

              --threads positive integer
                       Number  of  computation threads to use (1 to 256). The number of threads should be lesser
                       or equal to the number of available CPU cores.  The  default  is  to  use  all  available
                       resources and to launch one thread per logical core.

       Pairwise alignment options:

              The  results  of the n * (n - 1) / 2 pairwise alignments are written to the result files specified
              with --alnout, --blast6out, --fastapairs --matched, --notmatched, --samout, --uc or --userout (see
              Searching section below). Specify either the --acceptall option to output all pairwise alignments,
              or specify an identity level with --id  to  discard  weak  alignments.  Most  other  accept/reject
              options (see Searching options below) may also be used. Sequences are aligned on their plus strand
              only.

              --acceptall
                       Write  the  results  of  all  alignments to output files. This option overrides all other
                       accept/reject options (including --id).

              --allpairs_global filename
                       Perform optimal global pairwise alignments of all vs. all fasta  sequences  contained  in
                       filename. This command is multi-threaded.

              --id real
                       Reject the sequence match if the pairwise identity is lower than real (value ranging from
                       0.0 to 1.0 included).

              --threads positive integer
                       Number  of  computation threads to use (1 to 256). The number of threads should be lesser
                       or equal to the number of available CPU cores.  The  default  is  to  use  all  available
                       resources and to launch one thread per logical core.

       Searching options:

              --alnout filename
                       Write  pairwise global alignments to filename using a human-readable format. Use --rowlen
                       to modify alignment length. Output order may vary when using multiple threads.

              --blast6out filename
                       Write search results to filename using a blast-like tab-separated format of twelve fields
                       (listed below), with  one  line  per  query-target  matching  (or  lack  of  matching  if
                       --output_no_hits  is  used). Output order may vary when using multiple threads. A similar
                       output    can    be    obtain    with     --userout     filename     and     --userfields
                       query+target+id+alnlen+mism+opens+qlo+qhi+tlo+thi+evalue+bits.    A   complete  list  and
                       description is available in the section "Userfields" of this manual.

                              1.  query: query label.

                              2.  target: target (database sequence) label. The field is set to "*" if there  is
                                  no alignment.

                              3.  id:  percentage  of  identity  (real  value  ranging  from  0.0 to 100.0). The
                                  percentage identity is defined as 100 * (matching columns) / (alignment length
                                  - terminal gaps). See fields id0 to id4 for other definitions.

                              4.  alnlen: length of the query-target alignment (number of columns). The field is
                                  set to 0 if there is no alignment.

                              5.  mism: number of mismatches in the alignment (zero or positive integer value).

                              6.  opens: number of columns containing a gap opening (zero  or  positive  integer
                                  value).

                              7.  qlo:  first nucleotide of the query aligned with the target. Always equal to 1
                                  if there is an alignment, 0 otherwise.

                              8.  qhi: last nucleotide of the query aligned with the target. Always equal to the
                                  length of the pairwise alignment. The field  is  set  to  0  if  there  is  no
                                  alignment.

                              9.  tlo:  irst  nucleotide of the target aligned with the query. Always equal to 1
                                  if there is an alignment, 0 otherwise.

                              10. thi: last nucleotide of the target aligned with the query. Always equal to the
                                  length of the pairwise alignment. The field  is  set  to  0  if  there  is  no
                                  alignment.

                              11. evalue:  expectancy-value (not computed for nucleotide alignments). Always set
                                  to -1.

                              12. bits: bit score (not computed for nucleotide alignments). Always set to 0.

              --db filename
                       Compare query sequences (specified with --usearch_global) to the  fasta-formatted  target
                       sequences contained in filename, using global pairwise alignment.

              --dbmask none|dust|soft
                       Mask  simple  repeats  and  low-complexity regions in target database sequences using the
                       dust or the soft algorithms, or do not mask (none).  Warning,  when  using  soft  masking
                       search commands become case sensitive. The default is to mask using dust.

              --dbmatched filename
                       Write  database  target  sequences  matching  at least one query sequence to filename, in
                       fasta format. If the option --sizeout is used, the number of queries  that  matched  each
                       target sequence is indicated using the pattern ";size=integer;".

              --dbnotmatched filename
                       Write  database  target  sequences  not  matching  query  sequences to filename, in fasta
                       format.

              --fastapairs filename
                       Write pairwise alignments of query and target sequences to filename, in fasta format.

              --fulldp Dummy option for compatibility with usearch. To maximize search sensitivity, vsearch uses
                       a 8-way 16-bit SIMD vectorized full  dynamic  programming  algorithm  (Needleman-Wunsch),
                       whether or not --fulldp is specified.

              --gapext string
                       Set  penalties  for  a  gap  extension.  See  --gapopen for a complete description of the
                       penalty declaration system. The default is to initialize the six gap extending  penalties
                       using  a  penalty  of  2  for  extending  internal  gaps and a penalty of 1 for extending
                       terminal gaps, in both query and target sequences (i.e. 2I/1E).

              --gapopen string
                       Set penalties for a gap opening. A gap opening can occur in six  different  contexts:  in
                       the  query  (Q)  or in the target (T) sequence, at the left (L) or right (R) extremity of
                       the sequence, or inside the sequence (I). Sequence symbols (Q and T) can be combined with
                       location symbols (L, I, and R),  and  numerical  values  to  declare  penalties  for  all
                       possible  contexts:  aQL/bQI/cQR/dTL/eTI/fTR, where abcdef are zero or positive integers,
                       and "/" is used as a separator.
                       To simplify declarations, the location symbols (L, I, and R) can be combined, the  symbol
                       (E)  can be used to treat both extremities (L and R) equally, and the symbols Q and T can
                       be omitted to treat query and target sequences equally. For instance, the default  is  to
                       declare a penalty of 20 for opening internal gaps and a penalty of 2 for opening terminal
                       gaps  (left  or  right),  in  both  query  and  target sequences (i.e. 20I/2E). If only a
                       numerical value is given, without any sequence  or  location  symbol,  then  the  penalty
                       applies  to  all  gap  openings.  To forbid gap-opening, an infinite penalty value can be
                       declared with the symbol "*". To use vsearch as a semi-global aligner, a null-penalty can
                       be applied to the left (L) or right (R) gaps.
                       vsearch always initializes the six gap opening penalties  using  the  default  parameters
                       (20I/2E).  The  user  is then free to declare only the values he/she wants to modify. The
                       string is scanned from left to right,  accepted  symbols  are  (0123456789/LIREQT*),  and
                       later values override previous values.
                       Please  note  that  vsearch,  in  contrast to usearch, only allows integer gap penalties.
                       Because the lowest gap penalties are 0.5 by default in usearch, all  default  scores  and
                       gap  penalties  in  vsearch  have  been  doubled  to maintain equivalent penalties and to
                       produce identical alignments.

              --hardmask
                       Mask low-complexity regions by replacing them with Ns instead of setting  them  to  lower
                       case. For more information, please see the Masking section.

              --id real
                       Reject the sequence match if the pairwise identity is lower than real (value ranging from
                       0.0  to  1.0 included). The search process sorts target sequences by decreasing number of
                       k-mers they have in common with the query sequence, using that information as a proxy for
                       sequence similarity. That efficient pre-filtering will also prevent  pairwise  alignments
                       with  weakly matching targets, as there needs to be at least 6 shared k-mers to start the
                       pairwise alignment, and at least one out of every 16 k-mers from the query needs to match
                       the target. Consequently, using values lower than --id 0.5 is not likely to capture  more
                       weakly  matching  targets.  The  pairwise identity is by default defined as the number of
                       (matching columns) / (alignment length - terminal gaps). That definition can be  modified
                       by --iddef.

              --iddef 0|1|2|3|4
                       Change the pairwise identity definition used in --id. Values accepted are:

                              0.  CD-HIT definition: (matching columns) / (shortest sequence length).

                              1.  edit distance: (matching columns) / (alignment length).

                              2.  edit distance excluding terminal gaps (same as --id).

                              3.  Marine  Biological  Lab  definition  counting  each  extended gap (internal or
                                  terminal) as a single difference: 1.0 - [(mismatches + gaps)/(longest sequence
                                  length)]

                              4.  BLAST definition, equivalent to --iddef 2 in  a  context  of  global  pairwise
                                  alignment.

                       The  option  --userfields  accepts the fields id0 to id4, in addition to the field id, to
                       report the pairwise identity values corresponding to the different definitions.

              --idprefix positive integer
                       Reject the sequence match if the first integer nucleotides of the target do not match the
                       query.

              --idsuffix positive integer
                       Reject the sequence match if the last integer nucleotides of the target do not match  the
                       query.

              --leftjust
                       Reject the sequence match if the pairwise alignment begins with gaps.

              --match integer
                       Score  assigned  to  a  match (i.e. identical nucleotides) in the pairwise alignment. The
                       default value is 2.

              --matched filename
                       Write query sequences matching database target sequences to filename, in fasta format.

              --maxaccepts positive integer
                       Maximum number of hits to accept before stopping the search. The default value is 1. This
                       option works in pair with --maxrejects. The search  process  sorts  target  sequences  by
                       decreasing  number  of  k-mers  they  have  in common with the query sequence, using that
                       information as a proxy for sequence similarity. After pairwise alignments, if  the  first
                       target  sequence  passes  the  acceptation  criteria,  it is accepted as best hit and the
                       search process stops for that query. If --maxaccepts is set to a higher value, more  hits
                       are  accepted.  If --maxaccepts and --maxrejects are both set to 0, the complete database
                       is searched.

              --maxdiffs positive integer
                       Reject the sequence match if the  alignment  contains  at  least  integer  substitutions,
                       insertions or deletions.

              --maxgaps positive integer
                       Reject  the  sequence  match  if  the  alignment  contains at least integer insertions or
                       deletions.

              --maxhits positive integer
                       Maximum number of hits to show  once  the  search  is  terminated  (hits  are  sorted  by
                       decreasing identity). Unlimited by default. That option applies to --alnout, --blast6out,
                       --fastapairs, --samout, --uc, or --userout output files.

              --maxid real
                       Reject  the  sequence  match  if  the percentage of identity between the two sequences is
                       greater than real.

              --maxqsize positive integer
                       Reject query sequences with an abundance greater than integer.

              --maxqt real
                       Reject if the query/target sequence length ratio is greater than real.

              --maxrejects positive integer
                       Maximum number of non-matching target sequences to consider before stopping  the  search.
                       The  default value is 32. This option works in pair with --maxaccepts. The search process
                       sorts target sequences by decreasing number of k-mers they have in common with the  query
                       sequence,  using  that  information  as  a  proxy for sequence similarity. After pairwise
                       alignments, if none of the first  32  examined  target  sequences  pass  the  acceptation
                       criteria,  the  search process stops for that query (no hit). If --maxrejects is set to a
                       higher value, more target sequences are considered. If --maxaccepts and --maxrejects  are
                       both set to 0, the complete database is searched.

              --maxsizeratio real
                       Reject if the query/target abundance ratio is greater than real.

              --maxsl real
                       Reject if the shorter/longer sequence length ratio is greater than real.

              --maxsubs positive integer
                       Reject  the  sequence  match  if  the  pairwise  alignment  contains  more  than  integer
                       substitutions.

              --mid real
                       Reject the sequence match if the percentage of identity is lower than real (ignoring  all
                       gaps, internal and terminal).

              --mincols positive integer
                       Reject the sequence match if the alignment length is shorter than integer.

              --minqt real
                       Reject if the query/target sequence length ratio is lower than real.

              --minsizeratio real
                       Reject if the query/target abundance ratio is lower than real.

              --minsl real
                       Reject if the shorter/longer sequence length ratio is lower than real.

              --mintsize positive integer
                       Reject target sequences with an abundance lower than integer.

              --mismatch integer
                       Score  assigned to a mismatch (i.e. different nucleotides) in the pairwise alignment. The
                       default value is -4.

              --notmatched filename
                       Write query sequences not matching  database  target  sequences  to  filename,  in  fasta
                       format.

              --output_no_hits
                       Write  both  matching  and  non-matching  queries  to  --alnout, --blast6out, --samout or
                       --userout output files (--uc and --uc_allhits output files  always  feature  non-matching
                       queries). Non-matching queries are labelled "No hits" in --alnout files.

              --qmask none|dust|soft
                       Mask  simple  repeats and low-complexity regions in query sequences using the dust or the
                       soft algorithms, or do not mask (none). Warning, when using soft masking search  commands
                       become case sensitive. The default is to mask using dust.

              --query_cov real
                       Reject  if  the  fraction of the query aligned to the target sequence is lower than real.
                       The query coverage is computed  as  (matches  +  mismatches)  /  query  sequence  length.
                       Internal or terminal gaps are not taken into account.

              --rightjust
                       Reject the sequence match if the pairwise alignment ends with gaps.

              --rowlen positive integer
                       Width  of  alignment  lines  in  --alnout  output.  The  default value is 64. Set to 0 to
                       eliminate wrapping.

              --samout filename
                       Write alignment results to filename in the SAM format. For a description of  the  format,
                       see  <https://github.com/samtools/hts-specs>.  Output  order may vary when using multiple
                       threads.

              --self   Reject the sequence match if the query and target labels are identical.

              --selfid Reject the sequence match if the query and target sequences are strictly identical.

              --sizeout
                       Add abundance annotations to the output of the  option  --dbmatched  (using  the  pattern
                       ";size=integer;"), to report the number of queries that matched each target.

              --strand plus|both
                       When  searching for similar sequences, check the plus strand only (default) or check both
                       strands.

              --target_cov real
                       Reject the sequence match if the fraction of the target sequence  aligned  to  the  query
                       sequence  is lower than real. The target coverage is computed as (matches + mismatches) /
                       target sequence length.  Internal or terminal gaps are not taken into account.

              --threads positive integer
                       Number of computation threads to use (1 to 256). The number of threads should  be  lesser
                       or  equal  to  the  number  of  available  CPU cores. The default is to use all available
                       resources and to launch one thread per logical core.

              --top_hits_only
                       Output only the hits with the highest percentage of identity with the query.

              --uc filename
                       Output searching results in filename using a uclust-like format. For a description of the
                       format, see <http://www.drive5.com/usearch/manual/ucout.html>. Output order may vary when
                       using multiple threads.

              --uc_allhits
                       When using the --uc option, show all hits, not just the top hit for each query.

              --usearch_global filename
                       Compare target sequences (--db) to  the  fasta-formatted  query  sequences  contained  in
                       filename, using global pairwise alignment.

              --userfields string
                       When  using --userout, select and order the fields written to the output file. Fields are
                       separated by "+" (e.g. query+target+id). See the "Userfields" section for a complete list
                       of fields.

              --userout filename
                       Write user-defined tab-separated output to filename. Select the fields  with  the  option
                       --userfields. Output order may vary when using multiple threads. If --userfields is empty
                       or not present, filename is empty.

              --weak_id real
                       Show hits with percentage of identity of at least real, without terminating the search. A
                       normal  search  stops  as  soon  as  enough  hits  are found (as defined by --maxaccepts,
                       --maxrejects, and --id). As --weak_id  reports  weak  hits  that  are  not  deduced  from
                       --maxaccepts,  high --id values can be used, hence preserving both speed and sensitivity.
                       Logically, real must be smaller than the value indicated by --id.

              --wordlength positive integer
                       Length of words (i.e. k-mers) for database indexing. The range of  possible  values  goes
                       from  3  to  15, but values near 8 are generally recommended. Longer words may reduce the
                       sensitivity for weak similarities, but can increase accuracy. On the other hand,  shorter
                       words  may increase sensitivity, but can reduce accuracy. Computation time will generally
                       increase with shorter words and decrease with longer words.  Memory  requirements  for  a
                       part  of  the  index  increase  with a factor of 4 each time word length increases by one
                       nucleotide, and this generally becomes significant for  long  words  (12  or  more).  The
                       default value is 8.

       Shuffling options:

              --output filename
                       Write the shuffled sequences to filename, in fasta format.

              --seed positive integer
                       When  shuffling sequence order, use integer as seed. A given seed will always produce the
                       same output order (useful for replicability). Set  to  0  to  use  a  pseudo-random  seed
                       (default behavior).

              --shuffle filename
                       Pseudo-randomly shuffle the order of sequences contained in filename.

              --topn positive integer
                       Output only the top integer sequences.

       Sorting options:
              Fasta   entries   are   sorted   by   decreasing   abundance  (--sortbysize)  or  sequence  length
              (--sortbylength). To obtain a stable sorting order, ties are sorted by  decreasing  abundance  and
              label  increasing  alpha-numerical  order  (--sortbylength),  or  just  by label increasing alpha-
              numerical order (--sortbysize). Label sorting assumes that all sequences have unique  labels.  The
              same  applies  to  the  automatic  sorting  performed  during  chimera checking (--uchime_denovo),
              dereplication (--derep_fulllength), and clustering (--cluster_fast and --cluster_size).

              --maxsize positive integer
                       When using --sortbysize, discard sequences with an abundance value greater than integer.

              --minsize positive integer
                       When using --sortbysize, discard sequences with an abundance value smaller than integer.

              --output filename
                       Write the sorted sequences to filename, in fasta format.

              --relabel string
                       Relabel sequence using the prefix string and a ticker (1, 2, 3, etc.)  to  construct  the
                       new headers. Use --sizeout to conserve the abundance annotations.

              --sizeout
                       When  using  --relabel,  report abundance annotations to the output fasta file (using the
                       pattern ";size=integer;").

              --sortbylength filename
                       Sort by decreasing length the sequences contained in filename. See  the  general  options
                       --minseqlength and --maxseqlength to eliminate short and long sequences.

              --sortbysize filename
                       Sort   by   decreasing  abundance  the  sequences  contained  in  filename  (the  pattern
                       "[>;]size=integer[;]" has to be present). See the  options  --minsize  and  --maxsize  to
                       eliminate rare and dominant sequences.

              --topn positive integer
                       Output only the top integer sequences (i.e. the longest or the most abundant).

       Userfields (fields accepted by the --userfields option):

              aln      Print  a  string  of M (match), D (delete, i.e. a gap in the query) and I (insert, i.e. a
                       gap in the target) representing the pairwise  alignment.  Empty  field  if  there  is  no
                       alignment.

              alnlen   Print the length of the query-target alignment (number of columns). The field is set to 0
                       if there is no alignment.

              bits     Bit score (not computed for nucleotide alignments). Always set to 0.

              caln     Compact  representation  of  the  pairwise  alignment  using  the  CIGAR  format (Compact
                       Idiosyncratic Gapped Alignment Report): M (match), D (deletion) and I (insertion).  Empty
                       field if there is no alignment.

              evalue   E-value (not computed for nucleotide alignments). Always set to -1.

              exts     Number of columns containing a gap extension (zero or positive integer value).

              gaps     Number of columns containing a gap (zero or positive integer value).

              id       Percentage of identity (real value ranging from 0.0 to 100.0). The percentage identity is
                       defined as 100 * (matching columns) / (alignment length - terminal gaps).

              id0      CD-HIT  definition  of  the percentage of identity (real value ranging from 0.0 to 100.0)
                       using the length of the shortest sequence in the pairwise alignment as denominator: 100 *
                       (matching columns) / (shortest sequence length).

              id1      The percentage of identity (real value ranging from 0.0 to 100.0) is defined as the  edit
                       distance: 100 * (matching columns) / (alignment length).

              id2      The  percentage of identity (real value ranging from 0.0 to 100.0) is defined as the edit
                       distance, excluding terminal gaps. The field id2 is an alias for the field id.

              id3      Marine Biological Lab definition of the percentage of identity (real value  ranging  from
                       0.0  to  100.0), counting each extended gap (internal or terminal) as a single difference
                       and using the length of the longest sequence in the pairwise  alignment  as  denominator:
                       100 * (1.0 - [(mismatches + gaps) / (longest sequence length)]).

              id4      BLAST  definition  of  the percentage of identity (real value ranging from 0.0 to 100.0),
                       equivalent to --iddef 2 in a context of global pairwise alignment.

              ids      Number of matches in the alignment (zero or positive integer value).

              mism     Number of mismatches in the alignment (zero or positive integer value).

              opens    Number of columns containing a gap opening (zero or positive integer value).

              pairs    Number of columns containing only nucleotides. That value corresponds to  the  length  of
                       the alignment minus the gap-containing columns (zero or positive integer value).

              pctgaps  Number of columns containing gaps expressed as a percentage of the alignment length (real
                       value ranging from 0.0 to 100.0).

              pctpv    Percentage  of  positive  columns.  When  working  with  nucleotide  sequences,  this  is
                       equivalent to the percentage of matches (real value ranging from 0.0 to 100.0).

              pv       Number of positive columns. When working with nucleotide sequences, this is equivalent to
                       the number of matches (zero or positive integer value).

              qcov     Fraction of the query sequence that is aligned  with  the  target  sequence  (real  value
                       ranging  from  0.0  to  100.0).  The  query  coverage  is  computed as 100.0 * (matches +
                       mismatches) / query sequence length.  Internal  or  terminal  gaps  are  not  taken  into
                       account. The field is set to 0.0 if there is no alignment.

              qframe   Query  frame (-3 to +3). That field only concerns coding sequences and is not computed by
                       vsearch. Always set to +0.

              qhi      Last nucleotide of the query aligned with the target. Always equal to the length  of  the
                       pairwise alignment. The field is set to 0 if there is no alignment.

              qihi     Last nucleotide of the query aligned with the target (ignoring terminal gaps). Nucleotide
                       numbering starts from 1. The field is set to 0 if there is no alignment.

              qilo     First nucleotide of the query aligned with the target (ignoring initial gaps). Nucleotide
                       numbering starts from 1. The field is set to 0 if there is no alignment.

              ql       Query  sequence  length  (positive  integer  value). The field is set to 0 if there is no
                       alignment.

              qlo      First nucleotide of the query aligned with the target. Always equal to 1 if there  is  an
                       alignment, 0 otherwise.

              qrow     Print  the sequence of the query segment as seen in the pairwise alignment (i.e. with gap
                       insertions if need be). Empty field if there is no alignment.

              qs       Query segment length. Always equal to query sequence length.

              qstrand  Query strand orientation (+ or - for nucleotide sequences). Empty field if  there  is  no
                       alignment.

              query    Query label.

              raw      Raw  alignment  score (negative, null or positive integer value). The score is the sum of
                       match rewards minus mismatch penalties, gap openings and gap extensions. The field is set
                       to 0 if there is no alignment.

              target   Target label. The field is set to "*" if there is no alignment.

              tcov     Fraction of the target sequence that is aligned  with  the  query  sequence  (real  value
                       ranging  from  0.0  to  100.0).  The  target  coverage  is computed as 100.0 * (matches +
                       mismatches) / target sequence length.  Internal or  terminal  gaps  are  not  taken  into
                       account.  The field is set to 0.0 if there is no alignment.

              tframe   Target frame (-3 to +3). That field only concerns coding sequences and is not computed by
                       vsearch. Always set to +0.

              thi      Last  nucleotide  of the target aligned with the query. Always equal to the length of the
                       pairwise alignment. The field is set to 0 if there is no alignment.

              tihi     Last nucleotide of the target aligned with the query (ignoring terminal gaps). Nucleotide
                       numbering starts from 1. The field is set to 0 if there is no alignment.

              tilo     First nucleotide of the target aligned with the query (ignoring initial gaps). Nucleotide
                       numbering starts from 1. The field is set to 0 if there is no alignment.

              tl       Target sequence length (positive integer value). The field is set to 0  if  there  is  no
                       alignment.

              tlo      First  nucleotide  of the target aligned with the query. Always equal to 1 if there is an
                       alignment, 0 otherwise.

              trow     Print the sequence of the target segment as seen in the pairwise alignment (i.e. with gap
                       insertions if need be). Empty field if there is no alignment.

              ts       Target segment length. Always equal to target sequence length. The field is set to  0  if
                       there is no alignment.

              tstrand  Target  strand  orientation  (+  or  -  for  nucleotide sequences). Always set to "+", so
                       reverse strand matches have tstrand "+" and qstrand "-".  Empty  field  if  there  is  no
                       alignment.

DELIBERATE CHANGES

       If  you are a usearch user, our objective is to make you feel at home. That's why vsearch was designed to
       behave like usearch, to some extent. Like any complex software, usearch  is  not  free  from  quirks  and
       inconsistencies.  We  decided  not  to reproduce some of them, and for complete transparency, to document
       here the deliberate changes we made.

       During a search with usearch, when using the options --blast6out and --output_no_hits, for  queries  with
       no match the number of fields reported is 13, where it should be 12. This is corrected in vsearch.

       The field raw of the --userfields option is not informative in usearch. This is corrected in vsearch.

       The  fields  qlo,  qhi,  tlo,  thi  now  have  counterparts  (qilo, qihi, tilo, tihi) reporting alignment
       coordinates ignoring terminal gaps.

       In usearch, when using the option --output_no_hits,  queries  that  receive  no  match  are  reported  in
       blast6out file, but not in the alignment output file. This is corrected in vsearch.

       vsearch  introduces  a  new  --cluster_size  command  that sorts sequences by decreasing abundance before
       clustering.

       vsearch reintroduces --iddef alternative pairwise identity definitions that were removed from usearch.

       vsearch extends the --topn option to sorting commands.

       vsearch  extends   the   --sizein   option   to   dereplication   (--derep_fulllength)   and   clustering
       (--cluster_fast).

       vsearch treats T and U as identical nucleotides during dereplication.

       vsearch  sorting  is stabilized by using sequence abundances or sequences labels as secondary or tertiary
       keys.

NOVELTIES

       vsearch introduces new options not present in usearch 7. They are described in the "Options"  section  of
       this manual. Here is a short list:

       - alignwidth (chimera checking)

       - cluster_size (clustering)

       - fasta_width (general option)

       - iddef (clustering, pairwise alignment, searching)

       - maxuniquesize (dereplication)

       - shuffle (shuffling)

EXAMPLES

       Align all sequences in a database with each other and output all pairwise alignments:

              vsearch --allpairs_global database.fas --alnout results.aln --acceptall

       Check  for  the  presence  of chimeras (de novo); parents should be at least 1.5 times more abundant than
       chimeras. Output non-chimeric sequences in fasta format (no wrapping):

              vsearch --uchime_denovo queries.fas --nonchimeras results.fas --fasta_width 0 --abskew 1.5

       Cluster with a 97% similarity threshold, collect cluster centroids, and write cluster descriptions  using
       a uclust-like format:

              vsearch --cluster_fast queries.fas --id 0.97 --centroids centroids.fas --uc clusters.uc

       Dereplicate  the  sequences contained in queries.fas, take into account the abundance information already
       present, write unwrapped sequences to output with the new abundance information,  discard  all  sequences
       with an abundance of 1:

              vsearch    --derep_fulllength   queries.fas   --output   queries_masked.fas   --sizein   --sizeout
              --fasta_width 0 --minuniquesize 2

       Mask simple repeats and low complexity regions in the input fasta file (masked regions  are  lowercased),
       and write the results to the output file:

              vsearch --maskfasta queries.fas --output queries_masked.fas --qmask dust

       Search  queries in a reference database, with a 80%-similarity threshold, take terminal gaps into account
       when calculating pairwise similarities:

              vsearch --usearch_global queries.fas --db references.fas --alnout results.aln --id 0.8 --iddef 1

       Search a sequence dataset against itself (ignore self hits), get all matches with at least 60%  identity,
       and collect results in a blast-like tab-separated format:

              vsearch  --usearch_global  queries.fas --db queries.fas --id 0.6 --self --blast6out results.blast6
              --maxaccepts 0 --maxrejects 0

       Shuffle the input fasta file (change the order of sequences) in a repeatable fashion  (fixed  seed),  and
       write unwrapped fasta sequences to the output file:

              vsearch --shuffle queries.fas --output queries_shuffled.fas --seed 13 --fasta_width 0

       Sort   by  decreasing  abundance  the  sequences  contained  in  queries.fas  (using  the  "size=integer"
       information), relabel the sequences while preserving the abundance  information  (with  --sizeout),  keep
       only sequences with an abundance equal to or greater than 2:

              vsearch   --sortbysize   queries.fas  --output  queries_sorted.fas  --relabel  sampleA_  --sizeout
              --minsize 2

AUTHORS

       Implementation by Torbjørn Rognes and Tomás Flouri, documentation by Frédéric Mahé.

REPORTING BUGS

       Submit suggestions and bug-reports at <https://github.com/torognes/vsearch/issues>, send a  pull  request
       on  <https://github.com/torognes/vsearch>, or compose a friendly or curmudgeont e-mail to Torbjørn Rognes
       <torognes@ifi.uio.no>.

AVAILABILITY

       Source code and binaries are available at <https://github.com/torognes/vsearch>.

COPYRIGHT

       Copyright (C) 2014, 2015 Torbjørn Rognes, Frédéric Mahé and Tomás Flouri.

       This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero
       General Public License as published by the Free Software Foundation, either version 3 of the License,  or
       any later version.

       This  program  is  distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
       the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU  Affero  General
       Public License for more details.

       You  should  have  received  a copy of the GNU Affero General Public License along with this program.  If
       not, see <http://www.gnu.org/licenses/>.

       vsearch includes code from Google's CityHash project by Geoff Pike and Jyrki Alakuijala,  providing  some
       excellent hash functions available under a MIT license.

       vsearch includes code derived from Tatusov and Lipman's DUST program that is in the public domain.

       vsearch binaries may include code from the zlib library, copyright Jean-Loup Gailly and Mark Adler.

       vsearch binaries may include code from the bzip2 library, copyright Julian R. Seward.

SEE ALSO

       swipe,  an  extremely  fast  pairwise  local  (Smith-Waterman)  database  search tool by Torbjørn Rognes,
       available at <https://github.com/torognes/swipe>.

       swarm, a fast and accurate amplicon clustering method by Frédéric Mahé and Torbjørn Rognes, available  at
       <https://github.com/torognes/swarm>.

VERSION HISTORY

       New  features  and  important  modifications  of  vsearch  (short  lived or minor bug releases may not be
       mentioned):

              v1.0.0 released November 28th, 2014
                     First public release.

              v1.0.1 released December 1st, 2014
                     Bug fixes (sortbysize, semicolon after  size  annotation  in  headers)  and  minor  changes
                     (labels as secondary sort key for most sorts, treat T and U as identical for dereplication,
                     only output size in dbmatched file if sizeout specified).

              v1.0.2 released December 6th, 2014
                     Bug fixes (ssse3/sse4.1 requirement, memory leak).

              v1.0.3 released December 6th, 2014
                     Bug fix (now writes help to stdout instead of stderr).

              v1.0.4 released December 8th, 2014
                     Added --allpairs_global option. Reduced memory requirements slightly. Removed memory leaks.

              v1.0.5 released December 9th, 2014
                     Fixes a minor bug with --allpairs_global and --acceptall options.

              v1.0.6 released December 14th, 2014
                     Fixes a memory allocation bug in chimera detection (--uchime_ref option).

              v1.0.7 released December 19th, 2014
                     Fixes a bug in the output from chimera detection with the --uchimeout option.

              v1.0.8 released January 22nd, 2015
                     Introduces several changes and bug fixes:

                     - a new linear memory aligner for alignment of sequences longer than 5,000 nucleotides,

                     - a  new  --cluster_size  command  that  sorts  sequences  by  decreasing  abundance before
                       clustering,

                     - meaning of userfields qlo, qhi, tlo, thi changed for compatibility with usearch,

                     - new userfields qilo, qihi, tilo, tihi gives alignment coordinates ignoring terminal gaps,

                     - in --uc output files, a perfect alignment is indicated with a "=" sign,

                     - the option  --cluster_fast  will  now  sort  sequences  by  decreasing  length,  then  by
                       decreasing abundance and finally by sequence identifier,

                     - default --maxseqlength value set to 50,000 nucleotides,

                     - fix for bug in alignment in rare cases,

                     - fix for lack of detection of under- or overflow in SIMD aligner.

              v1.0.9 released January 22nd, 2015
                     Fixes a bug in the function sorting sequences by decreasing abundance (--sortbysize).

              v1.0.10 released January 23rd, 2015
                     Fixes  a  bug  where  the  sizein  option  was  ignored and always treated as on, affecting
                     clustering and dereplication commands.

              v1.0.11 released February 5th, 2015
                     Introduces the possibility to output  results  in  SAM  format  (for  clustering,  pairwise
                     alignment and searching).

              v1.0.12 released February 6th, 2015
                     Temporarily fixes a problem with long headers in FASTA files.

              v1.0.13 released February 17th, 2015
                     Fix  a  memory  allocation  problem  when  computing  multiple sequence alignments with the
                     --msaout and --consout options, as well as a memory leak.  Also increased line  buffer  for
                     reading FASTA files to 4MB.

              v1.0.14 released February 17th, 2015
                     Fix  a  bug  where  the multiple alignment and consensus sequence computed after clustering
                     ignored the strand of the sequences.  Also decreased size of line buffer for reading  FASTA
                     files to 1MB again due to excessive stack memory usage.

              v1.0.15 released February 18th, 2015
                     Fix  bug  in calculation of identity metric between sequences when using the MBL definition
                     (--iddef 3).

              v1.0.16 released February 19th, 2015
                     Integrated patches from Debian for increased compatibility with various architectures.

              v1.1.0 released February 20th, 2015
                     Added the --quiet option to suppress all output to stdout and stdout  except  for  warnings
                     and fatal errors.  Added the --log option to write messages to a log file.

              v1.1.1 released February 20th, 2015
                     Added info about --log and --quiet options to help text.

              v1.1.2 released March 18th, 2015
                     Fix bug with large datasets. Fix format of help info.

              v1.1.3 released March 18th, 2015
                     Fix more bugs with large datasets.

version 1.1.3                                    March 18, 2015                                       vsearch(1)