Provided by: swarm_3.1.0+dfsg-1_amd64 bug

NAME

       swarm — find clusters of nearly-identical nucleotide amplicons

SYNOPSIS

       swarm -h|v

       High-precision clustering:

       swarm [filename]

       swarm [-d 1] [-nrz] [-a int] [-i filename] [-l filename] [-o filename] [-s filename]
             [-t int] [-u filename] [-w filename] [filename]

       swarm [-d 1] -f [-nrz] [-a int] [-b int] [-c|y int] [-i filename] [-l filename]
             [-o filename] [-s filename] [-t int] [-u filename] [-w filename] [filename]

       Conservative clustering:

       swarm -d 2+ [-nrz] [-a int] [-e int] [-g int] [-i filename] [-l filename] [-m int]
             [-o filename] [-p int] [-s filename] [-t int] [-u filename] [-w filename] [filename]

       Dereplication (merge strictly identical sequences):

       swarm -d 0 [-rz] [-a int] [-i filename] [-l filename] [-o filename] [-s filename]
             [-u filename] [-w filename] [filename]

DESCRIPTION

       Environmental or clinical molecular studies generate large volumes of amplicons (e.g., 16S
       or 18S SSU-rRNA sequences) that need to be grouped into clusters.  Traditional  clustering
       methods are based on greedy, input-order dependent algorithms, with arbitrary selection of
       cluster centroids and cluster limits (often 97%-similarity). To address that  problem,  we
       developed swarm, a fast and robust method that recursively groups amplicons with d or less
       differences (i.e. substitutions, insertions or  deletions).  swarm  produces  natural  and
       stable  clusters  centered  on  local  peaks  of  abundance,  mostly free from input-order
       dependency induced by centroid selection.

       Exact clustering is impractical on large data sets when using a naïve all-vs-all  approach
       (more precisely a 2-combination without repetitions), as it implies unrealistic numbers of
       pairwise comparisons. swarm is based on a maximum number  of  differences  d  between  two
       amplicons,  and  focuses  only  on  very close local relationships. For d = 1, the default
       value, swarm uses an algorithm of linear complexity that  generates  all  possible  single
       mutations  and  performs  exact-string  matching  by  comparing  hash-values. For d = 2 or
       greater, swarm uses an algorithm of quadratic complexity  that  performs  pairwise  string
       comparisons.  An  efficient k-mer-based filtering and an astute use of comparisons results
       obtained during the clustering  process  allows  swarm  to  avoid  most  of  the  amplicon
       comparisons  needed  in  a naïve approach. To speed up the remaining amplicon comparisons,
       swarm implements an extremely fast Needleman-Wunsch algorithm making use of the  Streaming
       SIMD  Extensions  (SSE2)  of  x86-64 CPUs, NEON instructions of ARM64 CPUs, or Altivec/VMX
       instructions of POWER8 CPUs. If SSE2 instructions are not available, swarm exits  with  an
       error message.

       swarm  can  read  nucleotide  amplicons  in  fasta  format  from a normal file or from the
       standard input (using a pipe or a redirection). The amplicon  header  is  defined  as  the
       string  comprised  between  the  '>'  symbol  and  the first space or the end of the line,
       whichever comes first. Header length is curently limited  to  2048  characters  (including
       '>',  a  linefeed  and  a  final  null  character). Each header must end with an abundance
       annotation representing the amplicon copy number and defined as '_' followed by a positive
       integer.  See option -z for input data using usearch/vsearch's abundance annotation format
       (';size=integer[;]'). Once stripped from the abundance annotation, the remaining  part  of
       the header is call the label. In summary:

                         >header[[:blank:]]   and   header = label_[1-9][0-9]*$

       Abundance  annotations play a crucial role in the clustering process, and swarm exits with
       an error message if that information is not available. As swarm outputs lists of  amplicon
       labels,  amplicon  labels must be unique to avoid any ambiguity; swarm exits with an error
       message if labels are not unique. The amplicon sequence is defined as a string  of  [ACGT]
       or  [ACGU] symbols (case insensitive, 'U' is replaced with 'T' internally), starting after
       the end of the header line and ending before the next header line or the file  end;  swarm
       silently  removes  newline  symbols  ('\n' or '\r') and exits with an error message if any
       other symbol is present. Lastly, if sequences are not all unique, i.e. were  not  properly
       dereplicated, swarm will exit with an error message.

       Clusters  are  written  to  output  files (specified with -i, -o, -s and -u) by decreasing
       abundance of their seed sequences, and then by alphabetical order of seed sequence labels.
       An  exception  to  that  is the -w (--seeds) output, which is sorted by decreasing cluster
       abundance (sum of abundances of all sequences in the cluster), and  then  by  alphabetical
       order of seed sequence labels. This is particularly useful for post-clustering steps, such
       as de novo chimera detection, that require clusters to be sorted by decreasing abundances.

   General options
       -h, --help
                display this help and exit successfully.

       -t, --threads positive integer
                number of computation threads to use. Values between 1 and 256 are accepted,  but
                we  recommend  to  use  a  number  of  threads  lesser  or equal to the number of
                available CPU cores. Default number of threads is 1.

       -v, --version
                output version information and exit successfully.

       --       delimit the option list. Later arguments, if any, are treated as operands even if
                they  begin  with  '-'.  For  example, 'swarm -- -file.fasta' reads from the file
                '-file.fasta'.

   Clustering options
       -d, --differences zero or positive integer
                maximum number of differences allowed between two  amplicons,  meaning  that  two
                amplicons  will  be  grouped  if they have integer (or less) differences. This is
                swarm's most important parameter. The number of differences is calculated as  the
                number  of  mismatches  (substitutions,  insertions or deletions) between the two
                amplicons once  the  optimal  pairwise  global  alignment  has  been  found  (see
                'pairwise  alignment advanced options' to influence that step).  Any integer from
                0 to 255 can be used, but high d values will decrease the taxonomical  resolution
                of swarm results. Commonly used d values are 1, 2 or 3, rarely higher. When using
                d = 0, swarm will output results corresponding to a strict dereplication  of  the
                dataset,  i.e.  merging identical amplicons. Warning, whatever the d value, swarm
                requires fasta entries to present abundance values. Default number of differences
                d is 1.

       -n, --no-otu-breaking
                when  working  with  d  =  1,  deactivate  the  built-in  cluster refinement (not
                recommended). Amplicon abundance values are used to  identify  transitions  among
                in-contact  clusters  and to separate them, yielding higher-resolution clustering
                results. That option prevents  that  separation,  and  in  practice,  allows  the
                creation  of  a  link  between  amplicons  A and B, even if the abundance of B is
                higher than the abundance of A.

   Fastidious options
       -b, --boundary positive integer
                when using the option --fastidious (-f), define the  minimum  abundance  of  what
                should  be considered a large cluster. By default, a cluster with an abundance of
                3 or more is considered large. Conversely, a  cluster  is  small  if  it  has  an
                abundance  of  2  or  less, meaning that it is composed of either one amplicon of
                abundance 2, or two amplicons of abundance 1. Any positive value greater  than  1
                can  be specified. Using higher boundary values can reduce the number of clusters
                (up to a point), and will reduce the taxonomical resolution of swarm results.  It
                will also slightly increase computation time.

       -c, --ceiling positive integer
                when  using the option --fastidious (-f), define swarm's maximum memory footprint
                (in megabytes). swarm will adjust the --bloom-bits (-y) value of the Bloom filter
                to  fit  within the specified amount of memory. The value must be at least 8. See
                the --bloom-bits (-y) option  for  an  alternative  way  to  control  the  memory
                footprint.

       -f, --fastidious
                when working with d = 1, perform a second clustering pass to reduce the number of
                small clusters  (recommended  option).  During  the  first  clustering  pass,  an
                intermediate  amplicon can be missing for purely stochastic reasons, interrupting
                the aggregation process. The fastidious option  will  create  virtual  amplicons,
                allowing  to  graft  small  clusters  upon  larger ones. By default, a cluster is
                considered large if it has a total abundance of 3 or  more  (see  the  --boundary
                option  to  modify  that value). To speed things up, swarm uses a Bloom filter to
                store intermediate results. Warning, the second clustering pass can  be  2  to  3
                times  slower  than  the  first  pass  and requires much more memory to store the
                virtual amplicons  in  Bloom  filters.  See  the  options  --bloom-bits  (-y)  or
                --ceiling  (-c)  to  control  the  memory  footprint  of  the  Bloom  filter. The
                fastidious option modifies clustering results: the output files produced  by  the
                options --log (-l), --output-file (-o), --mothur (-r), --uclust-file, and --seeds
                (-w) are updated to reflect these modifications; the file --statistics-file  (-s)
                is  partially  updated  (columns  6  and  7  are  not  updated);  the output file
                --internal-structure (-i) is partially updated  (column  5  is  not  updated  for
                amplicons that belonged to the small cluster).

       -y, --bloom-bits positive integer
                when  using the option --fastidious (-f), define the size (in bits) of each entry
                in the Bloom filter. That option allows to balance the  efficiency  (i.e.  speed)
                and  the  memory  footprint of the Bloom filter. Large values will make the Bloom
                filter more efficient but will require more memory. Any value between  2  and  64
                can  be  used.  Default  value  is  16.  See  the  --ceiling  (-c)  option for an
                alternative way to control the memory footprint.

   Input/output options
       -a, --append-abundance positive integer
                set abundance value to use when some or all amplicons  in  the  input  file  lack
                abundance  values (_integer, or ;size=integer; when using -z). Warning, it is not
                recommended to use swarm on datasets where abundance values are all identical. We
                provide  that  option  as  a courtesy to advanced users, please use it carefully.
                swarm exits with an error message if abundance values are  missing  and  if  this
                option is not used.

       -i, --internal-structure filename
                output  all  pairs of nearly-identical amplicons to filename using a five-columns
                tab-delimited format:

                       1.  amplicon A label (header without abundance annotations).

                       2.  amplicon B label (header without abundance annotations).

                       3.  number of differences between amplicons A and B (positive integer).

                       4.  cluster number (positive integer).  Clusters  are  numbered  in  their
                           order  of  delineation,  starting  from  1.  All  pairs  of  amplicons
                           belonging to the same cluster will receive the same number.

                       5.  cummulated number of  steps  from  the  cluster  seed  to  amplicon  B
                           (positive  integer).  When  using  the  option  --fastidious (-f), the
                           actual number of steps between grafted amplicons and the cluster  seed
                           cannot  be  re-computed  efficiently  and  is  always set to 2 for the
                           amplicon  pair  linking  the  small  cluster  to  the  large  cluster.
                           Cummulated  number  of  steps  in  the small cluster (if any) are left
                           unchanged.

       -l, --log filename
                output all messages to filename instead of standard error, with the exception  of
                error  messages  of  course. That option is useful in situations where writing to
                standard error is problematic (for example, with certain job schedulers).

       -o, --output-file filename
                output clustering results to filename. Results consist of a list of clusters, one
                cluster  per  line.  A cluster is a list of amplicon headers separated by spaces.
                That output format can be modified by the option --mothur  (-r).  Default  is  to
                write to standard output.

       -r, --mothur
                output  clustering  results  in  a  format  compatible  with  Mothur. That option
                modifies swarm's default output format.

       -s, --statistics-file filename
                output statistics to filename. The file is a tab-separated table with one cluster
                per row and seven columns of information:

                       1.  number of unique amplicons in the cluster,

                       2.  total abundance of amplicons in the cluster,

                       3.  label of the initial seed (header without abundance annotations),

                       4.  abundance of the initial seed,

                       5.  number of amplicons with an abundance of 1 in the cluster,

                       6.  maximum  number  of  iterations before the cluster reached its natural
                           limit,

                       7.  cummulated number of steps along the path joining  the  seed  and  the
                           furthermost  amplicon  in  the  cluster.  Please  note that the actual
                           number of differences between the seed and the furthermost amplicon is
                           usually much smaller. When using the option --fastidious (-f), grafted
                           amplicons are not taken into account.

       -u, --uclust-file filename
                output clustering results in filename using a  tab-separated  uclust-like  format
                with 10 columns and 3 different type of entries (S, H or C). That option does not
                modify swarm's default output format. Each fasta sequence in the input  file  can
                be  either  a  cluster  centroid  (S) or a hit (H) assigned to a cluster. Cluster
                records (C) summarize information  (size,  centroid  header)  for  each  cluster.
                Column content varies with the type of entry (S, H or C):

                       1.  Record type: S, H, or C.

                       2.  Cluster number (zero-based).

                       3.  Centroid length (S), query length (H), or cluster size (C).

                       4.  Percentage of similarity with the centroid sequence (H), or set to '*'
                           (S, C).

                       5.  Match orientation + or - (H), or set to '*' (S, C).

                       6.  Not used, always set to '*' (S, C) or to zero (H).

                       7.  Not used, always set to '*' (S, C) or to zero (H).

                       8.  set to '*' (S, C) or, for H, compact representation  of  the  pairwise
                           alignment   using  the  CIGAR  format  (Compact  Idiosyncratic  Gapped
                           Alignment Report): M (match), D  (deletion)  and  I  (insertion).  The
                           equal  sign  '=' indicates that the query is identical to the centroid
                           sequence.

                       9.  Header of the query sequence (H), or of the centroid sequence (S, C).

                       10. Header of the centroid sequence (H), or set to '*' (S, C).

       -w, --seeds filename
                output  cluster  representative  sequences  to  filename  in  fasta  format.  The
                abundance  value  of  each cluster representative is the sum of the abundances of
                all the amplicons  in  the  cluster.  Fasta  headers  are  formated  as  follows:
                '>label_integer',  or  '>label;size=integer;'  if  the  -z  option  is  used, and
                sequences are uppercased. Sequences are sorted by decreasing abundance, and  then
                by alphabetical order of sequence labels.

       -z, --usearch-abundance
                accept     amplicon     abundance     values     in    usearch/vsearch's    style
                (>label;size=integer[;]). That option influences the abundance  annotation  style
                used in swarm's standard output (-o), as well as the output of options -r, -u and
                -w.

   Pairwise alignment advanced options
       when using d > 1, swarm recognizes advanced command-line options  modifying  the  pairwise
       global alignment scoring parameters:

              -m, --match-reward positive integer
                       Default reward for a nucleotide match is 5.

              -p, --mismatch-penalty positive integer
                       Default penalty for a nucleotide mismatch is 4.

              -g, --gap-opening-penalty positive integer
                       Default gap opening penalty is 12.

              -e, --gap-extension-penalty positive integer
                       Default gap extension penalty is 4.

       As  swarm  focuses  on  close  relationships  (e.g.,  d  = 2 or 3), clustering results are
       resilient to pairwise alignment model parameters modifications. When  clustering  using  a
       higher d value, modifying model parameters has a stronger impact.

EXAMPLES

       Clusterize  the  compressed  data set myfile.fasta using the finest resolution possible (1
       difference by default, built-in breaking, fastidious option) using 4 computation  threads.
       Clusters are written to the file myfile.swarms, and cluster representatives are written to
       myfile.representatives.fasta:
              zcat myfile.fasta.gz | \
                  swarm \
                      -t 4 \
                      -f \
                      -w myfile.representatives.fasta \
                      -o /dev/null

AUTHORS

       Concept by Frédéric Mahé, implementation by Torbjørn Rognes.

CITATION

       Mahé F, Rognes T, Quince C, de Vargas  C,  Dunthorn  M.  (2014)  Swarm:  robust  and  fast
       clustering       method      for      amplicon-based      studies.       PeerJ      2:e593
       ⟨https://doi.org/10.7717/peerj.593⟩.

       Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2015) Swarm v2: highly-scalable  and
       high-resolution amplicon clustering.  PeerJ 3:e1420 ⟨https://doi.org/10.7717/peerj.1420⟩.

REPORTING BUGS

       Submit  suggestions  and bug-reports at ⟨https://github.com/torognes/swarm/issues⟩, send a
       pull request  at  ⟨https://github.com/torognes/swarm/pulls⟩,  or  compose  a  friendly  or
       curmudgeonly   e-mail  to  Frédéric  Mahé  ⟨frederic.mahe@cirad.fr⟩  and  Torbjørn  Rognes
       ⟨torognes@ifi.uio.no⟩.

AVAILABILITY

       Source code and binaries available at ⟨https://github.com/torognes/swarm⟩.

COPYRIGHT

       Copyright (C) 2012-2021 Frédéric Mahé & Torbjørn Rognes

       This program is free software: you can redistribute it and/or modify it under the terms of
       the GNU Affero General Public License as published by the Free Software Foundation, either
       version 3 of the License, or any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;
       without  even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       See the GNU Affero General Public License for more details.

       You should have received a copy of the GNU Affero General Public License along  with  this
       program.  If not, see ⟨https://www.gnu.org/licenses/⟩.

SEE ALSO

       swipe, an extremely fast Smith-Waterman database search tool by Torbjørn Rognes (available
       at ⟨https://github.com/torognes/swipe⟩).

       vsearch, an open-source re-implementation of the  classic  uclust  clustering  method  (by
       Robert  C.  Edgar),  along  with  other amplicon filtering and searching tools. vsearch is
       implemented by Torbjørn Rognes and documented  by  Frédéric  Mahé,  and  is  available  at
       ⟨https://github.com/torognes/vsearch⟩.

VERSION HISTORY

       New  features  and important modifications of swarm (short lived or minor bug releases are
       not mentioned):

              v3.1.0 released March 1, 2021
                     Version 3.1.0 includes a fix for a bug in the  16-bit  SIMD  alignment  code
                     that  was  exposed  with a combination of d>1, long sequences, and very high
                     gap penalties. The code has also been been cleaned up, tested  and  improved
                     substantially,  and  it  is  now fully C++11 compliant. Support for macOS on
                     Apple Silicon (ARM64) has been added.

              v3.0.0 released October 24, 2019
                     Version 3.0.0 introduces a faster algorithm for d = 1, and a reduced  memory
                     footprint.  Swarm  has  been ported to Windows x86-64, GNU/Linux ARM 64, and
                     GNU/Linux  POWER8.  Internal  code  has  been  modernized,   hardened,   and
                     thoroughly tested. Strict dereplication of input sequences is now mandatory.
                     The --seeds option (-w) now outputs results sorted by decreasing  abundance,
                     and then by alphabetical order of sequence labels.

              v2.2.2 released December 12, 2017
                     Version  2.2.2  fixes  a  bug that would cause swarm to wait forever in very
                     rare cases when multiple threads were used.

              v2.2.1 released October 27, 2017
                     Version 2.2.1 fixes a memory  allocation  bug  for  d  =  1  and  duplicated
                     sequences.

              v2.2.0 released October 17, 2017
                     Version  2.2.0  fixes  several  problems  and  improves usability. Corrected
                     output to structure and uclust files when using fastidious  mode.  Corrected
                     abundance  output  in  some  cases. Added check for duplicated sequences and
                     fixed check for duplicated sequence IDs. Checks for empty  sequences.  Sorts
                     sequences  by additional fields to improve stability. Improves compatibility
                     with compilers and operating systems.   Outputs  sequences  in  upper  case.
                     Allows  64-bit  abundances. Shows message when waiting for input from stdin.
                     Improves error messages and warnings.  Improves  checking  of  command  line
                     options.   Fixes   remaining   errors   reported   by  test  suite.  Updates
                     documentation.

              v2.1.13 released March 8, 2017
                     Version 2.1.13 removes a bug with the progress bar when writing seeds.

              v2.1.12 released January 16, 2017
                     Version 2.1.12 removes a debugging message.

              v2.1.11 released January 16, 2017
                     Version 2.1.11  fixes  two  bugs  related  to  the  SIMD  implementation  of
                     alignment  that  might  result  in incorrect alignments and scores.  The bug
                     only applies when d > 1.

              v2.1.10 released December 22, 2016
                     Version 2.1.10 fixes two bugs related to gap penalties of  alignments.   The
                     first bug may lead to wrong aligments and similarity percentages reported in
                     UCLUST (.uc) files. The second bug makes swarm use  a  slightly  higher  gap
                     extension  penalty  than  specified.  The default gap extension penalty used
                     have actually been 4.5 instead of 4.

              v2.1.9 released July 6, 2016
                     Version 2.1.9 fixes errors when compiling with GCC version 6.

              v2.1.8 released March 11, 2016
                     Version 2.1.8 fixes a rare bug triggered  when  clustering  extremely  short
                     undereplicated  sequences. Also, alignment parameters are not shown when d =
                     1.

              v2.1.7 released February 24, 2016
                     Version 2.1.7 fixes a bug in the output of seeds with the -w option when d >
                     1  that  was  not  properly  fixed  in  version 2.1.6. It also handles ascii
                     character #13 (CR) in FASTA files better. Swarm will now exit with status  0
                     if  the  -h  or  the  -v  option  is specified. The help text and some error
                     messages have been improved.

              v2.1.6 released December 14, 2015
                     Version 2.1.6 fixes problems with older  compilers  that  do  not  have  the
                     x86intrin.h header file. It also fixes a bug in the output of seeds with the
                     -w option when d > 1.

              v2.1.5 released September 8, 2015
                     Version 2.1.5 fixes minor bugs.

              v2.1.4 released September 4, 2015
                     Version 2.1.4 fixes minor bugs in the swarm algorithm used for d = 1.

              v2.1.3 released August 28, 2015
                     Version 2.1.3 adds checks of numeric option arguments.

              v2.1.1 released March 31, 2015
                     Version 2.1.1 fixes a bug with the  fastidious  option  that  caused  it  to
                     ignore some connections between large and small clusters.

              v2.1.0 released March 24, 2015
                     Version 2.1.0 marks the first official release of swarm v2.

              v2.0.7 released March 18, 2015
                     Version  2.0.7  writes  abundance  information  in  usearch style when using
                     options -w (--seeds) in combination with -z (--usearch-abundance).

              v2.0.6 released March 13, 2015
                     Version 2.0.6 fixes a minor bug.

              v2.0.5 released March 13, 2015
                     Version 2.0.5 improves the implementation of the fastidious option and  adds
                     options  to  control  memory  usage  of  the  Bloom  filter (-y and -c).  In
                     addition, an option (-w) allows to output cluster representatives  sequences
                     with  updated  abundances  (sum of all abundances inside each cluster). This
                     version also enables swarm to run with d = 0.

              v2.0.4 released March 6, 2015
                     Version 2.0.4 includes a fully parallelised implementation of the fastidious
                     option.

              v2.0.3 released March 4, 2015
                     Version  2.0.3  includes  a working implementation of the fastidious option,
                     but only the initial clustering is parallelized.

              v2.0.2 released February 26, 2015
                     Version 2.0.2 fixes SSSE3 problems.

              v2.0.1 released February 26, 2015
                     Version  2.0.1  is  a  development   version   that   contains   a   partial
                     implementation of the fastidious option, but it is not usable yet.

              v2.0.0 released December 3, 2014
                     Version  2.0.0  is  faster  and  easier to use, providing new output options
                     (--internal-structure  and  --log),   new   control   options   (--boundary,
                     --fastidious,  --no-otu-breaking),  and built-in cluster refinement (no need
                     to use the python script anymore). When using default  parameters,  a  novel
                     and  considerably  faster algorithmic approach is used, guaranteeing swarm's
                     scalability.

              v1.2.21 released February 26, 2015
                     Version 1.2.21 is supposed to fix some problems related to the  use  of  the
                     SSSE3 CPU instructions which are not always available.

              v1.2.20 released November 6, 2014
                     Version  1.2.20  presents  a  production-ready  version  of  the alternative
                     algorithm (option -a), with optional built-in cluster breaking (option  -n).
                     That   alternative  algorithmic  approach  (usable  only  with  d  =  1)  is
                     considerably faster than currently used clustering algorithms, and can  deal
                     with  datasets  of  100  million unique amplicons or more in a few hours. Of
                     course, results are rigourously identical to the results previously produced
                     with swarm. That release also introduces new options to control swarm output
                     (options -i and -l).

              v1.2.19 released October 3, 2014
                     Version 1.2.19 fixes a problem related to  abundance  information  when  the
                     sequence label includes multiple underscore characters.

              v1.2.18 released September 29, 2014
                     Version  1.2.18 reenables the possibility of reading sequences from stdin if
                     no file name is specified on the command line. It also fixes a  bug  related
                     to CPU features detection.

              v1.2.17 released September 28, 2014
                     Version 1.2.17 fixes a memory allocation bug introduced in version 1.2.15.

              v1.2.16 released September 27, 2014
                     Version  1.2.16  fixes  a  bug  in  the abundance sort introduced in version
                     1.2.15.

              v1.2.15 released September 27, 2014
                     Version 1.2.15 sorts the input sequences in order  of  decreasing  abundance
                     unless  they  are  detected to be sorted already. When using the alternative
                     algorithm for d = 1 it also  sorts  all  subseeds  in  order  of  decreasing
                     abundance.

              v1.2.14 released September 27, 2014
                     Version  1.2.14  fixes  a  bug in the output with the --swarm_breaker option
                     (-b) when using the alternative algorithm (-a).

              v1.2.12 released August 18, 2014
                     Version 1.2.12  introduces  an  option  --alternative-algorithm  to  use  an
                     extremely  fast,  experimental clustering algorithm for the special case d =
                     1. Multithreading scalability of the default algorithm has  been  noticeably
                     improved.

              v1.2.10 released August 8, 2014
                     Version  1.2.10 allows amplicon abundances to be specified using the usearch
                     style in the sequence header (e.g.  '>id;size=1')  when  the  -z  option  is
                     chosen.

              v1.2.8 released August 5, 2014
                     Version  1.2.8  fixes  an  error  with  the  gap extension penalty. Previous
                     versions used a gap penalty twice as large as intended. That bug  correction
                     induces small changes in clustering results.

              v1.2.6 released May 23, 2014
                     Version  1.2.6 introduces an option --mothur to output clustering results in
                     a format compatible with the microbial ecology community  analysis  software
                     suite Mothur ( ⟨https://www.mothur.org/⟩).

              v1.2.5 released April 11, 2014
                     Version  1.2.5  removes  the  need  for  a POPCNT hardware instruction to be
                     present. swarm now automatically checks whether POPCNT is available and uses
                     a   slightly   slower  software  implementation  if  not.  Only  basic  SSE2
                     instructions are now required to run swarm.

              v1.2.4 released January 30, 2014
                     Version 1.2.4 introduces an option --break-swarms to  output  all  pairs  of
                     amplicons  with  d differences to standard error. That option is used by the
                     companion script `swarm_breaker.py` to refine swarm results. The  syntax  of
                     the inline assembly code is changed for compatibility with more compilers.

              v1.2 released May 16, 2013
                     Version  1.2  greatly  improves speed by using alignment-free comparisons of
                     amplicons based on k-mer word content.  For  each  amplicon,  the  presence-
                     absence  of  all  possible  5-mers  is  computed and recorded in a 1024-bits
                     vector. Vector comparisons are extremely fast  and  drastically  reduce  the
                     number  of  costly  pairwise  alignments performed by swarm. While remaining
                     exact, swarm 1.2 can be more than 100-times  faster  than  swarm  1.1,  when
                     using  a  single  thread  with  a  large set of sequences. The minor version
                     1.1.1, published just before, adds compatibility with Apple  computers,  and
                     corrects  an  issue in the pairwise global alignment step that could lead to
                     sub-optimal alignments.

              v1.1 released February 26, 2013
                     Version 1.1 introduces two new important options: the possibility to  output
                     clustering  results  using  the uclust output format, and the possibility to
                     output detailed statistics on each cluster. swarm 1.1 is  also  faster:  new
                     filterings  based  on  pairwise  amplicon  sequence  lengths and composition
                     comparisons reduce the number of pairwise alignments needed and speed up the
                     clustering.

              v1.0 released November 10, 2012
                     First public release.