Provided by: swarm_3.0.0+dfsg-2build1_amd64 bug

NAME

       swarm — find clusters of nearly-identical nucleotide amplicons

SYNOPSIS

       swarm -h|v

       High-precision clustering:

       swarm [filename]

       swarm [-d 1] [-nrz] [-a int] [-i filename] [-l filename] [-o filename] [-s filename]
             [-t int] [-u filename] [-w filename] [filename]

       swarm [-d 1] -f [-nrz] [-a int] [-b int] [-c|y int] [-i filename] [-l filename]
             [-o filename] [-s filename] [-t int] [-u filename] [-w filename] [filename]

       Conservative clustering:

       swarm -d 2+ [-nrz] [-a int] [-e int] [-g int] [-i filename] [-l filename] [-m int]
             [-o filename] [-p int] [-s filename] [-t int] [-u filename] [-w filename] [filename]

       Dereplication (merge strictly identical sequences):

       swarm -d 0 [-rz] [-a int] [-i filename] [-l filename] [-o filename] [-s filename]
             [-u filename] [-w filename] [filename]

DESCRIPTION

       Environmental or clinical molecular studies generate large volumes of amplicons (e.g., 16S
       or 18S SSU-rRNA sequences) that need to be clustered into molecular operational  taxonomic
       units  (OTUs).  Common  clustering  methods  are  based  on  greedy, input-order dependent
       algorithms, with arbitrary selection of global cluster  size  and  cluster  centroids.  To
       address that problem, we developed swarm, a fast and robust method that recursively groups
       amplicons with d or less differences (i.e. substitutions, insertions or deletions).  swarm
       produces  natural  and  stable  clusters centered on local peaks of abundance, mostly free
       from input-order dependency induced by centroid selection.

       Exact clustering is impractical on large data sets when using a naïve all-vs-all  approach
       (more precisely a 2-combination without repetitions), as it implies unrealistic numbers of
       pairwise comparisons. swarm is based on a maximum number  of  differences  d  between  two
       amplicons,  and  focuses  only  on  very close local relationships. For d = 1, the default
       value, swarm uses an algorithm of linear complexity that  generates  all  possible  single
       mutations  and  performs  exact-string  matching  by  comparing  hash-values. For d = 2 or
       greater, swarm uses an algorithm of quadratic complexity  that  performs  pairwise  string
       comparisons.  An  efficient k-mer-based filtering and an astute use of comparisons results
       obtained during the clustering  process  allows  swarm  to  avoid  most  of  the  amplicon
       comparisons  needed  in  a naïve approach. To speed up the remaining amplicon comparisons,
       swarm implements an extremely fast Needleman-Wunsch algorithm making use of the  Streaming
       SIMD Extensions (SSE2) of modern x86-64 CPUs, or NEON instructions of ARM-64 CPUs. If SSE2
       instructions are not available, swarm exits with an error message.

       swarm can read nucleotide amplicons in fasta  format  from  a  normal  file  or  from  the
       standard  input  (using  a  pipe  or a redirection). The amplicon header is defined as the
       string comprised between the '>' symbol and the first  space  or  the  end  of  the  line,
       whichever  comes first. Each header must end with an abundance annotation representing the
       amplicon copy number and defined as '_' followed by a positive integer. See option -z  for
       input  data using usearch/vsearch's abundance annotation format (';size=integer[;]'). Once
       stripped from the abundance annotation, the remaining part  of  the  header  is  call  the
       label. In summary:

                         >header[[:blank:]]   and   header = label_[1-9][0-9]*$

       Abundance  annotations play a crucial role in the clustering process, and swarm exits with
       an error message if that information is not available. As swarm outputs lists of  amplicon
       labels,  amplicon  labels must be unique to avoid any ambiguity; swarm exits with an error
       message if labels are not unique. The amplicon sequence is defined as a string  of  [ACGT]
       or  [ACGU] symbols (case insensitive, 'U' is replaced with 'T' internally), starting after
       the end of the header line and ending before the next header line or the file  end;  swarm
       silently  removes  newline  symbols  ('\n' or '\r') and exits with an error message if any
       other symbol is present. Lastly, if sequences are not all unique, i.e. were  not  properly
       dereplicated, swarm will exit with an error message.

       Clusters  are  written  to  output  files (specified with -i, -o, -s and -u) by decreasing
       abundance of their seed sequences, and then by alphabetical order of seed sequence labels.
       An  exception  to  that  is the -w (--seeds) output, which is sorted by decreasing cluster
       abundance (sum of abundances of all sequences in the cluster), and  then  by  alphabetical
       order of seed sequence labels. This is particularly useful for post-clustering steps, such
       as de novo chimera detection, that require clusters to be sorted by decreasing abundances.

   General options
       -h, --help
                display this help and exit successfully.

       -t, --threads positive integer
                number of computation threads to use. Values between 1 and 256 are accepted,  but
                we  recommend  to  use  a  number  of  threads  lesser  or equal to the number of
                available CPU cores. Default number of threads is 1.

       -v, --version
                output version information and exit successfully.

       --       delimit the option list. Later arguments, if any, are treated as operands even if
                they  begin  with  '-'.  For  example, 'swarm -- -file.fasta' reads from the file
                '-file.fasta'.

   Clustering options
       -d, --differences zero or positive integer
                maximum number of differences allowed between two  amplicons,  meaning  that  two
                amplicons  will  be  grouped  if they have integer (or less) differences. This is
                swarm's most important parameter. The number of differences is calculated as  the
                number  of  mismatches  (substitutions,  insertions or deletions) between the two
                amplicons once  the  optimal  pairwise  global  alignment  has  been  found  (see
                'pairwise  alignment advanced options' to influence that step).  Any integer from
                0 to 255 can be used, but high d values will decrease the taxonomical  resolution
                of swarm results. Commonly used d values are 1, 2 or 3, rarely higher. When using
                d = 0, swarm will output results corresponding to a strict dereplication  of  the
                dataset,  i.e.  merging identical amplicons. Warning, whatever the d value, swarm
                requires fasta entries to present abundance values. Default number of differences
                d is 1.

       -n, --no-otu-breaking
                when   working   with  d  =  1,  deactivate  the  built-in  OTU  refinement  (not
                recommended). Amplicon abundance values are used to  identify  transitions  among
                in-contact  OTUs  and  to  separate  them,  yielding higher-resolution clustering
                results. That option prevents  that  separation,  and  in  practice,  allows  the
                creation  of  a  link  between  amplicons  A and B, even if the abundance of B is
                higher than the abundance of A.

   Fastidious options
       -b, --boundary positive integer
                when using the option --fastidious (-f), define the minimum mass of a large  OTU.
                By  default,  an OTU with a mass of 3 or more is considered large. Conversely, an
                OTU is small if it has a mass of less than 3, meaning  that  it  is  composed  of
                either one amplicon of abundance 2, or two amplicons of abundance 1. Any positive
                value greater than 1 can be specified. Using higher boundary values will speed up
                the  second  pass,  but  also reduce the taxonomical resolution of swarm results.
                Default mass of a large OTU is 3.

       -c, --ceiling positive integer
                when using the option --fastidious (-f), define swarm's maximum memory  footprint
                (in megabytes). swarm will adjust the --bloom-bits (-y) value of the Bloom filter
                to fit within the specified amount of memory. The value must be at least 8.

       -f, --fastidious
                when working with d = 1, perform a second clustering pass to reduce the number of
                small   OTUs   (recommended   option).  During  the  first  clustering  pass,  an
                intermediate amplicon can be missing for purely stochastic reasons,  interrupting
                the  aggregation  process.  The  fastidious option will create virtual amplicons,
                allowing to graft small OTUs upon bigger ones. By default, an OTU is small if  it
                has  a  mass  of  2  or less (see the --boundary option to modify that value). To
                speed things up, swarm  uses  a  Bloom  filter  to  store  intermediate  results.
                Warning,  the  second  clustering  pass can be 2 to 3 times slower than the first
                pass and requires much more memory  to  store  the  virtual  amplicons  in  Bloom
                filters.  See  the  options  --bloom-bits  (-y)  or --ceiling (-c) to control the
                memory footprint of the Bloom filter. The fastidious option  modifies  clustering
                results: the output files produced by the options --log (-l), --output-file (-o),
                --mothur (-r), --uclust-file, and --seeds  (-w)  are  updated  to  reflect  these
                modifications;  the  file  --statistics-file (-s) is partially updated (columns 6
                and 7 are not updated); the output file --internal-structure  (-i)  is  partially
                updated (column 5 is not updated for amplicons that belonged to the small OTU).

       -y, --bloom-bits positive integer
                when  using the option --fastidious (-f), define the size (in bits) of each entry
                in the Bloom filter. That option allows to balance the  efficiency  (i.e.  speed)
                and  the  memory  footprint of the Bloom filter. Large values will make the Bloom
                filter more efficient but will require more memory. Any value between  2  and  64
                can  be  used.  Default  value  is  16.  See  the  --ceiling  (-c)  option for an
                alternative way to control the memory footprint.

   Input/output options
       -a, --append-abundance positive integer
                set abundance value to use when some or all amplicons  in  the  input  file  lack
                abundance  values (_integer, or ;size=integer; when using -z). Warning, it is not
                recommended to use swarm on datasets where abundance values are all identical. We
                provide  that  option  as  a courtesy to advanced users, please use it carefully.
                swarm exits with an error message if abundance values are  missing  and  if  this
                option is not used.

       -i, --internal-structure filename
                output  all  pairs of nearly-identical amplicons to filename using a five-columns
                tab-delimited format:

                       1.  amplicon A label (header without abundance annotations).

                       2.  amplicon B label (header without abundance annotations).

                       3.  number of differences between amplicons A and B (positive integer).

                       4.  OTU number (positive integer). OTUs are numbered  in  their  order  of
                           delineation,  starting from 1. All pairs of amplicons belonging to the
                           same OTU will receive the same number.

                       5.  cummulated number of steps from the OTU seed to amplicon  B  (positive
                           integer).  When  using the option --fastidious (-f), the actual number
                           of steps between grafted amplicons and the  OTU  seed  cannot  be  re-
                           computed  efficiently  and  is  always  set to 2 for the amplicon pair
                           linking the small OTU to the big OTU. Cummulated number  of  steps  in
                           the small OTU (if any) are left unchanged.

       -l, --log filename
                output  all messages to filename instead of standard error, with the exception of
                error messages of course. That option is useful in situations  where  writing  to
                standard error is problematic (for example, with certain job schedulers).

       -o, --output-file filename
                output clustering results to filename. Results consist of a list of OTUs, one OTU
                per line. An OTU is a list of amplicon headers separated by spaces.  That  output
                format  can  be  modified  by  the  option  --mothur (-r). Default is to write to
                standard output.

       -r, --mothur
                output clustering results  in  a  format  compatible  with  Mothur.  That  option
                modifies swarm's default output format.

       -s, --statistics-file filename
                output statistics to filename. The file is a tab-separated table with one OTU per
                row and seven columns of information:

                       1.  number of unique amplicons in the OTU,

                       2.  total abundance of amplicons in the OTU,

                       3.  label of the initial seed (header without abundance annotations),

                       4.  abundance of the initial seed,

                       5.  number of amplicons with an abundance of 1 in the OTU,

                       6.  maximum number of iterations before the OTU reached its natural limit,

                       7.  cummulated number of steps along the path joining  the  seed  and  the
                           furthermost amplicon in the OTU. Please note that the actual number of
                           differences between the seed and the furthermost amplicon  is  usually
                           much  smaller.  When  using  the  option  --fastidious  (-f),  grafted
                           amplicons are not taken into account.

       -u, --uclust-file filename
                output clustering results in filename using a  tab-separated  uclust-like  format
                with 10 columns and 3 different type of entries (S, H or C). That option does not
                modify swarm's default output format. Each fasta sequence in the input  file  can
                be  either  a  cluster  centroid  (S) or a hit (H) assigned to a cluster. Cluster
                records (C) summarize information  (size,  centroid  header)  for  each  cluster.
                Column content varies with the type of entry (S, H or C):

                       1.  Record type: S, H, or C.

                       2.  Cluster number (zero-based).

                       3.  Centroid length (S), query length (H), or cluster size (C).

                       4.  Percentage of similarity with the centroid sequence (H), or set to '*'
                           (S, C).

                       5.  Match orientation + or - (H), or set to '*' (S, C).

                       6.  Not used, always set to '*' (S, C) or to zero (H).

                       7.  Not used, always set to '*' (S, C) or to zero (H).

                       8.  set to '*' (S, C) or, for H, compact representation  of  the  pairwise
                           alignment   using  the  CIGAR  format  (Compact  Idiosyncratic  Gapped
                           Alignment Report): M (match), D  (deletion)  and  I  (insertion).  The
                           equal  sign  '=' indicates that the query is identical to the centroid
                           sequence.

                       9.  Header of the query sequence (H), or of the centroid sequence (S, C).

                       10. Header of the centroid sequence (H), or set to '*' (S, C).

       -w, --seeds filename
                output OTU representative sequences to filename in fasta  format.  The  abundance
                value  of  each  OTU  representative  is  the  sum  of  the abundances of all the
                amplicons in the OTU. Fasta headers are formated as follows: '>label_integer', or
                '>label;size=integer;'  if  the  -z option is used, and sequences are uppercased.
                Sequences are sorted by decreasing abundance, and then by alphabetical  order  of
                sequence labels.

       -z, --usearch-abundance
                accept     amplicon     abundance     values     in    usearch/vsearch's    style
                (>label;size=integer[;]). That option influences the abundance  annotation  style
                used in swarm's standard output (-o), as well as the output of options -r, -u and
                -w.

   Pairwise alignment advanced options
       when using d > 1, swarm recognizes advanced command-line options  modifying  the  pairwise
       global alignment scoring parameters:

              -m, --match-reward positive integer
                       Default reward for a nucleotide match is 5.

              -p, --mismatch-penalty positive integer
                       Default penalty for a nucleotide mismatch is 4.

              -g, --gap-opening-penalty positive integer
                       Default gap opening penalty is 12.

              -e, --gap-extension-penalty positive integer
                       Default gap extension penalty is 4.

       As  swarm  focuses  on  close  relationships  (e.g.,  d  = 2 or 3), clustering results are
       resilient to pairwise alignment model parameters modifications. When  clustering  using  a
       higher d value, modifying model parameters has a stronger impact.

EXAMPLES

       Clusterize  the  compressed  data  set  myfile.fasta into OTUs using the finest resolution
       possible  (1  difference  by  default,  built-in  breaking,  fastidious  option)  using  4
       computation  threads.  OTUs are written to the file myfile.swarms, and OTU representatives
       are written to myfile.representatives.fasta:
              zcat myfile.fasta.gz | \
                  swarm \
                      -t 4 \
                      -f \
                      -w myfile.representatives.fasta \
                      -o /dev/null

AUTHORS

       Concept by Frédéric Mahé, implementation by Torbjørn Rognes.

CITATION

       Mahé F, Rognes T, Quince C, de Vargas  C,  Dunthorn  M.  (2014)  Swarm:  robust  and  fast
       clustering       method      for      amplicon-based      studies.       PeerJ      2:e593
       ⟨https://doi.org/10.7717/peerj.593⟩.

       Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2015) Swarm v2: highly-scalable  and
       high-resolution amplicon clustering.  PeerJ 3:e1420 ⟨https://doi.org/10.7717/peerj.1420⟩.

REPORTING BUGS

       Submit  suggestions  and bug-reports at ⟨https://github.com/torognes/swarm/issues⟩, send a
       pull request at ⟨https://github.com/torognes/swarm⟩, or compose a friendly or curmudgeonly
       e-mail     to    Frédéric    Mahé    ⟨frederic.mahe@cirad.fr⟩    and    Torbjørn    Rognes
       ⟨torognes@ifi.uio.no⟩.

AVAILABILITY

       Source code and binaries available at ⟨https://github.com/torognes/swarm⟩.

COPYRIGHT

       Copyright (C) 2012-2019 Frédéric Mahé & Torbjørn Rognes

       This program is free software: you can redistribute it and/or modify it under the terms of
       the GNU Affero General Public License as published by the Free Software Foundation, either
       version 3 of the License, or any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;
       without  even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       See the GNU Affero General Public License for more details.

       You should have received a copy of the GNU Affero General Public License along  with  this
       program.  If not, see ⟨http://www.gnu.org/licenses/⟩.

SEE ALSO

       swipe, an extremely fast Smith-Waterman database search tool by Torbjørn Rognes (available
       at ⟨https://github.com/torognes/swipe⟩).

       vsearch, an open-source re-implementation of the  classic  uclust  clustering  method  (by
       Robert  C.  Edgar),  along  with  other amplicon filtering and searching tools. vsearch is
       implemented by Torbjørn Rognes and documented  by  Frédéric  Mahé,  and  is  available  at
       ⟨https://github.com/torognes/vsearch⟩.

VERSION HISTORY

       New  features  and important modifications of swarm (short lived or minor bug releases are
       not mentioned):

              v3.0.0 released October 24, 2019
                     Version 3.0.0 introduces a faster algorithm for d = 1, and a reduced  memory
                     footprint.  Swarm  has  been ported to Windows x86-64, GNU/Linux ARM 64, and
                     GNU/Linux  POWER8.  Internal  code  has  been  modernized,   hardened,   and
                     thoroughly tested. Strict dereplication of input sequences is now mandatory.
                     The --seeds option (-w) now outputs results sorted by decreasing  abundance,
                     and then by alphabetical order of sequence labels.

              v2.2.2 released December 12, 2017
                     Version  2.2.2  fixes  a  bug that would cause swarm to wait forever in very
                     rare cases when multiple threads were used.

              v2.2.1 released October 27, 2017
                     Version 2.2.1 fixes a memory  allocation  bug  for  d  =  1  and  duplicated
                     sequences.

              v2.2.0 released October 17, 2017
                     Version  2.2.0  fixes  several  problems  and  improves usability. Corrected
                     output to structure and uclust files when using fastidious  mode.  Corrected
                     abundance  output  in  some  cases. Added check for duplicated sequences and
                     fixed check for duplicated sequence IDs. Checks for empty  sequences.  Sorts
                     sequences  by additional fields to improve stability. Improves compatibility
                     with compilers and operating systems.   Outputs  sequences  in  upper  case.
                     Allows  64-bit  abundances. Shows message when waiting for input from stdin.
                     Improves error messages and warnings.  Improves  checking  of  command  line
                     options.   Fixes   remaining   errors   reported   by  test  suite.  Updates
                     documentation.

              v2.1.13 released March 8, 2017
                     Version 2.1.13 removes a bug with the progress bar when writing seeds.

              v2.1.12 released January 16, 2017
                     Version 2.1.12 removes a debugging message.

              v2.1.11 released January 16, 2017
                     Version 2.1.11  fixes  two  bugs  related  to  the  SIMD  implementation  of
                     alignment  that  might  result  in incorrect alignments and scores.  The bug
                     only applies when d > 1.

              v2.1.10 released December 22, 2016
                     Version 2.1.10 fixes two bugs related to gap penalties of  alignments.   The
                     first bug may lead to wrong aligments and similarity percentages reported in
                     UCLUST (.uc) files. The second bug makes swarm use  a  slightly  higher  gap
                     extension  penalty  than  specified.  The default gap extension penalty used
                     have actually been 4.5 instead of 4.

              v2.1.9 released July 6, 2016
                     Version 2.1.9 fixes errors when compiling with GCC version 6.

              v2.1.8 released March 11, 2016
                     Version 2.1.8 fixes a rare bug triggered  when  clustering  extremely  short
                     undereplicated  sequences. Also, alignment parameters are not shown when d =
                     1.

              v2.1.7 released February 24, 2016
                     Version 2.1.7 fixes a bug in the output of seeds with the -w option when d >
                     1  that  was  not  properly  fixed  in  version 2.1.6. It also handles ascii
                     character #13 (CR) in FASTA files better. Swarm will now exit with status  0
                     if  the  -h  or  the  -v  option  is specified. The help text and some error
                     messages have been improved.

              v2.1.6 released December 14, 2015
                     Version 2.1.6 fixes problems with older  compilers  that  do  not  have  the
                     x86intrin.h header file. It also fixes a bug in the output of seeds with the
                     -w option when d > 1.

              v2.1.5 released September 8, 2015
                     Version 2.1.5 fixes minor bugs.

              v2.1.4 released September 4, 2015
                     Version 2.1.4 fixes minor bugs in the swarm algorithm used for d = 1.

              v2.1.3 released August 28, 2015
                     Version 2.1.3 adds checks of numeric option arguments.

              v2.1.1 released March 31, 2015
                     Version 2.1.1 fixes a bug with the  fastidious  option  that  caused  it  to
                     ignore some connections between large and small OTUs.

              v2.1.0 released March 24, 2015
                     Version 2.1.0 marks the first official release of swarm v2.

              v2.0.7 released March 18, 2015
                     Version  2.0.7  writes  abundance  information  in  usearch style when using
                     options -w (--seeds) in combination with -z (--usearch-abundance).

              v2.0.6 released March 13, 2015
                     Version 2.0.6 fixes a minor bug.

              v2.0.5 released March 13, 2015
                     Version 2.0.5 improves the implementation of the fastidious option and  adds
                     options  to  control  memory  usage  of  the  Bloom  filter (-y and -c).  In
                     addition, an option (-w) allows to output OTU representatives sequences with
                     updated  abundances  (sum  of  all abundances inside each OTU). This version
                     also enables swarm to run with d = 0.

              v2.0.4 released March 6, 2015
                     Version 2.0.4 includes a fully parallelised implementation of the fastidious
                     option.

              v2.0.3 released March 4, 2015
                     Version  2.0.3  includes  a working implementation of the fastidious option,
                     but only the initial clustering is parallelized.

              v2.0.2 released February 26, 2015
                     Version 2.0.2 fixes SSSE3 problems.

              v2.0.1 released February 26, 2015
                     Version  2.0.1  is  a  development   version   that   contains   a   partial
                     implementation of the fastidious option, but it is not usable yet.

              v2.0.0 released December 3, 2014
                     Version  2.0.0  is  faster  and  easier to use, providing new output options
                     (--internal-structure  and  --log),   new   control   options   (--boundary,
                     --fastidious,  --no-otu-breaking),  and  built-in OTU refinement (no need to
                     use the python script anymore). When using default parameters, a  novel  and
                     considerably  faster  algorithmic  approach  is  used,  guaranteeing swarm's
                     scalability.

              v1.2.21 released February 26, 2015
                     Version 1.2.21 is supposed to fix some problems related to the  use  of  the
                     SSSE3 CPU instructions which are not always available.

              v1.2.20 released November 6, 2014
                     Version  1.2.20  presents  a  production-ready  version  of  the alternative
                     algorithm (option -a), with optional built-in OTU breaking (option -n). That
                     alternative  algorithmic  approach  (usable only with d = 1) is considerably
                     faster than currently used clustering algorithms, and can deal with datasets
                     of  100  million unique amplicons or more in a few hours. Of course, results
                     are rigourously identical to the results  previously  produced  with  swarm.
                     That release also introduces new options to control swarm output (options -i
                     and -l).

              v1.2.19 released October 3, 2014
                     Version 1.2.19 fixes a problem related to  abundance  information  when  the
                     sequence label includes multiple underscore characters.

              v1.2.18 released September 29, 2014
                     Version  1.2.18 reenables the possibility of reading sequences from stdin if
                     no file name is specified on the command line. It also fixes a  bug  related
                     to CPU features detection.

              v1.2.17 released September 28, 2014
                     Version 1.2.17 fixes a memory allocation bug introduced in version 1.2.15.

              v1.2.16 released September 27, 2014
                     Version  1.2.16  fixes  a  bug  in  the abundance sort introduced in version
                     1.2.15.

              v1.2.15 released September 27, 2014
                     Version 1.2.15 sorts the input sequences in order  of  decreasing  abundance
                     unless  they  are  detected to be sorted already. When using the alternative
                     algorithm for d = 1 it also  sorts  all  subseeds  in  order  of  decreasing
                     abundance.

              v1.2.14 released September 27, 2014
                     Version  1.2.14  fixes  a  bug in the output with the --swarm_breaker option
                     (-b) when using the alternative algorithm (-a).

              v1.2.12 released August 18, 2014
                     Version 1.2.12  introduces  an  option  --alternative-algorithm  to  use  an
                     extremely  fast,  experimental clustering algorithm for the special case d =
                     1. Multithreading scalability of the default algorithm has  been  noticeably
                     improved.

              v1.2.10 released August 8, 2014
                     Version  1.2.10 allows amplicon abundances to be specified using the usearch
                     style in the sequence header (e.g.  '>id;size=1')  when  the  -z  option  is
                     chosen.

              v1.2.8 released August 5, 2014
                     Version  1.2.8  fixes  an  error  with  the  gap extension penalty. Previous
                     versions used a gap penalty twice as large as intended. That bug  correction
                     induces small changes in clustering results.

              v1.2.6 released May 23, 2014
                     Version  1.2.6 introduces an option --mothur to output clustering results in
                     a format compatible with the microbial ecology community  analysis  software
                     suite Mothur ( ⟨https://www.mothur.org/⟩).

              v1.2.5 released April 11, 2014
                     Version  1.2.5  removes  the  need  for  a POPCNT hardware instruction to be
                     present. swarm now automatically checks whether POPCNT is available and uses
                     a   slightly   slower  software  implementation  if  not.  Only  basic  SSE2
                     instructions are now required to run swarm.

              v1.2.4 released January 30, 2014
                     Version 1.2.4 introduces an option --break-swarms to  output  all  pairs  of
                     amplicons  with  d differences to standard error. That option is used by the
                     companion script `swarm_breaker.py` to refine swarm results. The  syntax  of
                     the inline assembly code is changed for compatibility with more compilers.

              v1.2 released May 16, 2013
                     Version  1.2  greatly  improves speed by using alignment-free comparisons of
                     amplicons based on k-mer word content.  For  each  amplicon,  the  presence-
                     absence  of  all  possible  5-mers  is  computed and recorded in a 1024-bits
                     vector. Vector comparisons are extremely fast  and  drastically  reduce  the
                     number  of  costly  pairwise  alignments performed by swarm. While remaining
                     exact, swarm 1.2 can be more than 100-times  faster  than  swarm  1.1,  when
                     using  a  single  thread  with  a  large set of sequences. The minor version
                     1.1.1, published just before, adds compatibility with Apple  computers,  and
                     corrects  an  issue in the pairwise global alignment step that could lead to
                     sub-optimal alignments.

              v1.1 released February 26, 2013
                     Version 1.1 introduces two new important options: the possibility to  output
                     clustering  results  using  the uclust output format, and the possibility to
                     output detailed statistics on each  OTU.  swarm  1.1  is  also  faster:  new
                     filterings  based  on  pairwise  amplicon  sequence  lengths and composition
                     comparisons reduce the number of pairwise alignments needed and speed up the
                     clustering.

              v1.0 released November 10, 2012
                     First public release.