Provided by: swarm_3.1.5+dfsg-2_amd64 bug

NAME

       swarm — find clusters of nearly-identical nucleotide amplicons

SYNOPSIS

       swarm -h|v

       High-precision clustering:

       swarm [filename]

       swarm [-d 1] [-nrz] [-a int] [-i filename] [-j filename] [-l filename] [-o filename]
             [-s filename] [-t int] [-u filename] [-w filename] [filename]

       swarm [-d 1] -f [-nrz] [-a int] [-b int] [-c|y int] [-i filename] [-j filename]
             [-l filename] [-o filename] [-s filename] [-t int] [-u filename] [-w filename]
             [filename]

       Conservative clustering:

       swarm -d 2+ [-nrxz] [-a int] [-e int] [-g int] [-i filename] [-l filename] [-m int]
             [-o filename] [-p int] [-s filename] [-t int] [-u filename] [-w filename] [filename]

       Dereplication (merge strictly identical sequences):

       swarm -d 0 [-rz] [-a int] [-i filename] [-l filename] [-o filename] [-s filename]
             [-u filename] [-w filename] [filename]

DESCRIPTION

       Environmental or clinical molecular studies generate large volumes of amplicons (e.g., 16S
       or  18S  SSU-rRNA sequences) that need to be grouped into clusters. Traditional clustering
       methods are based on greedy, input-order dependent algorithms, with arbitrary selection of
       cluster  centroids  and cluster limits (often 97%-similarity). To address that problem, we
       developed swarm, a fast and robust method that recursively groups amplicons with d or less
       differences  (i.e.  substitutions,  insertions  or  deletions). swarm produces natural and
       stable clusters centered on  local  peaks  of  abundance,  mostly  free  from  input-order
       dependency induced by centroid selection.

       Exact  clustering is impractical on large data sets when using a naïve all-vs-all approach
       (more precisely a 2-combination without repetitions), as it implies unrealistic numbers of
       pairwise  comparisons.  swarm  is  based  on a maximum number of differences d between two
       amplicons, and focuses only on very close local relationships. For  d  =  1,  the  default
       value,  swarm  uses  an  algorithm of linear complexity that generates all possible single
       mutations and performs exact-string matching by  comparing  hash-values.  For  d  =  2  or
       greater,  swarm  uses  an  algorithm of quadratic complexity that performs pairwise string
       comparisons. An efficient k-mer-based filtering and an astute use of  comparisons  results
       obtained  during  the  clustering  process  allows  swarm  to  avoid  most of the amplicon
       comparisons needed in a naïve approach. To speed up the  remaining  amplicon  comparisons,
       swarm  implements an extremely fast Needleman-Wunsch algorithm making use of the Streaming
       SIMD Extensions (SSE2) of x86-64 CPUs, NEON instructions of  ARM64  CPUs,  or  Altivec/VMX
       instructions  of  POWER8 CPUs. If SSE2 instructions are not available, swarm exits with an
       error message.

       swarm can read nucleotide amplicons in fasta  format  from  a  normal  file  or  from  the
       standard  input  (using  a  pipe  or a redirection). The amplicon header is defined as the
       string comprised between the '>' symbol and the first  space  or  the  end  of  the  line,
       whichever  comes first. Each header must end with an abundance annotation representing the
       amplicon copy number and defined as '_' followed by a positive integer. See option -z  for
       input  data using usearch/vsearch's abundance annotation format (';size=integer[;]'). Once
       stripped from the abundance annotation, the remaining part  of  the  header  is  call  the
       label. In summary, using regular expression patterns:

                         >header[[:blank:]]   and   header = label_[1-9][0-9]*$

       Abundance  annotations play a crucial role in the clustering process, and swarm exits with
       an error message if that information is not available. As swarm outputs lists of  amplicon
       labels,  amplicon  labels must be unique to avoid any ambiguity; swarm exits with an error
       message if labels are not unique. The amplicon sequence is defined as a string  of  [ACGT]
       or  [ACGU] symbols (case insensitive, 'U' is replaced with 'T' internally), starting after
       the end of the header line and ending before the next header line or the file  end;  swarm
       silently  removes  newline  symbols  ('\n' or '\r') and exits with an error message if any
       other symbol is present. Accepted sequence lengths range from 1 nucleotide to  67  million
       nucleotides. Please note that processing 67-Mb sequences requires at least 32 gigabytes of
       memory. Lastly, if sequences are not all unique,  i.e.  were  not  properly  dereplicated,
       swarm will exit with an error message.

       Clusters  are  written  to  output  files (specified with -i, -o, -s and -u) by decreasing
       abundance of their seed sequences, and then by alphabetical order of seed sequence labels.
       An  exception  to  that  is the -w (--seeds) output, which is sorted by decreasing cluster
       abundance (sum of abundances of all sequences in the cluster), and  then  by  alphabetical
       order of seed sequence labels. This is particularly useful for post-clustering steps, such
       as de novo chimera detection, that require clusters to be sorted by decreasing abundances.

   General options
       -h, --help
                display this help and exit successfully.

       -t, --threads positive integer
                number of computation threads to use. Values between 1 and 512 are accepted,  but
                we  recommend  to  use  a  number  of  threads  lesser  or equal to the number of
                available CPU cores. Default number of threads is 1.

       -v, --version
                output version information and exit successfully.

       --       delimit the option list. Later arguments, if any, are treated as operands even if
                they  begin  with  '-'.  For  example, 'swarm -- -file.fasta' reads from the file
                '-file.fasta'.

   Clustering options
       -d, --differences zero or positive integer
                maximum number of differences allowed between two  amplicons,  meaning  that  two
                amplicons  will  be  grouped  if they have integer (or less) differences. This is
                swarm's most important parameter. The number of differences is calculated as  the
                number  of  mismatches  (substitutions,  insertions or deletions) between the two
                amplicons once  the  optimal  pairwise  global  alignment  has  been  found  (see
                'pairwise alignment advanced options' to influence that step).

                Any  integer  from  0  to  255  can  be used, but high d values will decrease the
                taxonomical resolution of swarm results. Commonly used d values are 1,  2  or  3,
                rarely  higher.  When  using  d = 0, swarm will output results corresponding to a
                strict dereplication of the dataset, i.e. merging identical  amplicons.  Warning,
                whatever  the  d value, swarm requires fasta entries to present abundance values.
                Default number of differences d is 1.

       -n, --no-otu-breaking
                when working with  d  =  1,  deactivate  the  built-in  cluster  refinement  (not
                recommended).  Amplicon  abundance  values are used to identify transitions among
                in-contact clusters and to separate them, yielding  higher-resolution  clustering
                results.  That  option  prevents  that  separation,  and  in practice, allows the
                creation of a link between amplicons A and B, even  if  the  abundance  of  B  is
                higher than the abundance of A.

   Fastidious options
       -b, --boundary positive integer
                when  using  the  option  --fastidious (-f), define the minimum abundance of what
                should be considered a  large  cluster.  By  default,  a  cluster  with  a  total
                abundance  of 3 or more is considered large. Conversely, a cluster is small if it
                has a total abundance of 2 or less, meaning that it is  composed  of  either  one
                amplicon  of  abundance  2,  or  two amplicons of abundance 1, or one amplicon of
                abundance 1. Any positive value greater than 1 can  be  specified.  Using  higher
                boundary  values  can  reduce  the  number  of clusters (up to a point), and will
                reduce the taxonomical  resolution  of  swarm  results.  It  will  also  slightly
                increase computation time.

       -c, --ceiling positive integer
                when  using the option --fastidious (-f), define swarm's maximum memory footprint
                (in megabytes). swarm will adjust the --bloom-bits (-y) value of the Bloom filter
                to  fit  within  the specified amount of memory. Values accepted range from 40 to
                1,073,741,824 megabytes. See the --bloom-bits (-y) option for an alternative  way
                to control the memory footprint.

       -f, --fastidious
                when working with d = 1, perform a second clustering pass to reduce the number of
                small clusters  (recommended  option).  During  the  first  clustering  pass,  an
                intermediate  amplicon can be missing for purely stochastic reasons, interrupting
                the aggregation process. The fastidious option  will  create  virtual  amplicons,
                allowing  to  graft  small  clusters  upon  larger ones. By default, a cluster is
                considered large if it has a total abundance of 3 or  more  (see  the  --boundary
                option to modify that value).

                To  speed  things  up,  swarm  uses a Bloom filter to store intermediate results.
                Warning, the second clustering pass can be 2 to 3 times  slower  than  the  first
                pass  and  requires  much  more  memory  to  store the virtual amplicons in Bloom
                filters. See the options --bloom-bits (-y)  or  --ceiling  (-c)  to  control  the
                memory footprint of the Bloom filter.

                The  fastidious  option modifies clustering results: the output files produced by
                the options --log (-l), --output-file (-o),  --mothur  (-r),  --uclust-file,  and
                --seeds   (-w)   are   updated   to   reflect   these   modifications;  the  file
                --statistics-file (-s) is partially updated (columns 6 and 7  are  not  updated);
                the  output  file --internal-structure (-i) is partially updated (column 5 is not
                updated for amplicons that belonged to the small cluster).

       -y, --bloom-bits positive integer
                when using the option --fastidious (-f), define the size (in bits) of each  entry
                in  the  Bloom  filter. That option allows to balance the efficiency (i.e. speed)
                and the memory footprint of the Bloom filter. Large values will  make  the  Bloom
                filter  more  efficient  but will require more memory. Any value between 2 and 64
                can be used.  Default  value  is  16.  See  the  --ceiling  (-c)  option  for  an
                alternative way to control the memory footprint.

   Input/output options
       -a, --append-abundance positive integer
                set  abundance  value  to  use  when some or all amplicons in the input file lack
                abundance values (_integer, or ;size=integer; when using -z). Warning, it is  not
                recommended to use swarm on datasets where abundance values are all identical. We
                provide that option as a courtesy to advanced users,  please  use  it  carefully.
                swarm  exits  with  an  error message if abundance values are missing and if this
                option is not used.

       -i, --internal-structure filename
                output all pairs of nearly-identical amplicons to filename  using  a  five-column
                tab-delimited format:

                       1.  amplicon A label (header without abundance annotations).

                       2.  amplicon B label (header without abundance annotations).

                       3.  number of differences between amplicons A and B (positive integer).

                       4.  cluster  number  (positive  integer).  Clusters  are numbered in their
                           order  of  delineation,  starting  from  1.  All  pairs  of  amplicons
                           belonging to the same cluster will receive the same number.

                       5.  cummulated  number  of  steps  from  the  cluster  seed  to amplicon B
                           (positive integer). When  using  the  option  --fastidious  (-f),  the
                           actual  number of steps between grafted amplicons and the cluster seed
                           cannot be re-computed efficiently and is  always  set  to  2  for  the
                           amplicon  pair  linking  the  small  cluster  to  the  large  cluster.
                           Cummulated number of steps in the small  cluster  (if  any)  are  left
                           unchanged.

       -j, --network-file filename
                (advanced  users)  when working with d = 1, dump raw amplicon network to filename
                using a two-column tab-delimited table of  headers  with  abundance  annotations.
                Each  line  represents  a connection between two similar amplicons, from the most
                abundant to the lesser abundant. When amplicons have the  same  abundance  value,
                connections  are  bi-directional and are represented on two lines: A to B, then B
                to A.

                In order to delineate clusters  and  to  compute  the  equivalent  of  a  minimal
                spanning  tree  for  each  cluster (see option --internal-structure), swarm first
                builds a network of similar amplicons. This option  is  for  advanced  users  who
                would like to explore this raw network.

       -l, --log filename
                output  all messages to filename instead of standard error, with the exception of
                error messages of course. That option is useful in situations  where  writing  to
                standard error is problematic (for example, with certain job schedulers).

       -o, --output-file filename
                output clustering results to filename. Results consist of a list of clusters, one
                cluster per line. A cluster is a list of amplicon headers  separated  by  spaces.
                That  output  format  can  be modified by the option --mothur (-r). Default is to
                write to standard output.

       -r, --mothur
                output clustering results  in  a  format  compatible  with  Mothur.  That  option
                modifies swarm's default output format.

       -s, --statistics-file filename
                output statistics to filename. The file is a tab-separated table with one cluster
                per row and seven columns of information:

                       1.  number of unique amplicons in the cluster,

                       2.  total abundance of amplicons in the cluster,

                       3.  label of the initial seed (header without abundance annotations),

                       4.  abundance of the initial seed,

                       5.  number of amplicons with an abundance of 1 in the cluster,

                       6.  maximum number of iterations before the cluster  reached  its  natural
                           limit,

                       7.  cummulated  number  of  steps  along the path joining the seed and the
                           furthermost amplicon in the  cluster.  Please  note  that  the  actual
                           number of differences between the seed and the furthermost amplicon is
                           usually much smaller. When using the option --fastidious (-f), grafted
                           amplicons are not taken into account.

       -u, --uclust-file filename
                output  clustering  results  in filename using a tab-separated uclust-like format
                with 10 columns and 3 different type of entries (S, H or C). That option does not
                modify  swarm's  default output format. Each fasta sequence in the input file can
                be either a cluster centroid (S) or a hit (H)  assigned  to  a  cluster.  Cluster
                records  (C)  summarize  information  for  each cluster (number of hits, centroid
                header). Column content varies with the type of entry (S, H or C):

                       1.  Record type: S, H, or C.

                       2.  Cluster number (zero-based).

                       3.  Centroid length (S), query length (H), or number of hits (C).

                       4.  Percentage of similarity with the centroid sequence (H), or set to '*'
                           (S, C).

                       5.  Match orientation + or - (H), or set to '*' (S, C).

                       6.  Not used, always set to '*' (S, C) or to zero (H).

                       7.  Not used, always set to '*' (S, C) or to zero (H).

                       8.  set  to  '*'  (S, C) or, for H, compact representation of the pairwise
                           alignment  using  the  CIGAR  format  (Compact  Idiosyncratic   Gapped
                           Alignment  Report):  M  (match),  D  (deletion) and I (insertion). The
                           equal sign '=' indicates that the query is identical to  the  centroid
                           sequence.

                       9.  Header of the query sequence (H), or of the centroid sequence (S, C).

                       10. Header of the centroid sequence (H), or set to '*' (S, C).

       -w, --seeds filename
                output  cluster  representative  sequences  to  filename  in  fasta  format.  The
                abundance value of each cluster representative is the sum of  the  abundances  of
                all  the  amplicons  in  the  cluster.  Fasta  headers  are  formated as follows:
                '>label_integer', or  '>label;size=integer;'  if  the  -z  option  is  used,  and
                sequences  are uppercased. Sequences are sorted by decreasing abundance, and then
                by alphabetical order of sequence labels.

       -z, --usearch-abundance
                accept    amplicon    abundance     values     in     usearch/vsearch's     style
                (>label;size=integer[;]).  That  option influences the abundance annotation style
                used in swarm's standard output (-o), as well as the output of options -r, -u and
                -w.

   Pairwise alignment advanced options
       when  using  d  > 1, swarm recognizes advanced command-line options modifying the pairwise
       global alignment scoring parameters:

              -m, --match-reward positive integer
                       Default reward for a nucleotide match is 5.

              -p, --mismatch-penalty positive integer
                       Default penalty for a nucleotide mismatch is 4.

              -g, --gap-opening-penalty positive integer
                       Default gap opening penalty is 12.

              -e, --gap-extension-penalty positive integer
                       Default gap extension penalty is 4.

              -x, --disable-sse3
                       On the x86-64 CPU architecture, disable SSE3 and later instructions.  This
                       option is meant for developers, not for regular users.

       As  swarm  focuses  on  close  relationships  (e.g.,  d  = 2 or 3), clustering results are
       resilient to pairwise alignment model parameters modifications. When  clustering  using  a
       higher d value, modifying model parameters has a stronger impact.

EXAMPLES

       Clusterize  the  compressed  data set myfile.fasta using the finest resolution possible (1
       difference by default, built-in breaking, fastidious option) using 4 computation  threads.
       Clusters are written to the file myfile.swarms, and cluster representatives are written to
       myfile.representatives.fasta:
              zcat myfile.fasta.gz | \
                  swarm \
                      -t 4 \
                      -f \
                      -w myfile.representatives.fasta \
                      -o myfile.swarms

AUTHORS

       Concept by Frédéric Mahé, implementation by Torbjørn Rognes.

CITATION

       Mahé F, Rognes T, Quince C, de Vargas  C,  Dunthorn  M.  (2014)  Swarm:  robust  and  fast
       clustering       method      for      amplicon-based      studies.       PeerJ      2:e593
       ⟨https://doi.org/10.7717/peerj.593⟩.

       Mahé F, Rognes T, Quince C, de Vargas C, Dunthorn M. (2015) Swarm v2: highly-scalable  and
       high-resolution amplicon clustering.  PeerJ 3:e1420 ⟨https://doi.org/10.7717/peerj.1420⟩.

       Mahé  F,  Czech L, Stamatakis A, Quince C, de Vargas C, Dunthorn M, Rognes T. (2021) Swarm
       v3:       towards       tera-scale       amplicon       clustering.         Bioinformatics
       ⟨https://doi.org/10.1093/bioinformatics/btab493⟩.

REPORTING BUGS

       Submit  suggestions  and bug-reports at ⟨https://github.com/torognes/swarm/issues⟩, send a
       pull request  at  ⟨https://github.com/torognes/swarm/pulls⟩,  or  compose  a  friendly  or
       curmudgeonly   e-mail  to  Frédéric  Mahé  ⟨frederic.mahe@cirad.fr⟩  and  Torbjørn  Rognes
       ⟨torognes@ifi.uio.no⟩.

AVAILABILITY

       Source code and binaries available at ⟨https://github.com/torognes/swarm⟩.

COPYRIGHT

       Copyright (C) 2012-2024 Frédéric Mahé & Torbjørn Rognes

       This program is free software: you can redistribute it and/or modify it under the terms of
       the GNU Affero General Public License as published by the Free Software Foundation, either
       version 3 of the License, or any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;
       without  even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       See the GNU Affero General Public License for more details.

       You should have received a copy of the GNU Affero General Public License along  with  this
       program.  If not, see ⟨https://www.gnu.org/licenses/⟩.

SEE ALSO

       swipe, an extremely fast Smith-Waterman database search tool by Torbjørn Rognes (available
       at ⟨https://github.com/torognes/swipe⟩).

       vsearch, an open-source re-implementation of the  classic  uclust  clustering  method  (by
       Robert  C.  Edgar),  along  with  other amplicon filtering and searching tools. vsearch is
       implemented by Torbjørn Rognes and documented  by  Frédéric  Mahé,  and  is  available  at
       ⟨https://github.com/torognes/vsearch⟩.

VERSION HISTORY

       New  features  and important modifications of swarm (short lived or minor bug releases are
       not mentioned):

              v3.1.5 released March 31, 2024
                     Version 3.1.5 changes the minimal  value  for  the  ceiling  option  from  8
                     megabytes  to  40  megabytes,  and  fixes four minor bugs. Warning, peak RSS
                     memory  increased  by  5  to  10%  when  d  >=  2.  Version  3.1.5  improves
                     documentation  (now  covering  option --network_file), adds more compilation
                     checks and eliminates 50 compilation warnings with GCC 13, GCC 14 and  clang
                     19, as well as 1,677 static analysis warnings.

              v3.1.4 released September 20, 2023
                     Version 3.1.4 fixes a minor bug. It eliminates compilation warnings with GCC
                     13 and clang 18, as well as 1,040  static  analysis  warnings.  The  maximal
                     number of threads swarm can run is now 512, instead of 256. Compilation with
                     runtime checks (`-DNDEBUG`) is now the default. When d > 1,  overall  memory
                     allocations remain unchanged, but peak RSS memory increased by 6 to 10%, due
                     to a change in the timing  of  memory  deallocations.  Peak  RSS  memory  is
                     expected to regress to its prior levels as refactoring continues.

              v3.1.3 released December 5, 2022
                     Version  3.1.3  fixes a regression introduced in version 3.1.1 (memory over-
                     allocation when d >  1).  It  also  fixes  a  minor  off-by-one  error  when
                     allocating  memory  for a Bloom filter, compilation warnings with GCC 12 and
                     clang 13, as well as static analysis warnings. Documentation  was  improved,
                     as well as our test suite (swarm-tests).

              v3.1.2 released November 10, 2022
                     Fix a bug with fastidious mode introduced in version 3.1.1, that could cause
                     Swarm to crash. Probably due to allocating too much memory.

              v3.1.1 released September 29, 2022
                     Version 3.1.1 eliminates a risk of segmentation fault  with  extremely  long
                     sequence  headers.  Documentation and error messages have been improved, and
                     code cleaning continued.

              v3.1.0 released March 1, 2021
                     Version 3.1.0 includes a fix for a bug in the  16-bit  SIMD  alignment  code
                     that  was  exposed  with a combination of d>1, long sequences, and very high
                     gap penalties. The code has also been been cleaned up, tested  and  improved
                     substantially,  and  it  is  now fully C++11 compliant. Support for macOS on
                     Apple Silicon (ARM64) has been added.

              v3.0.0 released October 24, 2019
                     Version 3.0.0 introduces a faster algorithm for d = 1, and a reduced  memory
                     footprint.  Swarm  has  been ported to Windows x86-64, GNU/Linux ARM 64, and
                     GNU/Linux  POWER8.  Internal  code  has  been  modernized,   hardened,   and
                     thoroughly tested. Strict dereplication of input sequences is now mandatory.
                     The --seeds option (-w) now outputs results sorted by decreasing  abundance,
                     and then by alphabetical order of sequence labels.

              v2.2.2 released December 12, 2017
                     Version  2.2.2  fixes  a  bug that would cause swarm to wait forever in very
                     rare cases when multiple threads were used.

              v2.2.1 released October 27, 2017
                     Version 2.2.1 fixes a memory  allocation  bug  for  d  =  1  and  duplicated
                     sequences.

              v2.2.0 released October 17, 2017
                     Version  2.2.0  fixes  several  problems  and  improves usability. Corrected
                     output to structure and uclust files when using fastidious  mode.  Corrected
                     abundance  output  in  some  cases. Added check for duplicated sequences and
                     fixed check for duplicated sequence IDs. Checks for empty  sequences.  Sorts
                     sequences  by additional fields to improve stability. Improves compatibility
                     with compilers and operating systems.   Outputs  sequences  in  upper  case.
                     Allows  64-bit  abundances. Shows message when waiting for input from stdin.
                     Improves error messages and warnings.  Improves  checking  of  command  line
                     options.   Fixes   remaining   errors   reported   by  test  suite.  Updates
                     documentation.

              v2.1.13 released March 8, 2017
                     Version 2.1.13 removes a bug with the progress bar when writing seeds.

              v2.1.12 released January 16, 2017
                     Version 2.1.12 removes a debugging message.

              v2.1.11 released January 16, 2017
                     Version 2.1.11  fixes  two  bugs  related  to  the  SIMD  implementation  of
                     alignment  that  might  result  in incorrect alignments and scores.  The bug
                     only applies when d > 1.

              v2.1.10 released December 22, 2016
                     Version 2.1.10 fixes two bugs related to gap penalties of  alignments.   The
                     first bug may lead to wrong aligments and similarity percentages reported in
                     UCLUST (.uc) files. The second bug makes swarm use  a  slightly  higher  gap
                     extension  penalty  than  specified.  The default gap extension penalty used
                     have actually been 4.5 instead of 4.

              v2.1.9 released July 6, 2016
                     Version 2.1.9 fixes errors when compiling with GCC version 6.

              v2.1.8 released March 11, 2016
                     Version 2.1.8 fixes a rare bug triggered  when  clustering  extremely  short
                     undereplicated  sequences. Also, alignment parameters are not shown when d =
                     1.

              v2.1.7 released February 24, 2016
                     Version 2.1.7 fixes a bug in the output of seeds with the -w option when d >
                     1  that  was  not  properly  fixed  in  version 2.1.6. It also handles ascii
                     character #13 (CR) in FASTA files better. Swarm will now exit with status  0
                     if  the  -h  or  the  -v  option  is specified. The help text and some error
                     messages have been improved.

              v2.1.6 released December 14, 2015
                     Version 2.1.6 fixes problems with older  compilers  that  do  not  have  the
                     x86intrin.h header file. It also fixes a bug in the output of seeds with the
                     -w option when d > 1.

              v2.1.5 released September 8, 2015
                     Version 2.1.5 fixes minor bugs.

              v2.1.4 released September 4, 2015
                     Version 2.1.4 fixes minor bugs in the swarm algorithm used for d = 1.

              v2.1.3 released August 28, 2015
                     Version 2.1.3 adds checks of numeric option arguments.

              v2.1.1 released March 31, 2015
                     Version 2.1.1 fixes a bug with the  fastidious  option  that  caused  it  to
                     ignore some connections between large and small clusters.

              v2.1.0 released March 24, 2015
                     Version 2.1.0 marks the first official release of swarm v2.

              v2.0.7 released March 18, 2015
                     Version  2.0.7  writes  abundance  information  in  usearch style when using
                     options -w (--seeds) in combination with -z (--usearch-abundance).

              v2.0.6 released March 13, 2015
                     Version 2.0.6 fixes a minor bug.

              v2.0.5 released March 13, 2015
                     Version 2.0.5 improves the implementation of the fastidious option and  adds
                     options  to  control  memory  usage  of  the  Bloom  filter (-y and -c).  In
                     addition, an option (-w) allows to output cluster representatives  sequences
                     with  updated  abundances  (sum of all abundances inside each cluster). This
                     version also enables swarm to run with d = 0.

              v2.0.4 released March 6, 2015
                     Version 2.0.4 includes a fully parallelised implementation of the fastidious
                     option.

              v2.0.3 released March 4, 2015
                     Version  2.0.3  includes  a working implementation of the fastidious option,
                     but only the initial clustering is parallelized.

              v2.0.2 released February 26, 2015
                     Version 2.0.2 fixes SSSE3 problems.

              v2.0.1 released February 26, 2015
                     Version  2.0.1  is  a  development   version   that   contains   a   partial
                     implementation of the fastidious option, but it is not usable yet.

              v2.0.0 released December 3, 2014
                     Version  2.0.0  is  faster  and  easier to use, providing new output options
                     (--internal-structure  and  --log),   new   control   options   (--boundary,
                     --fastidious,  --no-otu-breaking),  and built-in cluster refinement (no need
                     to use the python script anymore). When using default  parameters,  a  novel
                     and  considerably  faster algorithmic approach is used, guaranteeing swarm's
                     scalability.

              v1.2.21 released February 26, 2015
                     Version 1.2.21 is supposed to fix some problems related to the  use  of  the
                     SSSE3 CPU instructions which are not always available.

              v1.2.20 released November 6, 2014
                     Version  1.2.20  presents  a  production-ready  version  of  the alternative
                     algorithm (option -a), with optional built-in cluster breaking (option  -n).
                     That   alternative  algorithmic  approach  (usable  only  with  d  =  1)  is
                     considerably faster than currently used clustering algorithms, and can  deal
                     with  datasets  of  100  million unique amplicons or more in a few hours. Of
                     course, results are rigourously identical to the results previously produced
                     with swarm. That release also introduces new options to control swarm output
                     (options -i and -l).

              v1.2.19 released October 3, 2014
                     Version 1.2.19 fixes a problem related to  abundance  information  when  the
                     sequence label includes multiple underscore characters.

              v1.2.18 released September 29, 2014
                     Version  1.2.18 reenables the possibility of reading sequences from stdin if
                     no file name is specified on the command line. It also fixes a  bug  related
                     to CPU features detection.

              v1.2.17 released September 28, 2014
                     Version 1.2.17 fixes a memory allocation bug introduced in version 1.2.15.

              v1.2.16 released September 27, 2014
                     Version  1.2.16  fixes  a  bug  in  the abundance sort introduced in version
                     1.2.15.

              v1.2.15 released September 27, 2014
                     Version 1.2.15 sorts the input sequences in order  of  decreasing  abundance
                     unless  they  are  detected to be sorted already. When using the alternative
                     algorithm for d = 1 it also  sorts  all  subseeds  in  order  of  decreasing
                     abundance.

              v1.2.14 released September 27, 2014
                     Version  1.2.14  fixes  a  bug in the output with the --swarm_breaker option
                     (-b) when using the alternative algorithm (-a).

              v1.2.12 released August 18, 2014
                     Version 1.2.12  introduces  an  option  --alternative-algorithm  to  use  an
                     extremely  fast,  experimental clustering algorithm for the special case d =
                     1. Multithreading scalability of the default algorithm has  been  noticeably
                     improved.

              v1.2.10 released August 8, 2014
                     Version  1.2.10 allows amplicon abundances to be specified using the usearch
                     style in the sequence header (e.g.  '>id;size=1')  when  the  -z  option  is
                     chosen.

              v1.2.8 released August 5, 2014
                     Version  1.2.8  fixes  an  error  with  the  gap extension penalty. Previous
                     versions used a gap penalty twice as large as intended. That bug  correction
                     induces small changes in clustering results.

              v1.2.6 released May 23, 2014
                     Version  1.2.6 introduces an option --mothur to output clustering results in
                     a format compatible with the microbial ecology community  analysis  software
                     suite Mothur ( ⟨https://www.mothur.org/⟩).

              v1.2.5 released April 11, 2014
                     Version  1.2.5  removes  the  need  for  a POPCNT hardware instruction to be
                     present. swarm now automatically checks whether POPCNT is available and uses
                     a   slightly   slower  software  implementation  if  not.  Only  basic  SSE2
                     instructions are now required to run swarm.

              v1.2.4 released January 30, 2014
                     Version 1.2.4 introduces an option --break-swarms to  output  all  pairs  of
                     amplicons  with  d differences to standard error. That option is used by the
                     companion script `swarm_breaker.py` to refine swarm results. The  syntax  of
                     the inline assembly code is changed for compatibility with more compilers.

              v1.2 released May 16, 2013
                     Version  1.2  greatly  improves speed by using alignment-free comparisons of
                     amplicons based on k-mer word content.  For  each  amplicon,  the  presence-
                     absence  of  all  possible  5-mers  is  computed and recorded in a 1024-bits
                     vector. Vector comparisons are extremely fast  and  drastically  reduce  the
                     number  of  costly  pairwise  alignments performed by swarm. While remaining
                     exact, swarm 1.2 can be more than 100-times  faster  than  swarm  1.1,  when
                     using  a  single  thread  with  a  large set of sequences. The minor version
                     1.1.1, published just before, adds compatibility with Apple  computers,  and
                     corrects  an  issue in the pairwise global alignment step that could lead to
                     sub-optimal alignments.

              v1.1 released February 26, 2013
                     Version 1.1 introduces two new important options: the possibility to  output
                     clustering  results  using  the uclust output format, and the possibility to
                     output detailed statistics on each cluster. swarm 1.1 is  also  faster:  new
                     filterings  based  on  pairwise  amplicon  sequence  lengths and composition
                     comparisons reduce the number of pairwise alignments needed and speed up the
                     clustering.

              v1.0 released November 10, 2012
                     First public release.