Ubuntu Manpage: unikmer - Toolkit for nucleic acid k-mer analysis

Provided by: unikmer_0.18.8-1ubuntu0.1_amd64

NAME

       unikmer - Toolkit for nucleic acid k-mer analysis

DESCRIPTION

       unikmer - Unique-Kmer Toolkit

       unikmer  is  a  toolkit  for  nucleic acid k-mer analysis, providing functions including set operation on
       k-mers optional with TaxIds but without count information.

       K-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64', and serialized  in  binary  file
       with extension '.unik'.

       TaxIds  can  be  assigned when counting k-mers from genome sequences, and LCA (Lowest Common Ancestor) is
       computed during set opertions  including  computing  union,  intersection,  set  difference,  unique  and
       repeated k-mers.

       Version: v0.17.2

       Author: Wei Shen <shenwei356@gmail.com>

       Documents  : https://shenwei356.github.io/unikmer Source code: https://github.com/shenwei356/unikmer

       Dataset (optional):

              Manipulating  k-mers  with  TaxIds  needs  taxonomy file from e.g., NCBI Taxonomy database, please
              extract "nodes.dmp", "names.dmp", "delnodes.dmp" and "merged.dmp" from link below into ~/.unikmer/
              , ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz , or some other directory, and later you  can
              refer to using flag --data-dir or environment variable UNIKMER_DB.

              For GTDB, use https://github.com/nick-youngblut/gtdb_to_taxdump for taxonomy conversion.

              Note  that TaxIds are represented using uint32 and stored in 4 or less bytes, all TaxIds should be
              in range of [1, 4294967295]

   Usage:
              unikmer [command]

   Available Commands:
       common Find k-mers shared by most of multiple binary files

       concat Concatenate multiple binary files without removing duplicates

       count  Generate k-mers (sketch) from FASTA/Q sequences

       decode Decode encoded integer to k-mer text

       diff   Set difference of multiple binary files

       dump   Convert plain k-mer text to binary format

       encode Encode plain k-mer text to integer

       filter Filter low-complexity k-mers (experimental)

              genautocomplete   generate   shell   autocompletion   script    (bash|zsh|fish|powershell)    grep
              Search  k-mers  from  binary files head            Extract the first N k-mers help            Help
              about any command info            Information of  binary  files  inter            Intersection  of
              multiple  binary  files  locate          Locate k-mers in genome merge           Merge k-mers from
              sorted chunk files num             Quickly inspect  number  of  k-mers  in  binary  files  rfilter
              Filter   k-mers   by   taxonomic  rank  sample           Sample  k-mers  from  binary  files  sort
              Sort k-mers in binary files to reduce file size split           Split  k-mers  into  sorted  chunk
              files  tsplit           Split  k-mers  according to taxid union           Union of multiple binary
              files uniqs            Mapping  k-mers  back  to  genome  and  find  unique  subsequences  version
              Print  version  information  and check for update view            Read and output binary format to
              plain text

   Flags:
       -c, --compact
              write compact binary file with little loss of speed

       --compression-level int
              compression level (default -1)

       --data-dir string
              directory  containing  NCBI  Taxonomy  files,  including  nodes.dmp,  names.dmp,  merged.dmp   and
              delnodes.dmp (default "/home/nilesh/.unikmer")

       -h, --help
              help for unikmer

       -I, --ignore-taxid
              ignore taxonomy information

       -i, --infile-list string
              file  of  input  files  list  (one  file  per line), if given, they are appended to files from cli
              arguments

       --max-taxid uint32
              for smaller TaxIds, we can use less space to store TaxIds. default value is 1<<32-1, that's enough
              for NCBI Taxonomy TaxIds (default 4294967295)

       -C, --no-compress
              do not compress binary file (not recommended)

       --nocheck-file
              do not check binary file, when using process substitution/named pipe

       -j, --threads int
              number of CPUs to use. (default value: 1 for single-CPU PC, 2 for others) (default 2)

       --verbose
              print verbose information

       Use "unikmer [command] --help" for more information about a command.

AUTHOR

        This manpage was written by Nilesh Patra for the Debian distribution and
        can be used for any other usage of the program.

unikmer 0.18.3                                     August 2021                                        UNIKMER(1)