Ubuntu Manpage: unikmer - Toolkit for nucleic acid k-mer analysis

NAME

       unikmer - Toolkit for nucleic acid k-mer analysis

DESCRIPTION

       unikmer - Toolkit for k-mer with taxonomic information

       unikmer  is  a  toolkit for nucleic acid k-mer analysis, providing functions including set
       operation on k-mers optional with TaxIds but without count information.

       K-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64', and serialized in
       binary file with extension '.unik'.

       TaxIds  can be assigned when counting k-mers from genome sequences, and LCA (Lowest Common
       Ancestor) is computed during set opertions including computing  union,  intersection,  set
       difference, unique and repeated k-mers.

       Version: v0.19.0

       Author: Wei Shen <shenwei356@gmail.com>

       Documents          :         https://bioinf.shenwei.me/unikmer         Source        code:
       https://github.com/shenwei356/unikmer

       Dataset (optional):

              Manipulating k-mers with TaxIds  needs  taxonomy  file  from  e.g.,  NCBI  Taxonomy
              database,  please extract "nodes.dmp", "names.dmp", "delnodes.dmp" and "merged.dmp"
              from          link           below           into           ~/.unikmer/           ,
              ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz  ,  or some other directory, and
              later you can refer to using flag --data-dir or environment variable UNIKMER_DB.

              For GTDB, use 'taxonkit create-taxdump' to create NCBI-style taxonomy  dump  files,
              or download from:

              https://github.com/shenwei356/gtdb-taxonomy

              Note  that  TaxIds  are represented using uint32 and stored in 4 or less bytes, all
              TaxIds should be in the range of [1, 4294967295]

   Usage:
              unikmer [command]

   Available Commands:
              autocompletion  Generate  shell  autocompletion  script  (bash|zsh|fish|powershell)
              common          Find  k-mers  shared  by  most  of  multiple  binary  files  concat
              Concatenate   multiple   binary   files   without   removing    duplicates    count
              Generate  k-mers  (sketch)  from  FASTA/Q  sequences  decode         Decode encoded
              integer to k-mer text diff           Set difference of multiple binary  files  dump
              Convert plain k-mer text to binary format encode         Encode plain k-mer text to
              integer  filter          Filter  out  low-complexity  k-mers  (experimental)   grep
              Search  k-mers  from  binary  files  head           Extract the first N k-mers info
              Information of binary files inter          Intersection of  multiple  binary  files
              locate          Locate  k-mers  in  genome  merge          Merge k-mers from sorted
              chunk files num            Quickly inspect number of k-mers in binary files rfilter
              Filter k-mers by taxonomic rank sample         Sample k-mers from binary files sort
              Sort k-mers in binary files to reduce file size split           Split  k-mers  into
              sorted   chunk   files   tsplit          Split  k-mers  according  to  taxid  union
              Union of multiple binary files uniqs          Mapping k-mers  back  to  genome  and
              find  unique  subsequences  version         Print version information and check for
              update view           Read and output binary format to plain text

   Flags:
       -c, --compact
              write compact binary file with little loss of speed

       --compression-level int
              compression level (default -1)

       --data-dir string
              directory  containing  NCBI  Taxonomy  files,   including   nodes.dmp,   names.dmp,
              merged.dmp and delnodes.dmp (default "/home/nilesh/.unikmer")

       -h, --help
              help for unikmer

       -I, --ignore-taxid
              ignore taxonomy information

       -i, --infile-list string
              file  of input files list (one file per line), if given, they are appended to files
              from cli arguments

       --max-taxid uint32
              for smaller TaxIds, we can use  less  space  to  store  TaxIds.  default  value  is
              1<<32-1, that's enough for NCBI Taxonomy TaxIds (default 4294967295)

       -C, --no-compress
              do not compress binary file (not recommended)

       --nocheck-file
              do not check binary file, when using process substitution or named pipe

       -j, --threads int
              number of CPUs to use (default 4)

       --verbose
              print verbose information

       Use "unikmer [command] --help" for more information about a command.