Ubuntu Manpage: unikmer - Toolkit for nucleic acid k-mer analysis

NAME

       unikmer - Toolkit for nucleic acid k-mer analysis

DESCRIPTION

       unikmer - Unique-Kmer Toolkit

       unikmer  is  a  toolkit for nucleic acid k-mer analysis, providing functions including set
       operation on k-mers optional with TaxIds but without count information.

       K-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64', and serialized in
       binary file with extension '.unik'.

       TaxIds  can be assigned when counting k-mers from genome sequences, and LCA (Lowest Common
       Ancestor) is computed during set opertions including computing  union,  intersection,  set
       difference, unique and repeated k-mers.

       Version: v0.17.2

       Author: Wei Shen <shenwei356@gmail.com>

       Documents         :        https://shenwei356.github.io/unikmer        Source        code:
       https://github.com/shenwei356/unikmer

       Dataset (optional):

              Manipulating k-mers with TaxIds  needs  taxonomy  file  from  e.g.,  NCBI  Taxonomy
              database,  please extract "nodes.dmp", "names.dmp", "delnodes.dmp" and "merged.dmp"
              from          link           below           into           ~/.unikmer/           ,
              ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz  ,  or some other directory, and
              later you can refer to using flag --data-dir or environment variable UNIKMER_DB.

              For  GTDB,  use  https://github.com/nick-youngblut/gtdb_to_taxdump   for   taxonomy
              conversion.

              Note  that  TaxIds  are represented using uint32 and stored in 4 or less bytes, all
              TaxIds should be in range of [1, 4294967295]

   Usage:
              unikmer [command]

   Available Commands:
       common Find k-mers shared by most of multiple binary files

       concat Concatenate multiple binary files without removing duplicates

       count  Generate k-mers (sketch) from FASTA/Q sequences

       decode Decode encoded integer to k-mer text

       diff   Set difference of multiple binary files

       dump   Convert plain k-mer text to binary format

       encode Encode plain k-mer text to integer

       filter Filter low-complexity k-mers (experimental)

              genautocomplete generate  shell  autocompletion  script  (bash|zsh|fish|powershell)
              grep            Search k-mers from binary files head            Extract the first N
              k-mers help            Help about any command info            Information of binary
              files  inter           Intersection of multiple binary files locate          Locate
              k-mers  in  genome  merge            Merge  k-mers  from  sorted  chunk  files  num
              Quickly  inspect  number of k-mers in binary files rfilter         Filter k-mers by
              taxonomic rank sample          Sample k-mers from binary files sort            Sort
              k-mers in binary files to reduce file size split           Split k-mers into sorted
              chunk files tsplit          Split k-mers according to taxid  union            Union
              of  multiple  binary  files  uniqs           Mapping k-mers back to genome and find
              unique subsequences version         Print version information and check for  update
              view            Read and output binary format to plain text

   Flags:
       -c, --compact
              write compact binary file with little loss of speed

       --compression-level int
              compression level (default -1)

       --data-dir string
              directory   containing   NCBI   Taxonomy  files,  including  nodes.dmp,  names.dmp,
              merged.dmp and delnodes.dmp (default "/home/nilesh/.unikmer")

       -h, --help
              help for unikmer

       -I, --ignore-taxid
              ignore taxonomy information

       -i, --infile-list string
              file of input files list (one file per line), if given, they are appended to  files
              from cli arguments

       --max-taxid uint32
              for  smaller  TaxIds,  we  can  use  less  space  to store TaxIds. default value is
              1<<32-1, that's enough for NCBI Taxonomy TaxIds (default 4294967295)

       -C, --no-compress
              do not compress binary file (not recommended)

       --nocheck-file
              do not check binary file, when using process substitution/named pipe

       -j, --threads int
              number of CPUs to use. (default value: 1 for single-CPU PC, 2 for others)  (default
              2)

       --verbose
              print verbose information

       Use "unikmer [command] --help" for more information about a command.

AUTHOR

        This manpage was written by Nilesh Patra for the Debian distribution and
        can be used for any other usage of the program.