Ubuntu Manpage: seqkit - cross-platform and ultrafast toolkit for FASTA/Q file manipulation

NAME

       seqkit - cross-platform and ultrafast toolkit for FASTA/Q file manipulation

DESCRIPTION

       SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation

       Version: 2.1.0

       Author: Wei Shen <shenwei356@gmail.com>

       Documents   :  http://bioinf.shenwei.me/seqkit  Source  code: https://github.com/shenwei356/seqkit Please
       cite: https://doi.org/10.1371/journal.pone.0163962

       Seqkit utlizies the pgzip (https://github.com/klauspost/pgzip) package to read and write gzip  file,  and
       the outputted gzip file would be slighty larger than files generated by GNU gzip.

       Seqkit  writes  gzip files very fast, much faster than the multi-threaded pigz, therefore there's no need
       to pipe the result to gzip/pigz.

   Usage:
              seqkit [command]

   Available Commands:
       amplicon
              extract amplicon (or specific region around it) via primer(s)

       bam    monitoring and online histograms of BAM record features

       common find common sequences of multiple files by id/name/sequence

       concat concatenate sequences with same ID from multiple files

       convert
              convert FASTQ quality encoding between Sanger, Solexa and Illumina

       duplicate
              duplicate sequences N times

       faidx  create FASTA index file and extract subsequence

       fish   look for short sequences in larger sequences using local alignment

       fq2fa  convert FASTQ to FASTA

       fx2tab convert FASTA/Q to tabular format (and length, GC content, average quality...)

              genautocomplete   generate   shell   autocompletion   script    (bash|zsh|fish|powershell)    grep
              search sequences by ID/name/sequence/sequence motifs, mismatch allowed head            print first
              N FASTA/Q records head-genome     print sequences of the first genome with common prefixes in name
              locate           locate subsequences/motifs, mismatch allowed mutate          edit sequence (point
              mutation, insertion, deletion) pair            match up paired-end  reads  from  two  fastq  files
              range           print FASTA/Q records in a range (start:end) rename          rename duplicated IDs
              replace          replace  name/sequence by regular expression restart         reset start position
              for circular  genome  rmdup            remove  duplicated  sequences  by  ID/name/sequence  sample
              sample  sequences  by number or proportion sana            sanitize broken single line FASTQ files
              scat             real  time  recursive  concatenation   and   streaming   of   fastx   files   seq
              transform  sequences  (extract  ID,  filter  by  length,  remove gaps...)  shuffle         shuffle
              sequences sliding         extract subsequences in sliding windows sort             sort  sequences
              by  id/name/sequence/length split           split sequences into files by id/seq region/size/parts
              (mainly for FASTA) split2          split sequences into files by size/parts (FASTA,  PE/SE  FASTQ)
              stats             simple   statistics   of  FASTA/Q  files  subseq           get  subsequences  by
              region/gtf/bed, including flanking sequences tab2fx          convert  tabular  format  to  FASTA/Q
              format  translate       translate DNA/RNA to protein sequence (supporting ambiguous bases) version
              print version information and check for update watch           monitoring and online histograms of
              sequence features

   Flags:
       --alphabet-guess-seq-length int
              length of sequence prefix of the first FASTA record based on which  seqkit  guesses  the  sequence
              type (0 for whole seq) (default 10000)

       -h, --help
              help for seqkit

       --id-ncbi
              FASTA head is NCBI-style, e.g. >gi|110645304|ref|NC_002516.2| Pseud...

       --id-regexp string
              regular expression for parsing ID (default "^(\\S+)\\s?")

       --infile-list string
              file  of  input  files  list  (one  file  per line), if given, they are appended to files from cli
              arguments

       -w, --line-width int
              line width when outputting FASTA format (0 for no wrap) (default 60)

       -o, --out-file string
              out file ("-" for stdout, suffix .gz for gzipped out) (default "-")

       --quiet
              be quiet and do not show extra information

       -t, --seq-type string
              sequence type (dna|rna|protein|unlimit|auto) (for auto,  it  automatically  detect  by  the  first
              sequence) (default "auto")

       -j, --threads int
              number of CPUs. can also set with environment variable SEQKIT_THREADS) (default 4)

       Use "seqkit [command] --help" for more information about a command.

AUTHOR

       This  manpage was written by Nilesh Patra for the Debian distribution and can be used for any other usage
       of the program.

seqkit 2.1.0+ds                                   January 2022                                         SEQKIT(1)