Ubuntu Manpage: seqkit - cross-platform and ultrafast toolkit for FASTA/Q file manipulation

NAME

       seqkit - cross-platform and ultrafast toolkit for FASTA/Q file manipulation

DESCRIPTION

       SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation

       Version: 2.1.0

       Author: Wei Shen <shenwei356@gmail.com>

       Documents           :         http://bioinf.shenwei.me/seqkit         Source         code:
       https://github.com/shenwei356/seqkit                     Please                      cite:
       https://doi.org/10.1371/journal.pone.0163962

       Seqkit  utlizies  the pgzip (https://github.com/klauspost/pgzip) package to read and write
       gzip file, and the outputted gzip file would be slighty larger than files generated by GNU
       gzip.

       Seqkit  writes  gzip  files very fast, much faster than the multi-threaded pigz, therefore
       there's no need to pipe the result to gzip/pigz.

   Usage:
              seqkit [command]

   Available Commands:
       amplicon
              extract amplicon (or specific region around it) via primer(s)

       bam    monitoring and online histograms of BAM record features

       common find common sequences of multiple files by id/name/sequence

       concat concatenate sequences with same ID from multiple files

       convert
              convert FASTQ quality encoding between Sanger, Solexa and Illumina

       duplicate
              duplicate sequences N times

       faidx  create FASTA index file and extract subsequence

       fish   look for short sequences in larger sequences using local alignment

       fq2fa  convert FASTQ to FASTA

       fx2tab convert FASTA/Q to tabular format (and length, GC content, average quality...)

              genautocomplete generate  shell  autocompletion  script  (bash|zsh|fish|powershell)
              grep             search  sequences  by  ID/name/sequence/sequence  motifs, mismatch
              allowed  head             print  first  N  FASTA/Q  records  head-genome      print
              sequences  of  the first genome with common prefixes in name locate          locate
              subsequences/motifs,  mismatch  allowed  mutate            edit   sequence   (point
              mutation,  insertion,  deletion) pair            match up paired-end reads from two
              fastq files range           print FASTA/Q records in  a  range  (start:end)  rename
              rename  duplicated  IDs replace         replace name/sequence by regular expression
              restart         reset start position for  circular  genome  rmdup            remove
              duplicated sequences by ID/name/sequence sample          sample sequences by number
              or  proportion  sana             sanitize  broken  single  line  FASTQ  files  scat
              real   time   recursive   concatenation   and   streaming   of   fastx   files  seq
              transform sequences  (extract  ID,  filter  by  length,  remove  gaps...)   shuffle
              shuffle  sequences  sliding          extract  subsequences  in sliding windows sort
              sort sequences by  id/name/sequence/length  split            split  sequences  into
              files   by  id/seq  region/size/parts  (mainly  for  FASTA)  split2           split
              sequences into files by size/parts  (FASTA,  PE/SE  FASTQ)  stats            simple
              statistics  of  FASTA/Q  files  subseq          get subsequences by region/gtf/bed,
              including flanking sequences tab2fx           convert  tabular  format  to  FASTA/Q
              format  translate       translate DNA/RNA to protein sequence (supporting ambiguous
              bases) version          print  version  information  and  check  for  update  watch
              monitoring and online histograms of sequence features

   Flags:
       --alphabet-guess-seq-length int
              length  of  sequence prefix of the first FASTA record based on which seqkit guesses
              the sequence type (0 for whole seq) (default 10000)

       -h, --help
              help for seqkit

       --id-ncbi
              FASTA head is NCBI-style, e.g. >gi|110645304|ref|NC_002516.2| Pseud...

       --id-regexp string
              regular expression for parsing ID (default "^(\\S+)\\s?")

       --infile-list string
              file of input files list (one file per line), if given, they are appended to  files
              from cli arguments

       -w, --line-width int
              line width when outputting FASTA format (0 for no wrap) (default 60)

       -o, --out-file string
              out file ("-" for stdout, suffix .gz for gzipped out) (default "-")

       --quiet
              be quiet and do not show extra information

       -t, --seq-type string
              sequence  type (dna|rna|protein|unlimit|auto) (for auto, it automatically detect by
              the first sequence) (default "auto")

       -j, --threads int
              number of CPUs. can also set with environment variable SEQKIT_THREADS) (default 4)

       Use "seqkit [command] --help" for more information about a command.

AUTHOR

       This manpage was written by Nilesh Patra for the Debian distribution and can be  used  for
       any other usage of the program.