Ubuntu Manpage: seqkit - cross-platform and ultrafast toolkit for FASTA/Q file manipulation

name
description
author

NAME

       seqkit - cross-platform and ultrafast toolkit for FASTA/Q file manipulation

DESCRIPTION

       SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation

       Version: 2.1.0

       Author: Wei Shen <shenwei356@gmail.com>

       Documents   :  http://bioinf.shenwei.me/seqkit  Source  code: https://github.com/shenwei356/seqkit Please
       cite: https://doi.org/10.1371/journal.pone.0163962

       Seqkit utlizies the pgzip (https://github.com/klauspost/pgzip) package to read and write gzip  file,  and
       the outputted gzip file would be slighty larger than files generated by GNU gzip.

       Seqkit  writes  gzip files very fast, much faster than the multi-threaded pigz, therefore there's no need
       to pipe the result to gzip/pigz.

   Usage:
              seqkit [command]

   Available Commands:
       amplicon
              extract amplicon (or specific region around it) via primer(s)

       bam    monitoring and online histograms of BAM record features

       common find common sequences of multiple files by id/name/sequence

       concat concatenate sequences with same ID from multiple files

       convert
              convert FASTQ quality encoding between Sanger, Solexa and Illumina

       duplicate
              duplicate sequences N times

       faidx  create FASTA index file and extract subsequence

       fish   look for short sequences in larger sequences using local alignment

       fq2fa  convert FASTQ to FASTA

       fx2tab convert FASTA/Q to tabular format (and length, GC content, average quality...)

              genautocomplete   generate   shell   autocompletion   script    (bash|zsh|fish|powershell)    grep
              search sequences by ID/name/sequence/sequence motifs, mismatch allowed head            print first
              N FASTA/Q records head-genome     print sequences of the first genome with common prefixes in name
              locate           locate subsequences/motifs, mismatch allowed mutate          edit sequence (point
              mutation, insertion, deletion) pair            match up paired-end  reads  from  two  fastq  files
              range           print FASTA/Q records in a range (start:end) rename          rename duplicated IDs
              replace         replace name/sequence by regular expression restart         reset  start  position
              for  circular  genome  rmdup            remove  duplicated  sequences  by  ID/name/sequence sample
              sample sequences by number or proportion sana            sanitize broken single line  FASTQ  files
              scat              real   time   recursive   concatenation   and   streaming  of  fastx  files  seq
              transform sequences (extract ID,  filter  by  length,  remove  gaps...)   shuffle          shuffle
              sequences  sliding          extract subsequences in sliding windows sort            sort sequences
              by id/name/sequence/length split           split sequences into files by id/seq  region/size/parts
              (mainly  for  FASTA) split2          split sequences into files by size/parts (FASTA, PE/SE FASTQ)
              stats            simple  statistics  of  FASTA/Q  files  subseq            get   subsequences   by
              region/gtf/bed,  including  flanking  sequences  tab2fx          convert tabular format to FASTA/Q
              format translate       translate DNA/RNA to protein sequence (supporting ambiguous bases)  version
              print version information and check for update watch           monitoring and online histograms of
              sequence features

   Flags:
       --alphabet-guess-seq-length int
              length of sequence prefix of the first FASTA record based on which  seqkit  guesses  the  sequence
              type (0 for whole seq) (default 10000)

       -h, --help
              help for seqkit

       --id-ncbi
              FASTA head is NCBI-style, e.g. >gi|110645304|ref|NC_002516.2| Pseud...

       --id-regexp string
              regular expression for parsing ID (default "^(\\S+)\\s?")

       --infile-list string
              file  of  input  files  list  (one  file  per line), if given, they are appended to files from cli
              arguments

       -w, --line-width int
              line width when outputting FASTA format (0 for no wrap) (default 60)

       -o, --out-file string
              out file ("-" for stdout, suffix .gz for gzipped out) (default "-")

       --quiet
              be quiet and do not show extra information

       -t, --seq-type string
              sequence type (dna|rna|protein|unlimit|auto) (for auto,  it  automatically  detect  by  the  first
              sequence) (default "auto")

       -j, --threads int
              number of CPUs. can also set with environment variable SEQKIT_THREADS) (default 4)

       Use "seqkit [command] --help" for more information about a command.

AUTHOR

       This  manpage was written by Nilesh Patra for the Debian distribution and can be used for any other usage
       of the program.