Ubuntu Manpage: sortmerna - tool for filtering, mapping and OTU-picking NGS reads

NAME

       sortmerna - tool for filtering, mapping and OTU-picking NGS reads

SYNOPSIS

       sortmerna --ref db.fasta,db.idx --reads file.fa --aligned base_name_output [OPTIONS]

DESCRIPTION

       SortMeRNA  is  a  biological sequence analysis tool for filtering, mapping and OTU-picking
       NGS reads. The core algorithm is based on  approximate  seeds  and  allows  for  fast  and
       sensitive analyses of nucleotide sequences. The main application of SortMeRNA is filtering
       rRNA from  metatranscriptomic  data.   Additional  applications  include  OTU-picking  and
       taxonomy assignation available through QIIME v1.9+ (http://qiime.org - v1.9.0-rc1).

       SortMeRNA  takes as input a file of reads (fasta or fastq format) and one or multiple rRNA
       database file(s), and sorts apart rRNA and rejected reads into two files specified by  the
       user.  Optionally,  it can provide high quality local alignments of rRNA reads against the
       rRNA database. SortMeRNA works with Illumina, 454, Ion Torrent and PacBio  data,  and  can
       produce SAM and BLAST-like alignments.

OPTIONS

   MANDATORY OPTIONS
       --ref STRING,STRING
              FASTA reference file, index file
              Example:
              --ref /path/to/file1.fasta,/path/to/index1
              If passing multiple reference sequence files, separate them by ':'
              Example:
              --ref /path/f1.fasta,/path/index1:/path/f2.fasta,path/index2

       --reads STRING
              FASTA/FASTQ reads file

       --aligned STRING
              aligned reads filepath + base file name (appropriate extension will be added)

   COMMON OPTIONS
       --other STRING
              rejected reads filepath + base file name (appropriate extension will be added)

       --fastx BOOL
              output FASTA/FASTQ fil (default: off, for aligned and/or rejected reads)

       --sam BOOL
              output SAM alignmen (default: off, for aligned reads only)

       --SQ BOOL
              add SQ tags to the SAM fil (default: off)

       --blast INT
              output alignments in various Blast-like formats
              0 - pairwise
              1 - tabular (Blast -m 8 format)
              2 - tabular + column for CIGAR
              3 - tabular + columns for CIGAR and query coverage

       --log BOOL
              output overall statistic (default: off)

       --num_alignments INT
              report   first   INT   alignments   per   read   reaching   E-value  (default:  -1,
              --num_alignments 0 signifies all alignments will be output)

       or (default)

       --best INT
              report INT best alignments per read reaching  E-value  (default:  1)  by  searching
              --min_lis  INT  candidate  alignments  (--best 0 signifies all candidate alignments
              will be searched)

       --min_lis INT
              search all alignments having the first INT longest LIS (default: 2) LIS stands  for
              Longest  Increasing  Subsequence,  it  is computed using seeds' positions to expand
              hits into longer matches prior to Smith-Waterman alignment.

       --print_all_reads
              output null alignment strings for non-aligned reads (default: off)  to  SAM  and/or
              BLAST tabular files

       --paired_in BOOL
              both paired-end reads go in --aligned fasta/q file (default: off, interleaved reads
              only, see Section 4.2.4 of User Manual)

       --paired_out BOOL
              both paired-end reads go in --other fasta/q file (default: off,  interleaved  reads
              only, see Section 4.2.4 of User Manual)

       --match INT
              SW score (positive integer) for a match (default: 2)

       --mismatch INT
              SW penalty (negative integer) for a mismatch (default: -3)

       --gap_open INT
              SW penalty (positive integer) for introducing a gap (default: 5)

       --gap_ext INT
              SW penalty (positive integer) for extending a gap (default: 2)

       -N INT SW penalty for ambiguous letters (N's) (default: scored as --mismatch)

       -F BOOL
              search only the forward strand (default: off)

       -R BOOL
              search only the reverse-complementary strand (default: off)

       -a INT number of threads to use  (default: 1)

       -e DOUBLE
              E-value threshold  (default: 1)

       -m INT INT  Mbytes  for  loading  the  reads into memory (default: 1024, maximum -m INT is
              5872)

       -v BOOL
              verbose  (default: off)

   OTU PICKING OPTIONS
       --id DOUBLE
              %id similarity threshold (the alignment must  still  pass  the  E-value  threshold,
              default: 0.97)

       --coverage DOUBLE
              %query  coverage  threshold  (the  alignment must still pass the E-value threshold,
              default: 0.97)

       --de_novo_otu BOOL
              FASTA/FASTQ file for reads matching database < %id
              (set using --id) and < %cov (set using --coverage)
              (alignment must still pass the E-value threshold, default: off)

       --otu_map BOOL
              output OTU map (input to QIIME's make_otu_table.py, default: off)

   ADVANCED OPTIONS
       see SortMeRNA user manual for more details

       --passes INT
              three intervals at which to place the seed on the read (L is the seed length set in
              indexdb_rna(1), default: L,L/2,3)

       --edges INT
              number (or percent if INT followed by % sign) of nucleotides to add to each edge of
              the read prior to SW local alignment (default: 4)

       --num_seeds INT
              number of seeds matched before searching for candidate LIS (default: 2)

       --full_search BOOL
              search for all 0-error and 1-error seed matches in the index rather  than  stopping
              after  finding  a 0-error match (<1% gain in sensitivity with up four-fold decrease
              in speed, default: off)

       --pid BOOL
              add pid to output file names (default: off)

       -h BOOL
              help

       --version BOOL
              SortMeRNA version number