Ubuntu Manpage: metabat1 - MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequency

NAME

       metabat1  -  MetaBAT:  Metagenome Binning based on Abundance and Tetranucleotide frequency
       (version 1)

DESCRIPTION

       MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequency  (version  1)
       by Don Kang (ddkang@lbl.gov), Jeff Froula, Rob Egan, and Zhong Wang (zhongwang@lbl.gov)

OPTIONS

       -h [ --help ]
              produce help message

       -i [ --inFile ] arg
              Contigs in (gzipped) fasta file format [Mandatory]

       -o [ --outFile ] arg
              Base  file  name for each bin. The default output is fasta format. Use -l option to
              output only contig names [Mandatory]

       -a [ --abdFile ] arg
              A file having mean and variance of base coverage depth (tab  delimited;  the  first
              column  should  be contig names, and the first row will be considered as the header
              and be skipped) [Optional]

       --cvExt
              When a coverage file without variance (from third party tools) is used  instead  of
              abdFile from jgi_summarize_bam_contig_depths

       -p [ --pairFile ] arg
              A  file  having  paired  reads mapping information. Use it to increase sensitivity.
              (tab delimited; should have 3 columns of contig index (ordered by), its mate contig
              index,  and supporting mean read coverage.  The first row will be considered as the
              header and be skipped) [Optional]

       --p1 arg (=0)
              Probability cutoff for bin seeding. It mainly controls the number of potential bins
              and  their specificity. The higher, the more (specific) bins would be. (Percentage;
              Should be between 0 and 100)

       --p2 arg (=0)
              Probability cutoff for secondary neighbors. It supports p1 and better be  close  to
              p1. (Percentage; Should be between 0 and 100)

       --minProb arg (=0)
              Minimum probability for binning consideration. It controls sensitivity.  Usually it
              should be >= 75. (Percentage; Should be between 0 and 100)

       --minBinned arg (=0)
              Minimum proportion of already binned neighbors for one's membership  inference.  It
              contorls  specificity.  Usually  it would be <= 50 (Percentage; Should be between 0
              and 100)

       --verysensitive
              For greater sensitivity, especially in a simple community. It is the  shortcut  for
              --p1 90 --p2 85 --pB 20 --minProb 75 --minBinned 20 --minCorr 90

       --sensitive
              For  better  sensitivity  [default]. It is the shortcut for --p1 90 --p2 90 --pB 20
              --minProb 80 --minBinned 40 --minCorr 92

       --specific
              For better specificity. Different from --sensitive when using  correlation  binning
              or  ensemble  binning.  It is the shortcut for --p1 90 --p2 90 --pB 30 --minProb 80
              --minBinned 40 --minCorr 96

       --veryspecific
              For greater specificity. No correlation binning for short contig recruiting. It  is
              the shortcut for --p1 90 --p2 90 --pB 40 --minProb 80 --minBinned 40

       --superspecific
              For  the best specificity. It is the shortcut for --p1 95 --p2 90 --pB 50 --minProb
              80 --minBinned 20

       --minCorr arg (=0)
              Minimum pearson correlation coefficient for  binning  missed  contigs  to  increase
              sensitivity  (Helpful  when  there are many samples). Should be very high (>=90) to
              reduce contamination. (Percentage; Should be between 0 and 100; 0 disables)

       --minSamples arg (=10)
              Minimum number of sample sizes for considering correlation based recruiting

       -x [ --minCV ] arg (=1)
              Minimum mean coverage of a contig to consider for abundance distance calculation in
              each library

       --minCVSum arg (=2)
              Minimum  total  mean  coverage  of  a contig (sum of all libraries) to consider for
              abundance distance calculation

       -s [ --minClsSize ] arg (=200000) Minimum size of a bin to be considered as the output

       -m [ --minContig ] arg (=2500)
              Minimum size of a contig to be considered for binning (should  be  >=1500;  ideally
              >=2500).  If  #  of  samples  >= minSamples, small contigs (>=1000) will be given a
              chance to be recruited to existing bins by default.

       --minContigByCorr arg (=1000)
              Minimum size of a contig to be considered for  recruiting  by  pearson  correlation
              coefficients  (activated  only  if  #  of  samples  >=  minSamples;  disabled  when
              minContigByCorr > minContig)

       -t [ --numThreads ] arg (=0)
              Number of threads to use (0: use all cores)

       --minShared arg (=50)
              Percentage cutoff for merging fuzzy contigs

       --fuzzy
              Binning with fuzziness which assigns multiple  memberships  of  a  contig  to  bins
              (activated only with --pairFile at the moment)

       -l [ --onlyLabel ]
              Output only sequence labels as a list in a column without sequences

       -S [ --sumLowCV ]
              If  set,  then every sample that falls below the minCV will be used in an aggregate
              sample

       -V [ --maxVarRatio ] arg (=0)
              Ignore any contigs where variance / mean exceeds this ratio (0 disables)

       --saveTNF arg
              File to save (or load if exists) TNF matrix for each contig in input

       --saveDistance arg
              File to save (or load if exists) distance graph at lowest probability cutoff

       --saveCls
              Save cluster memberships as a matrix format

       --unbinned
              Generate [outFile].unbinned.fa file for unbinned contigs

       --noBinOut
              No bin output. Usually combined with --saveCls to check only contig memberships

       -B [ --B ] arg (=20)
              Number of bootstrapping for ensemble binning (Recommended to be >=20)

       --pB arg (=50)
              Proportion   of   shared   membership   in   bootstrapping.   Major   control   for
              sensitivity/specificity. The higher, the specific. (Percentage; Should be between 0
              and 100)

       --seed arg (=0)
              For reproducibility in ensemble binning, though it might produce slightly different
              results. (0: use random seed)

       --keep Keep the intermediate files for later usage

       -d [ --debug ]
              Debug output

       -v [ --verbose ]
              Verbose output

AUTHOR

        This manpage was written by Andreas Tille for the Debian distribution and
        can be used for any other usage of the program.