lunar (1) meryl.1.gz

Provided by: meryl_0~20150903+r2013-8build3_amd64 bug

NAME

       meryl - in- and out-of-core kmer counting and utilities

SYNOPSIS

   Estimating memory requirements
       meryl -P -m kmersize [-c #] [-p] -s seq.fasta

       meryl -P -m kmersize [-c #] [-p] -n mercount

   Building a table
       meryl  -B  -m kmersize  [-c  #] [-p] [-v] [-f|-r|-C] [-L minoccurrence] [-U maxoccurrence]
       [-threads n |  {-segments  segments |  -memory megabytes}  [-configbatch [-sge  jobname]]]
       -s seq.fasta -o tblprefix

       meryl -countbatch number [-sgebuild "qsuboptionstring"] -o tblprefix

       meryl -mergebatch number [-sgemerge "qsuboptionstring"] -o tblprefix

   Performing operations on a table
       meryl -M operation [-v] -s tblprefix [-s  tblprefix2 ...]  -o output

   Dumping a table
       meryl -Dh -s tblprefix

       meryl -Dt -n mincount -s tblprefix

DESCRIPTION

       meryl  computes  the  kmer content of genomic sequences.  Kmer content is represented as a
       list of kmers and the number of times each occurs in the input sequences.  The kmer can be
       restricted  to  only  the  forward  kmer,  only  the  reverse  kmer, or the canonical kmer
       (lexicographically smaller of the forward and reverse kmer at each location).   Meryl  can
       report  the  histogram  of  counts,  the  list  of  kmers and their counts, or can perform
       mathematical and set operations on the processed data files.

       The output of meryl is two binary files, called a meryl database,  which  can  be  quickly
       dumped  to provide a histogram of counts, or the actual counts.  A C++ library is supplied
       for direct access to the files.

OPTIONS

       -P     Estimate memory requirements. Given a sequence file (-s) or an upper limit  on  the
              number  of  mers in the file (-n), compute the table size (-t in build) to minimize
              the memory usage. This mode recognizes the following options:

              -m #   size of a mer (required)

              -c #   homopolymer compression (optional)

              -p     enable positions

              -s seq.fasta
                     Sequence file to be scanned to determine the number of mers

              -n #   compute params assuming file with this many mers in it

              Only one of -s, -n need to be specified.  If both are given, -s takes priority.

       -B     Compute the mer-count tables given a sequence file (-s) and lots of parameters.  By
              default, both strands are processed.

              -f     only build for the forward strand

              -r     only build for the reverse strand

              -C     use canonical mers (assumes both strands)

              -L #   DON'T save mers that occur less than # times

              -U #   DON'T save mers that occur more than # times

              -m #   size of a mer (required)

              -c #   homopolymer compression (optional)

              -p     enable positions

              -s seq.fasta
                     sequence to build the table for

              -o tblprefix
                     output table prefix

              -v     entertain the user

              The  meryl process can run in one large memory batch, in many small memory batches,
              or under SGE control, all with or without using multiple CPU  cores.   By  default,
              the  computation is done as one large sequential process.  Multi-threaded operation
              is possible, at additional memory expense, as is segmented operation, at additional
              I/O expense.

              Threaded operation
                     Split  the counting in to n almost-equally sized pieces.  This uses an extra
                     h MB (from -P) per thread.

                     -threads n
                            use n threads to build

              Segmented, sequential operation
                     Split the counting into pieces that will fit into  no  more  than  m  MB  of
                     memory,  or into n equal sized pieces.  Each piece is computed sequentially,
                     and the results are merged at the end.  Only one of -memory and -segments is
                     needed.

                     -memory m
                            use at most m MB of memory per segment

                     -segments n
                            use n segments

              Segmented, batched operation
                     Same  as sequential, except this allows each segment to be manually executed
                     in parallel.  Only one of -memory and -segments is  needed.   Also  see  the
                     EXAMPLE section on this page.

                     -memory m
                            use at most m MB of memory per segment

                     -segments n
                            use n segments

                     -configbatch
                            create the batches

                     -countbatch n
                            run batch number n

                     -mergebatch
                            merge the batches

                     Batched mode can run on the grid.

                     -sge jobname
                            unique job name for this execution.  Meryl will submit jobs with name
                            mpjobname, ncjobname, nmjobname, for phases prepare, count and merge.

                     -sgebuild "options"

                     -sgemerge "options"
                            any additional options to qsub(1) (e.g., "-p -153  -pe  thread  2  -A
                            merylaccount")  N.B. - -N will be ignored N.B. - be sure to quote the
                            options

       -M     Given a list of tables, perform a math, logical  or  threshold  operation.   Unless
              specified, all operations take any number of databases.  Math operations are:

              min    count  is the minimum count for all databases.  If the mer does NOT exist in
                     all databases, the mer has a zero count, and is NOT in the output.

              minexist
                     count is the minimum count for all databases that contain the mer

              max    count is the maximum count for all databases

              add    count is sum of the counts for all databases

              sub    count is the first minus the second (binary only)

              abs    count is the absolute value of the first minus the second (binary only)

              Logical operations are:

              and    outputs mer iff it exists in all databases

              nand   outputs mer iff it exists in at least one, but not all, databases

              or     outputs mer iff it exists in at least one database

              xor    outputs mer iff it exists in an odd number of databases

              Threshold operations are:

              lessthan x
                     outputs mer iff it has count <  x

              lessthanorequal x
                     outputs mer iff it has count <= x

              greaterthan x
                     outputs mer iff it has count >  x

              greaterthanorequal x
                     outputs mer iff it has count >= x

              equal x
                     outputs mer iff it has count == x

              Threshold operations work on exactly one database.

              -s tblprefix
                     use tblprefix as a database

              -o tblprefix
                     create this output

              -v     entertain the user

       -D     Dump table (not all of these work)

              -Dd    Dump a histogram of the distance between the same mers.

              -Dt    Dump mers >= a threshold.  Use -n to specify the threshold.

              -Dc    Count the number of mers, distinct mers and unique mers.

              -Dh    Dump (to stdout) a histogram of mer counts.

              -s     Read the count table from here (leave off the .mcdat or .mcidx).

EXAMPLE

   Batch creation of a table
       Initialize the compute with -configbatch, which needs all the build options.  Execute  all
       -countbatch jobs, then -mergebatch to complete.

              meryl -configbatch -B [options] -o file
              meryl -countbatch 0 -o file
              meryl -countbatch 1 -o file
              ...
              meryl -countbatch N -o file
              meryl -mergebatch N -o file

SEE ALSO

       simple(1), mapMers(1), mapMers-depth(1), kmer-mask(1)