Ubuntu Manpage: obistat - description of obistat

Provided by: obitools_1.2.12+dfsg-2_amd64

NAME

       obistat - description of obistat

       obistats computes basic statistics for attribute values of sequence records.  The sequence
       records can be categorized or not using one or several -c options.  By default,  only  the
       number of sequence records and the total count are computed for each category.  Additional
       statistics can be computed for attribute values in each category, like:

          · minimum value (-m option)

          · maximum value (-M option)

          · mean value (-a option)

          · variance (-v option)

          · standard deviation (-s option)

       The result is a contingency table with the different categories in rows, and the  computed
       statistics in columns.

OBISTAT SPECIFIC OPTIONS

       -c <KEY>, --category-attribute=<KEY>
                 Attribute  used  to  categorize  the sequence records. Several -c options can be
                 combined.

                 TIP:
                     The <KEY> can be simply the key of an  attribute,  or  a  Python  expression
                     similarly to the -p option of obigrep.

              Example:

                     > obistat -c sample -c seq_length seq.fasta

                 This  command  prints  the  number  of sequence records and total count for each
                 combination of sample and sequence length.

       -m <KEY>, --min=<KEY>
                 Computes the minimum value of attribute <KEY> for each category.

              Example:

                     > obistat -c sample -m seq_length seq.fastq

                 This command computes the minimum sequence length observed for each sample.

       -M <KEY>, --max=<KEY>
                 Computes the maximum value of attribute <KEY> for each category.

              Example:

                     > obistat -c sample -M seq_length seq.fastq

                 This command computes the maximum sequence length observed for each sample.

       -a <KEY>, --mean=<KEY>
                 Computes the mean value of attribute <KEY> for each category.

              Example:

                     > obistat -c sample -a seq_length seq.fastq

                 This command computes the mean sequence length observed for each sample.

              -v <KEY>, --variance=<KEY>
                     Computes the variance of attribute <KEY> for each category.

              Example:

                     > obistat -c genus_name -v reverse_error seq.fastq

                 This command computes the variance of the  number  of  errors  observed  in  the
                 reverse primer for each genus.

              -s <KEY>, -std-dev=<KEY>
                     Computes the standard deviation of attribute <KEY> for each category.

              Example:

                     > obistat -c genus_name -s reverse_error seq.fastq

                 This command computes the standard deviation of the number of errors observed in
                 the reverse primer for each genus.

OPTIONS TO SPECIFY INPUT FORMAT

   Restrict the analysis to a sub-part of the input file
       --skip <N>
              The N first sequence records of the file are discarded from the  analysis  and  not
              reported to the output file

       --only <N>
              Only  the N next sequence records of the file are analyzed. The following sequences
              in the file are neither analyzed, neither reported to the output file.  This option
              can be used conjointly with the –skip option.

   Sequence annotated format
       --genbank
              Input file is in genbank format.

       --embl Input file is in embl format.

   fasta related format
       --fasta
              Input file is in fasta format (including OBITools fasta extensions).

   fastq related format
       --sanger
              Input  file  is  in  Sanger  fastq  format  (standard  fastq  used  by  HiSeq/MiSeq
              sequencers).

       --solexa
              Input file is in fastq format produced by Solexa (Ga IIx) sequencers.

   ecoPCR related format
       --ecopcr
              Input file is in ecoPCR format.

       --ecopcrdb
              Input is an ecoPCR database.

   Specifying the sequence type
       --nuc  Input file contains nucleic sequences.

       --prot Input file contains protein sequences.

TAXONOMY RELATED OPTIONS

       -d <FILENAME>, --database=<FILENAME>
              ecoPCR taxonomy Database name

       -t <FILENAME>, --taxonomy-dump=<FILENAME>
              NCBI Taxonomy dump repository name

COMMON OPTIONS

       -h, --help
              Shows this help message and exits.

       --DEBUG
              Sets logging in debug mode.

OBISTAT USED SEQUENCE ATTRIBUTE

          · count

AUTHOR

       The OBITools Development Team - LECA

COPYRIGHT

       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                                OBISTAT(1)