Provided by: obitools_1.2.12+dfsg-2_amd64 bug


       ecotaxspecificity - description of ecotaxspecificity

       The ecotaxspecificity command evaluates barcode resolution at different taxonomic ranks.

       As  inputs,  it takes a sequence record file annotated with taxids in the sequence header,
       and a database formated as an ecopcr database (see obitaxonomy) or  a  NCBI  taxdump  (see
       NCBI ftp site).

       An example of output is reported below:

          Number of sequences added in graph: 284
          Number of nodes in all components: 269
          Number of sequences lost: 15!
          rank                      taxon_ok      taxon_total     percent
          order                            8               8        100.00
          superfamily                      1               1        100.00
          parvorder                        1               1        100.00
          subkingdom                       1               1        100.00
          superkingdom                     1               1        100.00
          kingdom                          3               3        100.00
          phylum                           5               5        100.00
          infraorder                       1               1        100.00
          subfamily                        3               3        100.00
          class                            6               6        100.00
          species                         35             176         19.89
          superorder                       1               1        100.00
          suborder                         1               1        100.00
          subtribe                         1               1        100.00
          subclass                         3               3        100.00
          genus                            9              15         60.00
          superclass                       1               1        100.00
          family                          10              10        100.00
          tribe                            2               2        100.00
          subphylum                        1               1        100.00

       In  this example, the input sequence file contains 284 sequence records, but only 269 have
       been examined, because taxonomic information was not recovered for the  the  15  remaining

       “Taxon_total” refers to the number of different taxa observed at this rank in the sequence
       record file (when taxonomic  information  is  available  at  this  rank),  and  “taxon_ok”
       corresponds  to  the  number of taxa that the barcode sequence identifies unambiguously in
       the taxonomic database. In this example, the sequence records correspond to 176  different
       species,  but  only  35  of  these  have specific barcodes. “percent” is the percentage of
       unambiguously identified taxa among the total number of taxa (taxon_ok/taxon_total*100).


       -e INT, --errors=<INT>
                 Two sequences are considered as different if they have INT or  more  differences
                 (default: 1).


                        > ecotaxspecificity -d my_ecopcr_database -e 5 seq.fasta

                 This  command  considers  that  two  sequences  with  less  than  5  differences
                 correspond to the same barcode.


       -d <FILENAME>, --database=<FILENAME>
              ecoPCR taxonomy Database name

       -t <FILENAME>, --taxonomy-dump=<FILENAME>
              NCBI Taxonomy dump repository name


   Restrict the analysis to a sub-part of the input file
       --skip <N>
              The N first sequence records of the file are discarded from the  analysis  and  not
              reported to the output file

       --only <N>
              Only  the N next sequence records of the file are analyzed. The following sequences
              in the file are neither analyzed, neither reported to the output file.  This option
              can be used conjointly with the –skip option.

   Sequence annotated format
              Input file is in genbank format.

       --embl Input file is in embl format.

   fasta related format
              Input file is in fasta format (including OBITools fasta extensions).

   fastq related format
              Input  file  is  in  Sanger  fastq  format  (standard  fastq  used  by  HiSeq/MiSeq

              Input file is in fastq format produced by Solexa (Ga IIx) sequencers.

   ecoPCR related format
              Input file is in ecoPCR format.

              Input is an ecoPCR database.

   Specifying the sequence type
       --nuc  Input file contains nucleic sequences.

       --prot Input file contains protein sequences.


       -h, --help
              Shows this help message and exits.

              Sets logging in debug mode.


          · taxid


       The OBITools Development Team - LECA


       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                      ECOTAXSPECIFICITY(1)