Provided by: obitools_1.2.12+dfsg-2_amd64 bug

NAME

       obiaddtaxids - description of obiaddtaxids

       The  obiaddtaxids  command  annotates  sequence  records  with  a  taxid  based on a taxon
       scientific name stored in the sequence record header.

       Taxonomic information linking a taxid to a taxon scientific name is stored in  a  database
       formatted as an ecoPCR database (see obitaxonomy) or a NCBI taxdump (see NCBI ftp site).

       The  way  to  extract  the  taxon  scientific  name from the sequence record header can be
       specified by two options:

          · By  default,  the  sequence  identifier  is  used.  Underscore  characters  (_)   are
            substituted by spaces before looking for the taxon scientific name into the taxonomic
            database.

          · If the input file is an OBITools extended fasta format, the -k option  specifies  the
            attribute containing the taxon scientific name.

          · If  the  input  file  is  a  fasta file imported from the UNITE or from the SILVA web
            sites, the -f  option  allows  specifying  this  source  and  parsing  correctly  the
            associated taxonomic information.

       For  each sequence record, obiaddtaxids tries to match the extracted taxon scientific name
       with those stored in the taxonomic database.

          · If a match is found, the sequence record is annotated with the corresponding taxid.

       Otherwise,

          · If the -g option is set and the taxon name is composed of  two  words  and  only  the
            first  one  is  found  in  the  taxonomic  database at the ‘genus’ rank, obiaddtaxids
            considers that it found the genus associated with this sequence record and it  stores
            this sequence record in the file specified by the -g option.

          · If the -u option is set and no taxonomic information is retrieved from the scientific
            taxon name, the sequence record is stored in the file specified by the -u option.

          Example

              > obiaddtaxids -k species_name -g genus_identified.fasta \
                             -u unidentified.fasta -d my_ecopcr_database \
                             my_sequences.fasta > identified.fasta

          Tries to match the value associated with the species_name key of each  sequence  record
          from   the  my_sequences.fasta  file  with  a  taxon  name  from  the  ecoPCR  database
          my_ecopcr_database.

              · If there is an exact match, the sequence record is stored in the identified.fasta
                file.

              · If  not  and  the  species_name  value  is  composed  of  two words, obiaddtaxids
                considers the first word as a genus name and tries to find it into the  taxonomic
                database.

                   · If   a   genus   is   found,   the   sequence   record   is  stored  in  the
                     genus_identified.fasta file.

                   · Otherwise the sequence record is stored in the unidentified.fasta file.

OBIADDTAXIDS SPECIFIC OPTIONS

       -f <FORMAT>, --format=<FORMAT>
              Format of the sequence file. Possible formats are:

                 · raw: for regular OBITools extended fasta files (default value).

                 · UNITE: for fasta files downloaded from the UNITE web site.

                 · SILVA: for fasta files downloaded from the SILVA web site.

       -k <KEY>, --key-name=<KEY>
              Key of the attribute containing the taxon name in sequence files  in  the  OBITools
              extended fasta format.

       -a <ANCESTOR>, --restricting_ancestor=<ANCESTOR>
              Enables to restrict the search of taxids under a specified ancestor.

              <ANCESTOR> can be a taxid (integer) or a key (string).

                 · If  it  is  a  taxid,  this  taxid  is used to restrict the search for all the
                   sequence records.

                 · If it is a key, obiaddtaxids looks for the ancestor taxid in the corresponding
                   attribute.  This  allows  having  a  different  ancestor  restriction for each
                   sequence record.

       -g <FILENAME>, --genus_found=<FILENAME>
              File used to store sequences with a match found for the genus.

              CAUTION:
                 this option is not valid with the UNITE format.

       -u <FILENAME>, --unidentified=<FILENAME>
              File used to store sequences with no taxonomic match found.

TAXONOMY RELATED OPTIONS

       -d <FILENAME>, --database=<FILENAME>
              ecoPCR taxonomy Database name

       -t <FILENAME>, --taxonomy-dump=<FILENAME>
              NCBI Taxonomy dump repository name

COMMON OPTIONS

       -h, --help
              Shows this help message and exits.

       --DEBUG
              Sets logging in debug mode.

OBIADDTAXIDS ADDED SEQUENCE ATTRIBUTE

          · taxid

AUTHOR

       The OBITools Development Team - LECA

COPYRIGHT

       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                           OBIADDTAXIDS(1)