Ubuntu Manpage: obicut - description of obicut

Provided by: obitools_1.2.12+dfsg-2_amd64

NAME

       obicut - description of obicut

       obicut is a command that trims sequence objects based on two integer values: the -b option
       gives the first position of the sequence to be kept, and the  -e  option  gives  the  last
       position to be kept. Both values can be computed using a python expression.
          Example:

                 > obicut -b 50 -e seq_length seq1.fasta > seq2.fasta

              Keeps only the sequence part from the fiftieth position to the end.

          Example:

                 > obicut -b 50 -e seq_length-50 seq1.fasta > seq2.fasta

              Trims the first and last 50 nucleotides of the sequence object.

OBICUT SPECIFIC OPTIONS

       -b <INTEGER>, --begin=<INTEGER>
              Integer  value (possibly calculated using a python expression) indicating the first
              position of the sequence to be kept.

       -e <INTEGER>, --end=<INTEGER>
              Integer value (possibly calculated using a python expression) indicating  the  last
              position of the sequence to be kept.

SEQUENCE RECORD SELECTION OPTIONS

       -s <REGULAR_PATTERN>, --sequence=<REGULAR_PATTERN>
                 Regular expression pattern to be tested against the sequence itself. The pattern
                 is case insensitive.

              Examples:

                     > obigrep -s 'GAATTC' seq1.fasta > seq2.fasta

                 Selects only the sequence records that contain an EcoRI restriction site.

                     > obigrep -s 'A{10,}' seq1.fasta > seq2.fasta

                 Selects only the sequence records that contain a stretch of at least 10 A.

                     > obigrep -s '^[ACGT]+$' seq1.fasta > seq2.fasta

                 Selects only the sequence records that do not contain ambiguous nucleotides.

       -D <REGULAR_PATTERN>, --definition=<REGULAR_PATTERN>
                 Regular expression pattern to be tested against the definition of  the  sequence
                 record. The pattern is case sensitive.

              Example:

                     > obigrep -D '[Cc]hloroplast' seq1.fasta > seq2.fasta

                 Selects  only  the  sequence  records  whose  definition contains chloroplast or
                 Chloroplast.

       -I <REGULAR_PATTERN>, --identifier=<REGULAR_PATTERN>
                 Regular expression pattern to be tested against the identifier of  the  sequence
                 record. The pattern is case sensitive.

              Example:

                     > obigrep -I '^GH' seq1.fasta > seq2.fasta

                 Selects only the sequence records whose identifier begins with GH.

       --id-list=<FILENAME>
                 <FILENAME>  points  to  a  text  file  containing  the  list  of sequence record
                 identifiers to be selected.  The file format consists in a single identifier per
                 line.

              Example:

                     > obigrep --id-list=my_id_list.txt seq1.fasta > seq2.fasta

                 Selects   only   the  sequence  records  whose  identifier  is  present  in  the
                 my_id_list.txt file.

       -a <KEY>:<REGULAR_PATTERN>,

       --attribute=<KEY>:<REGULAR_PATTERN>
                 Regular expression pattern  matched  against  the  attributes  of  the  sequence
                 record.  the  value  of this attribute is of the form : key:regular_pattern. The
                 pattern is case sensitive. Several -a options can be used on  the  same  command
                 line  and  in  this  last  case,  the  selected  sequence records will match all
                 constraints.

              Example:

                     > obigrep -a 'family_name:Asteraceae' seq1.fasta > seq2.fasta

                 Selects the sequence records containing an attribute whose  key  is  family_name
                 and value is Asteraceae.

       -A <ATTRIBUTE_NAME>, --has-attribute=<KEY>
                 Selects sequence records having an attribute whose key = <KEY>.

              Example:

                     > obigrep -A taxid seq1.fasta > seq2.fasta

                 Selects only the sequence records having a taxid attribute defined.

       -p <PYTHON_EXPRESSION>, --predicat=<PYTHON_EXPRESSION>
                 Python  boolean  expression  to  be  evaluated  for  each  sequence  record. The
                 attribute keys defined for each sequence record can be used in the expression as
                 variable  names.   An  extra  variable  named  ‘sequence’ refers to the sequence
                 record itself.  Several -p options can be used on the same command line  and  in
                 this last case, the selected sequence records will match all constraints.

              Example:

                     >  obigrep -p '(forward_error<2) and (reverse_error<2)' \
                        seq1.fasta > seq2.fasta

                 Selects   only  the  sequence  records  whose  forward_error  and  reverse_error
                 attributes have a value smaller than two.

       -L <##>, --lmax=<##>
                 Keeps sequence records whose sequence length is equal or shorter than lmax.

              Example:

                     > obigrep -L 100 seq1.fasta > seq2.fasta

                 Selects only the sequence records that have a sequence length equal  or  shorter
                 than 100bp.

       -l <##>, --lmin=<##>
                 Selects sequence records whose sequence length is equal or longer than lmin.

              Examples:

                     > obigrep -l 100 seq1.fasta > seq2.fasta

                 Selects  only  the  sequence records that have a sequence length equal or longer
                 than 100bp.

       -v, --inverse-match
                 Inverts the sequence record selection.

              Examples:

                     > obigrep -v -l 100 seq1.fasta > seq2.fasta

                 Selects only the sequence records that  have  a  sequence  length  shorter  than
                 100bp.

TAXONOMY RELATED OPTIONS

       -d <FILENAME>, --database=<FILENAME>
              ecoPCR taxonomy Database name

       -t <FILENAME>, --taxonomy-dump=<FILENAME>
              NCBI Taxonomy dump repository name

       --require-rank=<RANK_NAME>
              select sequence with taxid tag containing a parent of rank <RANK_NAME>

       -r <TAXID>, --required=<TAXID>
              required taxid

       -i <TAXID>, --ignore=<TAXID>
              ignored taxid

COMMON OPTIONS

       -h, --help
              Shows this help message and exits.

       --DEBUG
              Sets logging in debug mode.

AUTHOR

       The OBITools Development Team - LECA

COPYRIGHT

       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                                 OBICUT(1)