Ubuntu Manpage: obisubset - description of obisubset

Provided by: obitools_1.2.12+dfsg-2_amd64

NAME

       obisubset - description of obisubset

       The  obisubset  command  extracts  a  subset  of  samples  from  a sequence file after its
       dereplication using obiuniq program.

OBISUBSET SPECIFIC OPTIONS

       -s <TAGNAME>, --sample=<TAGNAME>,
              The option -s allows to specify the tag containing sample descriptions, the default
              value is set to merged_sample.

              Example:

                     > obiuniq -m sample seq1.fasta > seq2.fasta
                     > obisubset -s merged_sample -n sample1 seq2.fasta > seq3.fasta

                 After  the  dereplication  of  the  sequences  using  the  in  the new attribute
                 merged_sample.

       -o <TAGNAME>, --other-tag=<TAGNAME>,
              Another tag to clean according to the sample subset

              Example:

                     > obisubset -s merged_sample -o -n sample1 seq2.fasta > seq3.fasta

       -l <FILENAME>, --sample-list=<FILENAME>,
              File containing the samples names (one sample id per line).

              Example:

                     > obisubset -s merged_sample -o -l ids.txt seq2.fasta > seq3.fasta

       -p <REGEX>, --sample-pattern=<REGEX>,
              A regular expression pattern matching the sample ids to extract.

              Example:

                     > obisubset -s merged_sample -o -p "negative_.*" seq2.fasta > seq3.fasta

       -n <SAMPLEIDS>, --sample-name=<SAMPLEIDS>,
              A sample id to extract

              Example:

                     > obisubset -s merged_sample -o -n sample1 seq2.fasta > seq3.fasta

OPTIONS TO SPECIFY INPUT FORMAT

   Restrict the analysis to a sub-part of the input file
       --skip <N>
              The N first sequence records of the file are discarded from the  analysis  and  not
              reported to the output file

       --only <N>
              Only  the N next sequence records of the file are analyzed. The following sequences
              in the file are neither analyzed, neither reported to the output file.  This option
              can be used conjointly with the –skip option.

   Sequence annotated format
       --genbank
              Input file is in genbank format.

       --embl Input file is in embl format.

   fasta related format
       --fasta
              Input file is in fasta format (including OBITools fasta extensions).

   fastq related format
       --sanger
              Input  file  is  in  Sanger  fastq  format  (standard  fastq  used  by  HiSeq/MiSeq
              sequencers).

       --solexa
              Input file is in fastq format produced by Solexa (Ga IIx) sequencers.

   ecoPCR related format
       --ecopcr
              Input file is in ecoPCR format.

       --ecopcrdb
              Input is an ecoPCR database.

   Specifying the sequence type
       --nuc  Input file contains nucleic sequences.

       --prot Input file contains protein sequences.

OPTIONS TO SPECIFY OUTPUT FORMAT

   Standard output format
       --fasta-output
              Output sequences in OBITools fasta format

       --fastq-output
              Output sequences in Sanger fastq format

   Generating an ecoPCR database
       --ecopcrdb-output=<PREFIX_FILENAME>
              Creates an ecoPCR database from sequence records results

   Miscellaneous option
       --uppercase
              Print sequences in upper case (default is lower case)

COMMON OPTIONS

       -h, --help
              Shows this help message and exits.

       --DEBUG
              Sets logging in debug mode.

OBISUBSET MODIFIES SEQUENCE ATTRIBUTES

            · count

            · merged_*

OBISUBSET USED SEQUENCE ATTRIBUTE

          · count

          · merged_*

AUTHOR

       The OBITools Development Team - LECA

COPYRIGHT

       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                              OBISUBSET(1)