Ubuntu Manpage: obiextract - description of obiextract

Provided by: obitools_1.2.12+dfsg-2_amd64

NAME

       obiextract - description of obiextract

       The obiextract command extract a subset of samples from a complete dataset.

       Extracted  sample  names can be specified or by indicating their names using option on the
       command line or by indicating a file name containing a sample name per line

       The count attribute of the sequence and the slot describing  distribution  of  the  sample
       occurrences among samples are modified according to the selected samples.

       A  sequence  not  present  in at least one of the selected samples is not conserved in the
       output of obiextract.

OBIEXTRACT SPECIFIC OPTIONS

       -s <KEY>, --sample=<KEY>
              Attribute containing sample descriptions. By default the attribute  name  used  for
              describing samples is set to merged_sample.

       -e <SAMPLE_NAME>, --extract=<KEY>
                 Attribute  indicating  which  <SAMPLE_NAME>  have  to  be extracted.  Several -p
                 options can be added for specifying several samples.  If you want to  extract  a
                 large number of samples, please refer to the -E option described below

                 TIP:
                     The  <KEY>  can  be  simply  the key of an attribute, or a Python expression
                     similarly to the -p option of obigrep.

              Example:

                     > obiextract -e sampleA -e sampleB allseqs.fasta > samplesAB.fasta

                 This command extracts from the allseqs.fasta file data related to samples A  and
                 B.

       -E <FILENAME>, --extract-list=<FILENAME>
                 Allows  for  indicating  a  file name where a list of sample is stored. The file
                 must be a simple text file with a sample name per line.

              Example:

                     > obiextract -E subset.txt allseqs.fasta > subset_samples.fasta

                 This command extracts from the allseqs.fasta file data related to samples listed
                 in the subset.txt file.

OPTIONS TO SPECIFY INPUT FORMAT

   Restrict the analysis to a sub-part of the input file
       --skip <N>
              The  N  first  sequence records of the file are discarded from the analysis and not
              reported to the output file

       --only <N>
              Only the N next sequence records of the file are analyzed. The following  sequences
              in the file are neither analyzed, neither reported to the output file.  This option
              can be used conjointly with the –skip option.

   Sequence annotated format
       --genbank
              Input file is in genbank format.

       --embl Input file is in embl format.

   fasta related format
       --fasta
              Input file is in fasta format (including OBITools fasta extensions).

   fastq related format
       --sanger
              Input  file  is  in  Sanger  fastq  format  (standard  fastq  used  by  HiSeq/MiSeq
              sequencers).

       --solexa
              Input file is in fastq format produced by Solexa (Ga IIx) sequencers.

   ecoPCR related format
       --ecopcr
              Input file is in ecoPCR format.

       --ecopcrdb
              Input is an ecoPCR database.

   Specifying the sequence type
       --nuc  Input file contains nucleic sequences.

       --prot Input file contains protein sequences.

OPTIONS TO SPECIFY OUTPUT FORMAT

   Standard output format
       --fasta-output
              Output sequences in OBITools fasta format

       --fastq-output
              Output sequences in Sanger fastq format

   Generating an ecoPCR database
       --ecopcrdb-output=<PREFIX_FILENAME>
              Creates an ecoPCR database from sequence records results

   Miscellaneous option
       --uppercase
              Print sequences in upper case (default is lower case)

COMMON OPTIONS

       -h, --help
              Shows this help message and exits.

       --DEBUG
              Sets logging in debug mode.

OBIEXTRACT MODIFIED SEQUENCE ATTRIBUTES

          · count

OBIEXTRACT USED SEQUENCE ATTRIBUTE

          · count

AUTHOR

       The OBITools Development Team - LECA

COPYRIGHT

       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                             OBIEXTRACT(1)