Provided by: obitools_1.2.12+dfsg-2_amd64 bug

NAME

       obisample - description of obisample

       obisample randomly resamples sequence records with or without replacement.

OBISAMPLE SPECIFIC OPTIONS

       -s ###, --sample-size ###
                 Specifies the size of the generated sample.

                     · without  the  -a  option,  sample size is expressed as the exact number of
                       sequence records to be sampled (default: number of sequence records in the
                       input file).

                     · with the -a option, sample size is expressed as a fraction of the sequence
                       record numbers in the input file (expressed as a number between 0 and 1).

              Example:

                     > obisample -s 1000 seq1.fasta > seq2.fasta

                 Samples  randomly  1000  sequence  records  from  the  seq1.fasta   file,   with
                 replacement, and saves them in the seq2.fasta file.

       -a, --approx-sampling
                 Switches  the  resampling  algorithm  to  an approximative one, useful for large
                 files.

                 The default algorithm selects exactly the number of sequence  records  specified
                 with  the  -s  option.  When  the  -a  option is set, each sequence record has a
                 probability to be selected related to the count attribute of the sequence record
                 and the -s fraction.

              Example:

                     > obisample -s 0.5 -a seq1.fastq > seq2.fastq

                 Samples  randomly  half  of the sequence records of the seq1.fastq file, without
                 replacement, and saves them in the seq2.fastq file.

       -w, --without-replacement
                 Asks for sampling without replacement.

              Example:

                     > obisample -s 1000 -w seq1.fasta > seq2.fasta

                 Samples randomly  1000  sequence  records  from  the  seq1.fasta  file,  without
                 replacement  (the  input  file  must contain at least 1000 sequences), and saves
                 them in the seq2.fasta file.

OPTIONS TO SPECIFY INPUT FORMAT

   Restrict the analysis to a sub-part of the input file
       --skip <N>
              The N first sequence records of the file are discarded from the  analysis  and  not
              reported to the output file

       --only <N>
              Only  the N next sequence records of the file are analyzed. The following sequences
              in the file are neither analyzed, neither reported to the output file.  This option
              can be used conjointly with the –skip option.

   Sequence annotated format
       --genbank
              Input file is in genbank format.

       --embl Input file is in embl format.

   fasta related format
       --fasta
              Input file is in fasta format (including OBITools fasta extensions).

   fastq related format
       --sanger
              Input  file  is  in  Sanger  fastq  format  (standard  fastq  used  by  HiSeq/MiSeq
              sequencers).

       --solexa
              Input file is in fastq format produced by Solexa (Ga IIx) sequencers.

   ecoPCR related format
       --ecopcr
              Input file is in ecoPCR format.

       --ecopcrdb
              Input is an ecoPCR database.

   Specifying the sequence type
       --nuc  Input file contains nucleic sequences.

       --prot Input file contains protein sequences.

COMMON OPTIONS

       -h, --help
              Shows this help message and exits.

       --DEBUG
              Sets logging in debug mode.

OBISAMPLE USED SEQUENCE ATTRIBUTE

          · count

AUTHOR

       The OBITools Development Team - LECA

COPYRIGHT

       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                              OBISAMPLE(1)