lunar (1) BaitFilter.1.gz

Provided by: baitfisher_1.2.7+git20211020.de26d5c+dfsg-1_amd64 bug

NAME

       BaitFilter-v1.0.6 - manual page for BaitFilter-v1.0.6

DESCRIPTION

       Welcome to Bait-Filter, version 1.0.6.

       USAGE:

       ./BaitFilter-v1.0.6
              -i  <string>  [-o  <string>] [-c <string>] [-m <string>] [--blast-second-hit-evalue
              <floating  point  number>]  [--blast-first-hit-evalue  <floating   point   number>]
              [--blast-min-hit-coverage-of-baits-in-tiling-stack    <floating    point   number>]
              [--ref-blast-db        <string>]        [--blast-extra-commandline        <string>]
              [--blast-evalue-cutoff   <floating  point  number>]  [-B  <string>]  [-t  <positive
              integer>]  [--ID-prefix  <string>]  [-S]  [--verbosity  <unsigned   integer>]   [-b
              <string>] [--] [--version] [-h]

       Where:

       -i <string>,  --input-bait-file-name <string>

       (required)
              Name of the input bait locus file. This is the bait file

              obtained from the BaitFisher program or from a previous filter run with BaitFilter.

       -o <string>,  --output-bait-file-name <string>

              Name  of the output bait file. All modes, except the conversion mode, produce files
              in the BaitFisher format.

       -c <string>,  --convert <string>

              Allows the user to produce the final output file which can be uploaded  at  a  bait
              producing  company.  In this mode, BaitFilter reads the input bait file and instead
              of doing a filtering step, it produces a custom bait file that can be  uploaded  at
              the  baits  producing company. In order to avoid confusion, a filtering step cannot
              be done in the same run as the conversion. If you want to filter a  bait  file  and
              convert  the output, you will need to call this program more than once, first to do
              the filtering and second  to  do  the  conversion.  Allowed  conversion  parameters
              currently are: "four-column-upload".

              New  output formats can be added upon request. Please contact the author: Christoph
              Mayer, Email: Mayer Christoph <c.mayer.zfmk@uni-bonn.de>

       -m <string>,  --mode <string>

              Apart from the input file option, the mode option is  the  most  important  option.
              This  option  specifies which filter mode BaitFilter uses. (See the user manual for
              more details):

       "ab":  Retain only the best bait locus for each alignment file

       when using the optimality criterion
              to minimize the total

              number of required baits.

       "as":  Retain only the best bait locus for each alignment file

       when using the optimality criterion
              to maximize the number

              of sequences the result is based on.

       "fb":  Retain only the best bait locus for each feature (e.g. CDS)

       when using the optimality criterion
              to minimize the total

              number of required baits. Only applicable if alignment cutting  has  been  used  in
              BaitFisher.

       "fs":  Retain only the best bait locus for each feature (e.g. CDS)

       when using the optimality criterion
              to maximize the number

              of  sequences the result is based on. Only applicable if alignment cutting has been
              used in BaitFisher.

              "blast-a": Remove all bait regions of all ALIGNMENTs for which one  or  more  baits
              have at least two good hits to a reference genome. (Not recommended.)

              "blast-f": Remove all bait regions of all FEATUREs for which one or more baits have
              at least two good hits to a reference genome. (Not recommended.)

              "blast-l": Remove only the bait REGIONs that contain a bait that has multiple  good
              hits to a reference genome. (Recommended over blast-f and blast-a.)

              "blast-c":  Conduct  a  coverage  filter  run  without  a search for multiple hits.
              Requires   the   blast-min-hit-coverage-of-baits-in-tiling-stack   option   to   be
              specified.

       "thin-b":
              Thin out a bait file to every Nth bait region, by finding

              the start position that minimizes the number of baits.

       "thin-s":
              Thin out a bait file to every Nth bait region, by finding

              the start position that maximizes the number of sequences.

       "thin-b-old":
              Similar to thin-b, but treats all loci as if they come

              from  one  alignment  file.  Identical  to  behaviour of thin-b in version 1.0.5 or
              earlier.

       "thin-s-old":
              Similar to thin-s, but treats all loci as if they come

              from one alignment file. Identical to behaviour  of  thin-b  in  version  1.0.5  or
              earlier.

       --blast-second-hit-evalue <floating point number>

              Maximum  E-value for the second or second best hit. A bait is characterised to bind
              ambiguously, if we have at least  two  good  hits.   This  option  is  the  E-value
              threshold  for  the  second best hit to different loci of the genome.This option is
              the E-value threshold for the second best hit. Default: 0.000001

       --blast-first-hit-evalue <floating point number>

              Maximum E-value for the first or best hit of the bait against the genome. A bait is
              characterized  to  bind ambiguously, if we have at least two good hits to different
              loci of the genome. This option is the E-value threshold for  the  first/best  hit.
              Default: 0.000001

       --blast-min-hit-coverage-of-baits-in-tiling-stack <floating point

              number>

              Can  be  specified together with the following modes (-m option): blast-a, blast-f,
              blast-l, blast-c. In all these modes, a blast  analysis  of  all  baits  against  a
              reference  genome  is conducted. This option specifies a minimum query hit coverage
              which at least one bait has to have in each tiling stack (i.e. the  column  in  the
              tiling  design).  Otherwise  the bait region is discarded. If not specified, no hit
              coverage is checked. The coverage is determined  for  each  bait  by  dividing  the
              length  of  the best hit of this bait against the specified genome by the length of
              this bait. Then the highest coverage is determined  for  each  bait  stack  of  the
              tiling  design.   If  this  option  is  used  together  with  another filter, it is
              important to know the order in which the two are applied, since the  order  matters
              for  the  final  result:For  the  mode  options:  blast-a, blast-f, blast-l the hit
              coverage is checked after filtering for  baits  with  multiple  good  hits  to  the
              reference genome.

       --ref-blast-db <string>

              Base name to a blast data base file. This name is passed to the blast command. This
              is the name of the fasta file of your reference genome.  IMPORTANT: The makeblastdb
              program has to be called before starting the Bait-Filter program. makeblastdb takes
              the fasta file and  creates data base files out of it. Cannot be specified together
              with the blast-result-file option.

       --blast-extra-commandline <string>

              When invoking the blast command, extra command line parameters can be passed to the
              blast program with the aid of this option. As an example , this option  allows  you
              to   specify  the  number  of  threads  the  blast  program  should  use.  Example:
              --blast-extra-commandline "-num_threads 20" sets the number of threads to 20.

       --blast-evalue-cutoff <floating point number>

              When conducting a blast search, a maximum E-value can be specified when calling the
              blast  program.  The  effect  is  that hits with a higher E-value are not reported.
              BaitFilter always specifies such an E-value when calling  the  blast  program.  The
              default   E-value   passed  by  BaitFilter  to  the  blast  program  is  twice  the
              --blast-second-hit-evalue. If a coverage filter is requested the default  value  is
              set to 0.001 if twice the value of --blast-second-hit-evalue is smaller than 0.001.
              This should guarantee that all hits necessary for the blast and/or coverage  filter
              are  found.  If  the  user  wants to set a different E-value threshold, this can be
              specified with this option. With version  1.0.6  of  this  program,  the  value  is
              automatically  changed  to  be  larger  or equal to 0.001 if the coverage filter is
              used. This makes the usage of this option unnecessary in most cases.

       -B <string>,  --blast-executable <string>

              Name of or path+name to  the  blast  executable.  Default:  blastn.  Minimum  blast
              version  number:  Blast+  2.2.x. Default: blastn. Cannot be specified together with
              the blast-result-file option.

       -t <positive integer>,  --thinning-step-width <positive integer>

              Thin out the bait file by retaining only every Nth bait region. The  integer  after
              the  option specifies the step width N. If one of the modes thin-b (thin-b-old), or
              thin-s (thin-s-old) is active, this option is required, otherwise it is not allowed
              to set this parameter.

       --ID-prefix <string>

              In  the  conversion mode to the four-column-upload file format, each converted file
              should get a unique ProbeID prefix, since even among multiple files,  ProbeIDs  are
              not  allowed to be identical. With this option the user is able to specify a prefix
              string to all probe IDs in the four-column-upload file created by BaitFilter.

       -S,  --stats

              Compute bait file characteristics for the input file and report these.   This  mode
              is automatically used for all modes specified with -m option or the conversion mode
              specified with -c option. The purpose of the -S option is to compute stats  without
              having  to  filter  or  convert the input file. In particular, the -S mode does not
              require specifying an output file.

              This option has no effect if combined with the -m or -c modes.

       --verbosity <unsigned integer>

              The verbosity option controls the amount of information Bait-Filter writes  to  the
              console  while  running. 0: Print only welcome message and essential error messages
              that lead to exiting the program. 1: report also warnings, 2: report also progress,
              3:  report  more  detailed  progress,  >10:  debug output. Maximum 10000: write all
              possible diagnostic output. A value of 2 is required if startup  parameters  should
              be reported.

       -b <string>,  --blast-result-file <string>

              Conducting a blast analysis of all baits against a reference genome can take a long
              time. If different filtering parameters, e.g.  different coverage thresholds are to
              be  compared, the same blast has to be done multiple times. With this argument, the
              blast will be skipped and the specified blast result file will be used. This option
              has  to  be used with caution! No checks are done (so far) to ensure that the blast
              result file corresponds to the  specified  bait  file.  If  a  BaitFilter  run  was
              conducted  which  did  a  blast search, BaitFilter will not delete the blast result
              file after the run was completed. The result file with  the  name  blast_result.txt
              will  remain  in  the  working  directory. It can be moved or renamed and with this
              option it can be specified as the input file for further BaitFilter  runs.  If  you
              have  the  slightest doubt whether you are using the correct blast result file, you
              should not use this option. This  option  is  only  allowed  in  modes  that  would
              normally  do  a  blast  search.  This  option cannot be specified together with the
              blast-executable,   blast-evalue-cutoff,   blast-extra-commandline,    ref-blast-db
              options,  since  these  are  options  specific  to  runs in which a blast search is
              conducted.

       --,  --ignore_rest

              Ignores the rest of the labeled arguments following this flag.

       --version

              Displays version information and exits.

       -h,  --help

              Displays usage information and exits.

              The Bait-Filter program has been  designed  to  post  process  the  output  of  the
              BaitFisher program in order select appropriate bait regions and to create the final
              bait set. BaitFilter offers several filtering and  conversion  modes.  If  multiple
              filtering  steps  and  a  final conversion are required, BaitFilter will have to be
              started multiple times and the output of the different runs are used  as  input  in
              the next step.

              The  BaitFisher  program  designs  baits for every locus for which a bait design is
              possible for a full bait region. A bait region can start  at  every  nucleotide  as
              long  as  the  remaining sequence is long enough. This output has to be reduced and
              the purpose of BaitFilter is to find  for  each  feature,  gene  or  alignment  the
              optimal  locus  or  the  optimal  loci for the bait regions. Before determining the
              locus with the fewest number of baits or the largest sequence coverage,  one  might
              want  to  determine  which  baits  are  expected  to  bind  specifically in a given
              reference genome. This is achieved by  conducting  a  Blast  search  of  the  baits
              against a genome. Baits which are highly similar to at least two loci of the genome
              can be determined and their bait regions can be removed. The  blast  search  result
              can  also  be  used to specify a minimum hit coverage of the baits in a bait region
              against the reference genome.  After removing bait regions at  inferior  loci,  the
              optimal  bait region starting locus (start coordinate) can be inferred with the aid
              of different criteria in a subsequent  run  of  BaitFilter.  As  input,  BaitFilter
              requires a bait file generated by the BaitFisher program or a BaitFile generated by
              a previous filtering run of BaitFilter. This bait file is  specified  with  the  -i
              command line parameter (see below).  Furthermore, the user has to specify an output
              file name with the -o parameter and a filter mode with the -m parameter.

              To convert a file to final and uploadable output format, see the -c option below.

              To compute a bait file statistics of an input file, see the -S option below.

              The different filter modes provided by BaitFilter are the following:

              1a) Retain only the best bait locus per alignment file. Criterion: Minimize  number
              of required baits.

              1b)  Retain only the best bait locus per alignment file. Criterion: Maximize number
              of sequences.

              2a) Retain only best bait locus per feature (requires that features  were  selected
              in BaitFisher). Criterion: Minimize number of required baits.

              2b)  Retain  only best bait locus per feature (requires that features were selected
              in BaitFisher). Criterion: Maximize number of sequences.

              3) Use a blast search of the bait sequences against a reference  genome  to  detect
              putative  non-unique  target  loci. Non unique target sites will have multiple good
              hits against the reference genome.   Furthermore, a minimum coverage  of  the  best
              blast hit of bait sequence against the genome can be specified. Note that all blast
              modes require additional command line parameters!  These modes remove bait  regions
              for   which  multiple  good  blast  hits  where  found  or  for  which  baits  have
              insufficiently long hits. Different versions of this mode are available:

              3a) If a single bait is not unique, remove all bait regions from the current gene.

              3b) If a single bait is not unique,  remove  all  bait  regions  from  the  current
              feature (if applicable).

              3c)  If a single bait is not unique, remove only the bait region that contains this
              bait.

              4) Thin out the given bait file: Retain only every Nth bait region, where N has  to
              be specified by the user. Two submodes are available:

              4a)  Thin  out bait regions by retaining only every Nth bait region in a bait file.
              The starting offset will by chosen such  that  the  number  of  required  baits  is
              minimized.

              4b)  Thin  out bait regions by retaining only every Nth bait region in a bait file.
              The starting offset will by chosen such that the number of sequences the result  is
              based on is maximized.

       Welcome to Bait-Filter, version 1.0.6.

       ./BaitFilter-v1.0.6  version: 1.0.6

SEE ALSO

       The  full  documentation  for BaitFilter-v1.0.6 is maintained as a Texinfo manual.  If the
       info and BaitFilter-v1.0.6 programs are properly installed at your site, the command

              info BaitFilter-v1.0.6

       should give you access to the complete manual.