bionic (1) tRNAscan-SE.1.gz

Provided by: trnascan-se_1.3.1-1_amd64 bug

NAME

       tRNAscan-SE - improved detection of transfer RNA genes in genomic sequence

SYNOPSIS

       tRNAscan-SE [options] seqfile(s)

DESCRIPTION

       tRNAscan-SE  searches  for  transfer  RNAs in genomic sequence seqfile(s) using three separate methods to
       achieve  a  combination  of  speed,  sensitivity,  and  selectivity  not  available  with  each   program
       individually.

       tRNAscan-SE  was  written  in  the  PERL  (version  5.0)  script  language.  Input consists of DNA or RNA
       sequences in FASTA format.  tRNA predictions are output in  standard  tabular,  ACeDB-compatible,  or  an
       extended  format  including  tRNA  secondary  structure  information.  tRNAscan-SE does no tRNA detection
       itself, but instead combines the strengths of three independent tRNA prediction programs  by  negotiating
       the  flow  of  information among them, performing a limited amount of post-processing, and outputting the
       result.

       tRNAscan-SE combines the specificity of the Cove probabilistic RNA prediction  package  (Eddy  &  Durbin,
       1994) with the speed and sensitivity of tRNAscan 1.3 (Fichant & Burks, 1991) plus an implementation of an
       algorithm described by Pavesi and colleagues (1994) which searches for eukaryotic pol III tRNA  promoters
       (our  implementation  referred  to  as  EufindtRNA).   tRNAscan  and  EufindtRNA  are  used as first-pass
       prefilters to identify "candidate" tRNA regions of the sequence.  These subsequences are then  passed  to
       Cove  for  further  analysis,  and  output  if  Cove  confirms the initial tRNA prediction.  In this way,
       tRNAscan-SE attains the best of both worlds: (1)  a  false  positive  rate  equally  low  to  using  Cove
       analysis, (2) the combined sensitivities of tRNAscan and EufindtRNA (detection of 99% of true tRNAs), and
       (3) search speed 1,000 to 3,000 times faster than Cove analysis and  30  to  90  times  faster  than  the
       original  tRNAscan  1.3  (tRNAscan-SE  uses  both  a code-optimized version of tRNAscan 1.3 which gives a
       650-fold increase in speed, and a fast C implementation of the Pavesi et al. algorithm).

       tRNAscan-SE was designed to make rapid,  sensitive  searches  of  genomic  sequence  feasible  using  the
       selectivity  of the Cove analysis package.  Search sensitivity was optimized with eukaryote cytoplasmic &
       eubacterial sequences, but it may be applied more broadly with a slight reduction in sensitivity.

       In the default tabular output format, each new tRNA in a sequence is consecutively numbered in the  'tRNA
       #'  column.   'tRNA  Bounds'  specify  the  starting (5') and ending (3') nucleotide bounds for the tRNA.
       tRNAs found on the reverse (lower) strand are indicated by having the Begin (5') bound greater  than  the
       End (3') bound.

       The 'tRNA Type' is the predicted amino acid charged to the tRNA molecule based on the predicted anticodon
       (written 5'->3') displayed in the next column.   tRNAs that fit criteria for potential pseudogenes  (poor
       primary  or  secondary  structure),  will  be  marked with "Pseudo" in the 'tRNA Type' column (pseudogene
       checking is further discussed in the Methods section of the program manual).  If  there  is  a  predicted
       intron  in  the  tRNA,  the  next  two  columns indicate the nucleotide bounds.  If there is no predicted
       intron, both of these columns contain zero.

       The final column is the Cove score for the tRNA in bits of information.  Specifically, it is  a  log-odds
       score:  the  log  of  the  ratio  of the probability of the sequence given the tRNA covariance model used
       (developed from hand-alignment of 1415 tRNAs), and the probability of the sequence given a simple  random
       sequence  model.   tRNAscan-SE  counts any sequence that attains a score of 20.0 bits or larger as a tRNA
       (based on empirical studies conducted by Eddy & Durbin in ref #2).

OPTIONS

       -h     Prints entire list of program options, each with a brief, one-line description.

       -P     This option selects the prokaryotic covariace model for tRNA  analysis,  and  loosens  the  search
              parameters  for  EufindtRNA  to  improve  detection  of  prokaryotic tRNAs.  Use of this mode with
              prokaryotic sequences will also improve  bounds  prediction  of  the  3'  end  (the  terminal  CAA
              triplet).

       -A     This  option  selects an archaeal-specific covariance model for tRNA analysis, as well as slightly
              loosening the EufindtRNA search cutoffs.

       -O     This parameter bypasses the fast first-pass scanners that are poor at detecting  organellar  tRNAs
              and  runs  Cove  analysis  only.   Since true organellar tRNAs have been found to have Cove scores
              between 15 and 20 bits, the search cutoff is  lowered  from  20  to  15  bits.   Also,  pseudogene
              checking  is  disabled  since  it  is  only applicable to eukaryotic cytoplasmic tRNA pseudogenes.
              Since Cove-only mode is used, searches will be very slow (see -C option  below)  relative  to  the
              default mode.

       -G     This  option  selects  the  general tRNA covariance model that was trained on tRNAs from all three
              phylogenetic domains (archaea, bacteria, & eukarya).  This mode can be used when analyzing a mixed
              collection  of  sequences  from  more  than  one  phylogenetic  domain,  with  only slight loss of
              sensitivity and selectivity.  The original publication describing  this  program  and  tRNAscan-SE
              version  1.0  used  this  general  tRNA model exclusively.  If you wish to compare scores to those
              found in the paper or scans using v1.0, use this option.  Use of this option  is  compatible  with
              all other search mode options described in this section.

       -C     Directs  tRNAscan-SE to analyze sequences using Cove analysis only.  This option allows a slightly
              more sensitive search than the default tRNAscan + EufindtRNA -> Cove mode, but is much slower  (by
              approx.  250  to 3,000 fold).  Output format and other program defaults are otherwise identical to
              the normal analysis.

       -H     This option displays the breakdown of the two components of the covariance model bit score.  Since
              tRNA  pseudogenes  often  have  one  very low component (good secondary structure but poor primary
              sequence similarity to the tRNA model, or vice versa), this information may be useful in  deciding
              whether  a  low-scoring  tRNA  is  likely  to be a pseudogene.  The heuristic pseudogene detection
              filter uses this information to flag possible pseudogenes -- use this option to see why a  hit  is
              marked  as  a possible pseudogene.  The user may wish to examine score breakdowns from known tRNAs
              in the organism of interest to get a frame of reference.

       -D     Manually disable checking tRNAs for poor primary or secondary structure scores often indicative of
              eukaryotic  pseudogenes.   This  will  slightly  speed  the  program  &  may be necessary for non-
              eukaryotic sequences that are flagged as possible pseudogenes  but  are  known  to  be  functional
              tRNAs.

       -o <file>
              Output final results to <file>.

       -f <file>
              Save  final  results  and Cove tRNA secondary structure predictions to <file>.  This output format
              makes visual inspection of individual tRNA predictions easier since the tRNA sequence is displayed
              along with the predicted tRNA base pairings.

       -a     Output final results in ACeDB format instead of the default tabular format.

       -m <file>
              Save  statistics  summary  for  run.   This option directs tRNAscan-SE to write a brief summary to
              <file> which contains the run options selected as well  as  statistics  on  the  number  of  tRNAs
              detected  at  each  phase  of the search, search speed, and other bits of information.  See Manual
              documentation for explanation of each statistic.

       -d     Display program progress.  Messages indicating which phase of  the  tRNA  search  are  printed  to
              standard  output.  If final results are also being sent to standard output, some of these messages
              will be suppressed so as to not interrupt display of the results.

       -l <file>
              Save log of program progress in <file>.  Identical to -d  option,  but  sends  message  to  <file>
              instead  of standard output.  Note: the -d option overrides the -l option if both are specified on
              the same command line.

       -q     Quiet mode: the credits & run  option  selections  normally  printed  to  standard  error  at  the
              beginning of each run are suppressed.

       -b     Use  brief  output  format.   This  eliminates  column headers that appear by default when writing
              results in tabular output format.  Useful if results are to be parsed or piped to another program.

       -N     This option causes tRNAscan-SE to output a tRNA's corresponding codon in place of its anticodon.

       -(Option)#
              The '#' symbol may be used as shorthand to specify "default" file names  for  output  files.   The
              default file names are constructed by using the input sequence file name, followed by an extension
              specifying the output file type <seqfile.ext> where '.ext' is:

              Extension   Option    Description
              ---------   ------    -----------
               .out        -o       final results
               .stats      -m       summary statistics file
               .log        -l       run progress file
               .ss         -f       secondary structures save file
               .fpass.out  -r       formatted, tabular output
                                    from first-pass scans
               .fpos       -F       FASTA file of tRNAs identified in                first-pass scans that  were
              found to be                false positives by Cove analysis

              Notes:

              1)  If  the  input sequence file name has the extensions '.fa' or '.seq', these extensions will be
              removed before using the filename as a prefix for default file names.  (example -- input file name
              Mygene.seq will have the output file name Mygene.out if the '-o#' option is used).

              2)  If  more  than  one  sequence file is specified on the command line, the "default" output file
              prefix will be the name of the FIRST sequence file on the command line.   Use  the  -p  option  to
              change  this  default name to something more appropriate when using more than one sequence file on
              the command line.

       -p <label>
              Use <label> prefix as the default output file prefix when using '#' for file  name  specification.
              <label> is used in place of the input sequence file name.

       -y     This option displays which of the first-pass scanners detected the tRNA being output.  "Ts", "Eu",
              or "Bo" will appear in the last column of Tabular output, indicating  that  either  tRNAscan  1.4,
              EufindtRNA, or both scanners detected the tRNA, respectively.

       -X <score>
              Set  Cove cutoff score for reporting tRNAs (default=20).  This option allows the user to specify a
              different Cove score threshold for reporting tRNAs.  It  is  not  recommended  that  novice  users
              change  this  cutoff,  as  a  lower cutoff score will increase the number of pseudogenes and other
              false positives found by tRNAscan-SE (especially when  used  with  the  "Cove  only"  scan  mode).
              Conversely,  a  higher cutoff than 20.0 bits will likely cause true tRNAs to be missed by tRNAscan
              (numerous "real" tRNAs have been found just above the 20.0 cutoff).  Knowledgable users  may  wish
              to  experiment  with  this  parameter  to find very unusual tRNAs or pseudogenes beyond the normal
              range of detection with the preceding caveats in mind.

       -L <length>
              Set max length of tRNA intron+variable region (default=116bp).  The default  maximum  tRNA  length
              for tRNAscan-SE is 192 bp, but this limit can be increased with this option to allow searches with
              no practical limit on tRNA length.  In the first phase of tRNAscan-SE, EufindtRNA searches  for  A
              and  B  boxes  of  <length>  maximum  distance  apart,  and passes only the 5' and 3' tRNA ends to
              covariance model analysis for confirmation (removing the  bulk  of  long  intervening  sequences).
              tRNAs  containing  group I and II introns have been detected by setting this parameter to over 800
              bp.  Caution: group I or II introns in tRNAs tend to occur in positions other than  the  canonical
              position  of  protein-spliced  introns, so tRNAscan-SE mispredicts the intron bounds and anticodon
              sequence for these cases.  tRNA bound predictions, however, have been  found  to  be  reliable  in
              these same tRNAs.

       -I <score>
              This  score  cutoff  affects the sensitivity of the first-pass scanner EufindtRNA.  This parameter
              should not need to be adjusted from its default values (variable depending on search mode), but is
              included  for  users  who  are  familiar  with  the  Pavesi et al. (1994) paper and wish to set it
              manually.  See Lowe & Eddy (1997) for details on parameter values used by tRNAscan-SE depending on
              the search mode.

       -B <number>
              By  default,  tRNAscan-SE adds 7 nucleotides to both ends of tRNA predictions when first-pass tRNA
              predictions are passed to covariance model (CM)  analysis.   CM  analysis  generally  trims  these
              bounds  back  down,  but on occassion, allows prediction of an otherwise truncated first-pass tRNA
              prediction.

       -g <file>
              Use exceptions to "universal" genetic code specified in <file>.  By default,  tRNAscan-SE  uses  a
              standard  universal  codon  ->  amino  acid  translation table that is specified at the end of the
              tRNAscan-SE.src source file.  This option allows the user to specify  exceptions  to  the  default
              translation  table.  The user may use any one of several alternate translation code files included
              in this package (see files 'gcode.*'), or create a new alternate  translation  file.   See  Manual
              documentation for specification of file format, or refer to included examples files.

              Note:  this option does not have any effect when using the -T or -E options -- you must be running
              in default or Cove only analysis mode.

       -c <file>
              For users who have developed their own tRNA covariance models using the Cove program "coveb"  (see
              Cove  documentation),  this  parameter allows substitution for the default tRNA covariance models.
              May be useful for extending Cove-only mode detection of particularly strange tRNA species such  as
              mitochondrial tRNAs.

       -Q     By default, if an output result file to be written to already exists, the user is prompted whether
              the file should be over-written or appended to.  Using this options  forces  overwriting  of  pre-
              existing  files  without an interactive prompt.  This option may be handy for batch-processing and
              running tRNAscan-SE in the background.

       -n <EXPR>
              Search only sequences with names matching <EXPR> string.  <EXPR>  may  contain  *  or  ?  wildcard
              characters,  but  the  user should remember to enclose these expressions in single quotes to avoid
              shell expansion.  Only those sequences with names (first non-white space word after ">" symbol  on
              FASTA name/description line) matching <EXPR> are analyzed for tRNAs.

       -s <EXPR>
              Start  search  at  first  sequence  with  name matching <EXPR> string and continue to end of input
              sequence file(s).  This may be useful for re-starting crashed/aborted runs at the point where  the
              previous  run  stopped.   (If  same  names  for output file(s) are used, program will ask if files
              should be over-written or appended to -- choose append and  run  will  successfully  be  restarted
              where it left off).

       -T     Directs  tRNAscan-SE  to  use only tRNAscan to analyze sequences.  This mode will default to using
              "strict" parameters with tRNAscan analysis (similar to tRNAscan version 1.3 operation).  This mode
              of  operation  is  faster  (3-5  times  faster  than  default  mode  analysis), but will result in
              approximately 0.2 to 0.6 false positive tRNAs per Mbp, decreased sensitivity,  and  less  reliable
              prediction of anticodons, tRNA isotype, and introns.

       -t <mode>
              Explicitly  set tRNAscan params, where <mode> = R or S (R=relaxed, S=strict tRNAscan v1.3 params).
              This option allows selection of strict or relaxed search parameters  for  tRNAscan  analysis.   By
              default, "strict" parameters are used.  Relaxed parameters may give very slightly increased search
              sensitivity, but increase search time by 20-40 fold.

       -E     Run EufindtRNA alone to search for tRNAs.  Since Cove is not being used as a secondary  filter  to
              remove  false  positives,  this  run  mode  defaults  to  "Normal"  parameters  which more closely
              approximates the sensitivity and selectivity of the original  algorithm  describe  by  Pavesi  and
              colleagues (see the next option, -e for a description of the various run modes).

       -e <mode>
              Explicitly  set  EufindtRNA  params,  where  <mode>= R, N, or S (relaxed, normal, or strict).  The
              "relaxed" mode is used for EufindtRNA when  using  tRNAscan-SE  in  default  mode.   With  relaxed
              parameters,  tRNAs  that  lack  pol  III  poly-T  terminators are not penalized, increasing search
              sensitivity, but decreasing selectivity.  When Cove analysis is being used as a  secondary  filter
              for false positives (as in tRNAscan-SE's default mode), overall selectivity is not decreased.

              Using  "normal"  parameters  with  EufindtRNA  does  incorporate a log odds score for the distance
              between the B box and the first poly-T terminator, but does not disqualify tRNAs that do not  have
              a terminator signal within 60 nucleotides.  This mode is used by default when Cove analysis is not
              being used as a secondary false positive filter.

              Using "strict" parameters with EufindtRNA also incorporates a log  odds  score  for  the  distance
              between  the  B  box  and the first poly-T terminator, but _rejects_ tRNAs that do not have such a
              signal within 60 nucleotides of the end of the B box.  This mode  most  closely  approximates  the
              originally  published search algorithm (3); sensitivity is reduced relative to using "relaxed" and
              "normal" modes, but selectivity is increased which is important if no secondary  filter,  such  as
              Cove  analysis,  is  being  used  to remove false positives.  This mode will miss most prokaryotic
              tRNAs since the poly-T terminator signal is a feature specific to eukaryotic tRNAs  genes  (always
              use "relaxed" mode for scanning prokaryotic sequences for tRNAs).

       -r <file>
              Save tabular, formatted output results from tRNAscan and/or EufindtRNA first pass scans in <file>.
              The format is similar to the final tabular output format, except no Cove  score  is  available  at
              this  point  in the search (if EufindtRNA has detected the tRNA, the negative log likelihood score
              is given).  Also, the sequence ID number and source sequence length appear in  the  columns  where
              intron  bounds  are shown in final output.  This option may be useful for examining false positive
              tRNAs predicted by first-pass scans that have been filtered out by Cove analysis.

       -u <file>
              This option allows the user to re-generate results from regions identified  to  have  tRNAs  by  a
              previous  tRNAscan-SE  run.   Either  a  regular  tabular result file, or output saved with the -r
              option may be used as the specified <file>.  This option is  particularly  useful  for  generating
              either  secondary  structure  output (-f option) or ACeDB output (-a option) without having to re-
              scan entire sequences.  Alternatively, if the -r option is used to generate the  previous  results
              file,  tRNAscan-SE  will  pick up at the stage of Cove-confirmation of tRNAs and output final tRNA
              predicitons as with a normal run.

              Note: the -n and -s options will not work in conjunction with this option.

       -F <file>
              Save first-pass candidate tRNAs in <file> that were then found  to  be  false  positives  by  Cove
              analysis.   This option saves candidate tRNAs found by either tRNAscan and/or EufindtRNA that were
              then rejected by Cove analysis as being false positives.  tRNAs are saved in  the  FASTA  sequence
              format.

       -M <file>
              This  option  may  be used when scanning a collection of known tRNA sequences to identify possible
              false negatives (incorreclty missed by tRNAscan-SE) or sequences incorrectly  annotated  as  tRNAs
              (correctly  passed  over by tRNAscan-SE).  Examination of primary & secondary structure covariance
              model scores (-H option), and visual inspection of secondary structures (use  -F  option)  may  be
              helpful resolving identification conflicts.

SEE ALSO

       User Manual and tutorial: Manual.ps (postscript), MANUAL (text)

BUGS

       No major bugs known.

NOTES

       This  software  and  documentation  is  Copyright  (C) 1996, Todd M.J. Lowe & Sean R. Eddy.  It is freely
       distributable under  terms  of  the  GNU  General  Public  License.  See  COPYING,  in  the  source  code
       distribution, for more details, or contact me.

       Todd Lowe
       Dept. of Genetics, Washington Univ. School of Medicine
       660 S. Euclid Box 8232
       St Louis, MO 63110 USA
       Phone: 1-314-362-7667
       FAX  : 1-314-362-2985
       Email: lowe@genetics.wustl.edu

REFERENCES

       1.  Fichant,  G.A.  and  Burks, C. (1991) "Identifying potential tRNA genes in genomic DNA sequences", J.
       Mol. Biol., 220, 659-671.

       2. Eddy, S.R. and Durbin, R. (1994) "RNA sequence analysis using covariance models",  Nucl.  Acids  Res.,
       22, 2079-2088.

       3.  Pavesi,  A.,  Conterio,  F.,  Bolchi,  A.,  Dieci,  G.,  Ottonello,  S. (1994) "Identification of new
       eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis  of  transcriptional
       control regions", Nucl. Acids Res., 22, 1247-1256.

       4. Lowe, T.M. & Eddy, S.R. (1997) "tRNAscan-SE: A program for improved detection of transfer RNA genes in
       genomic sequence", Nucl. Acids Res., 25, 955-964.