Provided by: trnascan-se_1.3.1-1_amd64 bug

NAME

       tRNAscan-SE - improved detection of transfer RNA genes in genomic sequence

SYNOPSIS

       tRNAscan-SE [options] seqfile(s)

DESCRIPTION

       tRNAscan-SE searches for transfer RNAs in genomic sequence seqfile(s) using three separate
       methods to achieve a combination of speed, sensitivity, and selectivity not available with
       each program individually.

       tRNAscan-SE  was written in the PERL (version 5.0) script language.  Input consists of DNA
       or RNA sequences in FASTA format.  tRNA predictions are output in standard tabular, ACeDB-
       compatible,  or  an  extended  format  including  tRNA  secondary  structure  information.
       tRNAscan-SE does no tRNA detection itself, but instead combines  the  strengths  of  three
       independent  tRNA  prediction  programs by negotiating the flow of information among them,
       performing a limited amount of post-processing, and outputting the result.

       tRNAscan-SE combines the specificity of the  Cove  probabilistic  RNA  prediction  package
       (Eddy  &  Durbin,  1994)  with the speed and sensitivity of tRNAscan 1.3 (Fichant & Burks,
       1991) plus an implementation of an algorithm described by  Pavesi  and  colleagues  (1994)
       which  searches  for  eukaryotic pol III tRNA promoters (our implementation referred to as
       EufindtRNA).  tRNAscan and EufindtRNA  are  used  as  first-pass  prefilters  to  identify
       "candidate"  tRNA regions of the sequence.  These subsequences are then passed to Cove for
       further analysis, and output if Cove confirms the initial tRNA prediction.  In  this  way,
       tRNAscan-SE  attains  the  best  of  both worlds: (1) a false positive rate equally low to
       using Cove analysis, (2) the combined sensitivities of tRNAscan and EufindtRNA  (detection
       of 99% of true tRNAs), and (3) search speed 1,000 to 3,000 times faster than Cove analysis
       and 30 to 90 times faster than the original tRNAscan 1.3 (tRNAscan-SE uses  both  a  code-
       optimized  version  of tRNAscan 1.3 which gives a 650-fold increase in speed, and a fast C
       implementation of the Pavesi et al. algorithm).

       tRNAscan-SE was designed to make rapid, sensitive searches of  genomic  sequence  feasible
       using the selectivity of the Cove analysis package.  Search sensitivity was optimized with
       eukaryote cytoplasmic & eubacterial sequences, but it may be applied more broadly  with  a
       slight reduction in sensitivity.

       In  the  default  tabular  output  format,  each  new  tRNA in a sequence is consecutively
       numbered in the 'tRNA #' column.  'tRNA Bounds' specify the starting (5') and ending  (3')
       nucleotide  bounds  for the tRNA.  tRNAs found on the reverse (lower) strand are indicated
       by having the Begin (5') bound greater than the End (3') bound.

       The 'tRNA Type' is the predicted amino acid charged to the  tRNA  molecule  based  on  the
       predicted  anticodon  (written  5'->3')  displayed  in  the  next column.   tRNAs that fit
       criteria for potential pseudogenes (poor primary or secondary structure), will  be  marked
       with  "Pseudo"  in the 'tRNA Type' column (pseudogene checking is further discussed in the
       Methods section of the program manual).  If there is a predicted intron in the  tRNA,  the
       next two columns indicate the nucleotide bounds.  If there is no predicted intron, both of
       these columns contain zero.

       The final column is the Cove score for the tRNA in bits of information.  Specifically,  it
       is  a  log-odds  score:  the log of the ratio of the probability of the sequence given the
       tRNA covariance model  used  (developed  from  hand-alignment  of  1415  tRNAs),  and  the
       probability  of the sequence given a simple random sequence model.  tRNAscan-SE counts any
       sequence that attains a score of 20.0 bits or larger as a tRNA (based on empirical studies
       conducted by Eddy & Durbin in ref #2).

OPTIONS

       -h     Prints entire list of program options, each with a brief, one-line description.

       -P     This  option selects the prokaryotic covariace model for tRNA analysis, and loosens
              the search parameters for EufindtRNA to improve  detection  of  prokaryotic  tRNAs.
              Use  of this mode with prokaryotic sequences will also improve bounds prediction of
              the 3' end (the terminal CAA triplet).

       -A     This option selects an archaeal-specific covariance model  for  tRNA  analysis,  as
              well as slightly loosening the EufindtRNA search cutoffs.

       -O     This  parameter  bypasses  the  fast first-pass scanners that are poor at detecting
              organellar tRNAs and runs Cove analysis only.  Since  true  organellar  tRNAs  have
              been found to have Cove scores between 15 and 20 bits, the search cutoff is lowered
              from 20 to 15 bits.  Also,  pseudogene  checking  is  disabled  since  it  is  only
              applicable  to  eukaryotic  cytoplasmic  tRNA pseudogenes.  Since Cove-only mode is
              used, searches will be very slow (see -C option  below)  relative  to  the  default
              mode.

       -G     This  option  selects  the  general tRNA covariance model that was trained on tRNAs
              from all three phylogenetic domains (archaea, bacteria, & eukarya).  This mode  can
              be  used  when  analyzing  a  mixed  collection  of  sequences  from  more than one
              phylogenetic domain, with only slight loss of  sensitivity  and  selectivity.   The
              original  publication describing this program and tRNAscan-SE version 1.0 used this
              general tRNA model exclusively.  If you wish to compare scores to  those  found  in
              the  paper  or scans using v1.0, use this option.  Use of this option is compatible
              with all other search mode options described in this section.

       -C     Directs tRNAscan-SE to analyze sequences using Cove  analysis  only.   This  option
              allows  a  slightly more sensitive search than the default tRNAscan + EufindtRNA ->
              Cove mode, but is much slower (by approx. 250 to 3,000 fold).   Output  format  and
              other program defaults are otherwise identical to the normal analysis.

       -H     This  option  displays  the breakdown of the two components of the covariance model
              bit score.  Since  tRNA  pseudogenes  often  have  one  very  low  component  (good
              secondary structure but poor primary sequence similarity to the tRNA model, or vice
              versa), this information may be useful in deciding whether a  low-scoring  tRNA  is
              likely  to  be  a  pseudogene.  The heuristic pseudogene detection filter uses this
              information to flag possible pseudogenes -- use this option to see  why  a  hit  is
              marked  as  a  possible  pseudogene.  The user may wish to examine score breakdowns
              from known tRNAs in the organism of interest to get a frame of reference.

       -D     Manually disable checking tRNAs for poor  primary  or  secondary  structure  scores
              often indicative of eukaryotic pseudogenes.  This will slightly speed the program &
              may be  necessary  for  non-eukaryotic  sequences  that  are  flagged  as  possible
              pseudogenes but are known to be functional tRNAs.

       -o <file>
              Output final results to <file>.

       -f <file>
              Save  final  results and Cove tRNA secondary structure predictions to <file>.  This
              output format makes visual inspection of individual tRNA predictions  easier  since
              the tRNA sequence is displayed along with the predicted tRNA base pairings.

       -a     Output final results in ACeDB format instead of the default tabular format.

       -m <file>
              Save  statistics summary for run.  This option directs tRNAscan-SE to write a brief
              summary to <file> which contains the run options selected as well as statistics  on
              the  number  of tRNAs detected at each phase of the search, search speed, and other
              bits of information.  See Manual documentation for explanation of each statistic.

       -d     Display program progress.  Messages indicating which phase of the tRNA  search  are
              printed  to  standard  output.  If  final  results  are also being sent to standard
              output, some of these messages will be suppressed so as to not interrupt display of
              the results.

       -l <file>
              Save  log of program progress in <file>.  Identical to -d option, but sends message
              to <file> instead of standard output.  Note: the -d option overrides the -l  option
              if both are specified on the same command line.

       -q     Quiet  mode: the credits & run option selections normally printed to standard error
              at the beginning of each run are suppressed.

       -b     Use brief output format.  This eliminates column headers  that  appear  by  default
              when  writing results in tabular output format.  Useful if results are to be parsed
              or piped to another program.

       -N     This option causes tRNAscan-SE to output a tRNA's corresponding codon in  place  of
              its anticodon.

       -(Option)#
              The  '#' symbol may be used as shorthand to specify "default" file names for output
              files.  The default file names are constructed by using  the  input  sequence  file
              name,  followed by an extension specifying the output file type <seqfile.ext> where
              '.ext' is:

              Extension   Option    Description
              ---------   ------    -----------
               .out        -o       final results
               .stats      -m       summary statistics file
               .log        -l       run progress file
               .ss         -f       secondary structures save file
               .fpass.out  -r       formatted, tabular output
                                    from first-pass scans
               .fpos       -F       FASTA file of tRNAs identified  in                 first-pass
              scans that were found to be                false positives by Cove analysis

              Notes:

              1)  If  the  input  sequence  file  name  has the extensions '.fa' or '.seq', these
              extensions will be removed before using the filename as a prefix for  default  file
              names.   (example  --  input  file  name  Mygene.seq will have the output file name
              Mygene.out if the '-o#' option is used).

              2) If more than one sequence file is specified on the command line,  the  "default"
              output file prefix will be the name of the FIRST sequence file on the command line.
              Use the -p option to change this default name to something  more  appropriate  when
              using more than one sequence file on the command line.

       -p <label>
              Use  <label>  prefix as the default output file prefix when using '#' for file name
              specification.  <label> is used in place of the input sequence file name.

       -y     This option displays which of the  first-pass  scanners  detected  the  tRNA  being
              output.   "Ts",  "Eu",  or  "Bo"  will appear in the last column of Tabular output,
              indicating that either tRNAscan 1.4, EufindtRNA,  or  both  scanners  detected  the
              tRNA, respectively.

       -X <score>
              Set  Cove  cutoff  score  for reporting tRNAs (default=20).  This option allows the
              user to specify a different Cove score threshold for reporting tRNAs.   It  is  not
              recommended  that  novice  users  change  this cutoff, as a lower cutoff score will
              increase the number of pseudogenes and other false positives found  by  tRNAscan-SE
              (especially when used with the "Cove only" scan mode).  Conversely, a higher cutoff
              than 20.0 bits will likely cause true tRNAs to  be  missed  by  tRNAscan  (numerous
              "real"  tRNAs  have been found just above the 20.0 cutoff).  Knowledgable users may
              wish to experiment with this parameter to find very unusual  tRNAs  or  pseudogenes
              beyond the normal range of detection with the preceding caveats in mind.

       -L <length>
              Set max length of tRNA intron+variable region (default=116bp).  The default maximum
              tRNA length for tRNAscan-SE is 192 bp, but this limit can be  increased  with  this
              option  to  allow  searches  with  no practical limit on tRNA length.  In the first
              phase of tRNAscan-SE, EufindtRNA searches for A and B  boxes  of  <length>  maximum
              distance  apart,  and  passes  only  the  5'  and  3' tRNA ends to covariance model
              analysis for confirmation (removing the bulk of long intervening sequences).  tRNAs
              containing  group  I and II introns have been detected by setting this parameter to
              over 800 bp.  Caution: group I or II introns in tRNAs tend to  occur  in  positions
              other  than  the  canonical  position  of  protein-spliced  introns, so tRNAscan-SE
              mispredicts the intron bounds and anticodon sequence for these cases.   tRNA  bound
              predictions, however, have been found to be reliable in these same tRNAs.

       -I <score>
              This  score  cutoff  affects  the sensitivity of the first-pass scanner EufindtRNA.
              This parameter should not need to be adjusted from  its  default  values  (variable
              depending  on  search  mode),  but  is included for users who are familiar with the
              Pavesi et al. (1994) paper and wish to set it manually.  See Lowe & Eddy (1997) for
              details on parameter values used by tRNAscan-SE depending on the search mode.

       -B <number>
              By  default,  tRNAscan-SE  adds 7 nucleotides to both ends of tRNA predictions when
              first-pass tRNA predictions are passed  to  covariance  model  (CM)  analysis.   CM
              analysis  generally  trims  these  bounds  back  down,  but  on  occassion,  allows
              prediction of an otherwise truncated first-pass tRNA prediction.

       -g <file>
              Use exceptions to "universal"  genetic  code  specified  in  <file>.   By  default,
              tRNAscan-SE uses a standard universal codon -> amino acid translation table that is
              specified at the end of the tRNAscan-SE.src source file.  This  option  allows  the
              user  to specify exceptions to the default translation table.  The user may use any
              one of several alternate translation code files included in this package (see files
              'gcode.*'),  or  create a new alternate translation file.  See Manual documentation
              for specification of file format, or refer to included examples files.

              Note: this option does not have any effect when using the -T or -E options  --  you
              must be running in default or Cove only analysis mode.

       -c <file>
              For  users  who  have  developed  their  own  tRNA covariance models using the Cove
              program "coveb" (see Cove documentation), this parameter  allows  substitution  for
              the  default  tRNA  covariance  models.  May be useful for extending Cove-only mode
              detection of particularly strange tRNA species such as mitochondrial tRNAs.

       -Q     By default, if an output result file to be written to already exists, the  user  is
              prompted  whether  the  file  should  be  over-written  or appended to.  Using this
              options forces overwriting of pre-existing files  without  an  interactive  prompt.
              This  option  may  be  handy  for  batch-processing  and running tRNAscan-SE in the
              background.

       -n <EXPR>
              Search only sequences with names matching <EXPR> string.  <EXPR> may contain * or ?
              wildcard  characters,  but the user should remember to enclose these expressions in
              single quotes to avoid shell expansion.  Only those  sequences  with  names  (first
              non-white  space  word  after  ">"  symbol on FASTA name/description line) matching
              <EXPR> are analyzed for tRNAs.

       -s <EXPR>
              Start search at first sequence with name matching <EXPR> string and continue to end
              of input sequence file(s).  This may be useful for re-starting crashed/aborted runs
              at the point where the previous run stopped.  (If same names for output file(s) are
              used,  program  will  ask  if files should be over-written or appended to -- choose
              append and run will successfully be restarted where it left off).

       -T     Directs tRNAscan-SE to use only tRNAscan to  analyze  sequences.   This  mode  will
              default  to  using  "strict" parameters with tRNAscan analysis (similar to tRNAscan
              version 1.3 operation).  This mode of operation is faster (3-5  times  faster  than
              default  mode analysis), but will result in approximately 0.2 to 0.6 false positive
              tRNAs per Mbp, decreased sensitivity, and less reliable prediction  of  anticodons,
              tRNA isotype, and introns.

       -t <mode>
              Explicitly set tRNAscan params, where <mode> = R or S (R=relaxed, S=strict tRNAscan
              v1.3 params).  This option allows selection of strict or relaxed search  parameters
              for  tRNAscan  analysis.   By  default,  "strict"  parameters  are  used.   Relaxed
              parameters may give very slightly increased search sensitivity, but increase search
              time by 20-40 fold.

       -E     Run  EufindtRNA  alone  to  search  for  tRNAs.   Since Cove is not being used as a
              secondary filter to remove false positives, this  run  mode  defaults  to  "Normal"
              parameters  which  more closely approximates the sensitivity and selectivity of the
              original algorithm describe by Pavesi and colleagues (see the next option, -e for a
              description of the various run modes).

       -e <mode>
              Explicitly  set  EufindtRNA  params,  where <mode>= R, N, or S (relaxed, normal, or
              strict).  The "relaxed" mode is used  for  EufindtRNA  when  using  tRNAscan-SE  in
              default  mode.  With relaxed parameters, tRNAs that lack pol III poly-T terminators
              are not penalized, increasing search sensitivity, but decreasing selectivity.  When
              Cove  analysis  is  being  used  as  a  secondary filter for false positives (as in
              tRNAscan-SE's default mode), overall selectivity is not decreased.

              Using "normal" parameters with EufindtRNA does incorporate a log odds score for the
              distance between the B box and the first poly-T terminator, but does not disqualify
              tRNAs that do not have a terminator signal within 60  nucleotides.   This  mode  is
              used  by default when Cove analysis is not being used as a secondary false positive
              filter.

              Using "strict" parameters with EufindtRNA also incorporates a log  odds  score  for
              the distance between the B box and the first poly-T terminator, but _rejects_ tRNAs
              that do not have such a signal within 60 nucleotides of the end of the B box.  This
              mode  most  closely  approximates  the  originally  published search algorithm (3);
              sensitivity is  reduced  relative  to  using  "relaxed"  and  "normal"  modes,  but
              selectivity  is  increased  which is important if no secondary filter, such as Cove
              analysis, is being used to remove  false  positives.   This  mode  will  miss  most
              prokaryotic  tRNAs  since  the  poly-T  terminator  signal is a feature specific to
              eukaryotic  tRNAs  genes  (always  use  "relaxed"  mode  for  scanning  prokaryotic
              sequences for tRNAs).

       -r <file>
              Save  tabular,  formatted output results from tRNAscan and/or EufindtRNA first pass
              scans in <file>.  The format is similar to the final tabular output format,  except
              no  Cove score is available at this point in the search (if EufindtRNA has detected
              the tRNA, the negative log likelihood score  is  given).   Also,  the  sequence  ID
              number  and  source  sequence  length appear in the columns where intron bounds are
              shown in final output.  This option may be  useful  for  examining  false  positive
              tRNAs predicted by first-pass scans that have been filtered out by Cove analysis.

       -u <file>
              This  option allows the user to re-generate results from regions identified to have
              tRNAs by a previous tRNAscan-SE run.  Either a  regular  tabular  result  file,  or
              output  saved  with the -r option may be used as the specified <file>.  This option
              is particularly useful for generating either secondary structure output (-f option)
              or   ACeDB   output  (-a  option)  without  having  to  re-scan  entire  sequences.
              Alternatively, if the -r option is used to  generate  the  previous  results  file,
              tRNAscan-SE  will  pick  up  at  the stage of Cove-confirmation of tRNAs and output
              final tRNA predicitons as with a normal run.

              Note: the -n and -s options will not work in conjunction with this option.

       -F <file>
              Save first-pass candidate tRNAs  in  <file>  that  were  then  found  to  be  false
              positives  by  Cove  analysis.   This  option saves candidate tRNAs found by either
              tRNAscan and/or EufindtRNA that were then rejected by Cove analysis as being  false
              positives.  tRNAs are saved in the FASTA sequence format.

       -M <file>
              This  option  may  be  used  when  scanning a collection of known tRNA sequences to
              identify possible false negatives (incorreclty missed by tRNAscan-SE) or  sequences
              incorrectly annotated as tRNAs (correctly passed over by tRNAscan-SE).  Examination
              of primary & secondary structure covariance model scores (-H  option),  and  visual
              inspection  of  secondary  structures  (use  -F  option)  may  be helpful resolving
              identification conflicts.

SEE ALSO

       User Manual and tutorial: Manual.ps (postscript), MANUAL (text)

BUGS

       No major bugs known.

NOTES

       This software and documentation is Copyright (C) 1996, Todd M.J. Lowe & Sean R. Eddy.   It
       is freely distributable under terms of the GNU General Public License. See COPYING, in the
       source code distribution, for more details, or contact me.

       Todd Lowe
       Dept. of Genetics, Washington Univ. School of Medicine
       660 S. Euclid Box 8232
       St Louis, MO 63110 USA
       Phone: 1-314-362-7667
       FAX  : 1-314-362-2985
       Email: lowe@genetics.wustl.edu

REFERENCES

       1. Fichant, G.A. and Burks, C. (1991) "Identifying potential tRNA  genes  in  genomic  DNA
       sequences", J. Mol. Biol., 220, 659-671.

       2. Eddy, S.R. and Durbin, R. (1994) "RNA sequence analysis using covariance models", Nucl.
       Acids Res., 22, 2079-2088.

       3. Pavesi, A., Conterio, F., Bolchi, A., Dieci, G., Ottonello, S.  (1994)  "Identification
       of  new  eukaryotic  tRNA  genes  in  genomic  DNA  databases by a multistep weight matrix
       analysis of transcriptional control regions", Nucl. Acids Res., 22, 1247-1256.

       4. Lowe, T.M. & Eddy, S.R. (1997)  "tRNAscan-SE:  A  program  for  improved  detection  of
       transfer RNA genes in genomic sequence", Nucl. Acids Res., 25, 955-964.