Provided by: trnascan-se_1.3.1-1_amd64 bug

NAME

       tRNAscan-SE - improved detection of transfer RNA genes in genomic sequence

SYNOPSIS

       tRNAscan-SE [options] seqfile(s)

DESCRIPTION

       tRNAscan-SE  searches  for  transfer  RNAs in genomic sequence seqfile(s) using three separate methods to
       achieve  a  combination  of  speed,  sensitivity,  and  selectivity  not  available  with  each   program
       individually.

       tRNAscan-SE  was  written  in  the  PERL  (version  5.0)  script  language.  Input consists of DNA or RNA
       sequences in FASTA format.  tRNA predictions are output in  standard  tabular,  ACeDB-compatible,  or  an
       extended  format  including  tRNA  secondary  structure  information.  tRNAscan-SE does no tRNA detection
       itself, but instead combines the strengths of three independent tRNA prediction programs  by  negotiating
       the  flow  of  information among them, performing a limited amount of post-processing, and outputting the
       result.

       tRNAscan-SE combines the specificity of the Cove probabilistic RNA prediction  package  (Eddy  &  Durbin,
       1994) with the speed and sensitivity of tRNAscan 1.3 (Fichant & Burks, 1991) plus an implementation of an
       algorithm  described by Pavesi and colleagues (1994) which searches for eukaryotic pol III tRNA promoters
       (our implementation referred  to  as  EufindtRNA).   tRNAscan  and  EufindtRNA  are  used  as  first-pass
       prefilters  to  identify "candidate" tRNA regions of the sequence.  These subsequences are then passed to
       Cove for further analysis, and output if Cove  confirms  the  initial  tRNA  prediction.   In  this  way,
       tRNAscan-SE  attains  the  best  of  both  worlds:  (1)  a  false positive rate equally low to using Cove
       analysis, (2) the combined sensitivities of tRNAscan and EufindtRNA (detection of 99% of true tRNAs), and
       (3) search speed 1,000 to 3,000 times faster than Cove analysis and  30  to  90  times  faster  than  the
       original  tRNAscan  1.3  (tRNAscan-SE  uses  both  a code-optimized version of tRNAscan 1.3 which gives a
       650-fold increase in speed, and a fast C implementation of the Pavesi et al. algorithm).

       tRNAscan-SE was designed to make rapid,  sensitive  searches  of  genomic  sequence  feasible  using  the
       selectivity  of the Cove analysis package.  Search sensitivity was optimized with eukaryote cytoplasmic &
       eubacterial sequences, but it may be applied more broadly with a slight reduction in sensitivity.

       In the default tabular output format, each new tRNA in a sequence is consecutively numbered in the  'tRNA
       #'  column.   'tRNA  Bounds'  specify  the  starting (5') and ending (3') nucleotide bounds for the tRNA.
       tRNAs found on the reverse (lower) strand are indicated by having the Begin (5') bound greater  than  the
       End (3') bound.

       The 'tRNA Type' is the predicted amino acid charged to the tRNA molecule based on the predicted anticodon
       (written  5'->3') displayed in the next column.   tRNAs that fit criteria for potential pseudogenes (poor
       primary or secondary structure), will be marked with "Pseudo"  in  the  'tRNA  Type'  column  (pseudogene
       checking  is  further  discussed  in the Methods section of the program manual).  If there is a predicted
       intron in the tRNA, the next two columns indicate the  nucleotide  bounds.   If  there  is  no  predicted
       intron, both of these columns contain zero.

       The  final  column is the Cove score for the tRNA in bits of information.  Specifically, it is a log-odds
       score: the log of the ratio of the probability of the sequence  given  the  tRNA  covariance  model  used
       (developed  from hand-alignment of 1415 tRNAs), and the probability of the sequence given a simple random
       sequence model.  tRNAscan-SE counts any sequence that attains a score of 20.0 bits or larger  as  a  tRNA
       (based on empirical studies conducted by Eddy & Durbin in ref #2).

OPTIONS

       -h     Prints entire list of program options, each with a brief, one-line description.

       -P     This  option  selects  the  prokaryotic  covariace model for tRNA analysis, and loosens the search
              parameters for EufindtRNA to improve detection of  prokaryotic  tRNAs.   Use  of  this  mode  with
              prokaryotic  sequences  will  also  improve  bounds  prediction  of  the  3' end (the terminal CAA
              triplet).

       -A     This option selects an archaeal-specific covariance model for tRNA analysis, as well  as  slightly
              loosening the EufindtRNA search cutoffs.

       -O     This  parameter  bypasses the fast first-pass scanners that are poor at detecting organellar tRNAs
              and runs Cove analysis only.  Since true organellar tRNAs have been  found  to  have  Cove  scores
              between  15  and  20  bits,  the  search  cutoff  is lowered from 20 to 15 bits.  Also, pseudogene
              checking is disabled since it is only  applicable  to  eukaryotic  cytoplasmic  tRNA  pseudogenes.
              Since  Cove-only  mode  is  used, searches will be very slow (see -C option below) relative to the
              default mode.

       -G     This option selects the general tRNA covariance model that was trained on  tRNAs  from  all  three
              phylogenetic domains (archaea, bacteria, & eukarya).  This mode can be used when analyzing a mixed
              collection  of  sequences  from  more  than  one  phylogenetic  domain,  with  only slight loss of
              sensitivity and selectivity.  The original publication describing  this  program  and  tRNAscan-SE
              version  1.0  used  this  general  tRNA model exclusively.  If you wish to compare scores to those
              found in the paper or scans using v1.0, use this option.  Use of this option  is  compatible  with
              all other search mode options described in this section.

       -C     Directs  tRNAscan-SE to analyze sequences using Cove analysis only.  This option allows a slightly
              more sensitive search than the default tRNAscan + EufindtRNA -> Cove mode, but is much slower  (by
              approx.  250  to 3,000 fold).  Output format and other program defaults are otherwise identical to
              the normal analysis.

       -H     This option displays the breakdown of the two components of the covariance model bit score.  Since
              tRNA pseudogenes often have one very low component (good  secondary  structure  but  poor  primary
              sequence  similarity to the tRNA model, or vice versa), this information may be useful in deciding
              whether a low-scoring tRNA is likely to be  a  pseudogene.   The  heuristic  pseudogene  detection
              filter  uses  this information to flag possible pseudogenes -- use this option to see why a hit is
              marked as a possible pseudogene.  The user may wish to examine score breakdowns from  known  tRNAs
              in the organism of interest to get a frame of reference.

       -D     Manually disable checking tRNAs for poor primary or secondary structure scores often indicative of
              eukaryotic  pseudogenes.   This  will  slightly  speed  the  program  &  may be necessary for non-
              eukaryotic sequences that are flagged as possible pseudogenes  but  are  known  to  be  functional
              tRNAs.

       -o <file>
              Output final results to <file>.

       -f <file>
              Save  final  results  and Cove tRNA secondary structure predictions to <file>.  This output format
              makes visual inspection of individual tRNA predictions easier since the tRNA sequence is displayed
              along with the predicted tRNA base pairings.

       -a     Output final results in ACeDB format instead of the default tabular format.

       -m <file>
              Save statistics summary for run.  This option directs tRNAscan-SE to  write  a  brief  summary  to
              <file>  which  contains  the  run  options  selected  as well as statistics on the number of tRNAs
              detected at each phase of the search, search speed, and other bits  of  information.   See  Manual
              documentation for explanation of each statistic.

       -d     Display  program  progress.   Messages  indicating  which  phase of the tRNA search are printed to
              standard output. If final results are also being sent to standard output, some of  these  messages
              will be suppressed so as to not interrupt display of the results.

       -l <file>
              Save  log  of  program  progress  in  <file>.  Identical to -d option, but sends message to <file>
              instead of standard output.  Note: the -d option overrides the -l option if both are specified  on
              the same command line.

       -q     Quiet  mode:  the  credits  &  run  option  selections  normally  printed to standard error at the
              beginning of each run are suppressed.

       -b     Use brief output format.  This eliminates column headers  that  appear  by  default  when  writing
              results in tabular output format.  Useful if results are to be parsed or piped to another program.

       -N     This option causes tRNAscan-SE to output a tRNA's corresponding codon in place of its anticodon.

       -(Option)#
              The  '#'  symbol  may  be used as shorthand to specify "default" file names for output files.  The
              default file names are constructed by using the input sequence file name, followed by an extension
              specifying the output file type <seqfile.ext> where '.ext' is:

              Extension   Option    Description
              ---------   ------    -----------
               .out        -o       final results
               .stats      -m       summary statistics file
               .log        -l       run progress file
               .ss         -f       secondary structures save file
               .fpass.out  -r       formatted, tabular output
                                    from first-pass scans
               .fpos       -F       FASTA file of tRNAs identified in                first-pass scans that  were
              found to be                false positives by Cove analysis

              Notes:

              1)  If  the  input sequence file name has the extensions '.fa' or '.seq', these extensions will be
              removed before using the filename as a prefix for default file names.  (example -- input file name
              Mygene.seq will have the output file name Mygene.out if the '-o#' option is used).

              2) If more than one sequence file is specified on the command  line,  the  "default"  output  file
              prefix  will  be  the  name  of the FIRST sequence file on the command line.  Use the -p option to
              change this default name to something more appropriate when using more than one sequence  file  on
              the command line.

       -p <label>
              Use  <label>  prefix as the default output file prefix when using '#' for file name specification.
              <label> is used in place of the input sequence file name.

       -y     This option displays which of the first-pass scanners detected the tRNA being output.  "Ts", "Eu",
              or "Bo" will appear in the last column of Tabular output, indicating  that  either  tRNAscan  1.4,
              EufindtRNA, or both scanners detected the tRNA, respectively.

       -X <score>
              Set  Cove cutoff score for reporting tRNAs (default=20).  This option allows the user to specify a
              different Cove score threshold for reporting tRNAs.  It  is  not  recommended  that  novice  users
              change  this  cutoff,  as  a  lower cutoff score will increase the number of pseudogenes and other
              false positives found by tRNAscan-SE (especially when  used  with  the  "Cove  only"  scan  mode).
              Conversely,  a  higher cutoff than 20.0 bits will likely cause true tRNAs to be missed by tRNAscan
              (numerous "real" tRNAs have been found just above the 20.0 cutoff).  Knowledgable users  may  wish
              to  experiment  with  this  parameter  to find very unusual tRNAs or pseudogenes beyond the normal
              range of detection with the preceding caveats in mind.

       -L <length>
              Set max length of tRNA intron+variable region (default=116bp).  The default  maximum  tRNA  length
              for tRNAscan-SE is 192 bp, but this limit can be increased with this option to allow searches with
              no  practical  limit on tRNA length.  In the first phase of tRNAscan-SE, EufindtRNA searches for A
              and B boxes of <length> maximum distance apart, and passes  only  the  5'  and  3'  tRNA  ends  to
              covariance  model  analysis  for  confirmation  (removing the bulk of long intervening sequences).
              tRNAs containing group I and II introns have been detected by setting this parameter to  over  800
              bp.   Caution:  group I or II introns in tRNAs tend to occur in positions other than the canonical
              position of protein-spliced introns, so tRNAscan-SE mispredicts the intron  bounds  and  anticodon
              sequence  for  these  cases.   tRNA  bound predictions, however, have been found to be reliable in
              these same tRNAs.

       -I <score>
              This score cutoff affects the sensitivity of the first-pass scanner  EufindtRNA.   This  parameter
              should not need to be adjusted from its default values (variable depending on search mode), but is
              included  for  users  who  are  familiar  with  the  Pavesi et al. (1994) paper and wish to set it
              manually.  See Lowe & Eddy (1997) for details on parameter values used by tRNAscan-SE depending on
              the search mode.

       -B <number>
              By default, tRNAscan-SE adds 7 nucleotides to both ends of tRNA predictions when  first-pass  tRNA
              predictions  are  passed  to  covariance  model  (CM) analysis.  CM analysis generally trims these
              bounds back down, but on occassion, allows prediction of an otherwise  truncated  first-pass  tRNA
              prediction.

       -g <file>
              Use  exceptions  to  "universal" genetic code specified in <file>.  By default, tRNAscan-SE uses a
              standard universal codon -> amino acid translation table that is  specified  at  the  end  of  the
              tRNAscan-SE.src  source  file.   This  option allows the user to specify exceptions to the default
              translation table.  The user may use any one of several alternate translation code files  included
              in  this  package  (see  files 'gcode.*'), or create a new alternate translation file.  See Manual
              documentation for specification of file format, or refer to included examples files.

              Note: this option does not have any effect when using the -T or -E options -- you must be  running
              in default or Cove only analysis mode.

       -c <file>
              For  users who have developed their own tRNA covariance models using the Cove program "coveb" (see
              Cove documentation), this parameter allows substitution for the default  tRNA  covariance  models.
              May  be useful for extending Cove-only mode detection of particularly strange tRNA species such as
              mitochondrial tRNAs.

       -Q     By default, if an output result file to be written to already exists, the user is prompted whether
              the file should be over-written or appended to.  Using this options  forces  overwriting  of  pre-
              existing  files  without an interactive prompt.  This option may be handy for batch-processing and
              running tRNAscan-SE in the background.

       -n <EXPR>
              Search only sequences with names matching <EXPR> string.  <EXPR>  may  contain  *  or  ?  wildcard
              characters,  but  the  user should remember to enclose these expressions in single quotes to avoid
              shell expansion.  Only those sequences with names (first non-white space word after ">" symbol  on
              FASTA name/description line) matching <EXPR> are analyzed for tRNAs.

       -s <EXPR>
              Start  search  at  first  sequence  with  name matching <EXPR> string and continue to end of input
              sequence file(s).  This may be useful for re-starting crashed/aborted runs at the point where  the
              previous  run  stopped.   (If  same  names  for output file(s) are used, program will ask if files
              should be over-written or appended to -- choose append and  run  will  successfully  be  restarted
              where it left off).

       -T     Directs  tRNAscan-SE  to  use only tRNAscan to analyze sequences.  This mode will default to using
              "strict" parameters with tRNAscan analysis (similar to tRNAscan version 1.3 operation).  This mode
              of operation is faster (3-5  times  faster  than  default  mode  analysis),  but  will  result  in
              approximately  0.2  to  0.6 false positive tRNAs per Mbp, decreased sensitivity, and less reliable
              prediction of anticodons, tRNA isotype, and introns.

       -t <mode>
              Explicitly set tRNAscan params, where <mode> = R or S (R=relaxed, S=strict tRNAscan v1.3  params).
              This  option  allows  selection  of strict or relaxed search parameters for tRNAscan analysis.  By
              default, "strict" parameters are used.  Relaxed parameters may give very slightly increased search
              sensitivity, but increase search time by 20-40 fold.

       -E     Run EufindtRNA alone to search for tRNAs.  Since Cove is not being used as a secondary  filter  to
              remove  false  positives,  this  run  mode  defaults  to  "Normal"  parameters  which more closely
              approximates the sensitivity and selectivity of the original  algorithm  describe  by  Pavesi  and
              colleagues (see the next option, -e for a description of the various run modes).

       -e <mode>
              Explicitly  set  EufindtRNA  params,  where  <mode>= R, N, or S (relaxed, normal, or strict).  The
              "relaxed" mode is used for EufindtRNA when  using  tRNAscan-SE  in  default  mode.   With  relaxed
              parameters,  tRNAs  that  lack  pol  III  poly-T  terminators are not penalized, increasing search
              sensitivity, but decreasing selectivity.  When Cove analysis is being used as a  secondary  filter
              for false positives (as in tRNAscan-SE's default mode), overall selectivity is not decreased.

              Using  "normal"  parameters  with  EufindtRNA  does  incorporate a log odds score for the distance
              between the B box and the first poly-T terminator, but does not disqualify tRNAs that do not  have
              a terminator signal within 60 nucleotides.  This mode is used by default when Cove analysis is not
              being used as a secondary false positive filter.

              Using  "strict"  parameters  with  EufindtRNA  also incorporates a log odds score for the distance
              between the B box and the first poly-T terminator, but _rejects_ tRNAs that do  not  have  such  a
              signal  within  60  nucleotides  of the end of the B box.  This mode most closely approximates the
              originally published search algorithm (3); sensitivity is reduced relative to using "relaxed"  and
              "normal"  modes,  but  selectivity is increased which is important if no secondary filter, such as
              Cove analysis, is being used to remove false positives.  This  mode  will  miss  most  prokaryotic
              tRNAs  since  the poly-T terminator signal is a feature specific to eukaryotic tRNAs genes (always
              use "relaxed" mode for scanning prokaryotic sequences for tRNAs).

       -r <file>
              Save tabular, formatted output results from tRNAscan and/or EufindtRNA first pass scans in <file>.
              The format is similar to the final tabular output format, except no Cove  score  is  available  at
              this  point  in the search (if EufindtRNA has detected the tRNA, the negative log likelihood score
              is given).  Also, the sequence ID number and source sequence length appear in  the  columns  where
              intron  bounds  are shown in final output.  This option may be useful for examining false positive
              tRNAs predicted by first-pass scans that have been filtered out by Cove analysis.

       -u <file>
              This option allows the user to re-generate results from regions identified  to  have  tRNAs  by  a
              previous  tRNAscan-SE  run.   Either  a  regular  tabular result file, or output saved with the -r
              option may be used as the specified <file>.  This option is  particularly  useful  for  generating
              either  secondary  structure  output (-f option) or ACeDB output (-a option) without having to re-
              scan entire sequences.  Alternatively, if the -r option is used to generate the  previous  results
              file,  tRNAscan-SE  will  pick up at the stage of Cove-confirmation of tRNAs and output final tRNA
              predicitons as with a normal run.

              Note: the -n and -s options will not work in conjunction with this option.

       -F <file>
              Save first-pass candidate tRNAs in <file> that were then found  to  be  false  positives  by  Cove
              analysis.   This option saves candidate tRNAs found by either tRNAscan and/or EufindtRNA that were
              then rejected by Cove analysis as being false positives.  tRNAs are saved in  the  FASTA  sequence
              format.

       -M <file>
              This  option  may  be used when scanning a collection of known tRNA sequences to identify possible
              false negatives (incorreclty missed by tRNAscan-SE) or sequences incorrectly  annotated  as  tRNAs
              (correctly  passed  over by tRNAscan-SE).  Examination of primary & secondary structure covariance
              model scores (-H option), and visual inspection of secondary structures (use  -F  option)  may  be
              helpful resolving identification conflicts.

SEE ALSO

       User Manual and tutorial: Manual.ps (postscript), MANUAL (text)

BUGS

       No major bugs known.

NOTES

       This  software  and  documentation  is  Copyright  (C) 1996, Todd M.J. Lowe & Sean R. Eddy.  It is freely
       distributable under  terms  of  the  GNU  General  Public  License.  See  COPYING,  in  the  source  code
       distribution, for more details, or contact me.

       Todd Lowe
       Dept. of Genetics, Washington Univ. School of Medicine
       660 S. Euclid Box 8232
       St Louis, MO 63110 USA
       Phone: 1-314-362-7667
       FAX  : 1-314-362-2985
       Email: lowe@genetics.wustl.edu

REFERENCES

       1.  Fichant,  G.A.  and  Burks, C. (1991) "Identifying potential tRNA genes in genomic DNA sequences", J.
       Mol. Biol., 220, 659-671.

       2. Eddy, S.R. and Durbin, R. (1994) "RNA sequence analysis using covariance models",  Nucl.  Acids  Res.,
       22, 2079-2088.

       3.  Pavesi,  A.,  Conterio,  F.,  Bolchi,  A.,  Dieci,  G.,  Ottonello,  S. (1994) "Identification of new
       eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis  of  transcriptional
       control regions", Nucl. Acids Res., 22, 1247-1256.

       4. Lowe, T.M. & Eddy, S.R. (1997) "tRNAscan-SE: A program for improved detection of transfer RNA genes in
       genomic sequence", Nucl. Acids Res., 25, 955-964.

tRNAscan-SE 1.1                                   November 1997                                   tRNAscan-SE(1)