Provided by: profphd_1.0.42-1_all
NAME
prof - secondary structure and solvent accessibility predictor
SYNOPSIS
prof [INPUTFILE+] [OPTIONS]
DESCRIPTION
Secondary structure is predicted by a system of neural networks rating at an expected average accuracy > 72% for the three states helix, strand and loop (Rost & Sander, PNAS, 1993 , 90, 7558-7562; Rost & Sander, JMB, 1993 , 232, 584-599; and Rost & Sander, Proteins, 1994 , 19, 55-72; evaluation of accuracy). Evaluated on the same data set, PROFsec is rated at ten percentage points higher three-state accuracy than methods using only single sequence information, and at more than six percentage points higher than, e.g., a method using alignment information based on statistics (Levin, Pascarella, Argos & Garnier, Prot. Engng., 6, 849-54, 1993). PHDsec predictions have three main features: 1. improved accuracy through evolutionary information from multiple sequence alignments 2. improved beta-strand prediction through a balanced training procedure 3. more accurate prediction of secondary structure segments by using a multi-level system Solvent accessibility is predicted by a neural network method rating at a correlation coefficient (correlation between experimentally observed and predicted relative solvent accessibility) of 0.54 cross-validated on a set of 238 globular proteins (Rost & Sander, Proteins, 1994, 20, 216-226; evaluation of accuracy). The output of the neural network codes for 10 states of relative accessibility. Expressed in units of the difference between prediction by homology modelling (best method) and prediction at random (worst method), PROFacc is some 26 percentage points superior to a comparable neural network using three output states (buried, intermediate, exposed) and using no information from multiple alignments. Transmembrane helices in integral membrane proteins are predicted by a system of neural networks. The shortcoming of the network system is that often too long helices are predicted. These are cut by an empirical filter. The final prediction (Rost et al., Protein Science, 1995, 4, 521-533; evaluation of accuracy) has an expected per-residue accuracy of about 95%. The number of false positives, i.e., transmembrane helices predicted in globular proteins, is about 2%. The neural network prediction of transmembrane helices (PHDhtm) is refined by a dynamic programming-like algorithm. This method resulted in correct predictions of all transmembrane helices for 89% of the 131 proteins used in a cross-validation test; more than 98% of the transmembrane helices were correctly predicted. The output of this method is used to predict topology, i.e., the orientation of the N-term with respect to the membrane. The expected accuracy of the topology prediction is > 86%. Prediction accuracy is higher than average for eukaryotic proteins and lower than average for prokaryotes. PHDtopology is more accurate than all other methods tested on identical data sets. If no output file option (such as --fileRdb or --fileOut) is given the RDB formatted output is written into ./INPUTFILENAME.prof where 'prof' replaces the extension of the input file. In lack of extension '.prof' is appended to the input file name. Output format The RDB format is self-annotating, see example outputs in /share/profphd/prof/exa.
REFERENCES
Rost, B. and Sander, C. (1994a). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19(1), 55-72. Rost, B. and Sander, C. (1994b). Conservation and prediction of solvent accessibility in protein families. Proteins, 20(3), 216-26. Rost, B., Casadio, R., Fariselli, P., and Sander, C. (1995). Transmembrane helices predicted at 95% accuracy. Protein Sci, 4(3), 521-33.
OPTIONS
See each keyword for more help. Most of these are likely to be broken. a alternative connectivity patterns (default=3) 3 predict sec + acc + htm acc predict solvent accessibility, only ali add alignment to 'human-readable' PROF output file(s) arch system architecture (e.g.: SGI64|SGI5|SGI32|SUNMP|ALPHA) ascii write 'human-readable' PROF output file(s) best PROF with best accuracy and longest run-time both predict secondary structure and solvent accessibility data data=<all|brief|normal|detail> for HTML out: only those parts of predictions written debug keep most intermediate files, print debugging messages dirWork work directory, default: a temporary directory from File::Temp::tempdir. Must be fully qualified path. Known to work. doEval DO evaluation for list (only for known structures and lists) doFilterHssp filter the input HSSP file (excluding some pairs) doHtmfil DO filter the membrane prediction (default) doHtmisit DO check strength of predicted membrane helix (default) doHtmref DO refine the membrane prediction (default) doHtmtop DO membrane helix topology (default) dssp convert PROF into DSSP format expand expand insertions when converting output to MSF format fast PROF with lowest accuracy and highest speed fileCasp name of PROF output in CASP format (file.caspProf) fileDssp name of PROF output in DSSP format (file.dsspProf) fileHtml name of PROF output in HTML format (file.htmlProf) fileMsf name of PROF output in MSF format (file.msfProf) fileNotHtm name of file flagging that no membrane helix was found fileOut name of PROF output in RDB format (file.rdbProf) Known to work. fileProf name of PROF output in human readable format (file.prof) Broken. fileRdb name of PROF output in RDB format (file.rdbProf) Known to work. fileSaf name of PROF output in SAF format (file.safProf) filter filter the input HSSP file (excluding some pairs) good PROF with good accuracy and moderate speed graph add ASCII graph to 'human-readable' PROF output file(s) htm use: 'htm=<N|0.N>' gives minimal transmembrane helix detected default is 'htm=8' (resp. htm=0.8) smaller numbers more false positives and fewer false negatives! html argument 'hmtl' or 'html=<all|body|head>' write HTML format of prediction 'html' will result in that the PROF output is converted to HTML 'html=body' restricts HTML file to the HTML_BODY tag part 'html=head' restricts HTML file to the HTML_HEADER tag part 'html=all' gives both HEADER and BODY keepConv keep the conversion of the input file to HSSP format keepFilter argument <*|doKeepFilter=1> keep the filtered HSSP file keepHssp argument <*|doKeepHssp=1> keep the intermediate HSSP file keepNetDb argument <*|doKeepNetDb=1> keep the intermediate DbNet file(s) list argument <*|isList=1> input file is list of files msf convert PROF into MSF format nice give 'nice-D' to set the nice value (priority) of the job noProfHead do NOT copy file with tables into local directory noSearch short for doSearchFile=0, i.e. no searching of DB files noascii surpress writing ASCII (i.e. human readable) result files nohtml surpress writing HTML result files nonice job will not be niced, i.e. not run with lower priority notEval DO NOT check accuracy even when known structures notHtmfil do NOT filter the membrane prediction notHtmisit do NOT check whether or not membrane helix strong enough notHtmref do NOT refine the membrane prediction notHtmtop do NOT membrane helix topology nresPerLineAli Number of characters used for MSF file. Default: 50. numresMin Minimal number of residues to run network, otherwise prd=symbolPrdShort. Default: 9. optJury Adds PHD to jury. Default: `normal,usePHD'. Many other parameters change the default for this one as a side-effect, the list is not comprehensive: phd, nophd, /^para(3|Both|Sec|Acc|Htm|CapH|CapE|CapHE)/, /^para?/, jct para3 Parameter file for sec+acc+htm. Default: `<DIRPROF>/net/PROFboth_best.par'. paraAcc Parameter file for acc. Default: `<DIRPROF>/net/PROFacc_best.par'. paraBoth Parameter file for sec+acc. Default: `<DIRPROF>/net/PROFboth_best.par'. paraSec Parameter file for sec. Default: `<DIRPROF>/net/PROFsec_best.par'. riSubAcc Minimal reliability index (RI) for subset PROFacc. Default: 4. riSubSec Minimal reliability index (RI) for subset PROFsec. Default: 5. riSubSym Symbol for residues predicted with RI < riSubSec/Acc. Default: `.'. s_k_i_p problems, manual, hints, notation, txt, known, DONE, Date, date, aa, Lhssp, numaa, code saf convert PROF into SAF format scrAddHelp scrGoal neural network switching scrHelpTxt Input file formats accepted: hssp,dssp,msf,saf,fastamul,pirmul,fasta,pir,gcg,swiss scrIn list_of_files (or single file) parameter_file scrName prof scrNarg 2 sec predict secondary structure, only silent no information written to screen - this is the default skipMissing do not abort if input file missing! sourceFile prof test is just a test (faster) translate-jobid-in-param-values String 'jobid' gets substituted with $par{jobid} tst quick run through program, low accuracy user user name --version Print version
AUTHOR
B. Rost, Sander C, Fariselli P, Casadio R, Liu J, Yachdav G, Kajan L.
EXAMPLES
Prediction from alignment in HSSP file for best results prof /share/profphd/prof/exa/1ppt.hssp fileRdb=/tmp/1ppt.hssp.prof Prediction from a single sequence prof /share/profphd/prof/exa/1ppt.f fileRdb=/tmp/1ppt.f.rdbProf phd.pl invocation /share/profphd/prof/embl/phd.pl /share/profphd/prof/exa/1ppt.hssp htm fileOutPhd=/tmp/query.phdPred fileOutRdb=/tmp/query.phdRdb fileNotHtm=/tmp/query.phdNotHtm
ENVIRONMENT
PROFPHDDIR Override package prof package dir /share/profphd. RGUTILSDIR Override location of librg-utils-perl /share/librg-utils-perl.
FILES
*.rdbProf default output file extension /share/profphd/prof default data directory
BUGS
Please report bugs at <https://rostlab.org/bugzilla3/enter_bug.cgi?product=profphd>. Prediction from HSSP file fails when residue lines with exclamation marks `!' are present: Use 'optJury=normal' and 'both' like this: prof /tmp/1a3q.hssp fileRdb=/tmp/1a3q.hssp.profRdb optJury=normal both
SEE ALSO
Main website <http://www.predictprotein.org/> Documentation <http://www.predictprotein.org/docs.php> Community website <http://groups.google.com/group/PredictProtein> FTP <ftp://rostlab.org/pub/cubic/downloads/prof> Newsgroups <http://groups.google.com/group/PredictProtein>