Provided by: toppic_1.5.3+dfsg1-1build3_amd64 

NAME
topmg - Top-down mass spectrometry based proteoform identification using Mass Graphs
SYNOPSIS
topmg [options] database-file-name spectrum-file-names
DESCRIPTION
TopMG (Top-down mass spectrometry based proteoform identification using Mass Graphs) is a software tool
for identifying ultra-modified proteoforms by searching top-down tandem mass spectra against a protein
sequence database. It is capable of identifying proteoforms with multiple variable PTMs and unexpected
alterations, such as histone proteoforms and phosphorylated ones. It uses mass graphs, which efficiently
represent candidate proteoforms with multiple variable PTMs, to increase the speed and sensitivity in
proteoform identification. In addition, approximate spectrum-based filtering methods are employed for
protein sequence filtering, and a Markov chain Monte Carlo method (TopMCMC) is used for estimating the
statistical significance of identifications.
1. Input
• A protein database file in the FASTA format
• A mass spectrum data file in the msalign format
• A text file of variable PTMs
• A text file of fixed PTMs (optional)
• A text file containing LC/MS feature information (optional)
2. Output
TopMG outputs two csv files, an xml file, and a collection of html files for identified proteoforms. For
example, when the input mass spectrum data file is spectra_ms2.msalign, the output includes:
• spectra_ms2_topmg_prsm.csv: a csv file containing identified PrSMs with an E-value or spectrum level
FDR cutoff.
• spectra_ms2_topmg_proteoform.csv: a csv file containing identified proteoforms with an E-value or
proteoform level FDR cutoff.
• spectra_ms2_topmg_proteoform.xml: an xml file containing identified proteoforms with the E-value or
proteoform level FDR cutoff.
• spectra_html/topmg_prsm_cutoff: a folder containing java script files of identified PrSMs using the
E-value or spectrum level FDR cutoff.
• spectra_html/topmg_proteoform_cutoff: a folder containing java script files of identified PrSMs
using the E-value or proteoform level cutoff.
• spectra_html/topview: a folder containing html files for the visualization of identified PrSMs.
To browse identified proteins, proteoforms, and PrSMs, use a chrome browser to open the file
spectra_html/topview/index.html. Google Chrome is recommended (Firefox, IE are not recommended).
When the input contains two or more spectrum files, TopMG outputs two csv files, an xml file, and a
collection of html files for each input file. When a file name is specified for combined identifications,
it combines spectra and proteoforms identified from all the input files, removes redundant proteoform
identifications, and reports two csv files, an xml file, and a collection of html files for the combined
results. For example, when the input is spectra1_ms2.msalign and spectra2_ms2.msalign and the combined
output file name is "combined," the output files are:
combined_ms2_topmg_prsm.csv: a csv file containing PrSMs identified from all the input files with an
E-value or spectrum level FDR cutoff. combined_ms2_topmg_proteoform.csv: a csv file containing
proteoforms identified from all the input files with an E-value or proteoform level FDR cutoff.
combined_ms2_topmg_proteoform.xml: an xml file containing proteoforms identified from all the input
files with the E-value or proteoform level FDR cutoff. combined_html/topmg_prsm_cutoff: a folder
containing java script files of PrSMs identified from all the input files using the E-value or
spectrum level FDR cutoff. combined_html/topmg_proteoform_cutoff: a folder containing java script
files of PrSMs identified from all the input files using the E-value or proteoform level cutoff.
combined_html/topview: a folder containing html files for the visualization of identified PrSMs.
OPTIONS
-h [ --help ] Print the help message.
-a [ --activation ] <CID|HCD|ETD|UVPD|FILE> Fragmentation method of tandem mass spectra. When FILE is
used, fragmentation methods of spectra are given in the input spectral data file. Default value: FILE.
-f [ --fixed-mod ] <C57|C58|a fixed modification file> Set fixed modifications. Three available options:
C57, C58, or the name of a text file specifying fixed modifications (see an example file). When C57 is
selected, carbamidomethylation on cysteine is the only fixed modification. When C58 is selected,
carboxymethylation on cysteine is the only fixed modification.
-n [ --n-terminal-form ] <a list of allowed N-terminal forms> Set N-terminal forms of proteins. Four
N-terminal forms can be selected: NONE, NME, NME_ACETYLATION, and M_ACETYLATION. NONE stands for no
modifications, NME for N-terminal methionine excision, NME_ACETYLATION for N-terminal acetylation after
the initiator methionine is removed, and M_ACETYLATION for N-terminal methionine acetylation. When
multiple forms are allowed, they are separated by commas. Default value:
NONE,M_ACETYLATION,NME,NME_ACETYLATION.
-d [ --decoy ] Use a shuffled decoy protein database to estimate spectrum and proteoform level FDRs. When
-d is chosen, a shuffled decoy database is automatically generated and appended to the target database
before database search, and FDR rates are estimated using the target-decoy approach.
-e [ --mass-error-tolerance ] <a positive integer> Set the error tolerance for precursor and fragment
masses in ppm. Default value: 15.
-p [ --proteoform-error-tolerance ] <a positive number> Set the error tolerance for identifying PrSM
clusters (in Dalton). Default value: 1.2 Dalton.
-M [ --max-shift ] <a number> Set the maximum absolute value for unexpected mass shifts (in Dalton).
Default value: 500.
-t [ --spectrum-cutoff-type ] <EVALUE|FDR> Set the spectrum level cutoff type for filtering PrSMs.
Default value: EVALUE.
-v [ --spectrum-cutoff-value ] <a positive number> Set the spectrum level cutoff value for filtering
PrSMs. Default value: 0.01.
-T [ --proteoform-cutoff-type ] <EVALUE|FDR> Set the proteoform level cutoff type for filtering
proteoforms and PrSMs. Default value: EVALUE.
-V [ --proteoform-cutoff-value ] <a positive number> Set the proteoform level cutoff value for filtering
proteoforms and PrSMs. Default value: 0.01.
-i [ --mod-file-name ] <a modification file> Specify a text file of variable PTMs. See an example file.
-u [ --thread-number ] <a positive number> Set the number of threads used in the computation. Default
value: 1. The maximum number of threads is determined by the CPU and memory of the computer used for
computation. About 4 GB memory is required for each thread. If the computer has 16 GB memory and a CPU
with 8 cores, the maximum number of threads is 4 because 16 GB memory is required for 4 threads.
-x [ --no-topfd-feature ] Specify that there are no TopFD feature files for proteoform identification.
-D [ --use-asf-diagonal ] Use the ASF-DIAGONAL method for protein sequence filtering. The default
filtering method is ASF-RESTRICT. When -D is selected, both ASF-RESTRICT and ASF-DIAGONAL will be used.
The combined approach may identify more PrSMs, but it is much slower than using ASF-RESTRICT only. See
this paper for more details.
-P [ --var-ptm ] <a positive number> Set the maximum number of variable PTM sites in a proteoform.
Default value: 5.
-s [ --num-shift <0|1|2> Set the maximum number of unexpected mass shifts in a proteoform. Default value:
0.
-c [ --combined-file-name ] <a filename> Specify an output file name for combined identifications when
the input consists of multiple spectrum files.
-k [ --keep ] Keep intermediate files generated by TopMG.
ADVANCED OPTIONS
-j [ --proteo-graph-dis ] <a positive number> Set the length of the largest gap in constructing
proteoform graphs. Default value: 40. See this paper for more details.
-G [ --var-ptm-in-gap ] <a positive number> Set the maximum number of variable PTM sites in a gap in a
proteoform graph. Default value: 5. See this paper for more details.
EXAMPLES
To use the following examples, a variable modification file variable_mods.txt in the current foler is
needed. (See an example.)
• Search a deconvoluted MS/MS spectrum file spectra_ms2.msalign against a protein database file
proteins.fasta with feature files. The user does not need to specify the feature file name. TopMG will
automatically obtain the names of feature files from the spectrum file name spectra_ms2.msalign.
topmg -i variable_mods.txt proteins.fasta spectra_ms2.msalign
• Search two deconvoluted MS/MS spectrum files spectra1_ms2.msalign and spectra2_ms2.msalign against a
protein database file proteins.fasta with feature files. In addition, all identifications are combined
and reported using a file name "combined."
topmg -i variable_mods.txt -c combined proteins.fasta spectra1_ms2.msalign spectra2_ms2.msalign
• Search all deconvoluted MS/MS spectrum files in the current folder against a protein database file
proteins.fasta with feature files.
topmg -i variable_mods.txt proteins.fasta *_ms2.msalign
• Search a deconvoluted MS/MS spectrum file spectra_ms2.msalign against a protein database file
proteins.fasta without feature files.
topmg -i variable_mods.txt -x proteins.fasta spectra_ms2.msalign
• Search a deconvoluted MS/MS spectrum file spectra_ms2.msalign against a protein database file
proteins.fasta with feature files and a fixed modification: carbamidomethylation on cysteine.
topmg -i variable_mods.txt -f C57 proteins.fasta spectra_ms2.msalign
• Search a deconvoluted MS/MS spectrum file spectra_ms2.msalign against a protein database file
proteins.fasta with feature files. In an identified proteoform, at most 1 unexpected mass shift and 4
variable PTMs are allowed and the maximum value for unexpected mass shifts is 10,000 Dalton.
topmg -i variable_mods.txt -P 4 -s 1 -M 10000 proteins.fasta spectra_ms2.msalign
• Search a deconvoluted MS/MS spectrum file spectra_ms2.msalign against a protein database file
proteins.fasta with feature files. The error tolerance for precursor and fragment masses is 5 ppm.
topmg -i variable_mods.txt -e 5 proteins.fasta spectra_ms2.msalign
• Search a deconvoluted MS/MS spectrum file spectra_ms2.msalign against a protein database file
proteins.fasta with feature files. Use the target decoy approach to compute spectrum level and
proteoform level FDRs, filter identified proteoform spectrum-matches by a 5% spectrum level FDR, and
filter identified proteoforms by a 5% proteoform level FDR.
topmg -i variable_mods.txt -d -t FDR -v 0.05 -T FDR -V 0.05 proteins.fasta spectra_ms2.msalign
• Search a deconvoluted MS/MS spectrum file spectra_ms2.msalign against a protein database file
proteins.fasta with feature files. Use 6 CPU threads to speed up the computation. About 24 GB memory is
required for 6 threads. If the computer lacks enough memory, TopMG may crash.
topmg -i variable_mods.txt -u 6 proteins.fasta spectra_ms2.msalign
SEE ALSO
• topfd (1)
• toppic (1)
• topdiff (1)
MAN PAGE PRODUCTION
This man page was written by Filippo Rusconi <lopippo@debian.org>. Material was taken from
http://proteomics.informatics.iupui.edu/software/toppic/manual.html.
AUTHOR
Filippo Rusconi <lopippo@debian.org> and upstream authors (Dr. Xiaowen Liu's Lab at Indiana University-
Purdue University Indianapolis and others)
COPYRIGHT
Filippo Rusconi and Indiana University-Purdue University Indianapolis
1 20200521 TOPMG(1)