Provided by: toppic_1.3.0+dfsg1-4build1_amd64 bug

NAME

       topmg - Top-down mass spectrometry based proteoform identification using Mass Graphs

SYNOPSIS

          topmg [options] database-file-name spectrum-file-names

DESCRIPTION

       TopMG  (Top-down  mass spectrometry based proteoform identification using Mass Graphs) is a software tool
       for identifying ultra-modified proteoforms by searching top-down tandem mass spectra  against  a  protein
       sequence  database.  It  is capable of identifying proteoforms with multiple variable PTMs and unexpected
       alterations, such as histone proteoforms and phosphorylated ones. It uses mass graphs, which  efficiently
       represent  candidate  proteoforms  with  multiple variable PTMs, to increase the speed and sensitivity in
       proteoform identification. In addition, approximate spectrum-based filtering  methods  are  employed  for
       protein  sequence  filtering,  and a Markov chain Monte Carlo method (TopMCMC) is used for estimating the
       statistical significance of identifications.

       1. Input

             • A protein database file in the FASTA format

             • A mass spectrum data file in the msalign format

             • A text file of variable PTMs

             • A text file of fixed PTMs (optional)

             • A text file containing LC/MS feature information (optional)

       2. Output

       TopMG outputs two csv files, an xml file, and a collection of html files for identified proteoforms.  For
       example, when the input mass spectrum data file is spectra_ms2.msalign, the output includes:

          • spectra_ms2_topmg_prsm.csv: a csv file containing identified PrSMs with an E-value or spectrum level
            FDR cutoff.

          • spectra_ms2_topmg_proteoform.csv: a csv file containing identified proteoforms with  an  E-value  or
            proteoform level FDR cutoff.

          • spectra_ms2_topmg_proteoform.xml:  an xml file containing identified proteoforms with the E-value or
            proteoform level FDR cutoff.

          • spectra_html/topmg_prsm_cutoff: a folder containing java script files of identified PrSMs using  the
            E-value or spectrum level FDR cutoff.

          • spectra_html/topmg_proteoform_cutoff:  a  folder  containing  java  script files of identified PrSMs
            using the E-value or proteoform level cutoff.

          • spectra_html/topview: a folder containing html files for the visualization of identified PrSMs.

       To browse  identified  proteins,  proteoforms,  and  PrSMs,  use  a  chrome  browser  to  open  the  file
       spectra_html/topview/index.html. Google Chrome is recommended (Firefox, IE are not recommended).

       When  the  input  contains  two  or  more spectrum files, TopMG outputs two csv files, an xml file, and a
       collection of html files for each input file. When a file name is specified for combined identifications,
       it  combines  spectra  and  proteoforms identified from all the input files, removes redundant proteoform
       identifications, and reports two csv files, an xml file, and a collection of html files for the  combined
       results.  For  example,  when the input is spectra1_ms2.msalign and spectra2_ms2.msalign and the combined
       output file name is "combined," the output files are:
          combined_ms2_topmg_prsm.csv: a csv file containing PrSMs identified from all the input files  with  an
          E-value  or  spectrum  level  FDR  cutoff.   combined_ms2_topmg_proteoform.csv:  a csv file containing
          proteoforms identified from all the input files with  an  E-value  or  proteoform  level  FDR  cutoff.
          combined_ms2_topmg_proteoform.xml:  an  xml  file containing proteoforms identified from all the input
          files with the E-value or proteoform level  FDR  cutoff.   combined_html/topmg_prsm_cutoff:  a  folder
          containing  java  script  files  of  PrSMs  identified  from  all the input files using the E-value or
          spectrum level FDR cutoff.  combined_html/topmg_proteoform_cutoff: a  folder  containing  java  script
          files  of  PrSMs  identified  from  all  the input files using the E-value or proteoform level cutoff.
          combined_html/topview: a folder containing html files for the visualization of identified PrSMs.

OPTIONS

       -h [ --help ] Print the help message.

       -a [ --activation ] <CID|HCD|ETD|UVPD|FILE> Fragmentation method of tandem mass  spectra.  When  FILE  is
       used, fragmentation methods of spectra are given in the input spectral data file. Default value: FILE.

       -f  [ --fixed-mod ] <C57|C58|a fixed modification file> Set fixed modifications. Three available options:
       C57, C58, or the name of a text file specifying fixed modifications (see an example file).  When  C57  is
       selected,  carbamidomethylation  on  cysteine  is  the  only  fixed  modification.  When C58 is selected,
       carboxymethylation on cysteine is the only fixed modification.

       -n [ --n-terminal-form ] <a list of allowed N-terminal forms> Set  N-terminal  forms  of  proteins.  Four
       N-terminal  forms  can  be  selected:  NONE,  NME, NME_ACETYLATION, and M_ACETYLATION. NONE stands for no
       modifications, NME for N-terminal methionine excision, NME_ACETYLATION for N-terminal  acetylation  after
       the  initiator  methionine  is  removed,  and  M_ACETYLATION  for N-terminal methionine acetylation. When
       multiple    forms    are    allowed,    they    are    separated    by     commas.     Default     value:
       NONE,M_ACETYLATION,NME,NME_ACETYLATION.

       -d [ --decoy ] Use a shuffled decoy protein database to estimate spectrum and proteoform level FDRs. When
       -d is chosen, a shuffled decoy database is automatically generated and appended to  the  target  database
       before database search, and FDR rates are estimated using the target-decoy approach.

       -e  [  --mass-error-tolerance  ]  <a positive integer> Set the error tolerance for precursor and fragment
       masses in ppm. Default value: 15.

       -p [ --proteoform-error-tolerance ] <a positive number> Set the  error  tolerance  for  identifying  PrSM
       clusters (in Dalton). Default value: 1.2 Dalton.

       -M  [  --max-shift  ]  <a  number> Set the maximum absolute value for unexpected mass shifts (in Dalton).
       Default value: 500.

       -t [ --spectrum-cutoff-type ] <EVALUE|FDR> Set the  spectrum  level  cutoff  type  for  filtering  PrSMs.
       Default value: EVALUE.

       -v  [  --spectrum-cutoff-value  ]  <a  positive number> Set the spectrum level cutoff value for filtering
       PrSMs. Default value: 0.01.

       -T [ --proteoform-cutoff-type  ]  <EVALUE|FDR>  Set  the  proteoform  level  cutoff  type  for  filtering
       proteoforms and PrSMs. Default value: EVALUE.

       -V  [ --proteoform-cutoff-value ] <a positive number> Set the proteoform level cutoff value for filtering
       proteoforms and PrSMs. Default value: 0.01.

       -i [ --mod-file-name ] <a modification file> Specify a text file of variable PTMs. See an example file.

       -u [ --thread-number ] <a positive number> Set the number of threads used  in  the  computation.  Default
       value:  1.  The  maximum  number  of threads is determined by the CPU and memory of the computer used for
       computation. About 4 GB memory is required for each thread. If the computer has 16 GB memory  and  a  CPU
       with 8 cores, the maximum number of threads is 4 because 16 GB memory is required for 4 threads.

       -x [ --no-topfd-feature ] Specify that there are no TopFD feature files for proteoform identification.

       -D  [  --use-asf-diagonal  ]  Use  the  ASF-DIAGONAL  method  for protein sequence filtering. The default
       filtering method is ASF-RESTRICT. When -D is selected, both ASF-RESTRICT and ASF-DIAGONAL will  be  used.
       The  combined  approach  may identify more PrSMs, but it is much slower than using ASF-RESTRICT only. See
       this paper for more details.

       -P [ --var-ptm ] <a positive number> Set the maximum number  of  variable  PTM  sites  in  a  proteoform.
       Default value: 5.

       -s [ --num-shift <0|1|2> Set the maximum number of unexpected mass shifts in a proteoform. Default value:
       0.

       -c [ --combined-file-name ] <a filename> Specify an output file name for  combined  identifications  when
       the input consists of multiple spectrum files.

       -k [ --keep ] Keep intermediate files generated by TopMG.

ADVANCED OPTIONS

       -j  [  --proteo-graph-dis  ]  <a  positive  number>  Set  the  length  of the largest gap in constructing
       proteoform graphs. Default value: 40. See this paper for more details.

       -G [ --var-ptm-in-gap ] <a positive number> Set the maximum number of variable PTM sites in a  gap  in  a
       proteoform graph. Default value: 5. See this paper for more details.

EXAMPLES

       To  use  the  following  examples, a variable modification file variable_mods.txt in the current foler is
       needed. (See an example.)

       • Search a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database  file
         proteins.fasta  with feature files. The user does not need to specify the feature file name. TopMG will
         automatically obtain the names of feature files from the spectrum file name spectra_ms2.msalign.

         topmg -i variable_mods.txt proteins.fasta spectra_ms2.msalign

       • Search two deconvoluted MS/MS spectrum files spectra1_ms2.msalign and  spectra2_ms2.msalign  against  a
         protein  database file proteins.fasta with feature files. In addition, all identifications are combined
         and reported using a file name "combined."

         topmg -i variable_mods.txt -c combined proteins.fasta spectra1_ms2.msalign spectra2_ms2.msalign

       • Search all deconvoluted MS/MS spectrum files in the current folder  against  a  protein  database  file
         proteins.fasta with feature files.

         topmg -i variable_mods.txt proteins.fasta *_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta without feature files.

         topmg -i variable_mods.txt -x proteins.fasta spectra_ms2.msalign

       • Search a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database  file
         proteins.fasta with feature files and a fixed modification: carbamidomethylation on cysteine.

         topmg -i variable_mods.txt -f C57 proteins.fasta spectra_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta with feature files. In an identified proteoform, at most 1 unexpected mass shift  and  4
         variable PTMs are allowed and the maximum value for unexpected mass shifts is 10,000 Dalton.

         topmg -i variable_mods.txt -P 4 -s 1 -M 10000 proteins.fasta spectra_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta with feature files. The error tolerance for precursor and fragment masses is 5 ppm.

         topmg -i variable_mods.txt -e 5 proteins.fasta spectra_ms2.msalign

       • Search a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database  file
         proteins.fasta  with  feature  files.  Use  the  target  decoy  approach  to compute spectrum level and
         proteoform level FDRs, filter identified proteoform spectrum-matches by a 5% spectrum  level  FDR,  and
         filter identified proteoforms by a 5% proteoform level FDR.

         topmg -i variable_mods.txt -d -t FDR -v 0.05 -T FDR -V 0.05 proteins.fasta spectra_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta with feature files. Use 6 CPU threads to speed up the computation. About 24 GB memory is
         required for 6 threads. If the computer lacks enough memory, TopMG may crash.

         topmg -i variable_mods.txt -u 6 proteins.fasta spectra_ms2.msalign

SEE ALSO

       • topfd (1)

       • toppic (1)

       • topdiff (1)

MAN PAGE PRODUCTION

       This   man   page   was  written  by  Filippo  Rusconi  <lopippo@debian.org>.  Material  was  taken  from
       http://proteomics.informatics.iupui.edu/software/toppic/manual.html.

AUTHOR

       Filippo Rusconi <lopippo@debian.org> and upstream authors (Dr. Xiaowen Liu's Lab at  Indiana  University-
       Purdue University Indianapolis and others)

COPYRIGHT

       Filippo Rusconi and Indiana University-Purdue University Indianapolis