Provided by: gromacs-data_2020-2build1_all bug


       gmx-tune_pme - Time mdrun as a function of PME ranks to optimize settings


          gmx tune_pme [-s [<.tpr>]] [-cpi [<.cpt>]] [-table [<.xvg>]]
                       [-tablep [<.xvg>]] [-tableb [<.xvg>]]
                       [-rerun [<.xtc/.trr/...>]] [-ei [<.edi>]] [-p [<.out>]]
                       [-err [<.log>]] [-so [<.tpr>]] [-o [<.trr/.cpt/...>]]
                       [-x [<.xtc/.tng>]] [-cpo [<.cpt>]]
                       [-c [<.gro/.g96/...>]] [-e [<.edr>]] [-g [<.log>]]
                       [-dhdl [<.xvg>]] [-field [<.xvg>]] [-tpi [<.xvg>]]
                       [-tpid [<.xvg>]] [-eo [<.xvg>]] [-px [<.xvg>]]
                       [-pf [<.xvg>]] [-ro [<.xvg>]] [-ra [<.log>]]
                       [-rs [<.log>]] [-rt [<.log>]] [-mtx [<.mtx>]]
                       [-swap [<.xvg>]] [-bo [<.trr/.cpt/...>]] [-bx [<.xtc>]]
                       [-bcpo [<.cpt>]] [-bc [<.gro/.g96/...>]] [-be [<.edr>]]
                       [-bg [<.log>]] [-beo [<.xvg>]] [-bdhdl [<.xvg>]]
                       [-bfield [<.xvg>]] [-btpi [<.xvg>]] [-btpid [<.xvg>]]
                       [-bdevout [<.xvg>]] [-brunav [<.xvg>]] [-bpx [<.xvg>]]
                       [-bpf [<.xvg>]] [-bro [<.xvg>]] [-bra [<.log>]]
                       [-brs [<.log>]] [-brt [<.log>]] [-bmtx [<.mtx>]]
                       [-bdn [<.ndx>]] [-bswap [<.xvg>]] [-xvg <enum>]
                       [-mdrun <string>] [-np <int>] [-npstring <enum>]
                       [-ntmpi <int>] [-r <int>] [-max <real>] [-min <real>]
                       [-npme <enum>] [-fix <int>] [-rmax <real>]
                       [-rmin <real>] [-[no]scalevdw] [-ntpr <int>]
                       [-steps <int>] [-resetstep <int>] [-nsteps <int>]
                       [-[no]launch] [-[no]bench] [-[no]check]
                       [-gpu_id <string>] [-[no]append] [-[no]cpnum]
                       [-deffnm <string>]


       For  a  given  number  -np or -ntmpi of ranks, gmx tune_pme systematically times gmx mdrun
       with various numbers of PME-only ranks and determines which setting is  fastest.  It  will
       also  test whether performance can be enhanced by shifting load from the reciprocal to the
       real space part of the Ewald sum.  Simply pass your .tpr file  to  gmx  tune_pme  together
       with other options for gmx mdrun as needed.

       gmx  tune_pme  needs  to call gmx mdrun and so requires that you specify how to call mdrun
       with the argument to the -mdrun parameter. Depending how you have  built  GROMACS,  values
       such as 'gmx mdrun', 'gmx_d mdrun', or 'mdrun_mpi' might be needed.

       The program that runs MPI programs can be set in the environment variable MPIRUN (defaults
       to 'mpirun'). Note that for certain MPI frameworks, you need  to  provide  a  machine-  or
       hostfile. This can also be passed via the MPIRUN variable, e.g.

       export  MPIRUN="/usr/local/mpirun  -machinefile  hosts"  Note  that  in  such  cases it is
       normally necessary to compile and/or run gmx tune_pme without MPI support, so that it  can
       call the MPIRUN program.

       Before  doing  the  actual  benchmark runs, gmx tune_pme will do a quick check whether gmx
       mdrun works as expected with the provided  parallel  settings  if  the  -check  option  is
       activated  (the default).  Please call gmx tune_pme with the normal options you would pass
       to gmx mdrun and add -np for the number of ranks to perform the tests on,  or  -ntmpi  for
       the number of threads. You can also add -r to repeat each test several times to get better

       gmx tune_pme can test various real space / reciprocal space workloads for you. With  -ntpr
       you  control  how  many extra .tpr files will be written with enlarged cutoffs and smaller
       Fourier grids respectively.  Typically, the  first  test  (number  0)  will  be  with  the
       settings  from  the  input  .tpr  file;  the last test (number ntpr) will have the Coulomb
       cutoff specified by -rmax with a somewhat smaller PME grid at the same time.  In this last
       test, the Fourier spacing is multiplied with rmax/rcoulomb.  The remaining .tpr files will
       have equally-spaced Coulomb radii (and Fourier spacings) between these extremes. Note that
       you can set -ntpr to 1 if you just seek the optimal number of PME-only ranks; in that case
       your input .tpr file will remain unchanged.

       For the benchmark runs, the default of 1000 time steps should suffice for most MD systems.
       The  dynamic  load balancing needs about 100 time steps to adapt to local load imbalances,
       therefore the time step counters are by default reset after 100 steps. For  large  systems
       (>1M  atoms),  as  well  as  for  a  higher  accuracy  of the measurements, you should set
       -resetstep to a higher value.  From the 'DD' load imbalance entries in the  md.log  output
       file you can tell after how many steps the load is sufficiently balanced. Example call:

       gmx tune_pme -np 64 -s protein.tpr -launch

       After  calling  gmx  mdrun several times, detailed performance information is available in
       the output file perf.out.  Note that during the benchmarks, a couple  of  temporary  files
       are written (options -b*), these will be automatically deleted after each test.

       If  you want the simulation to be started automatically with the optimized parameters, use
       the command line option -launch.

       Basic support for GPU-enabled mdrun exists. Give a string containing the IDs of  the  GPUs
       that  you wish to use in the optimization in the -gpu_id command-line argument. This works
       exactly like mdrun -gpu_id, does not imply a mapping, and merely declares the eligible set
       of   GPU   devices.  gmx-tune_pme  will  construct  calls  to  mdrun  that  use  this  set
       appropriately. gmx-tune_pme does not support -gputasks.


       Options to specify input files:

       -s [<.tpr>] (topol.tpr)
              Portable xdr run input file

       -cpi [<.cpt>] (state.cpt) (Optional)
              Checkpoint file

       -table [<.xvg>] (table.xvg) (Optional)
              xvgr/xmgr file

       -tablep [<.xvg>] (tablep.xvg) (Optional)
              xvgr/xmgr file

       -tableb [<.xvg>] (table.xvg) (Optional)
              xvgr/xmgr file

       -rerun [<.xtc/.trr/...>] (rerun.xtc) (Optional)
              Trajectory: xtc trr cpt gro g96 pdb tng

       -ei [<.edi>] (sam.edi) (Optional)
              ED sampling input

       Options to specify output files:

       -p [<.out>] (perf.out)
              Generic output file

       -err [<.log>] (bencherr.log)
              Log file

       -so [<.tpr>] (tuned.tpr)
              Portable xdr run input file

       -o [<.trr/.cpt/...>] (traj.trr)
              Full precision trajectory: trr cpt tng

       -x [<.xtc/.tng>] (traj_comp.xtc) (Optional)
              Compressed trajectory (tng format or portable xdr format)

       -cpo [<.cpt>] (state.cpt) (Optional)
              Checkpoint file

       -c [<.gro/.g96/...>] (confout.gro)
              Structure file: gro g96 pdb brk ent esp

       -e [<.edr>] (ener.edr)
              Energy file

       -g [<.log>] (md.log)
              Log file

       -dhdl [<.xvg>] (dhdl.xvg) (Optional)
              xvgr/xmgr file

       -field [<.xvg>] (field.xvg) (Optional)
              xvgr/xmgr file

       -tpi [<.xvg>] (tpi.xvg) (Optional)
              xvgr/xmgr file

       -tpid [<.xvg>] (tpidist.xvg) (Optional)
              xvgr/xmgr file

       -eo [<.xvg>] (edsam.xvg) (Optional)
              xvgr/xmgr file

       -px [<.xvg>] (pullx.xvg) (Optional)
              xvgr/xmgr file

       -pf [<.xvg>] (pullf.xvg) (Optional)
              xvgr/xmgr file

       -ro [<.xvg>] (rotation.xvg) (Optional)
              xvgr/xmgr file

       -ra [<.log>] (rotangles.log) (Optional)
              Log file

       -rs [<.log>] (rotslabs.log) (Optional)
              Log file

       -rt [<.log>] (rottorque.log) (Optional)
              Log file

       -mtx [<.mtx>] (nm.mtx) (Optional)
              Hessian matrix

       -swap [<.xvg>] (swapions.xvg) (Optional)
              xvgr/xmgr file

       -bo [<.trr/.cpt/...>] (bench.trr)
              Full precision trajectory: trr cpt tng

       -bx [<.xtc>] (bench.xtc)
              Compressed trajectory (portable xdr format): xtc

       -bcpo [<.cpt>] (bench.cpt)
              Checkpoint file

       -bc [<.gro/.g96/...>] (bench.gro)
              Structure file: gro g96 pdb brk ent esp

       -be [<.edr>] (bench.edr)
              Energy file

       -bg [<.log>] (bench.log)
              Log file

       -beo [<.xvg>] (benchedo.xvg) (Optional)
              xvgr/xmgr file

       -bdhdl [<.xvg>] (benchdhdl.xvg) (Optional)
              xvgr/xmgr file

       -bfield [<.xvg>] (benchfld.xvg) (Optional)
              xvgr/xmgr file

       -btpi [<.xvg>] (benchtpi.xvg) (Optional)
              xvgr/xmgr file

       -btpid [<.xvg>] (benchtpid.xvg) (Optional)
              xvgr/xmgr file

       -bdevout [<.xvg>] (benchdev.xvg) (Optional)
              xvgr/xmgr file

       -brunav [<.xvg>] (benchrnav.xvg) (Optional)
              xvgr/xmgr file

       -bpx [<.xvg>] (benchpx.xvg) (Optional)
              xvgr/xmgr file

       -bpf [<.xvg>] (benchpf.xvg) (Optional)
              xvgr/xmgr file

       -bro [<.xvg>] (benchrot.xvg) (Optional)
              xvgr/xmgr file

       -bra [<.log>] (benchrota.log) (Optional)
              Log file

       -brs [<.log>] (benchrots.log) (Optional)
              Log file

       -brt [<.log>] (benchrott.log) (Optional)
              Log file

       -bmtx [<.mtx>] (benchn.mtx) (Optional)
              Hessian matrix

       -bdn [<.ndx>] (bench.ndx) (Optional)
              Index file

       -bswap [<.xvg>] (benchswp.xvg) (Optional)
              xvgr/xmgr file

       Other options:

       -xvg <enum> (xmgrace)
              xvg plot formatting: xmgrace, xmgr, none

       -mdrun <string>
              Command line to run a simulation, e.g. 'gmx mdrun' or 'mdrun_mpi'

       -np <int> (1)
              Number of ranks to run the tests on (must be > 2 for separate PME ranks)

       -npstring <enum> (np)
              Name of the $MPIRUN option that specifies the number of ranks to use ('np', or 'n';
              use 'none' if there is no such option): np, n, none

       -ntmpi <int> (1)
              Number of MPI-threads to run the tests on (turns MPI & mpirun off)

       -r <int> (2)
              Repeat each test this often

       -max <real> (0.5)
              Max fraction of PME ranks to test with

       -min <real> (0.25)
              Min fraction of PME ranks to test with

       -npme <enum> (auto)
              Within -min and -max, benchmark all possible values for -npme, or just a reasonable
              subset. Auto neglects -min and -max and chooses reasonable values  around  a  guess
              for npme derived from the .tpr: auto, all, subset

       -fix <int> (-2)
              If  >=  -1,  do not vary the number of PME-only ranks, instead use this fixed value
              and only vary rcoulomb and the PME grid spacing.

       -rmax <real> (0)
              If >0, maximal rcoulomb for -ntpr>1 (rcoulomb upscaling  results  in  fourier  grid

       -rmin <real> (0)
              If >0, minimal rcoulomb for -ntpr>1

       -[no]scalevdw (yes)
              Scale rvdw along with rcoulomb

       -ntpr <int> (0)
              Number  of  .tpr files to benchmark. Create this many files with different rcoulomb
              scaling factors depending on -rmin and -rmax. If  <  1,  automatically  choose  the
              number of .tpr files to test

       -steps <int> (1000)
              Take timings for this many steps in the benchmark runs

       -resetstep <int> (1500)
              Let  dlb equilibrate this many steps before timings are taken (reset cycle counters
              after this many steps)

       -nsteps <int> (-1)
              If non-negative, perform this many steps in the real run  (overwrites  nsteps  from
              .tpr, add .cpt steps)

       -[no]launch (no)
              Launch the real simulation after optimization

       -[no]bench (yes)
              Run the benchmarks or just create the input .tpr files?

       -[no]check (yes)
              Before the benchmark runs, check whether mdrun works in parallel

       -gpu_id <string>
              List of unique GPU device IDs that are eligible for use

       -[no]append (yes)
              Append  to  previous output files when continuing from checkpoint instead of adding
              the simulation part number to all file names (for launch only)

       -[no]cpnum (no)
              Keep and number checkpoint files (launch only)

       -deffnm <string>
              Set the default filenames (launch only)



       More information about GROMACS is available at <>.


       2020, GROMACS development team