Ubuntu Manpage: gmx-wham - Perform weighted histogram analysis after umbrella sampling

Provided by: gromacs-data_2018.1-1_all

NAME

       gmx-wham - Perform weighted histogram analysis after umbrella sampling

SYNOPSIS

          gmx wham [-ix [<.dat>]] [-if [<.dat>]] [-it [<.dat>]] [-ip [<.dat>]]
                   [-is [<.dat>]] [-iiact [<.dat>]] [-tab [<.dat>]]
                   [-o [<.xvg>]] [-hist [<.xvg>]] [-oiact [<.xvg>]]
                   [-bsres [<.xvg>]] [-bsprof [<.xvg>]] [-xvg <enum>]
                   [-min <real>] [-max <real>] [-[no]auto] [-bins <int>]
                   [-temp <real>] [-tol <real>] [-[no]v] [-b <real>]
                   [-e <real>] [-dt <real>] [-[no]histonly] [-[no]boundsonly]
                   [-[no]log] [-unit <enum>] [-zprof0 <real>] [-[no]cycl]
                   [-[no]sym] [-[no]ac] [-acsig <real>] [-ac-trestart <real>]
                   [-nBootstrap <int>] [-bs-method <enum>] [-bs-tau <real>]
                   [-bs-seed <int>] [-histbs-block <int>] [-[no]vbs]

DESCRIPTION

gmx wham is an analysis program that implements the Weighted Histogram Analysis Method
(WHAM). It is intended to analyze output files generated by umbrella sampling simulations
to compute a potential of mean force (PMF).

gmx wham is currently not fully up to date. It only supports pull setups where the first
pull coordinate(s) is/are umbrella pull coordinates and, if multiple coordinates need to
be analyzed, all used the same geometry and dimensions. In most cases this is not an
issue.

At present, three input modes are supported.

• With option -it, the user provides a file which contains the file names of the umbrella
simulation run-input files (.tpr files), AND, with option -ix, a file which contains
file names of the pullx mdrun output files. The .tpr and pullx files must be in
corresponding order, i.e. the first .tpr created the first pullx, etc.

• Same as the previous input mode, except that the the user provides the pull force output
file names (pullf.xvg) with option -if. From the pull force the position in the
umbrella potential is computed. This does not work with tabulated umbrella potentials.

• With option -ip, the user provides file names of (gzipped) .pdo files, i.e. the GROMACS
3.3 umbrella output files. If you have some unusual reaction coordinate you may also
generate your own .pdo files and feed them with the -ip option into to gmx wham. The
.pdo file header must be similar to the following:

# UMBRELLA 3.0
# Component selection: 0 0 1
# nSkip 1
# Ref. Group 'TestAtom'
# Nr. of pull groups 2
# Group 1 'GR1' Umb. Pos. 5.0 Umb. Cons. 1000.0
# Group 2 'GR2' Umb. Pos. 2.0 Umb. Cons. 500.0
#####

The number of pull groups, umbrella positions, force constants, and names may (of
course) differ. Following the header, a time column and a data column for each pull
group follows (i.e. the displacement with respect to the umbrella center). Up to four
pull groups are possible per .pdo file at present.

By default, all pull coordinates found in all pullx/pullf files are used in WHAM. If only
some of the pull coordinates should be used, a pull coordinate selection file (option -is)
can be provided. The selection file must contain one line for each tpr file in
tpr-files.dat. Each of these lines must contain one digit (0 or 1) for each pull
coordinate in the tpr file. Here, 1 indicates that the pull coordinate is used in WHAM,
and 0 means it is omitted. Example: If you have three tpr files, each containing 4 pull
coordinates, but only pull coordinates 1 and 2 should be used, coordsel.dat looks like
this:

1 1 0 0
1 1 0 0
1 1 0 0

By default, the output files are:

``-o`` PMF output file
``-hist`` Histograms output file

Always check whether the histograms sufficiently overlap.

The umbrella potential is assumed to be harmonic and the force constants are read from the
.tpr or .pdo files. If a non-harmonic umbrella force was applied a tabulated potential can
be provided with -tab.

WHAM options
• -bins Number of bins used in analysis

• -temp Temperature in the simulations

• -tol Stop iteration if profile (probability) changed less than tolerance

• -auto Automatic determination of boundaries

• -min,-max Boundaries of the profile

The data points that are used to compute the profile can be restricted with options -b,
-e, and -dt. Adjust -b to ensure sufficient equilibration in each umbrella window.

With -log (default) the profile is written in energy units, otherwise (with -nolog) as
probability. The unit can be specified with -unit. With energy output, the energy in the
first bin is defined to be zero. If you want the free energy at a different position to
be zero, set -zprof0 (useful with bootstrapping, see below).

For cyclic or periodic reaction coordinates (dihedral angle, channel PMF without osmotic
gradient), the option -cycl is useful. gmx wham will make use of the periodicity of the
system and generate a periodic PMF. The first and the last bin of the reaction coordinate
will assumed be be neighbors.

Option -sym symmetrizes the profile around z=0 before output, which may be useful for,
e.g. membranes.

Parallelization
If available, the number of OpenMP threads used by gmx wham can be controlled by setting
the OMP_NUM_THREADS environment variable.

Autocorrelations
With -ac, gmx wham estimates the integrated autocorrelation time (IACT) tau for each
umbrella window and weights the respective window with 1/[1+2*tau/dt]. The IACTs are
written to the file defined with -oiact. In verbose mode, all autocorrelation functions
(ACFs) are written to hist_autocorr.xvg. Because the IACTs can be severely underestimated
in case of limited sampling, option -acsig allows one to smooth the IACTs along the
reaction coordinate with a Gaussian (sigma provided with -acsig, see output in iact.xvg).
Note that the IACTs are estimated by simple integration of the ACFs while the ACFs are
larger 0.05. If you prefer to compute the IACTs by a more sophisticated (but possibly
less robust) method such as fitting to a double exponential, you can compute the IACTs
with gmx analyze and provide them to gmx wham with the file iact-in.dat (option -iiact),
which should contain one line per input file (.pdo or pullx/f file) and one column per
pull coordinate in the respective file.

Error analysis
Statistical errors may be estimated with bootstrap analysis. Use it with care, otherwise
the statistical error may be substantially underestimated. More background and examples
for the bootstrap technique can be found in Hub, de Groot and Van der Spoel, JCTC (2010)
6: 3713-3720. -nBootstrap defines the number of bootstraps (use, e.g., 100). Four
bootstrapping methods are supported and selected with -bs-method.

• b-hist Default: complete histograms are considered as independent data points, and the
bootstrap is carried out by assigning random weights to the histograms (“Bayesian
bootstrap”). Note that each point along the reaction coordinate must be covered by
multiple independent histograms (e.g. 10 histograms), otherwise the statistical error is
underestimated.

• hist Complete histograms are considered as independent data points. For each
bootstrap, N histograms are randomly chosen from the N given histograms (allowing
duplication, i.e. sampling with replacement). To avoid gaps without data along the
reaction coordinate blocks of histograms (-histbs-block) may be defined. In that case,
the given histograms are divided into blocks and only histograms within each block are
mixed. Note that the histograms within each block must be representative for all
possible histograms, otherwise the statistical error is underestimated.

• traj The given histograms are used to generate new random trajectories, such that the
generated data points are distributed according the given histograms and properly
autocorrelated. The autocorrelation time (ACT) for each window must be known, so use -ac
or provide the ACT with -iiact. If the ACT of all windows are identical (and known), you
can also provide them with -bs-tau. Note that this method may severely underestimate
the error in case of limited sampling, that is if individual histograms do not represent
the complete phase space at the respective positions.

• traj-gauss The same as method traj, but the trajectories are not bootstrapped from the
umbrella histograms but from Gaussians with the average and width of the umbrella
histograms. That method yields similar error estimates like method traj.

Bootstrapping output:

• -bsres Average profile and standard deviations

• -bsprof All bootstrapping profiles

With -vbs (verbose bootstrapping), the histograms of each bootstrap are written, and, with
bootstrap method traj, the cumulative distribution functions of the histograms.

OPTIONS

       Options to specify input files:

       -ix [<.dat>] (pullx-files.dat) (Optional)
              Generic data file

       -if [<.dat>] (pullf-files.dat) (Optional)
              Generic data file

       -it [<.dat>] (tpr-files.dat) (Optional)
              Generic data file

       -ip [<.dat>] (pdo-files.dat) (Optional)
              Generic data file

       -is [<.dat>] (coordsel.dat) (Optional)
              Generic data file

       -iiact [<.dat>] (iact-in.dat) (Optional)
              Generic data file

       -tab [<.dat>] (umb-pot.dat) (Optional)
              Generic data file

       Options to specify output files:

       -o [<.xvg>] (profile.xvg)
              xvgr/xmgr file

       -hist [<.xvg>] (histo.xvg)
              xvgr/xmgr file

       -oiact [<.xvg>] (iact.xvg) (Optional)
              xvgr/xmgr file

       -bsres [<.xvg>] (bsResult.xvg) (Optional)
              xvgr/xmgr file

       -bsprof [<.xvg>] (bsProfs.xvg) (Optional)
              xvgr/xmgr file

       Other options:

       -xvg <enum> (xmgrace)
              xvg plot formatting: xmgrace, xmgr, none

       -min <real> (0)
              Minimum coordinate in profile

       -max <real> (0)
              Maximum coordinate in profile

       -[no]auto (yes)
              Determine min and max automatically

       -bins <int> (200)
              Number of bins in profile

       -temp <real> (298)
              Temperature

       -tol <real> (1e-06)
              Tolerance

       -[no]v (no)
              Verbose mode

       -b <real> (50)
              First time to analyse (ps)

       -e <real> (1e+20)
              Last time to analyse (ps)

       -dt <real> (0)
              Analyse only every dt ps

       -[no]histonly (no)
              Write histograms and exit

       -[no]boundsonly (no)
              Determine min and max and exit (with -auto)

       -[no]log (yes)
              Calculate the log of the profile before printing

       -unit <enum> (kJ)
              Energy unit in case of log output: kJ, kCal, kT

       -zprof0 <real> (0)
              Define profile to 0.0 at this position (with -log)

       -[no]cycl (no)
              Create cyclic/periodic profile. Assumes min and max are the same point.

       -[no]sym (no)
              Symmetrize profile around z=0

       -[no]ac (no)
              Calculate integrated autocorrelation times and use in wham

       -acsig <real> (0)
              Smooth autocorrelation times along reaction coordinate with Gaussian of this sigma

       -ac-trestart <real> (1)
              When computing autocorrelation functions, restart computing every .. (ps)

       -nBootstrap <int> (0)
              nr of bootstraps to estimate statistical uncertainty (e.g., 200)

       -bs-method <enum> (b-hist)
              Bootstrap method: b-hist, hist, traj, traj-gauss

       -bs-tau <real> (0)
              Autocorrelation  time  (ACT)  assumed  for all histograms. Use option -ac if ACT is
              unknown.

       -bs-seed <int> (-1)
              Seed for bootstrapping. (-1 = use time)

       -histbs-block <int> (8)
              When mixing histograms only mix within blocks of -histbs-block.

       -[no]vbs (no)
              Verbose bootstrapping. Print the CDFs and a histogram file for each bootstrap.

COPYRIGHT

       2018, GROMACS development team

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

COPYRIGHT