ForceBalance  1.7.4
Automated optimization of force fields and empirical potentials
Usage

This page describes how to use the ForceBalance software.

A good starting point for using this software package is to run the scripts contained in the bin directory on the example jobs in the studies directory.

ForceBalance.py is the main executable script for force field optimization. It requires an input file and a Directory structure. MakeInputFile.py will create an example input file containing all options, their default values, and a short description for each option.

Input file

A minimal input file for ForceBalance might look something like this:

 $options
 jobtype newton
 forcefield water.itp
 $end
 
 $target
 name cluster-02
 type abinitio_gmx
 $end
 
 $target
 name cluster-03
 type abinitio_gmx
 $end

Global options for a ForceBalance job are given in the $options section while the settings for each Target are given in the $target sections. These are the only two section types.

The most important general options to note are: jobtype specifies the optimization algorithm to use and forcefield specifies the force field file name (there may be more than one of these). The most important target options to note are: name specifies the target name and type specifies the type of target (must correspond to a subdirectory in targets/ ). All options are explained in the Option Index.

Directory structure

The directory structure for our example job would look like:

 <root>
   +- forcefield
   |   |- water.itp
   +- targets
   |   +- cluster-02
   |   |   +- settings (contains job settings)
   |   |   |   |- shot.mdp
   |   |   |   |- topol.top
   |   |   |- all.gro (contains geometries)
   |   |   |- qdata.txt (contains QM data)
   |   +- cluster-03
   |   |   +- settings
   |   |   |   |- shot.mdp
   |   |   |   |- topol.top
   |   |   |- all.gro
   |   |   |- qdata.txt
   |   +- <more target directories>
   |- input.in
   +- temp
   |   |- iter_0001
   |   |- iter_0002
   |   |   |- <files generated during runtime>
   +- result
   |   |- water.itp (Optimized force field, generated on completion)
   |- input.in (ForceBalance input file)

The top-level directory names forcefield and targets are fixed and cannot be changed. forcefield contains the force field files that you're optimizing, and targets contains all of the reference data as well as the input files for simulating that data using the force field. Each subdirectory in targets corresponds to a single target, and its contents depend on the specific kind of target and its corresponding Target class.

The temp directory is the temporary workspace of the program, and the result directory is where the optimized force field files are deposited after the optimization job is done. These two directories are created if not already there.

Note the force field file, water.itp and the two fitting targets cluster-02 and cluster-03 match the target sections in the input file. There are two energy and force matching targets here; each directory contains the relevant geometries (in all.gro ) and reference data (in qdata.txt ).

Setting up the targets

There are many targets one can choose from.

One feature of ForceBalance is that targets can be linearly combined to produce an aggregate objective function. For example, our recently developed polarizable water model contains energy and force matching, binding energies, normal mode frequencies, density, and enthalpy of vaporization. With the AMOEBA functional form and 19 adjustable parameters, we developed a highly accurate model that fitted all of these properties to very high accuracy.

Due to the diverse nature of these calculations, they need to be set up in a specific way that is recognized by ForceBalance. The setup is different for each type of simulation, and we invite you to learn by example through looking at the files in the studies directory.

Energy and force matching

In these relatively simple simulations, the objective function is computed from the squared difference in the potential energy and forces (gradients) between the force field and reference (QM) method, evaluated at a number of stored geometries called snapshots. The mathematics are implemented in abinitio.py while the interfaces to simulation software exist in derived classes in gmxio.py, tinkerio.py, amberio.py and openmmio.py.

All energy and force matching targets require a coordinate trajectory file (all.gro ) and a quantum data file (qdata.txt ). The coordinate trajectory file contains the Cartesian coordinates of the snapshots, preferably in the file format of the simulation software used (all.gro is the most extensively tested.) The quantum data file is formatted according to a very simple specification:

 JOB 0
 COORDS x1 y1 z1 x2 y2 z2 ... (floating point numbers in Angstrom)
 ENERGY (floating point number in Hartree)
 FORCES fx1 fy1 fz1 ... (floating point numbers in Hartree/bohr - this is a misnomer because they are actually gradients, which differ by a sign from the forces.)
 
 JOB 1
 ...

The coordinates in the quantum data file should be consistent with the coordinate trajectory file, although ForceBalance will use the latter most of the time. It is easy to generate the quantum data file from parsing the output of quantum chemistry software. ForceBalance contains methods for parsing Q-Chem output files in the molecule.py class.

In addition to all.gro and qdata.txt , the simulation setup files are required. These contain settings needed by the simulation software for the calculation to run. For Gromacs calculations, a topology (.top) file and a run parameter file (.mdp) are required. These should be placed in the settings subdirectory within the directory belonging to the target.

As a side note: If you wish to tune a number in the .mdp file, simply move it to the forcefield directory and specify it as a force field file. ForceBalance will now be able to tune any highlighted parameters in the file, although it will also place copies of this file in all target directories within temp while the program is running.

potentials

ForceBalance contains methods for evaluating electrostatic potentials given a collection of point charges. At this time, this functionality is very experimental and risky to use for systems containing more than one molecule. This is because ForceBalance evaluates the electrostatic potentials internally, and we don't have an infrastructure for building a full topology consisting of many molecules. Currently, we assume that the electrostatic potential fitting contains only one molecule.

Once again, the coordinate trajectory file and quantum data files are used to specify the calculation. However, now the coordinates for evaluating the potential, and the reference potential values, are included:

 JOB 0
 COORDS x1 y1 z1 x2 y2 z2 ...
 ENERGY e
 FORCES fx1 fy1 fz1 ...
 ESPXYZ ex1 ey1 ez1 ex2 ey2 ez2 ... 
 ESPVAL ev1 ev2 ev3 ..

Running the optimization

To run ForceBalance, make sure the calculation is set up properly (refer to the above sections), and then type in:

ForceBalance.py input.in

In general it's impossible to set up a calculation perfectly the first time, in which case the calculation will crash. ForceBalance will try to print helpful error messages to guide you toward setting up your calculation properly.

Further example inputs and outputs are given in the Tutorial section.