![]() |
ForceBalance
1.7.4
Automated optimization of force fields and empirical potentials
|
This is a tutorial page, but if you haven't installed ForceBalance yet please go to the Installation page first.
It is very much in process, and there are many more examples to come.
After everything is installed, go to the studies/001_water_tutorial
directory in the distribution. Extract the targets.tar.bz2
archive file. Now execute:
ForceBalance.py very_simple.in
If the installation was successful, you will get an output file similar to very_simple.out
. What follows is a description of the output file and what ForceBalance is actually doing.
ForceBalance begins by reading the force field files from the forcefield
directory. The parameters to be optimized are specified in the parameter file by adding a special comment inside the file. For example, in the water.itp
file, we specify that the two van der Waals parameters on oxygen are to be optimized, using the following syntax:
OW 8 15.99940 0.000 A 3.15365e-01 6.48520e-01 ; PRM 5 6
The comment PRM 5 6
signals that "the parameters in fields 5
and 6 are to be optimized." The force field parser stores the physical value of the parameter and gives the parameter a name. These are printed out in the output file:
Reading force field from file: water.itp #=========================================================# #| Starting parameter indices, physical values and IDs |# #=========================================================# 0 [ 3.1537e-01 ] : VDWS:OW 1 [ 6.4852e-01 ] : VDWT:OW 2 [ 9.5720e-02 ] : BONDSB:HWOW 3 [ 5.0242e+05 ] : BONDSK:HWOW 4 [ 1.0452e+02 ] : ANGLESB:HWOWHW 5 [ 6.2802e+02 ] : ANGLESK:HWOWHW 6 [ 5.2000e-01 ] : COUL:SOL-2 COUL:SOL-3 7 [ -1.0400e+00 ] : COUL:SOL-4 8 [ 1.2800e-01 ] : VSITE3B:SOL-4 VSITE3A:SOL-4 -----------------------------------------------------------
The next section it prints out are a set of rescaling factors which are important for various aspects of the optimization. They are discussed further in this forum post. For now it suffices to say that these values represent the natural size of the parameter, or more specifically how much the parameter is expected to vary.
#========================================================# #| Rescaling Factors (Lower Takes Precedence): |# #========================================================# BONDSB : 5.29177e-02 BONDSK : 9.37583e+05 VSITE3A : 5.29177e-02 VSITE3B : 5.29177e-02 ANGLESB : 5.72958e+01 VDWS : 5.29177e-02 ANGLESK : 6.05928e+02 COUL : 1.00000e+00 VDWT : 2.47894e+00 ----------------------------------------------------------
Next, it prints out user-specified options that pertain to the force field the targets, the objective function and the optimizer. Options that are left at their default values (in this case, most) are not printed out. Use verbose_options True
in the input file to print out all of the options.
Now for the good stuff - the optimizer begins. ForceBalance computes each target and prints out an indicator for each one, then provides a breakdown of the overall objective function:
#========================================================# #| Main Optimizer |# #| Newton-Raphson Mode (Adaptive Radius) |# #========================================================# #=======================================================================# #| Target: cluster-06 Type: AbInitio_GMX Objective = 1.12035e-01 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 9.4124 27.3135 34.4605% Gradient (kJ/mol/A) 39.1963 119.0841 32.9148% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-12 Type: AbInitio_GMX Objective = 1.04039e-01 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 15.2291 47.3455 32.1658% Gradient (kJ/mol/A) 38.5401 118.0240 32.6545% ------------------------------------------------------------------------- #====================================================================# #| Objective Function Breakdown |# #| Target Name Residual x Weight = Contribution |# #====================================================================# cluster-06 0.11203 0.500 5.60173e-02 cluster-12 0.10404 0.500 5.20195e-02 Regularization 0.00000 1.000 0.00000e+00 Total 1.08037e-01 ---------------------------------------------------------------------- Step |k| |dk| |grad| -=X2=- Delta(X2) StepQual 0 0.000e+00 0.000e+00 3.206e+00 1.08037e-01 0.000e+00 0.000
In this example job, the targets were QM energies and forces for clusters of 6 and 12 water molecules. In the initial step (using the default TIP4P parameters) and for the first target (6-mers), the RMS error for energies is 9.4124 kJ/mol (34% of the variance in the QM energies themselves), and the RMS error for atomistic forces is 32% (again, scaled to the variance of the QM forces). Similar information is printed out for the 12-mers. Each target contributes to the overall objective function, whose value is 1.080e-01. The parameters are at their initial values, which means that any penalty function will have a value of zero (the Regularization term).
Next, ForceBalance takes a step in the parameter space. The default algorithm (a variation of Newton-Raphson) uses first and second derivative information; the gradient is printed to the screen, as is the parameter step.
#========================================================# #| Total Gradient |# #========================================================# 0 [ -1.36605285e-01 ] : VDWS:OW 1 [ -2.24335748e-01 ] : VDWT:OW 2 [ -3.14688760e+00 ] : BONDSB:HWOW 3 [ 3.54975985e-01 ] : BONDSK:HWOW 4 [ -3.24607484e-01 ] : ANGLESB:HWOWHW 5 [ 8.92900123e-02 ] : ANGLESK:HWOWHW 6 [ -7.50893285e-02 ] : COUL:SOL-2 COUL:SOL-3 7 [ -2.44318391e-01 ] : COUL:SOL-4 8 [ -2.23561237e-02 ] : VSITE3B:SOL-4 VSITE3A:SOL-4 ---------------------------------------------------------- Levenberg-Marquardt: Newton-Raphson step found (length 1.000e-01), 0.92958359 added to Hessian diagonal #========================================================# #| Mathematical Parameters (Current + Step = Next) |# #========================================================# 0 [ 0.0000e+00 + 2.3057e-02 = 2.3057e-02 ] : VDWS:OW 1 [ 0.0000e+00 + 3.7320e-02 = 3.7320e-02 ] : VDWT:OW 2 [ 0.0000e+00 + 5.7207e-03 = 5.7207e-03 ] : BONDSB:HWOW 3 [ 0.0000e+00 - 3.8228e-02 = -3.8228e-02 ] : BONDSK:HWOW 4 [ 0.0000e+00 + 8.6160e-03 = 8.6160e-03 ] : ANGLESB:HWOWHW 5 [ 0.0000e+00 - 7.4597e-02 = -7.4597e-02 ] : ANGLESK:HWOWHW 6 [ 0.0000e+00 - 7.4155e-03 = -7.4155e-03 ] : COUL:SOL-2 COUL:SOL-3 7 [ 0.0000e+00 + 2.9107e-02 = 2.9107e-02 ] : COUL:SOL-4 8 [ 0.0000e+00 + 6.3479e-03 = 6.3479e-03 ] : VSITE3B:SOL-4 VSITE3A:SOL-4 ---------------------------------------------------------- #========================================================# #| Physical Parameters (Current + Step = Next) |# #========================================================# 0 [ 3.1537e-01 + 1.2201e-03 = 3.1659e-01 ] : VDWS:OW 1 [ 6.4852e-01 + 9.2513e-02 = 7.4103e-01 ] : VDWT:OW 2 [ 9.5720e-02 + 3.0273e-04 = 9.6023e-02 ] : BONDSB:HWOW 3 [ 5.0242e+05 - 3.5842e+04 = 4.6657e+05 ] : BONDSK:HWOW 4 [ 1.0452e+02 + 4.9366e-01 = 1.0501e+02 ] : ANGLESB:HWOWHW 5 [ 6.2802e+02 - 4.5200e+01 = 5.8282e+02 ] : ANGLESK:HWOWHW 6 [ 5.2000e-01 - 7.4155e-03 = 5.1258e-01 ] : COUL:SOL-2 COUL:SOL-3 7 [ -1.0400e+00 + 2.9107e-02 = -1.0109e+00 ] : COUL:SOL-4 8 [ 1.2800e-01 + 3.3591e-04 = 1.2834e-01 ] : VSITE3B:SOL-4 VSITE3A:SOL-4 ----------------------------------------------------------
Note that the step length is limited to a "trust radius" of 0.1 - this option is tunable. The step in parameter space is given in terms of the "mathematical parameters" - the internal optimization variables - and the "physical parameters" which are the actual values in the force field files. The mathematical parameters are mainly useful because they can be used to restart an optimization by creating a read_mvals
section to the input file and pasting the lines from the output.
ForceBalance now computes the objective function again, using the new parameter values.
#=======================================================================# #| Target: cluster-06 Type: AbInitio_GMX Objective = 7.55909e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 8.0920 27.3135 29.6263% Gradient (kJ/mol/A) 30.1378 119.0841 25.3080% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-12 Type: AbInitio_GMX Objective = 7.17029e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 13.2306 47.3455 27.9447% Gradient (kJ/mol/A) 30.1330 118.0240 25.5312% ------------------------------------------------------------------------- #===================================================================================# #| Objective Function Breakdown |# #| Target Name Residual x Weight = Contribution (Current-Prev) |# #===================================================================================# cluster-06 0.07559 0.500 3.77955e-02 ( -1.822e-02 ) cluster-12 0.07170 0.500 3.58514e-02 ( -1.617e-02 ) Regularization 0.00010 1.000 9.99999e-05 ( +1.000e-04 ) Total 7.37469e-02 ( -3.429e-02 ) ------------------------------------------------------------------------------------- Step |k| |dk| |grad| -=X2=- Delta(X2) StepQual 1 1.000e-01 1.000e-01 1.941e-01 7.37469e-02 3.429e-02 1.001
Using the new parameter values, the values for each target have gone down - that is to say, the force field now produces better agreement with the reference data. In the objective function breakdown, improvements (i.e. decreasing values) from the previous step are printed in green while increasing values are printed in red. The "Regularization" term is printed in red because the parameters have moved from their initial values, so the penalty function is now finite.
The last line reports:
Step
) |k|
) |dk|
) |grad|
) -=X2=-
) StepQual
value of 1.0 signifies that the trust radius can be increased.Eventually, the optimization will converge. For this job (and when this documentation was written) it took five steps:
#=======================================================================# #| Target: cluster-06 Type: AbInitio_GMX Objective = 6.30676e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 7.9083 27.3135 28.9539% Gradient (kJ/mol/A) 24.1991 119.0841 20.3210% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-12 Type: AbInitio_GMX Objective = 5.89806e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 12.9053 47.3455 27.2578% Gradient (kJ/mol/A) 24.3021 118.0240 20.5908% ------------------------------------------------------------------------- #===================================================================================# #| Objective Function Breakdown |# #| Target Name Residual x Weight = Contribution (Current-Prev) |# #===================================================================================# cluster-06 0.06307 0.500 3.15338e-02 ( -4.544e-07 ) cluster-12 0.05898 0.500 2.94903e-02 ( -7.454e-06 ) Regularization 0.00170 1.000 1.70115e-03 ( +7.704e-06 ) Total 6.27252e-02 ( -2.045e-07 ) ------------------------------------------------------------------------------------- Step |k| |dk| |grad| -=X2=- Delta(X2) StepQual 5 4.124e-01 2.498e-03 2.130e-05 6.27252e-02 2.045e-07 1.029 Convergence criterion reached for gradient norm (1.00e-04) @========================================================@ @| Final objective function value |@ @| Full: 6.272524e-02 Un-penalized: 6.102410e-02 |@ @========================================================@ #========================================================# #| Final optimization parameters: |# #| Paste to input file to restart |# #========================================================# read_mvals 0 [ 3.3161e-02 ] : VDWS:OW 1 [ 4.3311e-02 ] : VDWT:OW 2 [ 5.5070e-03 ] : BONDSB:HWOW 3 [ -4.5933e-02 ] : BONDSK:HWOW 4 [ 1.5497e-02 ] : ANGLESB:HWOWHW 5 [ -3.7655e-01 ] : ANGLESK:HWOWHW 6 [ 2.4929e-03 ] : COUL:SOL-2 COUL:SOL-3 7 [ 1.1874e-02 ] : COUL:SOL-4 8 [ 1.5108e-01 ] : VSITE3B:SOL-4 VSITE3A:SOL-4 /read_mvals #========================================================# #| Final physical parameters: |# #========================================================# 0 [ 3.1712e-01 ] : VDWS:OW 1 [ 7.5589e-01 ] : VDWT:OW 2 [ 9.6011e-02 ] : BONDSB:HWOW 3 [ 4.5935e+05 ] : BONDSK:HWOW 4 [ 1.0541e+02 ] : ANGLESB:HWOWHW 5 [ 3.9986e+02 ] : ANGLESK:HWOWHW 6 [ 5.2249e-01 ] : COUL:SOL-2 COUL:SOL-3 7 [ -1.0281e+00 ] : COUL:SOL-4 8 [ 1.3599e-01 ] : VSITE3B:SOL-4 VSITE3A:SOL-4 ---------------------------------------------------------- The final force field has been printed to the 'result' directory. #========================================================# #| Congratulations, ForceBalance has finished |# #| Give yourself a pat on the back! |# #========================================================#
As you can see, the objective function has decreased considerably since the previous step, and most of the improvement was due to reducing the error in the atomistic forces. In the result
directory, you will find an updated water.itp
file with the optimized parameter values.
This newly generated force field is a better fit to the reference data, but is it actually a better force field or did we just overfit the data? To answer this question, look at validate.in
where the job type is set to single
, and there are many more targets. In particular, we are now including QM energies and forces for many cluster sizes ranging from 2 to 12.
Take the read_mvals
section from the output file of your previous run, paste it intointo the $options
section of validate.in
, and run ForceBalance.py validate.in
. ForceBalance will now evaluate the objective function using the force field parameters from the previous optimization.
You should see the following output:
#=======================================================================# #| Target: cluster-02 Type: AbInitio_GMX Objective = 6.59279e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 2.6852 8.9926 29.8605% Gradient (kJ/mol/A) 24.4491 120.0100 20.3725% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-03 Type: AbInitio_GMX Objective = 6.86838e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 4.1222 13.3676 30.8370% Gradient (kJ/mol/A) 24.1603 119.6514 20.1922% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-04 Type: AbInitio_GMX Objective = 6.99336e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 5.3337 17.0641 31.2567% Gradient (kJ/mol/A) 24.4117 121.2622 20.1313% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-05 Type: AbInitio_GMX Objective = 6.83413e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 6.4445 20.9275 30.7946% Gradient (kJ/mol/A) 24.2035 120.2407 20.1292% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-06 Type: AbInitio_GMX Objective = 6.30676e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 7.9083 27.3135 28.9539% Gradient (kJ/mol/A) 24.1990 119.0841 20.3209% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-07 Type: AbInitio_GMX Objective = 7.23291e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 9.0541 28.5018 31.7666% Gradient (kJ/mol/A) 24.5603 119.6730 20.5228% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-08 Type: AbInitio_GMX Objective = 6.47534e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 9.9105 33.9780 29.1676% Gradient (kJ/mol/A) 24.5318 118.7419 20.6598% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-09 Type: AbInitio_GMX Objective = 6.31661e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 10.6633 37.0249 28.8003% Gradient (kJ/mol/A) 24.5632 119.9943 20.4703% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-10 Type: AbInitio_GMX Objective = 6.16248e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 11.3800 40.4430 28.1383% Gradient (kJ/mol/A) 24.6246 119.4050 20.6227% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-11 Type: AbInitio_GMX Objective = 5.88603e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 12.7370 47.0656 27.0623% Gradient (kJ/mol/A) 24.6241 118.6740 20.7493% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-12 Type: AbInitio_GMX Objective = 5.89805e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 12.9053 47.3455 27.2578% Gradient (kJ/mol/A) 24.3021 118.0240 20.5908% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-13 Type: AbInitio_GMX Objective = 5.94277e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 13.7931 50.5451 27.2887% Gradient (kJ/mol/A) 24.7275 119.3178 20.7241% ------------------------------------------------------------------------- #=======================================================================# #| Target: cluster-14 Type: AbInitio_GMX Objective = 5.53305e-02 |# #| Difference Denominator Percent |# #| Physical Variable (Calc-Ref) RMS (Ref) Difference |# #=======================================================================# Energy (kJ/mol) 14.1450 54.7952 25.8143% Gradient (kJ/mol/A) 24.6526 119.0962 20.6997% ------------------------------------------------------------------------- #====================================================================# #| Objective Function Breakdown |# #| Target Name Residual x Weight = Contribution |# #====================================================================# cluster-02 0.06593 0.077 5.07138e-03 cluster-03 0.06868 0.077 5.28337e-03 cluster-04 0.06993 0.077 5.37951e-03 cluster-05 0.06834 0.077 5.25702e-03 cluster-06 0.06307 0.077 4.85135e-03 cluster-07 0.07233 0.077 5.56378e-03 cluster-08 0.06475 0.077 4.98103e-03 cluster-09 0.06317 0.077 4.85893e-03 cluster-10 0.06162 0.077 4.74037e-03 cluster-11 0.05886 0.077 4.52772e-03 cluster-12 0.05898 0.077 4.53697e-03 cluster-13 0.05943 0.077 4.57136e-03 cluster-14 0.05533 0.077 4.25619e-03 Regularization 0.00170 1.000 1.70118e-03 Total 6.55802e-02 ----------------------------------------------------------------------
As you can see, the agreement is comparable for all of the cluster sizes, and this effectively means that we were able to achieve an accurate fit to QM energies and forces for a wide range of cluster sizes using only the 6-mers and 12-mers.
A truly good force field needs to accurately reproduce experimental measurements, but these are more difficult to compute and optimize. ForceBalance provides methods for optimizing using experimental targets, but it is beyond the scope of this tutorial. However, hopefully this simple example helps to explain how force field optimization works within the framework of ForceBalance.
Feel free to explore using the other provided input files:
0.energy_force.in
uses all of the targets - cluster sizes 2 through 14 - in the optimization. 1.netforce_torque.in
includes net forces on water molecules and torques in the optimization. 2.L1_penalty.in
uses a L1 penalty function, effectively causing only some parameters to change and not others. 3.no_penalty.in
illustrates what happens when no penalty function is used at all. 4.change_factor.in
shows the effect of changing the rescaling factors. 5.gradient.in
performs a finite-difference check on the objective function gradient.