ForceBalance API  1.3
Automated optimization of force fields and empirical potentials
ffyapf.py
Go to the documentation of this file.
1 """ @package forcebalance.forcefield
2 
3 Force field module.
4 
5 In ForceBalance a 'force field' is built from a set of files containing
6 physical parameters. These files can be anything that enter into any
7 computation - our original program was quite dependent on the GROMACS
8 force field format, but this program is set up to allow very general
9 input formats.
10 
11 We introduce several important concepts:
12 
13 1) Adjustable parameters are allocated into a vector.
14 
15 To cast the force field optimization as a math problem, we treat all
16 of the parameters on equal footing and write them as indices in a parameter vector.
17 
18 2) A mapping from interaction type to parameter number.
19 
20 Each element in the parameter vector corresponds to one or more
21 interaction types. Whenever we change the parameter vector and
22 recompute the objective function, this amounts to changing the
23 physical parameters in the simulations, so we print out new
24 force field files for external programs. In addition, when
25 these programs are computing the objective function we are often
26 in low-level subroutines that compute terms in the energy and
27 force. If we need an analytic derivative of the objective
28 function, then these subroutines need to know which index of the
29 parameter vector needs to be modified.
30 
31 This is done by way of a hash table: for example, when we are
32 computing a Coulomb interaction between atom 4 and atom 5, we
33 can build the words 'COUL4' and 'COUL5' and look it up in the
34 parameter map; this gives us two numbers (say, 10 and 11)
35 corresponding to the eleventh and twelfth element of the
36 parameter vector. Then we can compute the derivatives of
37 the energy w/r.t. these parameters (in this case, COUL5/rij
38 and COUL4/rij) and increment these values in the objective function
39 gradient.
40 
41 In custom-implemented force fields (see counterpoisematch.py)
42 the hash table can also be used to look up parameter values for
43 computation of interactions. This is probably not the fastest way
44 to do things, however.
45 
46 3) Distinction between physical and mathematical parameters.
47 
48 The optimization algorithm works in a space that is related to, but
49 not exactly the same as the physical parameter space. The reasons
50 for why we do this are:
51 
52 a) Each parameter has its own physical units. On the one hand
53 it's not right to treat different physical units all on the same
54 footing, so nondimensionalization is desirable. To make matters
55 worse, the force field parameters can be small as 1e-8 or as
56 large as 1e+6 depending on the parameter type. This means the
57 elements of the objective function gradient / Hessian have
58 elements that differ from each other in size by 10+ orders of
59 magnitude, leading to mathematical instabilities in the optimizer.
60 
61 b) The parameter space can be constrained, most notably for atomic
62 partial charges where we don't want to change the overall charge
63 on a molecule. Thus we wish to project out certain movements
64 in the mathematical parameters such that they don't change the physical
65 parameters.
66 
67 c) We wish to regularize our optimization so as to avoid changing
68 our parameters in very insensitive directions (linear dependencies).
69 However, the sensitivity of the objective function to changes in the
70 force field depends on the physical units!
71 
72 For all of these reasons, we introduce a 'transformation matrix'
73 which maps mathematical parameters onto physical parameters. The
74 diagonal elements in this matrix are rescaling factors; they take the
75 mathematical parameter and magnify it by this constant factor. The
76 off-diagonal elements correspond to rotations and other linear
77 transformations, and currently I just use them to project out the
78 'increase the net charge' direction in the physical parameter space.
79 
80 Note that with regularization, these rescaling factors are equivalent
81 to the widths of prior distributions in a maximum likelihood framework.
82 Because there is such a correspondence between rescaling factors and
83 choosing a prior, they need to be chosen carefully. This is work in
84 progress. Another possibility is to sample the width of the priors from
85 a noninformative distribution -- the hyperprior (we can choose the
86 Jeffreys prior or something). This is work in progress.
87 
88 Right now only GROMACS parameters are supported, but this class is extensible,
89 we need more modules!
90 
91 @author Lee-Ping Wang
92 @date 04/2012
93 
94 """
95 from __future__ import division
96 
97 from builtins import zip
98 from builtins import str
99 from builtins import range
100 import os
101 import sys
102 import numpy as np
103 from numpy import sin, cos, tan, sinh, cosh, tanh, exp, log, sqrt, pi
104 from re import match, sub, split
105 import forcebalance
106 from forcebalance import gmxio, qchemio, tinkerio, custom_io, openmmio, amberio, psi4io, smirnoffio
107 from forcebalance.finite_difference import in_fd
108 from forcebalance.nifty import *
109 # from string import count
110 from copy import deepcopy
111 import traceback
112 import itertools
113 from collections import OrderedDict, defaultdict
114 from forcebalance.output import getLogger
115 logger = getLogger(__name__)
117 try:
118  from lxml import etree
119 except:
120  logger.warning("Failed to import lxml module, needed by OpenMM engine")
121 
122 FF_Extensions = {
123  "itp": "gmx",
124  "top": "gmx",
125  "in": "qchem",
126  "prm": "tinker",
127  "gen": "custom",
128  "xml": "openmm",
129  "offxml": "smirnoff",
130  "frcmod": "frcmod",
131  "mol2": "mol2",
132  "gbs": "gbs",
133  "grid": "grid"
134 }
135 """ Recognized force field formats. """
136 FF_IOModules = {
138  "qchem": qchemio.QCIn_Reader,
139  "tinker": tinkerio.Tinker_Reader,
140  "custom": custom_io.Gen_Reader,
141  "openmm": openmmio.OpenMM_Reader,
142  "smirnoff": smirnoffio.SMIRNOFF_Reader,
143  "frcmod": amberio.FrcMod_Reader,
144  "mol2": amberio.Mol2_Reader,
145  "gbs": psi4io.GBS_Reader,
146  "grid": psi4io.Grid_Reader
147 }
148 
149 
150 def determine_fftype(ffname, verbose=False):
151  """ Determine the type of a force field file. It is possible to
152  specify the file type explicitly in the input file using the
153  syntax 'force_field.ext:type'. Otherwise this function will try
154  to determine the force field type by extension. """
155 
156  fsplit = ffname.split('/')[-1].split(':')
157  fftype = None
158  if verbose: logger.info("Determining file type of %s ..." % fsplit[0])
159  if len(fsplit) == 2:
160  if fsplit[1] in FF_IOModules:
161  if verbose: logger.info("We're golden! (%s)\n" % fsplit[1])
162  fftype = fsplit[1]
163  else:
164  if verbose:
165  logger.info(
166  "\x1b[91m Warning: \x1b[0m %s not in supported types (%s)!\n"
167  % (fsplit[1], ', '.join(FF_IOModules.keys())))
168  elif len(fsplit) == 1:
169  if verbose:
170  logger.info(
171  "Guessing from extension (you may specify type with filename:type) ..."
172  )
173  ffname = fsplit[0]
174  ffext = ffname.split('.')[-1]
175  if ffext in FF_Extensions:
176  guesstype = FF_Extensions[ffext]
177  if guesstype in FF_IOModules:
178  if verbose:
179  logger.info("guessing %s -> %s!\n" % (ffext, guesstype))
180  fftype = guesstype
181  else:
182  if verbose:
183  logger.info(
184  "\x1b[91m Warning: \x1b[0m %s not in supported types (%s)!\n"
185  % (fsplit[0], ', '.join(FF_IOModules.keys())))
186  else:
187  if verbose:
188  logger.info(
189  "\x1b[91m Warning: \x1b[0m %s not in supported extensions (%s)!\n"
190  % (ffext, ', '.join(FF_Extensions.keys())))
191  if fftype is None:
192  if verbose: logger.warning("Force field type not determined!\n")
193  #sys.exit(1)
194  return fftype
195 
196 
197 # Thanks to tos9 from #python on freenode. :)
198 class BackedUpDict(dict):
199  def __init__(self, backup_dict):
200  super(BackedUpDict, self).__init__()
201  self.backup_dict = backup_dict
202 
203  def __missing__(self, key):
204  try:
205  return self.backup_dict[self['AtomType']][key]
206  except:
207  logger.error(
208  'The key %s does not exist as an atom attribute or as an atom type attribute!\n'
209  % key)
210  raise KeyError
211 
212 
213 class FF(forcebalance.BaseClass):
214  """ Force field class.
215 
216  This class contains all methods for force field manipulation.
217  To create an instance of this class, an input file is required
218  containing the list of force field file names. Everything else
219  inside this class pertaining to force field generation is self-contained.
220 
221  For details on force field parsing, see the detailed documentation for addff.
222 
223  """
224 
225  def __init__(self, options, verbose=True, printopt=True):
226  """Instantiation of force field class.
227 
228  Many variables here are initialized to zero, but they are filled out by
229  methods like addff, rsmake, and mktransmat.
230 
231  """
232  super(FF, self).__init__(options)
233  #======================================#
234  # Options that are given by the parser #
235  #======================================#
236 
239  self.set_option(None, None, 'root', os.getcwd())
240 
241  self.set_option(options, 'forcefield', 'fnms')
242 
243  self.set_option(options, 'ffdir')
244 
245  self.set_option(options, 'priors')
246 
247  self.set_option(options, 'constrain_charge')
248 
249  self.set_option(options, 'logarithmic_map')
250 
251  self.set_option(options, 'amoeba_pol')
252 
253  self.set_option(options, 'amoeba_eps')
254 
255  self.set_option(options, 'rigid_water')
256 
257  self.set_option(options, 'use_pvals')
258 
259  self.set_option(options, 'duplicate_pnames')
260 
261  #======================================#
262  # Variables which are set here #
263  #======================================#
264  # A lot of these variables are filled out by calling the methods.
265 
266 
267  self.ffdata = {}
268  self.ffdata_isxml = {}
269 
270  self.map = {}
271 
272  self.plist = []
273 
274  self.patoms = []
275 
279  self.pfields = []
280 
281  self.rs = []
282 
283  self.tm = None
284 
285  self.tmI = None
286 
287  self.excision = None
288 
289  self.np = 0
290 
291  self.pvals0 = []
292 
293  self.Readers = OrderedDict()
294 
295  self.atomnames = []
296 
297  # Read the force fields into memory.
298  for fnm in self.fnms:
299  if verbose:
300  logger.info("Reading force field from file: %s\n" % fnm)
301  self.addff(fnm)
302 
303 
305  self.FFAtomTypes = OrderedDict()
306  for Reader in self.Readers.values():
307  for k, v in Reader.AtomTypes.items():
308  if k in self.FFAtomTypes:
310  'Trying to insert atomtype %s into the force field, but it is already there'
311  % k)
312  self.FFAtomTypes[k] = v
313  # This is an ordered dictionary of {'Molecule' : [{'AtomType' : string, 'ResidueNumber' : int, 'ResidueName' : string,
314  # 'AtomName' : string, 'ChargeGroup' : int, 'Charge' : float}]}
315  # Each value in the dictionary is a list of BackedUpDictionaries.
316  # If we query a BackedUpDictionary and the value does not exist,
317  # then it will query the backup dictionary using the 'AtomType' value.
318  # Thus, if we look up the mass of 'HW1' or 'HW2' in the dictionary, it will
319  # return the mass for 'HW' in the AtomTypes dictionary.
320  self.FFMolecules = OrderedDict()
321  for Reader in self.Readers.values():
322  for Molecule, AtomList in Reader.Molecules.items():
323  for FFAtom in AtomList:
324  FFAtomWithDefaults = BackedUpDict(self.FFAtomTypes)
325  for k, v in FFAtom.items():
326  FFAtomWithDefaults[k] = v
327  self.FFMolecules.setdefault(Molecule,
328  []).append(FFAtomWithDefaults)
329 
330  # Set the initial values of parameter arrays.
331 
332  self.pvals0 = np.array(self.pvals0)
333 
334  # Prepare various components of the class for first use.
335 
336  self.list_map()
337  if verbose:
338 
339  bar = printcool(
340  "Starting parameter indices, physical values and IDs")
341  self.print_map()
342  logger.info(bar)
343 
344  self.rsmake(printfacs=verbose)
345 
346  self.mktransmat()
347 
348  self.redirect = {}
349 
350  self.linedestroy_save = []
351  self.prmdestroy_save = []
353  self.prmdestroy_this = []
354 
355  if printopt:
357  self.PrintOptionDict, title="Setup for force field")
358 
359  @classmethod
360  def fromfile(cls, fnm):
361  ffdir = os.path.split(fnm)[0]
362  fnm = os.path.split(fnm)[1]
363  options = {
364  'forcefield': [fnm],
365  'ffdir': ffdir,
366  'duplicate_pnames': True
367  }
368  return cls(options, verbose=False, printopt=False)
369 
370  def __getstate__(self):
371  state = deepcopy(self.__dict__)
372  for ffname in self.ffdata:
373  if self.ffdata_isxml[ffname]:
374  temp = etree.tostring(self.ffdata[ffname])
375  del state['ffdata'][ffname]
376  state['ffdata'][ffname] = temp
377  return state
378 
379  def __setstate__(self, state):
380  self.__dict__.update(state)
381  for ffname in self.ffdata:
382  if self.ffdata_isxml[ffname]:
383  temp = etree.ElementTree(etree.fromstring(self.ffdata[ffname]))
384  self.ffdata[ffname] = temp
385 
386  def addff(self, ffname, xmlScript=False):
387  """ Parse a force field file and add it to the class.
388 
389  First, figure out the type of force field file. This is done
390  either by explicitly specifying the type using for example,
391  <tt> ffname force_field.xml:openmm </tt> or we figure it out
392  by looking at the file extension.
393 
394  Next, parse the file. Currently we support two classes of
395  files - text and XML. The two types are treated very
396  differently; for XML we use the parsers in libxml (via the
397  python lxml module), and for text files we have our own
398  in-house parsing class. Within text files, there is also a
399  specialized GROMACS and TINKER parser as well as a generic
400  text parser.
401 
402  The job of the parser is to determine the following things:
403  1) Read the user-specified selection of parameters being fitted
404  2) Build a mapping (dictionary) of <tt> parameter identifier -> index in parameter vector </tt>
405  3) Build a list of physical parameter values
406  4) Figure out where to replace the parameter values in the force field file when the values are changed
407  5) Figure out which parameters need to be repeated or sign-flipped
408 
409  Generally speaking, each parameter value in the force field
410  file has a <tt> unique parameter identifier <tt>. The
411  identifier consists of three parts - the interaction type, the
412  parameter subtype (within that interaction type), and the
413  atoms involved.
414 
415  --- If XML: ---
416 
417  The force field file is read in using the lxml Python module. Specify
418  which parameter you want to fit using by adding a 'parameterize' element
419  to the end of the force field XML file, like so.
420 
421  @code
422  <AmoebaVdwForce type="BUFFERED-14-7">
423  <Vdw class="74" sigma="0.2655" epsilon="0.056484" reduction="0.910" parameterize="sigma, epsilon, reduction" />
424  @endcode
425 
426  In this example, the parameter identifier would look like <tt> Vdw/74/epsilon </tt>.
427 
428  --- If GROMACS (.itp) or TINKER (.prm) : ---
429 
430  Follow the rules in the ITP_Reader or Tinker_Reader derived
431  class. Read the documentation in the class documentation or
432  the 'feed' method to learn more. In all cases the parameter
433  is tagged using <tt> # PRM 3 </tt> (where # denotes a comment,
434  the word PRM stays the same, and 3 is the field number starting
435  from zero.)
436 
437  --- If normal text : ---
438 
439  The parameter identifier is simply built using the file name,
440  line number, and field. Thus, the identifier is unique but
441  completely noninformative (which is not ideal for our
442  purposes, but it works.)
443 
444  --- Endif ---
445 
446  @warning My program currently assumes that we are only using one MM program per job.
447  If we use CHARMM and GROMACS to perform simulations as part of the same TARGET, we will
448  get messed up. Maybe this needs to be fixed in the future, with program prefixes to
449  parameters like C_ , G_ .. or simply unit conversions, you get the idea.
450 
451  @warning I don't think the multiplier actually works for analytic derivatives unless
452  the interaction calculator knows the multiplier as well. I'm sure I can make this
453  work in the future if necessary.
454 
455  @param[in] ffname Name of the force field file
456 
457  """
458  fftype = determine_fftype(ffname)
459  ffname = ffname.split(':')[0]
460 
461  # Set the Tinker PRM file, which will help for running programs like "analyze".
462  if fftype == "tinker":
463  if hasattr(self, "tinkerprm"):
465  "There should only be one TINKER parameter file")
466  else:
467  self.tinkerprm = ffname
468 
469  # Set the OpenMM XML file, which will help for running OpenMM.
470  if fftype == "openmm":
471  if hasattr(self, "openmmxml"):
473  "There should only be one OpenMM XML file - confused!!")
474  else:
475  self.openmmxml = ffname
476 
477  if fftype == "smirnoff":
478  if hasattr(self, "offxml"):
480  "There should only be one SMIRNOFF XML file - confused!!")
481  else:
482  self.offxml = ffname
483 
484  self.amber_mol2 = []
485  if fftype == "mol2":
486  self.amber_mol2.append(ffname)
488  self.amber_frcmod = []
489  if fftype == "frcmod":
490  self.amber_frcmod.append(ffname)
491 
492  # Determine the appropriate parser from the FF_IOModules dictionary.
493  # If we can't figure it out, then use the base reader, it ain't so bad. :)
494  Reader = FF_IOModules.get(fftype, forcebalance.BaseReader)
495 
496  # Open the force field using an absolute path and read its contents into memory.
497  absff = os.path.join(self.root, self.ffdir, ffname)
498 
499  # Create an instance of the Reader.
500  # The reader is essentially a finite state machine that allows us to
501  # build the pid.
502  self.Readers[ffname] = Reader(ffname)
503  if fftype in ["openmm", "smirnoff"]:
504 
505  try:
506  self.ffdata[ffname] = etree.parse(absff)
507  except NameError:
508  logger.error(
509  "If etree not defined, please check if lxml module has been installed"
510  )
511  raise
512  self.ffdata_isxml[ffname] = True
513  # Process the file
514  self.addff_xml(ffname)
515  else:
516 
517  self.ffdata[ffname] = [
518  line.expandtabs() for line in open(absff).readlines()
519  ]
520  self.ffdata_isxml[ffname] = False
521  # Process the file
522  self.addff_txt(ffname, fftype, xmlScript)
523  if hasattr(self.Readers[ffname], 'atomnames'):
524  if len(self.atomnames) > 0:
525  sys.stderr.write(
526  'Found more than one force field containing atom names; skipping the second one (%s)\n'
527  % ffname)
528  else:
529  self.atomnames += self.Readers[ffname].atomnames
530 
531  def check_dupes(self, pid, ffname, ln, pfld):
532  """ Check to see whether a given parameter ID already exists, and provide an alternate if needed. """
533  pid_ = pid
534 
535  have_pids = [f[0] for f in self.pfields]
536 
537  if pid in have_pids:
538  pid0 = pid
539  extranum = 0
540  dupfnms = [i[1] for i in self.pfields if pid == i[0]]
541  duplns = [i[2] for i in self.pfields if pid == i[0]]
542  dupflds = [i[3] for i in self.pfields if pid == i[0]]
543  while pid in have_pids:
544  pid = "%s%i" % (pid0, extranum)
545  extranum += 1
546 
547  def warn_or_err(*args):
548  if self.duplicate_pnames:
549  logger.warn(*args)
550  else:
551  logger.error(*args)
552 
553  warn_or_err("Encountered an duplicate parameter ID (%s)\n" % pid_)
554  warn_or_err("file %s line %i field %i duplicates:\n" %
555  (os.path.basename(ffname), ln + 1, pfld))
556  for dupfnm, dupln, dupfld in zip(dupfnms, duplns, dupflds):
557  warn_or_err(
558  "file %s line %i field %i\n" % (dupfnm, dupln + 1, dupfld))
559  if self.duplicate_pnames:
560  logger.warn("Parameter name has been changed to %s\n" % pid)
561  else:
562  raise RuntimeError
563  return pid
564 
565  def addff_txt(self, ffname, fftype, xmlScript):
566  """ Parse a text force field and create several important instance variables.
567 
568  Each line is processed using the 'feed' method as implemented
569  in the reader class. This essentially allows us to create the
570  correct parameter identifier (pid), because the pid comes from
571  more than the current line, it also depends on the section that
572  we're in.
573 
574  When 'PRM' or 'RPT' is encountered, we do several things:
575  - Build the parameter identifier and insert it into the map
576  - Point to the file name, line number, and field where the parameter may be modified
577 
578  Additionally, when 'PRM' is encountered:
579  - Store the physical parameter value (this is permanent; it's the original value)
580  - Increment the total number of parameters
581 
582  When 'RPT' is encountered we don't expand the parameter vector
583  because this parameter is a copy of an existing one. If the
584  parameter identifier is preceded by MINUS_, we chop off the
585  prefix but remember that the sign needs to be flipped.
586 
587  """
588 
589  for ln, line in enumerate(self.ffdata[ffname]):
590  try:
591  self.Readers[ffname].feed(line)
592  except:
593  logger.warning(traceback.format_exc() + '\n')
594  logger.warning(
595  "The force field reader crashed when trying to read the following line:\n"
596  )
597  logger.warning(line + '\n')
598  traceback.print_exc()
600  "The force field parser got confused! The traceback and line in question are printed above."
601  )
602  sline = self.Readers[ffname].Split(line)
603 
604  kwds = list(
605  itertools.chain(*[[i, "/%s" % i]
606  for i in ['PRM', 'PARM', 'RPT', 'EVAL']]))
607 
608  marks = OrderedDict()
609  for k in kwds:
610  if sline.count(k) > 1:
611  logger.error(line)
612  logger.error(
613  "The above line contains multiple occurrences of the keyword %s\n"
614  % k)
615  raise RuntimeError
616  elif sline.count(k) == 1:
617  marks[k] = (np.array(sline) == k).argmax()
618  marks['END'] = len(sline)
619 
620  pmark = marks.get('PRM', None)
621  if pmark is None: pmark = marks.get('PARM', None)
622  rmark = marks.get('RPT', None)
623  emark = marks.get('EVAL', None)
624 
625  if pmark is not None:
626  pstop = min([i for i in marks.values() if i > pmark])
627  pflds = [
628  int(i) for i in sline[pmark + 1:pstop]
629  ] # The integers that specify the parameter word positions
630  for pfld in pflds:
631  # For each of the fields that are to be parameterized (indicated by PRM #),
632  # assign a parameter type to it according to the Interaction Type -> Parameter Dictionary.
633  pid = self.Readers[ffname].build_pid(pfld)
634  if xmlScript:
635  pid = 'Script/' + sline[pfld - 2] + '/'
636  pid = self.check_dupes(pid, ffname, ln, pfld)
637  self.map[pid] = self.np
638  # This parameter ID has these atoms involved.
639  self.patoms.append([self.Readers[ffname].molatom])
640  # Also append pid to the parameter list
641  self.assign_p0(self.np, float(sline[pfld]))
642  self.assign_field(self.np, pid, ffname, ln, pfld, 1)
643  self.np += 1
644  if rmark is not None:
645  parse = rmark + 1
646  stopparse = min([i for i in marks.values() if i > rmark])
647  while parse < stopparse:
648  # Between RPT and /RPT, the words occur in pairs.
649  # First is a number corresponding to the field that contains the dependent parameter.
650  # Second is a string corresponding to the 'pid' that this parameter depends on.
651  pfld = int(sline[parse])
652  try:
653  prep = self.map[sline[parse + 1].replace('MINUS_', '')]
654  except:
655  sys.stderr.write("Valid parameter IDs:\n")
656  count = 0
657  for i in self.map:
658  sys.stderr.write("%25s" % i)
659  if count % 3 == 2:
660  sys.stderr.write("\n")
661  count += 1
662  sys.stderr.write(
663  "\nOffending ID: %s\n" % sline[parse + 1])
664 
665  logger.error(
666  'Parameter repetition entry in force field file is incorrect (see above)\n'
667  )
668  raise RuntimeError
669  pid = self.Readers[ffname].build_pid(pfld)
670  pid = self.check_dupes(pid, ffname, ln, pfld)
671  self.map[pid] = prep
672  # This repeated parameter ID also has these atoms involved.
673  self.patoms[prep].append(self.Readers[ffname].molatom)
674  self.assign_field(prep, pid, ffname, ln, pfld,
675  "MINUS_" in sline[parse + 1] and -1 or 1)
676  parse += 2
677  if emark is not None:
678  parse = emark + 1
679  stopparse = min([i for i in marks.values() if i > emark])
680  while parse < stopparse:
681  # Between EVAL and /EVAL, the words occur in pairs.
682  # First is a number corresponding to the field that contains the dependent parameter.
683  # Second is a Python command that determines how to calculate the parameter.
684  pfld = int(sline[parse])
685  evalcmd = sline[parse +
686  1] # This string is actually Python code!!
687  pid = self.Readers[ffname].build_pid(pfld)
688  pid = self.check_dupes(pid, ffname, ln, pfld)
689  # EVAL parameters have no corresponding parameter index
690  #self.map[pid] = None
691  #self.map[pid] = prep
692  # This repeated parameter ID also has these atoms involved.
693  #self.patoms[prep].append(self.Readers[ffname].molatom)
694  self.assign_field(None, pid, ffname, ln, pfld, None,
695  evalcmd)
696  parse += 2
697 
698  def addff_xml(self, ffname):
699  """ Parse an XML force field file and create important instance variables.
700 
701  This was modeled after addff_txt, but XML and text files are
702  fundamentally different, necessitating two different methods.
703 
704  We begin with an _ElementTree object. We search through the tree
705  for the 'parameterize' and 'parameter_repeat' keywords. Each time
706  the keyword is encountered, we do the same four actions that
707  I describe in addff_txt.
708 
709  It's hard to specify precisely the location in an XML file to
710  change a force field parameter. I can create a list of tree
711  elements (essentially pointers to elements within a tree), but
712  this method breaks down when I copy the tree because I have no
713  way to refer to the copied tree elements. Fortunately, lxml
714  gives me a way to represent a tree using a flat list, and my
715  XML file 'locations' are represented using the positions in
716  the list.
717 
718  @warning The sign-flip hasn't been implemented yet. This
719  shouldn't matter unless your calculation contains repeated
720  parameters with opposite sign.
721 
722  """
723 
724  #check if xml file contains a script
725  #throw error if more than one script
726  #write script into .txt file and parse as text
727  fflist = list(self.ffdata[ffname].iter())
728  scriptElements = [elem for elem in fflist if elem.tag == 'Script']
729  if len(scriptElements) > 1:
730  logger.error(
731  'XML file' + ffname +
732  'contains more than one script! Consolidate your scripts into one script!\n'
733  )
734  raise RuntimeError
735  elif len(scriptElements) == 1:
736  Script = scriptElements[0].text
737  ffnameList = ffname.split('.')
738  ffnameScript = ffnameList[0] + 'Script.txt'
739  absScript = os.path.join(self.root, self.ffdir, ffnameScript)
740  if os.path.exists(absScript):
741  logger.error('XML file ' + absScript +
742  ' already exists on disk! Please delete it\n')
743  raise RuntimeError
744  wfile = forcebalance.nifty.wopen(absScript)
745  wfile.write(Script)
746  wfile.close()
747  self.addff(ffnameScript, xmlScript=True)
748  os.unlink(absScript)
749 
750  for e in self.ffdata[ffname].getroot().xpath('//@parameterize/..'):
751  parameters_to_optimize = sorted(
752  [i.strip() for i in e.get('parameterize').split(',')])
753  for p in parameters_to_optimize:
754  if p not in e.attrib:
755  logger.error(
756  "Parameter \'%s\' is not found for \'%s\', please check %s"
757  % (p, e.get('type'), ffname))
758  raise RuntimeError
759  pid = self.Readers[ffname].build_pid(e, p)
760  self.map[pid] = self.np
761  self.assign_p0(self.np, float(e.get(p)))
762  self.assign_field(self.np, pid, ffname, fflist.index(e), p, 1)
763  self.np += 1
764  self.patoms.append([])
765 
766  for e in self.ffdata[ffname].getroot().xpath('//@parameter_repeat/..'):
767  for field in e.get('parameter_repeat').split(','):
768  parameter_name = field.strip().split('=')[0]
769  if parameter_name not in e.attrib:
770  logger.error(
771  "Parameter \'%s\' is not found for \'%s\', please check %s"
772  % (parameter_name, e.get('type'), ffname))
773  raise RuntimeError
774  dest = self.Readers[ffname].build_pid(e, parameter_name)
775  src = field.strip().split('=')[1]
776  if src in self.map:
777  self.map[dest] = self.map[src]
778  else:
780  "Warning: You wanted to copy parameter from %s to %s, but the source parameter does not seem to exist!"
781  % (src, dest))
782  self.assign_field(self.map[dest], dest, ffname,
783  fflist.index(e),
784  dest.split('/')[1], 1)
785 
786  for e in self.ffdata[ffname].getroot().xpath('//@parameter_eval/..'):
787  for field in e.get('parameter_eval').split(','):
788  parameter_name = field.strip().split('=')[0]
789  if parameter_name not in e.attrib:
790  logger.error(
791  "Parameter \'%s\' is not found for \'%s\', please check %s"
792  % (parameter_name, e.get('type'), ffname))
793  raise RuntimeError
794  dest = self.Readers[ffname].build_pid(e, parameter_name)
795  evalcmd = field.strip().split('=')[1]
796  self.assign_field(None, dest, ffname, fflist.index(e),
797  dest.split('/')[1], None, evalcmd)
798 
799  def make(self, vals=None, use_pvals=False, printdir=None, precision=12):
800  """ Create a new force field using provided parameter values.
801 
802  This big kahuna does a number of things:
803  1) Creates the physical parameters from the mathematical parameters
804  2) Creates force fields with physical parameters substituted in
805  3) Prints the force fields to the specified file.
806 
807  It does NOT store the mathematical parameters in the class state
808  (since we can only hold one set of parameters).
809 
810  @param[in] printdir The directory that the force fields are printed to; as usual
811  this is relative to the project root directory.
812  @param[in] vals Input parameters. I previously had an option where it uses
813  stored values in the class state, but I don't think that's a good idea anymore.
814  @param[in] use_pvals Switch for whether to bypass the coordinate transformation
815  and use physical parameters directly.
816 
817  """
818  if type(vals) == np.ndarray and vals.ndim != 1:
819  logger.error('Please only pass 1-D arrays\n')
820  raise RuntimeError
821  if len(vals) != self.np:
822  logger.error(
823  'Input parameter np.array (%i) not the required size (%i)\n' %
824  (len(vals), self.np))
825  raise RuntimeError
826  if use_pvals or self.use_pvals:
827  logger.info("Using physical parameters directly!\r")
828  pvals = vals.copy().flatten()
829  else:
830  pvals = self.create_pvals(vals)
831 
832  OMMFormat = "%%.%ie" % precision
833 
834  def TXTFormat(number, precision):
835  SciNot = "%% .%ie" % precision
836  if abs(number) < 1000 and abs(number) > 0.001:
837  Decimal = "%% .%if" % precision
838  Num = Decimal % number
839  Mum = Decimal % (-1 * number)
840  if (float(Num) == float(Mum)):
841  return Decimal % abs(number)
842  else:
843  return Decimal % number
844  else:
845  Num = SciNot % number
846  Mum = SciNot % (-1 * number)
847  if (float(Num) == float(Mum)):
848  return SciNot % abs(number)
849  else:
850  return SciNot % number
851 
852  pvals = list(pvals)
853  # pvec1d(vals, precision=4)
854  newffdata = deepcopy(self.ffdata)
855 
856  # The dictionary that takes parameter names to physical values.
857  PRM = {i: pvals[self.map[i]] for i in self.map}
858 
859  #======================================#
860  # Print the new force field. #
861  #======================================#
862 
863  xml_lines = OrderedDict([(fnm, list(newffdata[fnm].iter()))
864  for fnm in self.fnms
865  if self.ffdata_isxml[fnm]])
866 
867  for i in range(len(self.pfields)):
868  pfield = self.pfields[i]
869  pid, fnm, ln, fld, mult, cmd = pfield
870  # XML force fields are easy to print.
871  # Our 'pointer' to where to replace the value
872  # is given by the position of this line in the
873  # iterable representation of the tree and the
874  # field number.
875  # if type(newffdata[fnm]) is etree._ElementTree:
876  if cmd is not None:
877  try:
878  # Bobby Tables, anyone?
879  if any(
880  [x in cmd
881  for x in ("system", "subprocess", "import")]):
883  "The command %s (written in the force field file) appears to be unsafe!"
884  % cmd)
885  wval = eval(cmd.replace("PARM", "PRM"))
886  # Attempt to allow evaluated parameters to be functions of each other.
887  PRM[pid] = wval
888  except:
889  logger.error(traceback.format_exc() + '\n')
890  logger.error(
891  "The command %s (written in the force field file) cannot be evaluated!\n"
892  % cmd)
893  raise RuntimeError
894  else:
895  wval = mult * pvals[self.map[pid]]
896  if self.ffdata_isxml[fnm]:
897  xml_lines[fnm][ln].attrib[fld] = OMMFormat % (wval)
898  # list(newffdata[fnm].iter())[ln].attrib[fld] = OMMFormat % (wval)
899  # Text force fields are a bit harder.
900  # Our pointer is given by the line and field number.
901  # We take care to preserve whitespace in the printout
902  # so that the new force field still has nicely formated
903  # columns.
904  else:
905  # Split the string into whitespace and data fields.
906  sline = self.Readers[fnm].Split(newffdata[fnm][ln])
907  whites = self.Readers[fnm].Whites(newffdata[fnm][ln])
908  # Align whitespaces and fields (it should go white, field, white, field)
909  if newffdata[fnm][ln][0] != ' ':
910  whites = [''] + whites
911  # Subtract one whitespace, unless the line begins with a minus sign.
912  if not match('^-', sline[fld]) and len(whites[fld]) > 1:
913  whites[fld] = whites[fld][:-1]
914  # Actually replace the field with the physical parameter value.
915  if precision == 12:
916  newrd = "% 17.12e" % (wval)
917  else:
918  newrd = TXTFormat(wval, precision)
919  # The new word might be longer than the old word.
920  # If this is the case, we can try to shave off some whitespace.
921  Lold = len(sline[fld])
922  if not match('^-', sline[fld]):
923  Lold += 1
924  Lnew = len(newrd)
925  if Lnew > Lold:
926  Shave = Lnew - Lold
927  if Shave < (len(whites[fld + 1]) + 2):
928  whites[fld + 1] = whites[fld + 1][:-Shave]
929  sline[fld] = newrd
930  # Replace the line in the new force field.
931  newffdata[fnm][ln] = ''.join(
932  [(whites[j] if
933  (len(whites[j]) > 0 or j == 0) else ' ') + sline[j]
934  for j in range(len(sline))]) + '\n'
935 
936  if printdir is not None:
937  absprintdir = os.path.join(self.root, printdir)
938  else:
939  absprintdir = os.getcwd()
940 
941  if not os.path.exists(absprintdir):
942  logger.info('Creating the directory %s to print the force field\n'
943  % absprintdir)
944  os.makedirs(absprintdir)
945 
946  for fnm in newffdata:
947  if self.ffdata_isxml[fnm]:
948  with wopen(os.path.join(absprintdir, fnm), binary=True) as f:
949  newffdata[fnm].write(f)
950  elif 'Script.txt' in fnm:
951  # if the xml file contains a script, ForceBalance will generate
952  # a temporary .txt file containing the script and any updates.
953  # We copy the updates made in the .txt file into the xml file by:
954  # First, find xml file corresponding to this .txt file
955  # Second, copy context of the .txt file into the text attribute
956  # of the script element (assumed to be the last element)
957  # Third. open the updated xml file as in the if statement above
958  tempText = "".join(newffdata[fnm])
959  fnmXml = fnm.split('Script')[0] + '.xml'
960  Ntemp = len(list(newffdata[fnmXml].iter()))
961  list(newffdata[fnmXml].iter())[Ntemp - 1].text = tempText
962  '''
963  scriptElements = [elem for elem in fflist if elem.tag=='Script']
964  if len(scriptElements) > 1:
965  logger.error('XML file'+ffname+'contains more than one script! Consolidate your scripts into one script!\n')
966  raise RuntimeError
967  else:
968  '''
969  with wopen(
970  os.path.join(absprintdir, fnmXml), binary=True) as f:
971  newffdata[fnmXml].write(f)
972  else:
973  with wopen(os.path.join(absprintdir, fnm)) as f:
974  f.writelines(newffdata[fnm])
975 
976  return pvals
977 
978  def make_redirect(self, mvals):
979  Groups = defaultdict(list)
980  for p, pid in enumerate(self.plist):
981  if 'Exponent' not in pid or len(pid.split()) != 1:
983  "Fusion penalty currently implemented only for basis set optimizations, where parameters are like this: Exponent:Elem=H,AMom=D,Bas=0,Con=0"
984  )
985  Data = dict([(i.split('=')[0], i.split('=')[1])
986  for i in pid.split(':')[1].split(',')])
987  if 'Con' not in Data or Data['Con'] != '0':
989  "More than one contraction coefficient found! You should expect the unexpected"
990  )
991  key = Data['Elem'] + '_' + Data['AMom']
992  Groups[key].append(p)
993  #pvals = self.FF.create_pvals(mvals)
994  #print "pvals: ", pvals
995 
996  pvals = self.create_pvals(mvals)
997  logger.info("pvals:\n")
998  logger.info(str(pvals) + '\n')
999 
1000  Thresh = 1e-4
1001 
1002  for gnm, pidx in Groups.items():
1003  # The group of parameters for a particular element / angular momentum.
1004  pvals_grp = pvals[pidx]
1005  # The order that the parameters come in.
1006  Order = np.argsort(pvals_grp)
1007  for p in range(len(Order) - 1):
1008  # The pointers to the parameter indices.
1009  pi = pidx[Order[p]]
1010  pj = pidx[Order[p + 1]]
1011  # pvals[pi] is the SMALLER parameter.
1012  # pvals[pj] is the LARGER parameter.
1013  dp = np.log(pvals[pj]) - np.log(pvals[pi])
1014  if dp < Thresh:
1015  if pi in self.redirect:
1016  pk = self.redirect[pi]
1017  else:
1018  pk = pi
1019  #if pj not in self.redirect:
1020  #print "Redirecting parameter %i to %i" % (pj, pk)
1021  #self.redirect[pj] = pk
1022  #print self.redirect
1023 
1024  def find_spacings(self):
1025  Groups = defaultdict(list)
1026  for p, pid in enumerate(self.plist):
1027  if 'Exponent' not in pid or len(pid.split()) != 1:
1029  "Fusion penalty currently implemented only for basis set optimizations, where parameters are like this: Exponent:Elem=H,AMom=D,Bas=0,Con=0"
1030  )
1031  Data = dict([(i.split('=')[0], i.split('=')[1])
1032  for i in pid.split(':')[1].split(',')])
1033  if 'Con' not in Data or Data['Con'] != '0':
1035  "More than one contraction coefficient found! You should expect the unexpected"
1036  )
1037  key = Data['Elem'] + '_' + Data['AMom']
1038  Groups[key].append(p)
1039 
1040  pvals = self.create_pvals(np.zeros(self.np))
1041  logger.info("pvals:\n")
1042  logger.info(str(pvals) + '\n')
1043 
1044  spacdict = {}
1045  for gnm, pidx in Groups.items():
1046  # The group of parameters for a particular element / angular momentum.
1047  pvals_grp = pvals[pidx]
1048  # The order that the parameters come in.
1049  Order = np.argsort(pvals_grp)
1050  spacs = []
1051  for p in range(len(Order) - 1):
1052  # The pointers to the parameter indices.
1053  pi = pidx[Order[p]]
1054  pj = pidx[Order[p + 1]]
1055  # pvals[pi] is the SMALLER parameter.
1056  # pvals[pj] is the LARGER parameter.
1057  dp = np.log(pvals[pj]) - np.log(pvals[pi])
1058  spacs.append(dp)
1059  if len(spacs) > 0:
1060  spacdict[gnm] = np.mean(np.array(spacs))
1061  return spacdict
1062 
1063  def create_pvals(self, mvals):
1064  """Converts mathematical to physical parameters.
1065 
1066  First, mathematical parameters are rescaled and rotated by
1067  multiplying by the transformation matrix, followed by adding
1068  the original physical parameters.
1069 
1070  @param[in] mvals The mathematical parameters
1071  @return pvals The physical parameters
1072 
1073  """
1074  if isinstance(mvals, list):
1075  mvals = np.array(mvals)
1076  for p in self.redirect:
1077  mvals[p] = 0.0
1078  if self.logarithmic_map:
1079  try:
1080  pvals = np.exp(mvals.flatten()) * self.pvals0
1081  except:
1082  logger.exception(str(mvals) + '\n')
1083  logger.error('What the hell did you do?\n')
1084  raise RuntimeError
1085  else:
1086  pvals = flat(np.dot(self.tmI, col(mvals))) + self.pvals0
1087  concern = ['polarizability', 'epsilon', 'VDWT']
1088  # Guard against certain types of parameters changing sign.
1089 
1090  for i in range(self.np):
1091  if any([j in self.plist[i]
1092  for j in concern]) and pvals[i] * self.pvals0[i] < 0:
1093  #print "Parameter %s has changed sign but it's not allowed to! Setting to zero." % self.plist[i]
1094  pvals[i] = 0.0
1095  # Redirect parameters (for the fusion penalty function.)
1096  for p in self.redirect:
1097  pvals[p] = pvals[self.redirect[p]]
1098  # if not in_fd():
1099  # print pvals
1100  #print "pvals = ", pvals
1101 
1102  return pvals
1103 
1104  def create_mvals(self, pvals):
1105  """Converts physical to mathematical parameters.
1106 
1107  We create the inverse transformation matrix using SVD.
1108 
1109  @param[in] pvals The physical parameters
1110  @return mvals The mathematical parameters
1111  """
1112 
1113  if self.logarithmic_map:
1114  logger.error(
1115  'create_mvals has not been implemented for logarithmic_map\n')
1116  raise RuntimeError
1117  mvals = flat(np.dot(invert_svd(self.tmI), col(pvals - self.pvals0)))
1118 
1119  return mvals
1120 
1121  def rsmake(self, printfacs=True):
1122  """Create the rescaling factors for the coordinate transformation in parameter space.
1123 
1124  The proper choice of rescaling factors (read: prior widths in maximum likelihood analysis)
1125  is still a black art. This is a topic of current research.
1126 
1127  @todo Pass in rsfactors through the input file
1128 
1129  @param[in] printfacs List for printing out the resecaling factors
1130 
1131  """
1132  typevals = OrderedDict()
1133  rsfactors = OrderedDict()
1134  rsfac_list = []
1135 
1137  termtypelist = []
1138  for f in self.fnms:
1139  if self.Readers[f].pdict == "XML_Override":
1140  for i, pName in enumerate(self.map):
1141  for p in self.pfields:
1142  if p[0] == pName:
1143  fnm = p[1]
1144  break
1145  ffnameScript = f.split('.')[0] + 'Script.txt'
1146  if fnm == f or fnm == ffnameScript:
1147  ttstr = '/'.join(
1148  [pName.split('/')[0],
1149  pName.split('/')[1]])
1150  if ttstr not in termtypelist:
1151  termtypelist.append(ttstr)
1152  else:
1153  for i in self.Readers[f].pdict:
1154  for j in self.Readers[f].pdict[i]:
1155  if isint(str(j)):
1156  ttstr = i + self.Readers[f].pdict[i][j]
1157  if ttstr not in termtypelist:
1158  termtypelist.append(ttstr)
1159  for termtype in termtypelist:
1160  for pid in self.map:
1161  if termtype in pid:
1162  typevals.setdefault(termtype,
1163  []).append(self.pvals0[self.map[pid]])
1164  for termtype in typevals:
1165  # The old, horrendously complicated rule
1166  # rsfactors[termtype] = exp(mean(log(abs(array(typevals[termtype]))+(abs(array(typevals[termtype]))==0))))
1167  # The newer, maximum rule (thanks Baba)
1168  maxval = max(abs(np.array(typevals[termtype])))
1169  # When all initial parameter values are zero, it could be a problem...
1170  if maxval == 0:
1171  maxval += 1
1172  rsfactors[termtype] = maxval
1173  rsfac_list.append(termtype)
1174  # Physically motivated overrides
1175  rs_override(rsfactors, termtype)
1176  # Overrides from input file
1177  for termtype in self.priors:
1178  while termtype in rsfac_list:
1179  rsfac_list.remove(termtype)
1180  rsfac_list.append(termtype)
1181  rsfactors[termtype] = self.priors[termtype]
1182 
1183  # for line in os.popen("awk '/rsfactor/ {print $2,$3}' %s" % pkg.options).readlines():
1184  # rsfactors[line.split()[0]] = float(line.split()[1])
1185  if printfacs:
1186  bar = printcool(
1187  "Rescaling Factors by Type (Lower Takes Precedence):", color=1)
1188  logger.info(''.join([
1189  " %-35s : %.5e\n" % (i, rsfactors[i]) for i in rsfac_list
1190  ]))
1191  logger.info(bar)
1192  self.rs_ord = OrderedDict([(i, rsfactors[i]) for i in rsfac_list])
1193 
1194  self.rs = np.ones(len(self.pvals0))
1195  self.rs_type = OrderedDict()
1196  have_rs = []
1197  for pnum in range(len(self.pvals0)):
1198  for termtype in rsfac_list:
1199  if termtype in self.plist[pnum]:
1200  if pnum not in have_rs:
1201  self.rs[pnum] = rsfactors[termtype]
1202  self.rs_type[pnum] = termtype
1203  elif self.rs[pnum] != rsfactors[termtype]:
1204  self.rs[pnum] = rsfactors[termtype]
1205  self.rs_type[pnum] = termtype
1206  have_rs.append(pnum)
1208  for pnum in range(len(self.pvals0)):
1209  if pnum not in have_rs:
1210  self.rs_type[pnum] = self.plist[pnum][0]
1211  if printfacs:
1212  bar = printcool(
1213  "Rescaling Types / Factors by Parameter Number:", color=1)
1214  self.print_map(vals=[
1215  " %-28s : %.5e" % (self.rs_type[pnum], self.rs[pnum])
1216  for pnum in range(len(self.pvals0))
1217  ])
1218  logger.info(bar)
1219 
1220  def make_rescale(self,
1221  scales,
1222  mvals=None,
1223  G=None,
1224  H=None,
1225  multiply=True,
1226  verbose=False):
1227  """ Obtain rescaled versions of the inputs according to dictionary values
1228  in "scales" (i.e. a replacement or multiplicative factor on self.rs_ord).
1229  Note that self.rs and self.rs_ord are not updated in this function. You
1230  need to do that outside.
1231 
1232  The purpose of this function is to simulate the effect of changing the
1233  parameter scale factors in the force field. If the scale factor is N,
1234  then a unit change in the mathematical parameters produces a change of
1235  N in the physical parameters. Thus, for a given point in the physical
1236  parameter space, the mathematical parameters are proportional to 1/N,
1237  the gradient is proportional to N, and the Hessian is proportional to N^2.
1238 
1239  Parameters
1240  ----------
1241  mvals : numpy.ndarray
1242  Parameters to be transformed, if desired. Must be same length as number of parameters.
1243  G : numpy.ndarray
1244  Gradient to be transformed, if desired. Must be same length as number of parameters.
1245  H : numpy.ndarray
1246  Hessian to be transformed, if desired. Must be square matrix with side-length = number of parameters.
1247  scales : OrderedDict
1248  A dictionary with the same keys as self.rs_ord and floating point values.
1249  These represent the values with which to multiply the existing scale factors
1250  (if multiply == True) or replace them (if multiply == False).
1251  Pro Tip: Create this variable from a copy of self.rs_ord
1252  multiply : bool
1253  When set to True, the new scale factors are the existing scale factors
1254  verbose : bool
1255  Loud and noisy
1256 
1257  Returns
1258  -------
1259  answer : OrderedDict
1260  Output dictionary containing :
1261  'rs' : New parameter scale factors (multiplied by scales if multiply=True, or replaced if multiply=False)
1262  'rs_ord' : New parameter scale factor dictionary
1263  'mvals' : New parameter values (if mvals is provided)
1264  'G' : New gradient (if G is provided)
1265  'H' : New Hessian (if H is provided)
1266  """
1267  if type(scales) != OrderedDict:
1268  raise RuntimeError('scales must have type OrderedDict')
1269  if scales.keys() != self.rs_ord.keys():
1270  raise RuntimeError('scales should have same keys as self.rs_ord')
1271  # Make the new dictionary of rescaling factors
1272  if multiply == False:
1273  rsord_out = deepcopy(scales)
1274  else:
1275  rsord_out = OrderedDict(
1276  [(k, scales[k] * self.rs_ord[k]) for k in scales.keys()])
1277  answer = OrderedDict()
1278  answer['rs_ord'] = rsord_out
1279  # Make the new array of rescaling factors
1280  rs_out = np.array([rsord_out[self.rs_type[p]] for p in range(self.np)])
1281  answer['rs'] = rs_out
1282  if mvals is not None:
1283  if mvals.shape != (self.np, ):
1284  raise RuntimeError('mvals has the wrong shape')
1285  mvals_out = deepcopy(mvals)
1286  for p in range(self.np):
1287  # Remember that for the same physical parameters, the mathematical
1288  # parameters are inversely proportional to scale factors
1289  mvals_out[p] *= (float(self.rs[p]) / float(rs_out[p]))
1290  answer['mvals'] = mvals_out
1291  if G is not None:
1292  if G.shape != (self.np, ):
1293  raise RuntimeError('G has the wrong shape')
1294  G_out = deepcopy(G)
1295  for p in range(self.np):
1296  # The gradient should be proportional to the scale factors
1297  G_out[p] *= (float(rs_out[p]) / float(self.rs[p]))
1298  answer['G'] = G_out
1299  if H is not None:
1300  if H.shape != (self.np, self.np):
1301  raise RuntimeError('H has the wrong shape')
1302  H_out = deepcopy(H)
1303  for p in range(self.np):
1304  # The Hessian should be proportional to the product of two scale factors
1305  H_out[p, :] *= (float(rs_out[p]) / float(self.rs[p]))
1306  for p in range(self.np):
1307  H_out[:, p] *= (float(rs_out[p]) / float(self.rs[p]))
1308  answer['H'] = H_out
1309  # The final parameters, gradient and Hessian should be consistent with the
1310  # returned scale factors.
1311  return answer
1312 
1313  def mktransmat(self):
1314  """ Create the transformation matrix to rescale and rotate the mathematical parameters.
1315 
1316  For point charge parameters, project out perturbations that
1317  change the total charge.
1318 
1319  First build these:
1320 
1321  'qmap' : Just a list of parameter indices that point to charges.
1322 
1323  'qid' : For each parameter in the qmap, a list of the affected atoms :)
1324  A potential target for the molecule-specific thang.
1325 
1326  Then make this:
1327 
1328  'qtrans2' : A transformation matrix that rotates the charge parameters.
1329  The first row is all zeros (because it corresponds to increasing the charge on all atoms)
1330  The other rows correspond to changing one of the parameters and decreasing all of the others
1331  equally such that the overall charge is preserved.
1332 
1333  'qmat2' : An identity matrix with 'qtrans2' pasted into the right place
1334 
1335  'transmat': 'qmat2' with rows and columns scaled using self.rs
1336 
1337  'excision': Parameter indices that need to be 'cut out' because they are irrelevant and
1338  mess with the matrix diagonalization
1339 
1340  @todo Only project out changes in total charge of a molecule, and perhaps generalize to
1341  fragments of molecules or other types of parameters.
1342  @todo The AMOEBA selection of charge depends not only on the atom type, but what that atom is bonded to.
1343  """
1344  self.qmap = []
1345  self.qid = []
1346  self.qid2 = []
1347  qnr = 1
1348  concern = ['COUL', 'c0', 'charge']
1349  qmat2 = np.eye(self.np)
1350 
1351  def insert_mat(qtrans2, qmap):
1352  # Write the qtrans2 block into qmat2.
1353  x = 0
1354  for i in range(self.np):
1355  if i in qmap:
1356  y = 0
1357  for j in qmap:
1358  qmat2[i, j] = qtrans2[x, y]
1359  y += 1
1360  x += 1
1361 
1362  def build_qtrans2(tq, qid, qmap):
1363  """ Build the matrix that ensures the net charge does not change. """
1364  nq = len(qmap)
1365  # tq = Total number of atomic charges that are being optimized on the molecule
1366  # NOTE: This may be greater than the number of charge parameters (nq)
1367  # The reason for the "one" here is because LP wanted to have multiple charge constraints
1368  # at some point in the future
1369  cons0 = np.ones((1, tq))
1370  cons = np.zeros((cons0.shape[0], nq))
1371  # Identity matrix equal to the number of charge parameters
1372  qtrans2 = np.eye(nq)
1373  # This is just one
1374  for i in range(cons.shape[0]):
1375  # Loop over the number of charge parameters
1376  for j in range(cons.shape[1]):
1377  # Each element of qid is a list that points to atom indices.
1378  # LPW: This code is breaking when we're not optimizing ALL the charges
1379  # Replace cons0[i][k-1] with all ones
1380  # cons[i][j] = sum([cons0[i][k-1] for k in qid[j]])
1381  cons[i][j] = float(len(qid[j]))
1382  cons[i] /= np.linalg.norm(cons[i])
1383  for j in range(i):
1384  cons[i] = orthogonalize(cons[i], cons[j])
1385  qtrans2[i, :] = 0
1386  for j in range(nq - i - 1):
1387  qtrans2[i + j + 1, :] = orthogonalize(
1388  qtrans2[i + j + 1, :], cons[i])
1389  return qtrans2
1390 
1391  # Here we build a charge constraint for each molecule.
1392  if any(len(r.adict) > 0 for r in self.Readers.values()):
1393  logger.info("Building charge constraints...\n")
1394  # Build a concatenated dictionary
1395  Adict = OrderedDict()
1396  # This is a loop over files
1397  for r in self.Readers.values():
1398  # This is a loop over molecules
1399  for k, v in r.adict.items():
1400  Adict[k] = v
1401  nmol = 0
1402  for molname, molatoms in Adict.items():
1403  mol_charge_count = np.zeros(self.np)
1404  tq = 0
1405  qmap = []
1406  qid = []
1407  for i in range(self.np):
1408  qct = 0
1409  qidx = []
1410  for imol, iatoms in self.patoms[i]:
1411  for iatom in iatoms:
1412  if imol == molname and iatom in molatoms:
1413  qct += 1
1414  tq += 1
1415  qidx.append(molatoms.index(iatom))
1416  if any([j in self.plist[i] for j in concern]) and qct > 0:
1417  qmap.append(i)
1418  qid.append(qidx)
1419  logger.info(
1420  "Parameter %i occurs %i times in molecule %s in locations %s (%s)\n"
1421  % (i, qct, molname, str(qidx), self.plist[i]))
1422  #Here is where we build the qtrans2 matrix.
1423  if len(qmap) > 0:
1424  qtrans2 = build_qtrans2(tq, qid, qmap)
1425  if self.constrain_charge:
1426  insert_mat(qtrans2, qmap)
1427  if nmol == 0:
1428  self.qid = qid
1429  self.qmap = qmap
1430  # The warning about ESP fitting is not very helpful
1431  # else:
1432  # logger.info("Note: ESP fitting will be performed assuming that molecule id %s is the FIRST molecule and the only one being fitted.\n" % molname)
1433  nmol += 1
1434  elif self.constrain_charge:
1436  "'adict' {molecule:atomnames} was not found.\n This isn't a big deal if we only have one molecule, but might cause problems if we want multiple charge neutrality constraints."
1437  )
1438  qnr = 0
1439  if any(
1440  [self.Readers[i].pdict == "XML_Override" for i in self.fnms]):
1441  # Hack to count the number of atoms for each atomic charge parameter, when the force field is an XML file.
1442  # This needs to be changed to Chain or Molecule
1443  logger.info(str([determine_fftype(k) for k in self.ffdata]))
1444  ListOfAtoms = list(
1445  itertools.chain(*[[
1446  e.get('type') for e in self.ffdata[k].getroot().xpath(
1447  '//Residue/Atom')
1448  ] for k in self.ffdata
1449  if determine_fftype(k) == "openmm"]))
1450  for i in range(self.np):
1451  if any([j in self.plist[i] for j in concern]):
1452  self.qmap.append(i)
1453  if 'Multipole/c0' in self.plist[
1454  i] or 'Atom/charge' in self.plist[i]:
1455  AType = self.plist[i].split('/')[-1].split('.')[0]
1456  nq = ListOfAtoms.count(AType)
1457  else:
1458  thisq = []
1459  for k in self.plist[i].split():
1460  for j in concern:
1461  if j in k:
1462  thisq.append(k.split('-')[-1])
1463  break
1464  try:
1465  self.qid2.append(
1466  np.array(
1467  [self.atomnames.index(k) for k in thisq]))
1468  except:
1469  pass
1470  nq = sum(
1471  np.array(
1472  [self.plist[i].count(j) for j in concern]))
1473  self.qid.append(qnr + np.arange(nq))
1474  qnr += nq
1475  if len(self.qid2) == 0:
1476  sys.stderr.write(
1477  'Unable to match atom numbers up with atom names (minor issue, unless doing ESP fitting). \nAre atom names implemented in the force field parser?\n'
1478  )
1479  else:
1480  self.qid = self.qid2
1481  tq = qnr - 1
1482  #Here is where we build the qtrans2 matrix.
1483  if len(self.qmap) > 0:
1484  cons0 = np.ones((1, tq))
1485  qtrans2 = build_qtrans2(tq, self.qid, self.qmap)
1486  # Insert qtrans2 into qmat2.
1487  if self.constrain_charge:
1488  insert_mat(qtrans2, self.qmap)
1489 
1490 
1492  if self.constrain_charge:
1493  MultipoleAtoms = set(
1494  [p.split('/')[-1] for p in self.plist if 'Multipole' in p])
1495  QuadrupoleGrps = [[
1496  i for i, p in enumerate(self.plist)
1497  if 'Multipole' in p and p.split('/')[-1] == A
1498  and p.split('/')[1] in ['q11', 'q22', 'q33']
1499  ] for A in MultipoleAtoms]
1500  for Grp in QuadrupoleGrps:
1501  qid = [np.array([i]) for i in range(3)]
1502  tq = 3
1503  qtrans2 = build_qtrans2(tq, qid, Grp)
1504  logger.info(
1505  "Making sure that quadrupoles are traceless (for parameter IDs %s)\n"
1506  % str(Grp))
1507  insert_mat(qtrans2, Grp)
1508 
1509  #ListOfAtoms = list(itertools.chain(*[[e.get('type') for e in self.ffdata[k].getroot().xpath('//Multipole')] for k in self.ffdata]))
1510 
1511  # print "Charge parameter constraint matrix - feel free to check it"
1512  # for i in qmat2:
1513  # for j in i:
1514  # print "% .3f" % j,
1515  # print
1516  # print
1517 
1518  # There is a bad bug here .. this matrix multiplication operation doesn't work!!
1519  # I will proceed using loops. This is unsettling.
1520  # Input matrices are qmat2 and self.rs (diagonal)
1521  transmat = np.dot(qmat2, np.diag(self.rs))
1522  transmat1 = np.zeros((self.np, self.np))
1523  for i in range(self.np):
1524  for k in range(self.np):
1525  transmat1[i, k] = qmat2[i, k] * self.rs[k]
1526 
1527  # This prints out the difference between the result of matrix multiplication
1528  # and the manual multiplication.
1529  #print transmat1
1530  #print transmat
1531  if len(transmat) > 0 and np.max(np.abs(transmat1 - transmat)) > 0.0:
1532  logger.warning(
1533  'The difference between the numpy multiplication and the manual multiplication is \x1b[1;91m%f\x1b[0m, '
1534  'but it should be zero.\n' % np.max(
1535  np.abs(transmat1 - transmat)))
1536 
1537  transmat = np.array(transmat1, copy=True)
1538  transmatNS = np.array(transmat, copy=True)
1539  self.excision = []
1540  for i in range(self.np):
1541  if abs(transmatNS[i, i]) < 1e-8:
1542  self.excision.append(i)
1543  transmatNS[i, i] += 1
1544  self.excision = list(set(self.excision))
1545  for i in self.excision:
1546  transmat[i, :] = np.zeros(self.np)
1547  self.tm = transmat
1548  self.tmI = transmat.T
1549 
1550  def list_map(self):
1551  """ Create the plist, which is like a reversed version of the parameter map. More convenient for printing. """
1552  if len(self.map) == 0:
1554  'The parameter map has no elements (Okay if we are not actually tuning any parameters.)'
1555  )
1556  else:
1557  self.plist = [
1558  [] for j in range(max([self.map[i] for i in self.map]) + 1)
1559  ]
1560  for i in self.map:
1561  self.plist[self.map[i]].append(i)
1562  for i in range(self.np):
1563  self.plist[i] = ' '.join(natural_sort(self.plist[i]))
1564 
1565  def print_map(self, vals=None, precision=4):
1566  """Prints out the (physical or mathematical) parameter indices, IDs and values in a visually appealing way."""
1567  if vals is None:
1568  vals = self.pvals0
1569  logger.info('\n'.join([
1570  "%4i [ %s ]" %
1571  (self.plist.index(i),
1572  "%% .%ie" % precision % float(vals[self.plist.index(i)])
1573  if isfloat(str(vals[self.plist.index(i)])) else
1574  (str(vals[self.plist.index(i)]))) + " : " + "%s" % i.split()[0]
1575  for i in self.plist
1576  ]))
1577  logger.info('\n')
1578 
1579  def sprint_map(self, vals=None, precision=4):
1580  """Prints out the (physical or mathematical) parameter indices, IDs and values to a string."""
1581  if vals is None:
1582  vals = self.pvals0
1583  out = '\n'.join([
1584  "%4i [ %s ]" %
1585  (self.plist.index(i),
1586  "%% .%ie" % precision % float(vals[self.plist.index(i)])
1587  if isfloat(str(vals[self.plist.index(i)])) else
1588  (str(vals[self.plist.index(i)]))) + " : " + "%s" % i.split()[0]
1589  for i in self.plist
1590  ])
1591  return out
1592 
1593  def assign_p0(self, idx, val):
1594  """ Assign physical parameter values to the 'pvals0' array.
1595 
1596  @param[in] idx The index to which we assign the parameter value.
1597  @param[in] val The parameter value to be inserted.
1598  """
1599  if idx == len(self.pvals0):
1600  self.pvals0.append(val)
1601  else:
1602  self.pvals0[idx] = val
1603 
1604  def assign_field(self, idx, pid, fnm, ln, pfld, mult, cmd=None):
1605  """ Record the locations of a parameter in a txt file; [[file name, line number, field number, and multiplier]].
1606 
1607  Note that parameters can have multiple locations because of the repetition functionality.
1608 
1609  @param[in] idx The (not necessarily unique) index of the parameter.
1610  @param[in] pid The unique parameter name.
1611  @param[in] fnm The file name of the parameter field.
1612  @param[in] ln The line number within the file (or the node index in the flattened xml)
1613  @param[in] pfld The field within the line (or the name of the attribute in the xml)
1614  @param[in] mult The multiplier (this is usually 1.0)
1615 
1616  """
1617  self.pfields.append([pid, fnm, ln, pfld, mult, cmd])
1618 
1619  def __eq__(self, other):
1620  # check equality of forcefields using comparison of pfields and map
1621  if isinstance(other, FF):
1622  # list comprehension removes pid/filename element of pfields since we don't care about filename uniqueness
1623  self_pfields = [p[2:] for p in self.pfields]
1624  other_pfields = [p[2:] for p in other.pfields]
1625 
1626  return self_pfields == other_pfields and\
1627  self.map == other.map and\
1628  (self.pvals0 == other.pvals0).all()
1629 
1630  # we only compare two forcefield objects
1631  else:
1632  return NotImplemented
1633 
1634 
1635 def rs_override(rsfactors, termtype, Temperature=298.15):
1636  """ This function takes in a dictionary (rsfactors) and a string (termtype).
1637 
1638  If termtype matches any of the strings below, rsfactors[termtype] is assigned
1639  to one of the numbers below.
1640 
1641  This is LPW's attempt to simplify the rescaling factors.
1642 
1643  @param[out] rsfactors The computed rescaling factor.
1644  @param[in] termtype The interaction type (corresponding to a physical unit)
1645  @param[in] Temperature The temperature for computing the kT energy scale
1646 
1647  """
1648  if match('PIMPDIHS[1-6]K|PDIHMULS[1-6]K|PDIHS[1-6]K|RBDIHSK[1-5]|MORSEC',
1649  termtype):
1650  # eV or eV rad^-2
1651  rsfactors[termtype] = 96.4853
1652  elif match('UREY_BRADLEYK1|ANGLESK', termtype):
1653  rsfactors[termtype] = 96.4853 * 6.28
1654  elif match('COUL|VPAIR_BHAMC|QTPIEA', termtype):
1655  # elementary charge, or unitless, or already in atomic unit
1656  rsfactors[termtype] = 1.0
1657  elif match('QTPIEC|QTPIEH', termtype):
1658  # eV to atomic unit
1659  rsfactors[termtype] = 27.2114
1660  elif match(
1661  'BONDSB|UREY_BRADLEYB|MORSEB|VDWS|VPAIRS|VSITE|VDW_BHAMA|VPAIR_BHAMA',
1662  termtype):
1663  # nm to atomic unit
1664  rsfactors[termtype] = 0.05291772
1665  elif match('BONDSK|UREY_BRADLEYK2', termtype):
1666  # au bohr^-2
1667  rsfactors[termtype] = 34455.5275 * 27.2114
1668  elif match('PDIHS[1-6]B|ANGLESB|UREY_BRADLEYB', termtype):
1669  # radian
1670  rsfactors[termtype] = 57.295779513
1671  elif match('VDWT|VDW_BHAMB|VPAIR_BHAMB', termtype):
1672  # VdW well depth; using kT. This was a tough one because the energy scale is so darn small.
1673  rsfactors[termtype] = kb * Temperature
1674  elif match('MORSEE', termtype):
1675  rsfactors[termtype] = 18.897261
def make_rescale(self, scales, mvals=None, G=None, H=None, multiply=True, verbose=False)
Obtain rescaled versions of the inputs according to dictionary values in "scales" (i...
Definition: ffyapf.py:1279
pvals0
Initial value of physical parameters.
Definition: ffyapf.py:295
Readers
A dictionary of force field reader classes.
Definition: ffyapf.py:297
tmI
The transpose of the transformation matrix.
Definition: ffyapf.py:289
Class for parsing OpenMM force field files.
Definition: smirnoffio.py:117
ffdata
As these options proliferate, the force field class becomes less standalone.
Definition: ffyapf.py:271
def rs_override(rsfactors, termtype, Temperature=298.15)
This function takes in a dictionary (rsfactors) and a string (termtype).
Definition: ffyapf.py:1668
def __setstate__(self, state)
Definition: ffyapf.py:383
def make_redirect(self, mvals)
Definition: ffyapf.py:987
np
The total number of parameters.
Definition: ffyapf.py:293
def mktransmat(self)
Create the transformation matrix to rescale and rotate the mathematical parameters.
Definition: ffyapf.py:1357
Nifty functions, intended to be imported by any module within ForceBalance.
def assign_field(self, idx, pid, fnm, ln, pfld, mult, cmd=None)
Record the locations of a parameter in a txt file; [[file name, line number, field number...
Definition: ffyapf.py:1636
def find_spacings(self)
Definition: ffyapf.py:1033
atomnames
A list of atom names (this is new, for ESP fitting)
Definition: ffyapf.py:299
def __getstate__(self)
Definition: ffyapf.py:374
rs
List of rescaling factors.
Definition: ffyapf.py:285
def __eq__(self, other)
Definition: ffyapf.py:1639
linedestroy_save
Destruction dictionary (experimental).
Definition: ffyapf.py:354
Finite state machine for parsing DVR grid files.
Definition: psi4io.py:254
def determine_fftype(ffname, verbose=False)
Determine the type of a force field file.
Definition: ffyapf.py:156
def addff_xml(self, ffname)
Parse an XML force field file and create important instance variables.
Definition: ffyapf.py:730
def addff(self, ffname, xmlScript=False)
Parse a force field file and add it to the class.
Definition: ffyapf.py:462
def __init__(self, options, verbose=True, printopt=True)
Instantiation of force field class.
Definition: ffyapf.py:235
def create_mvals(self, pvals)
Converts physical to mathematical parameters.
Definition: ffyapf.py:1122
def invert_svd(X, thresh=1e-12)
Invert a matrix using singular value decomposition.
Definition: nifty.py:621
def isint(word)
ONLY matches integers! If you have a decimal point? None shall pass!
Definition: molecule.py:401
rs_ord
Takes the dictionary &#39;BONDS&#39;:{3:&#39;B&#39;, 4:&#39;K&#39;}, &#39;VDW&#39;:{4:&#39;S&#39;, 5:&#39;T&#39;}, and turns it into a list of term t...
Definition: ffyapf.py:1204
def flat(vec)
Given any list, array, or matrix, return a single-index array.
Definition: nifty.py:467
tm
The transformation matrix for mathematical -> physical parameters.
Definition: ffyapf.py:287
Interaction type -> Parameter Dictionary.
Definition: psi4io.py:39
def natural_sort(l)
Return a natural sorted list.
Definition: nifty.py:285
Finite state machine for parsing custom GROMACS force field files.
Definition: custom_io.py:41
map
The mapping of interaction type -> parameter number.
Definition: ffyapf.py:274
def __missing__(self, key)
Definition: ffyapf.py:205
def rsmake(self, printfacs=True)
Create the rescaling factors for the coordinate transformation in parameter space.
Definition: ffyapf.py:1143
Finite state machine for parsing Mol2 force field file.
Definition: amberio.py:500
def assign_p0(self, idx, val)
Assign physical parameter values to the &#39;pvals0&#39; array.
Definition: ffyapf.py:1617
def col(vec)
Given any list, array, or matrix, return a 1-column 2D array.
Definition: nifty.py:448
excision
Indices to exclude from optimization / Hessian inversion.
Definition: ffyapf.py:291
redirect
Creates plist from map.
Definition: ffyapf.py:352
def print_map(self, vals=None, precision=4)
Prints out the (physical or mathematical) parameter indices, IDs and values in a visually appealing w...
Definition: ffyapf.py:1583
def make(self, vals=None, use_pvals=False, printdir=None, precision=12)
Create a new force field using provided parameter values.
Definition: ffyapf.py:826
def check_dupes(self, pid, ffname, ln, pfld)
Check to see whether a given parameter ID already exists, and provide an alternate if needed...
Definition: ffyapf.py:538
def __init__(self, backup_dict)
Definition: ffyapf.py:201
def orthogonalize(vec1, vec2)
Given two vectors vec1 and vec2, project out the component of vec1 that is along the vec2-direction...
Definition: nifty.py:608
patoms
A listing of parameter number -> atoms involved.
Definition: ffyapf.py:278
def warn_press_key(warning, timeout=10)
Definition: nifty.py:1599
def printcool_dictionary(Dict, title="Dictionary Keys : Values", bold=False, color=2, keywidth=25, topwidth=50, center=True, leftpad=0)
See documentation for printcool; this is a nice way to print out keys/values in a dictionary...
Definition: nifty.py:366
Finite state machine for parsing GROMACS force field files.
Definition: gmxio.py:362
def printcool(text, sym="#", bold=False, color=2, ansi=None, bottom='-', minwidth=50, center=True, sym2="=")
Cool-looking printout for slick formatting of output.
Definition: nifty.py:321
pfields
A list where pfields[i] = [pid,&#39;file&#39;,line,field,mult,cmd], basically a new way to modify force field...
Definition: ffyapf.py:283
def addff_txt(self, ffname, fftype, xmlScript)
Parse a text force field and create several important instance variables.
Definition: ffyapf.py:594
plist
The listing of parameter number -> interaction types.
Definition: ffyapf.py:276
Finite state machine for parsing FrcMod force field file.
Definition: amberio.py:556
Finite state machine for parsing TINKER force field files.
Definition: tinkerio.py:97
def sprint_map(self, vals=None, precision=4)
Prints out the (physical or mathematical) parameter indices, IDs and values to a string.
Definition: ffyapf.py:1598
Finite state machine for parsing Q-Chem input files.
Definition: qchemio.py:33
def create_pvals(self, mvals)
Converts mathematical to physical parameters.
Definition: ffyapf.py:1083
Force field class.
Definition: ffyapf.py:226
def fromfile(cls, fnm)
Definition: ffyapf.py:364
def isfloat(word)
Matches ANY number; it can be a decimal, scientific notation, integer, or what have you...
Definition: molecule.py:406
def wopen(dest, binary=False)
If trying to write to a symbolic link, remove it first.
Definition: nifty.py:1304
def list_map(self)
Create the plist, which is like a reversed version of the parameter map.
Definition: ffyapf.py:1567
FFAtomTypes
WORK IN PROGRESS ## This is a dictionary of {&#39;AtomType&#39;:{&#39;Mass&#39; : float, &#39;Charge&#39; : float...
Definition: ffyapf.py:309