1 """ @package forcebalance.forcefield 5 In ForceBalance a 'force field' is built from a set of files containing 6 physical parameters. These files can be anything that enter into any 7 computation - our original program was quite dependent on the GROMACS 8 force field format, but this program is set up to allow very general 11 We introduce several important concepts: 13 1) Adjustable parameters are allocated into a vector. 15 To cast the force field optimization as a math problem, we treat all 16 of the parameters on equal footing and write them as indices in a parameter vector. 18 2) A mapping from interaction type to parameter number. 20 Each element in the parameter vector corresponds to one or more 21 interaction types. Whenever we change the parameter vector and 22 recompute the objective function, this amounts to changing the 23 physical parameters in the simulations, so we print out new 24 force field files for external programs. In addition, when 25 these programs are computing the objective function we are often 26 in low-level subroutines that compute terms in the energy and 27 force. If we need an analytic derivative of the objective 28 function, then these subroutines need to know which index of the 29 parameter vector needs to be modified. 31 This is done by way of a hash table: for example, when we are 32 computing a Coulomb interaction between atom 4 and atom 5, we 33 can build the words 'COUL4' and 'COUL5' and look it up in the 34 parameter map; this gives us two numbers (say, 10 and 11) 35 corresponding to the eleventh and twelfth element of the 36 parameter vector. Then we can compute the derivatives of 37 the energy w/r.t. these parameters (in this case, COUL5/rij 38 and COUL4/rij) and increment these values in the objective function 41 In custom-implemented force fields (see counterpoisematch.py) 42 the hash table can also be used to look up parameter values for 43 computation of interactions. This is probably not the fastest way 44 to do things, however. 46 3) Distinction between physical and mathematical parameters. 48 The optimization algorithm works in a space that is related to, but 49 not exactly the same as the physical parameter space. The reasons 50 for why we do this are: 52 a) Each parameter has its own physical units. On the one hand 53 it's not right to treat different physical units all on the same 54 footing, so nondimensionalization is desirable. To make matters 55 worse, the force field parameters can be small as 1e-8 or as 56 large as 1e+6 depending on the parameter type. This means the 57 elements of the objective function gradient / Hessian have 58 elements that differ from each other in size by 10+ orders of 59 magnitude, leading to mathematical instabilities in the optimizer. 61 b) The parameter space can be constrained, most notably for atomic 62 partial charges where we don't want to change the overall charge 63 on a molecule. Thus we wish to project out certain movements 64 in the mathematical parameters such that they don't change the physical 67 c) We wish to regularize our optimization so as to avoid changing 68 our parameters in very insensitive directions (linear dependencies). 69 However, the sensitivity of the objective function to changes in the 70 force field depends on the physical units! 72 For all of these reasons, we introduce a 'transformation matrix' 73 which maps mathematical parameters onto physical parameters. The 74 diagonal elements in this matrix are rescaling factors; they take the 75 mathematical parameter and magnify it by this constant factor. The 76 off-diagonal elements correspond to rotations and other linear 77 transformations, and currently I just use them to project out the 78 'increase the net charge' direction in the physical parameter space. 80 Note that with regularization, these rescaling factors are equivalent 81 to the widths of prior distributions in a maximum likelihood framework. 82 Because there is such a correspondence between rescaling factors and 83 choosing a prior, they need to be chosen carefully. This is work in 84 progress. Another possibility is to sample the width of the priors from 85 a noninformative distribution -- the hyperprior (we can choose the 86 Jeffreys prior or something). This is work in progress. 88 Right now only GROMACS parameters are supported, but this class is extensible, 95 from __future__
import division
97 from builtins
import zip
98 from builtins
import str
99 from builtins
import range
103 from numpy
import sin, cos, tan, sinh, cosh, tanh, exp, log, sqrt, pi
104 from re
import match, sub, split
106 from forcebalance
import gmxio, qchemio, tinkerio, custom_io, openmmio, amberio, psi4io, smirnoffio
110 from copy
import deepcopy
113 from collections
import OrderedDict, defaultdict
114 from forcebalance.output
import getLogger
115 logger = getLogger(__name__)
118 from lxml
import etree
120 logger.warning(
"Failed to import lxml module, needed by OpenMM engine")
129 "offxml":
"smirnoff",
135 """ Recognized force field formats. """ 141 "openmm": openmmio.OpenMM_Reader,
151 """ Determine the type of a force field file. It is possible to 152 specify the file type explicitly in the input file using the 153 syntax 'force_field.ext:type'. Otherwise this function will try 154 to determine the force field type by extension. """ 156 fsplit = ffname.split(
'/')[-1].split(
':')
158 if verbose: logger.info(
"Determining file type of %s ..." % fsplit[0])
160 if fsplit[1]
in FF_IOModules:
161 if verbose: logger.info(
"We're golden! (%s)\n" % fsplit[1])
166 "\x1b[91m Warning: \x1b[0m %s not in supported types (%s)!\n" 167 % (fsplit[1],
', '.join(FF_IOModules.keys())))
168 elif len(fsplit) == 1:
171 "Guessing from extension (you may specify type with filename:type) ..." 174 ffext = ffname.split(
'.')[-1]
175 if ffext
in FF_Extensions:
176 guesstype = FF_Extensions[ffext]
177 if guesstype
in FF_IOModules:
179 logger.info(
"guessing %s -> %s!\n" % (ffext, guesstype))
184 "\x1b[91m Warning: \x1b[0m %s not in supported types (%s)!\n" 185 % (fsplit[0],
', '.join(FF_IOModules.keys())))
189 "\x1b[91m Warning: \x1b[0m %s not in supported extensions (%s)!\n" 190 % (ffext,
', '.join(FF_Extensions.keys())))
192 if verbose: logger.warning(
"Force field type not determined!\n")
199 def __init__(self, backup_dict):
208 'The key %s does not exist as an atom attribute or as an atom type attribute!\n' 213 class FF(forcebalance.BaseClass):
214 """ Force field class. 216 This class contains all methods for force field manipulation. 217 To create an instance of this class, an input file is required 218 containing the list of force field file names. Everything else 219 inside this class pertaining to force field generation is self-contained. 221 For details on force field parsing, see the detailed documentation for addff. 225 def __init__(self, options, verbose=True, printopt=True):
226 """Instantiation of force field class. 228 Many variables here are initialized to zero, but they are filled out by 229 methods like addff, rsmake, and mktransmat. 239 self.set_option(
None,
None,
'root', os.getcwd())
241 self.set_option(options,
'forcefield',
'fnms')
243 self.set_option(options,
'ffdir')
245 self.set_option(options,
'priors')
247 self.set_option(options,
'constrain_charge')
249 self.set_option(options,
'logarithmic_map')
251 self.set_option(options,
'amoeba_pol')
253 self.set_option(options,
'amoeba_eps')
255 self.set_option(options,
'rigid_water')
257 self.set_option(options,
'use_pvals')
259 self.set_option(options,
'duplicate_pnames')
298 for fnm
in self.fnms:
300 logger.info(
"Reading force field from file: %s\n" % fnm)
306 for Reader
in self.
Readers.values():
307 for k, v
in Reader.AtomTypes.items():
310 'Trying to insert atomtype %s into the force field, but it is already there' 321 for Reader
in self.
Readers.values():
322 for Molecule, AtomList
in Reader.Molecules.items():
323 for FFAtom
in AtomList:
325 for k, v
in FFAtom.items():
326 FFAtomWithDefaults[k] = v
328 []).append(FFAtomWithDefaults)
340 "Starting parameter indices, physical values and IDs")
344 self.
rsmake(printfacs=verbose)
357 self.PrintOptionDict, title=
"Setup for force field")
361 ffdir = os.path.split(fnm)[0]
362 fnm = os.path.split(fnm)[1]
366 'duplicate_pnames':
True 368 return cls(options, verbose=
False, printopt=
False)
371 state = deepcopy(self.__dict__)
372 for ffname
in self.
ffdata:
374 temp = etree.tostring(self.
ffdata[ffname])
375 del state[
'ffdata'][ffname]
376 state[
'ffdata'][ffname] = temp
380 self.__dict__.update(state)
381 for ffname
in self.
ffdata:
383 temp = etree.ElementTree(etree.fromstring(self.
ffdata[ffname]))
384 self.
ffdata[ffname] = temp
386 def addff(self, ffname, xmlScript=False):
387 """ Parse a force field file and add it to the class. 389 First, figure out the type of force field file. This is done 390 either by explicitly specifying the type using for example, 391 <tt> ffname force_field.xml:openmm </tt> or we figure it out 392 by looking at the file extension. 394 Next, parse the file. Currently we support two classes of 395 files - text and XML. The two types are treated very 396 differently; for XML we use the parsers in libxml (via the 397 python lxml module), and for text files we have our own 398 in-house parsing class. Within text files, there is also a 399 specialized GROMACS and TINKER parser as well as a generic 402 The job of the parser is to determine the following things: 403 1) Read the user-specified selection of parameters being fitted 404 2) Build a mapping (dictionary) of <tt> parameter identifier -> index in parameter vector </tt> 405 3) Build a list of physical parameter values 406 4) Figure out where to replace the parameter values in the force field file when the values are changed 407 5) Figure out which parameters need to be repeated or sign-flipped 409 Generally speaking, each parameter value in the force field 410 file has a <tt> unique parameter identifier <tt>. The 411 identifier consists of three parts - the interaction type, the 412 parameter subtype (within that interaction type), and the 417 The force field file is read in using the lxml Python module. Specify 418 which parameter you want to fit using by adding a 'parameterize' element 419 to the end of the force field XML file, like so. 422 <AmoebaVdwForce type="BUFFERED-14-7"> 423 <Vdw class="74" sigma="0.2655" epsilon="0.056484" reduction="0.910" parameterize="sigma, epsilon, reduction" /> 426 In this example, the parameter identifier would look like <tt> Vdw/74/epsilon </tt>. 428 --- If GROMACS (.itp) or TINKER (.prm) : --- 430 Follow the rules in the ITP_Reader or Tinker_Reader derived 431 class. Read the documentation in the class documentation or 432 the 'feed' method to learn more. In all cases the parameter 433 is tagged using <tt> # PRM 3 </tt> (where # denotes a comment, 434 the word PRM stays the same, and 3 is the field number starting 437 --- If normal text : --- 439 The parameter identifier is simply built using the file name, 440 line number, and field. Thus, the identifier is unique but 441 completely noninformative (which is not ideal for our 442 purposes, but it works.) 446 @warning My program currently assumes that we are only using one MM program per job. 447 If we use CHARMM and GROMACS to perform simulations as part of the same TARGET, we will 448 get messed up. Maybe this needs to be fixed in the future, with program prefixes to 449 parameters like C_ , G_ .. or simply unit conversions, you get the idea. 451 @warning I don't think the multiplier actually works for analytic derivatives unless 452 the interaction calculator knows the multiplier as well. I'm sure I can make this 453 work in the future if necessary. 455 @param[in] ffname Name of the force field file 459 ffname = ffname.split(
':')[0]
462 if fftype ==
"tinker":
463 if hasattr(self,
"tinkerprm"):
465 "There should only be one TINKER parameter file")
470 if fftype ==
"openmm":
471 if hasattr(self,
"openmmxml"):
473 "There should only be one OpenMM XML file - confused!!")
477 if fftype ==
"smirnoff":
478 if hasattr(self,
"offxml"):
480 "There should only be one SMIRNOFF XML file - confused!!")
489 if fftype ==
"frcmod":
494 Reader = FF_IOModules.get(fftype, forcebalance.BaseReader)
497 absff = os.path.join(self.root, self.ffdir, ffname)
502 self.
Readers[ffname] = Reader(ffname)
503 if fftype
in [
"openmm",
"smirnoff"]:
506 self.
ffdata[ffname] = etree.parse(absff)
509 "If etree not defined, please check if lxml module has been installed" 518 line.expandtabs()
for line
in open(absff).readlines()
522 self.
addff_txt(ffname, fftype, xmlScript)
523 if hasattr(self.
Readers[ffname],
'atomnames'):
526 'Found more than one force field containing atom names; skipping the second one (%s)\n' 532 """ Check to see whether a given parameter ID already exists, and provide an alternate if needed. """ 535 have_pids = [f[0]
for f
in self.
pfields]
540 dupfnms = [i[1]
for i
in self.
pfields if pid == i[0]]
541 duplns = [i[2]
for i
in self.
pfields if pid == i[0]]
542 dupflds = [i[3]
for i
in self.
pfields if pid == i[0]]
543 while pid
in have_pids:
544 pid =
"%s%i" % (pid0, extranum)
547 def warn_or_err(*args):
548 if self.duplicate_pnames:
553 warn_or_err(
"Encountered an duplicate parameter ID (%s)\n" % pid_)
554 warn_or_err(
"file %s line %i field %i duplicates:\n" %
555 (os.path.basename(ffname), ln + 1, pfld))
556 for dupfnm, dupln, dupfld
in zip(dupfnms, duplns, dupflds):
558 "file %s line %i field %i\n" % (dupfnm, dupln + 1, dupfld))
559 if self.duplicate_pnames:
560 logger.warn(
"Parameter name has been changed to %s\n" % pid)
565 def addff_txt(self, ffname, fftype, xmlScript):
566 """ Parse a text force field and create several important instance variables. 568 Each line is processed using the 'feed' method as implemented 569 in the reader class. This essentially allows us to create the 570 correct parameter identifier (pid), because the pid comes from 571 more than the current line, it also depends on the section that 574 When 'PRM' or 'RPT' is encountered, we do several things: 575 - Build the parameter identifier and insert it into the map 576 - Point to the file name, line number, and field where the parameter may be modified 578 Additionally, when 'PRM' is encountered: 579 - Store the physical parameter value (this is permanent; it's the original value) 580 - Increment the total number of parameters 582 When 'RPT' is encountered we don't expand the parameter vector 583 because this parameter is a copy of an existing one. If the 584 parameter identifier is preceded by MINUS_, we chop off the 585 prefix but remember that the sign needs to be flipped. 589 for ln, line
in enumerate(self.
ffdata[ffname]):
591 self.
Readers[ffname].feed(line)
593 logger.warning(traceback.format_exc() +
'\n')
595 "The force field reader crashed when trying to read the following line:\n" 597 logger.warning(line +
'\n')
598 traceback.print_exc()
600 "The force field parser got confused! The traceback and line in question are printed above." 602 sline = self.
Readers[ffname].Split(line)
605 itertools.chain(*[[i,
"/%s" % i]
606 for i
in [
'PRM',
'PARM',
'RPT',
'EVAL']]))
608 marks = OrderedDict()
610 if sline.count(k) > 1:
613 "The above line contains multiple occurrences of the keyword %s\n" 616 elif sline.count(k) == 1:
617 marks[k] = (np.array(sline) == k).argmax()
618 marks[
'END'] = len(sline)
620 pmark = marks.get(
'PRM',
None)
621 if pmark
is None: pmark = marks.get(
'PARM',
None)
622 rmark = marks.get(
'RPT',
None)
623 emark = marks.get(
'EVAL',
None)
625 if pmark
is not None:
626 pstop = min([i
for i
in marks.values()
if i > pmark])
628 int(i)
for i
in sline[pmark + 1:pstop]
633 pid = self.
Readers[ffname].build_pid(pfld)
635 pid =
'Script/' + sline[pfld - 2] +
'/' 637 self.
map[pid] = self.
np 644 if rmark
is not None:
646 stopparse = min([i
for i
in marks.values()
if i > rmark])
647 while parse < stopparse:
651 pfld = int(sline[parse])
653 prep = self.
map[sline[parse + 1].replace(
'MINUS_',
'')]
655 sys.stderr.write(
"Valid parameter IDs:\n")
658 sys.stderr.write(
"%25s" % i)
660 sys.stderr.write(
"\n")
663 "\nOffending ID: %s\n" % sline[parse + 1])
666 'Parameter repetition entry in force field file is incorrect (see above)\n' 669 pid = self.
Readers[ffname].build_pid(pfld)
675 "MINUS_" in sline[parse + 1]
and -1
or 1)
677 if emark
is not None:
679 stopparse = min([i
for i
in marks.values()
if i > emark])
680 while parse < stopparse:
684 pfld = int(sline[parse])
685 evalcmd = sline[parse +
687 pid = self.
Readers[ffname].build_pid(pfld)
699 """ Parse an XML force field file and create important instance variables. 701 This was modeled after addff_txt, but XML and text files are 702 fundamentally different, necessitating two different methods. 704 We begin with an _ElementTree object. We search through the tree 705 for the 'parameterize' and 'parameter_repeat' keywords. Each time 706 the keyword is encountered, we do the same four actions that 707 I describe in addff_txt. 709 It's hard to specify precisely the location in an XML file to 710 change a force field parameter. I can create a list of tree 711 elements (essentially pointers to elements within a tree), but 712 this method breaks down when I copy the tree because I have no 713 way to refer to the copied tree elements. Fortunately, lxml 714 gives me a way to represent a tree using a flat list, and my 715 XML file 'locations' are represented using the positions in 718 @warning The sign-flip hasn't been implemented yet. This 719 shouldn't matter unless your calculation contains repeated 720 parameters with opposite sign. 727 fflist = list(self.
ffdata[ffname].iter())
728 scriptElements = [elem
for elem
in fflist
if elem.tag ==
'Script']
729 if len(scriptElements) > 1:
731 'XML file' + ffname +
732 'contains more than one script! Consolidate your scripts into one script!\n' 735 elif len(scriptElements) == 1:
736 Script = scriptElements[0].text
737 ffnameList = ffname.split(
'.')
738 ffnameScript = ffnameList[0] +
'Script.txt' 739 absScript = os.path.join(self.root, self.ffdir, ffnameScript)
740 if os.path.exists(absScript):
741 logger.error(
'XML file ' + absScript +
742 ' already exists on disk! Please delete it\n')
744 wfile = forcebalance.nifty.wopen(absScript)
747 self.
addff(ffnameScript, xmlScript=
True)
750 for e
in self.
ffdata[ffname].getroot().xpath(
'//@parameterize/..'):
751 parameters_to_optimize = sorted(
752 [i.strip()
for i
in e.get(
'parameterize').split(
',')])
753 for p
in parameters_to_optimize:
754 if p
not in e.attrib:
756 "Parameter \'%s\' is not found for \'%s\', please check %s" 757 % (p, e.get(
'type'), ffname))
759 pid = self.
Readers[ffname].build_pid(e, p)
760 self.
map[pid] = self.
np 766 for e
in self.
ffdata[ffname].getroot().xpath(
'//@parameter_repeat/..'):
767 for field
in e.get(
'parameter_repeat').split(
','):
768 parameter_name = field.strip().split(
'=')[0]
769 if parameter_name
not in e.attrib:
771 "Parameter \'%s\' is not found for \'%s\', please check %s" 772 % (parameter_name, e.get(
'type'), ffname))
774 dest = self.
Readers[ffname].build_pid(e, parameter_name)
775 src = field.strip().split(
'=')[1]
777 self.
map[dest] = self.
map[src]
780 "Warning: You wanted to copy parameter from %s to %s, but the source parameter does not seem to exist!" 784 dest.split(
'/')[1], 1)
786 for e
in self.
ffdata[ffname].getroot().xpath(
'//@parameter_eval/..'):
787 for field
in e.get(
'parameter_eval').split(
','):
788 parameter_name = field.strip().split(
'=')[0]
789 if parameter_name
not in e.attrib:
791 "Parameter \'%s\' is not found for \'%s\', please check %s" 792 % (parameter_name, e.get(
'type'), ffname))
794 dest = self.
Readers[ffname].build_pid(e, parameter_name)
795 evalcmd = field.strip().split(
'=')[1]
797 dest.split(
'/')[1],
None, evalcmd)
799 def make(self, vals=None, use_pvals=False, printdir=None, precision=12):
800 """ Create a new force field using provided parameter values. 802 This big kahuna does a number of things: 803 1) Creates the physical parameters from the mathematical parameters 804 2) Creates force fields with physical parameters substituted in 805 3) Prints the force fields to the specified file. 807 It does NOT store the mathematical parameters in the class state 808 (since we can only hold one set of parameters). 810 @param[in] printdir The directory that the force fields are printed to; as usual 811 this is relative to the project root directory. 812 @param[in] vals Input parameters. I previously had an option where it uses 813 stored values in the class state, but I don't think that's a good idea anymore. 814 @param[in] use_pvals Switch for whether to bypass the coordinate transformation 815 and use physical parameters directly. 818 if type(vals) == np.ndarray
and vals.ndim != 1:
819 logger.error(
'Please only pass 1-D arrays\n')
821 if len(vals) != self.
np:
823 'Input parameter np.array (%i) not the required size (%i)\n' %
824 (len(vals), self.
np))
826 if use_pvals
or self.use_pvals:
827 logger.info(
"Using physical parameters directly!\r")
828 pvals = vals.copy().flatten()
832 OMMFormat =
"%%.%ie" % precision
834 def TXTFormat(number, precision):
835 SciNot =
"%% .%ie" % precision
836 if abs(number) < 1000
and abs(number) > 0.001:
837 Decimal =
"%% .%if" % precision
838 Num = Decimal % number
839 Mum = Decimal % (-1 * number)
840 if (float(Num) == float(Mum)):
841 return Decimal % abs(number)
843 return Decimal % number
845 Num = SciNot % number
846 Mum = SciNot % (-1 * number)
847 if (float(Num) == float(Mum)):
848 return SciNot % abs(number)
850 return SciNot % number
854 newffdata = deepcopy(self.
ffdata)
857 PRM = {i: pvals[self.
map[i]]
for i
in self.
map}
863 xml_lines = OrderedDict([(fnm, list(newffdata[fnm].iter()))
867 for i
in range(len(self.
pfields)):
869 pid, fnm, ln, fld, mult, cmd = pfield
881 for x
in (
"system",
"subprocess",
"import")]):
883 "The command %s (written in the force field file) appears to be unsafe!" 885 wval = eval(cmd.replace(
"PARM",
"PRM"))
889 logger.error(traceback.format_exc() +
'\n')
891 "The command %s (written in the force field file) cannot be evaluated!\n" 895 wval = mult * pvals[self.
map[pid]]
897 xml_lines[fnm][ln].attrib[fld] = OMMFormat % (wval)
906 sline = self.
Readers[fnm].Split(newffdata[fnm][ln])
907 whites = self.
Readers[fnm].Whites(newffdata[fnm][ln])
909 if newffdata[fnm][ln][0] !=
' ':
910 whites = [
''] + whites
912 if not match(
'^-', sline[fld])
and len(whites[fld]) > 1:
913 whites[fld] = whites[fld][:-1]
916 newrd =
"% 17.12e" % (wval)
918 newrd = TXTFormat(wval, precision)
921 Lold = len(sline[fld])
922 if not match(
'^-', sline[fld]):
927 if Shave < (len(whites[fld + 1]) + 2):
928 whites[fld + 1] = whites[fld + 1][:-Shave]
931 newffdata[fnm][ln] =
''.join(
933 (len(whites[j]) > 0
or j == 0)
else ' ') + sline[j]
934 for j
in range(len(sline))]) +
'\n' 936 if printdir
is not None:
937 absprintdir = os.path.join(self.root, printdir)
939 absprintdir = os.getcwd()
941 if not os.path.exists(absprintdir):
942 logger.info(
'Creating the directory %s to print the force field\n' 944 os.makedirs(absprintdir)
946 for fnm
in newffdata:
948 with
wopen(os.path.join(absprintdir, fnm), binary=
True)
as f:
949 newffdata[fnm].write(f)
950 elif 'Script.txt' in fnm:
958 tempText =
"".join(newffdata[fnm])
959 fnmXml = fnm.split(
'Script')[0] +
'.xml' 960 Ntemp = len(list(newffdata[fnmXml].iter()))
961 list(newffdata[fnmXml].iter())[Ntemp - 1].text = tempText
963 scriptElements = [elem for elem in fflist if elem.tag=='Script'] 964 if len(scriptElements) > 1: 965 logger.error('XML file'+ffname+'contains more than one script! Consolidate your scripts into one script!\n') 970 os.path.join(absprintdir, fnmXml), binary=
True)
as f:
971 newffdata[fnmXml].write(f)
973 with
wopen(os.path.join(absprintdir, fnm))
as f:
974 f.writelines(newffdata[fnm])
979 Groups = defaultdict(list)
980 for p, pid
in enumerate(self.
plist):
981 if 'Exponent' not in pid
or len(pid.split()) != 1:
983 "Fusion penalty currently implemented only for basis set optimizations, where parameters are like this: Exponent:Elem=H,AMom=D,Bas=0,Con=0" 985 Data = dict([(i.split(
'=')[0], i.split(
'=')[1])
986 for i
in pid.split(
':')[1].split(
',')])
987 if 'Con' not in Data
or Data[
'Con'] !=
'0':
989 "More than one contraction coefficient found! You should expect the unexpected" 991 key = Data[
'Elem'] +
'_' + Data[
'AMom']
992 Groups[key].append(p)
997 logger.info(
"pvals:\n")
998 logger.info(str(pvals) +
'\n')
1002 for gnm, pidx
in Groups.items():
1004 pvals_grp = pvals[pidx]
1006 Order = np.argsort(pvals_grp)
1007 for p
in range(len(Order) - 1):
1010 pj = pidx[Order[p + 1]]
1013 dp = np.log(pvals[pj]) - np.log(pvals[pi])
1025 Groups = defaultdict(list)
1026 for p, pid
in enumerate(self.
plist):
1027 if 'Exponent' not in pid
or len(pid.split()) != 1:
1029 "Fusion penalty currently implemented only for basis set optimizations, where parameters are like this: Exponent:Elem=H,AMom=D,Bas=0,Con=0" 1031 Data = dict([(i.split(
'=')[0], i.split(
'=')[1])
1032 for i
in pid.split(
':')[1].split(
',')])
1033 if 'Con' not in Data
or Data[
'Con'] !=
'0':
1035 "More than one contraction coefficient found! You should expect the unexpected" 1037 key = Data[
'Elem'] +
'_' + Data[
'AMom']
1038 Groups[key].append(p)
1041 logger.info(
"pvals:\n")
1042 logger.info(str(pvals) +
'\n')
1045 for gnm, pidx
in Groups.items():
1047 pvals_grp = pvals[pidx]
1049 Order = np.argsort(pvals_grp)
1051 for p
in range(len(Order) - 1):
1054 pj = pidx[Order[p + 1]]
1057 dp = np.log(pvals[pj]) - np.log(pvals[pi])
1060 spacdict[gnm] = np.mean(np.array(spacs))
1064 """Converts mathematical to physical parameters. 1066 First, mathematical parameters are rescaled and rotated by 1067 multiplying by the transformation matrix, followed by adding 1068 the original physical parameters. 1070 @param[in] mvals The mathematical parameters 1071 @return pvals The physical parameters 1074 if isinstance(mvals, list):
1075 mvals = np.array(mvals)
1078 if self.logarithmic_map:
1080 pvals = np.exp(mvals.flatten()) * self.
pvals0 1082 logger.exception(str(mvals) +
'\n')
1083 logger.error(
'What the hell did you do?\n')
1087 concern = [
'polarizability',
'epsilon',
'VDWT']
1090 for i
in range(self.
np):
1091 if any([j
in self.
plist[i]
1092 for j
in concern])
and pvals[i] * self.
pvals0[i] < 0:
1105 """Converts physical to mathematical parameters. 1107 We create the inverse transformation matrix using SVD. 1109 @param[in] pvals The physical parameters 1110 @return mvals The mathematical parameters 1113 if self.logarithmic_map:
1115 'create_mvals has not been implemented for logarithmic_map\n')
1121 def rsmake(self, printfacs=True):
1122 """Create the rescaling factors for the coordinate transformation in parameter space. 1124 The proper choice of rescaling factors (read: prior widths in maximum likelihood analysis) 1125 is still a black art. This is a topic of current research. 1127 @todo Pass in rsfactors through the input file 1129 @param[in] printfacs List for printing out the resecaling factors 1132 typevals = OrderedDict()
1133 rsfactors = OrderedDict()
1139 if self.
Readers[f].pdict ==
"XML_Override":
1140 for i, pName
in enumerate(self.
map):
1145 ffnameScript = f.split(
'.')[0] +
'Script.txt' 1146 if fnm == f
or fnm == ffnameScript:
1148 [pName.split(
'/')[0],
1149 pName.split(
'/')[1]])
1150 if ttstr
not in termtypelist:
1151 termtypelist.append(ttstr)
1153 for i
in self.
Readers[f].pdict:
1154 for j
in self.
Readers[f].pdict[i]:
1156 ttstr = i + self.
Readers[f].pdict[i][j]
1157 if ttstr
not in termtypelist:
1158 termtypelist.append(ttstr)
1159 for termtype
in termtypelist:
1160 for pid
in self.
map:
1162 typevals.setdefault(termtype,
1164 for termtype
in typevals:
1168 maxval = max(abs(np.array(typevals[termtype])))
1172 rsfactors[termtype] = maxval
1173 rsfac_list.append(termtype)
1177 for termtype
in self.priors:
1178 while termtype
in rsfac_list:
1179 rsfac_list.remove(termtype)
1180 rsfac_list.append(termtype)
1181 rsfactors[termtype] = self.priors[termtype]
1187 "Rescaling Factors by Type (Lower Takes Precedence):", color=1)
1188 logger.info(
''.join([
1189 " %-35s : %.5e\n" % (i, rsfactors[i])
for i
in rsfac_list
1192 self.
rs_ord = OrderedDict([(i, rsfactors[i])
for i
in rsfac_list])
1194 self.
rs = np.ones(len(self.
pvals0))
1197 for pnum
in range(len(self.
pvals0)):
1198 for termtype
in rsfac_list:
1199 if termtype
in self.
plist[pnum]:
1200 if pnum
not in have_rs:
1201 self.
rs[pnum] = rsfactors[termtype]
1203 elif self.
rs[pnum] != rsfactors[termtype]:
1204 self.
rs[pnum] = rsfactors[termtype]
1206 have_rs.append(pnum)
1208 for pnum
in range(len(self.
pvals0)):
1209 if pnum
not in have_rs:
1213 "Rescaling Types / Factors by Parameter Number:", color=1)
1215 " %-28s : %.5e" % (self.
rs_type[pnum], self.
rs[pnum])
1216 for pnum
in range(len(self.
pvals0))
1227 """ Obtain rescaled versions of the inputs according to dictionary values 1228 in "scales" (i.e. a replacement or multiplicative factor on self.rs_ord). 1229 Note that self.rs and self.rs_ord are not updated in this function. You 1230 need to do that outside. 1232 The purpose of this function is to simulate the effect of changing the 1233 parameter scale factors in the force field. If the scale factor is N, 1234 then a unit change in the mathematical parameters produces a change of 1235 N in the physical parameters. Thus, for a given point in the physical 1236 parameter space, the mathematical parameters are proportional to 1/N, 1237 the gradient is proportional to N, and the Hessian is proportional to N^2. 1241 mvals : numpy.ndarray 1242 Parameters to be transformed, if desired. Must be same length as number of parameters. 1244 Gradient to be transformed, if desired. Must be same length as number of parameters. 1246 Hessian to be transformed, if desired. Must be square matrix with side-length = number of parameters. 1247 scales : OrderedDict 1248 A dictionary with the same keys as self.rs_ord and floating point values. 1249 These represent the values with which to multiply the existing scale factors 1250 (if multiply == True) or replace them (if multiply == False). 1251 Pro Tip: Create this variable from a copy of self.rs_ord 1253 When set to True, the new scale factors are the existing scale factors 1259 answer : OrderedDict 1260 Output dictionary containing : 1261 'rs' : New parameter scale factors (multiplied by scales if multiply=True, or replaced if multiply=False) 1262 'rs_ord' : New parameter scale factor dictionary 1263 'mvals' : New parameter values (if mvals is provided) 1264 'G' : New gradient (if G is provided) 1265 'H' : New Hessian (if H is provided) 1267 if type(scales) != OrderedDict:
1268 raise RuntimeError(
'scales must have type OrderedDict')
1269 if scales.keys() != self.
rs_ord.keys():
1270 raise RuntimeError(
'scales should have same keys as self.rs_ord')
1272 if multiply ==
False:
1273 rsord_out = deepcopy(scales)
1275 rsord_out = OrderedDict(
1276 [(k, scales[k] * self.
rs_ord[k])
for k
in scales.keys()])
1277 answer = OrderedDict()
1278 answer[
'rs_ord'] = rsord_out
1280 rs_out = np.array([rsord_out[self.
rs_type[p]]
for p
in range(self.
np)])
1281 answer[
'rs'] = rs_out
1282 if mvals
is not None:
1283 if mvals.shape != (self.
np, ):
1284 raise RuntimeError(
'mvals has the wrong shape')
1285 mvals_out = deepcopy(mvals)
1286 for p
in range(self.
np):
1289 mvals_out[p] *= (float(self.
rs[p]) / float(rs_out[p]))
1290 answer[
'mvals'] = mvals_out
1292 if G.shape != (self.
np, ):
1293 raise RuntimeError(
'G has the wrong shape')
1295 for p
in range(self.
np):
1297 G_out[p] *= (float(rs_out[p]) / float(self.
rs[p]))
1300 if H.shape != (self.
np, self.
np):
1301 raise RuntimeError(
'H has the wrong shape')
1303 for p
in range(self.
np):
1305 H_out[p, :] *= (float(rs_out[p]) / float(self.
rs[p]))
1306 for p
in range(self.
np):
1307 H_out[:, p] *= (float(rs_out[p]) / float(self.
rs[p]))
1314 """ Create the transformation matrix to rescale and rotate the mathematical parameters. 1316 For point charge parameters, project out perturbations that 1317 change the total charge. 1321 'qmap' : Just a list of parameter indices that point to charges. 1323 'qid' : For each parameter in the qmap, a list of the affected atoms :) 1324 A potential target for the molecule-specific thang. 1328 'qtrans2' : A transformation matrix that rotates the charge parameters. 1329 The first row is all zeros (because it corresponds to increasing the charge on all atoms) 1330 The other rows correspond to changing one of the parameters and decreasing all of the others 1331 equally such that the overall charge is preserved. 1333 'qmat2' : An identity matrix with 'qtrans2' pasted into the right place 1335 'transmat': 'qmat2' with rows and columns scaled using self.rs 1337 'excision': Parameter indices that need to be 'cut out' because they are irrelevant and 1338 mess with the matrix diagonalization 1340 @todo Only project out changes in total charge of a molecule, and perhaps generalize to 1341 fragments of molecules or other types of parameters. 1342 @todo The AMOEBA selection of charge depends not only on the atom type, but what that atom is bonded to. 1348 concern = [
'COUL',
'c0',
'charge']
1349 qmat2 = np.eye(self.
np)
1351 def insert_mat(qtrans2, qmap):
1354 for i
in range(self.
np):
1358 qmat2[i, j] = qtrans2[x, y]
1362 def build_qtrans2(tq, qid, qmap):
1363 """ Build the matrix that ensures the net charge does not change. """ 1369 cons0 = np.ones((1, tq))
1370 cons = np.zeros((cons0.shape[0], nq))
1372 qtrans2 = np.eye(nq)
1374 for i
in range(cons.shape[0]):
1376 for j
in range(cons.shape[1]):
1381 cons[i][j] = float(len(qid[j]))
1382 cons[i] /= np.linalg.norm(cons[i])
1386 for j
in range(nq - i - 1):
1388 qtrans2[i + j + 1, :], cons[i])
1392 if any(len(r.adict) > 0
for r
in self.
Readers.values()):
1393 logger.info(
"Building charge constraints...\n")
1395 Adict = OrderedDict()
1397 for r
in self.
Readers.values():
1399 for k, v
in r.adict.items():
1402 for molname, molatoms
in Adict.items():
1403 mol_charge_count = np.zeros(self.
np)
1407 for i
in range(self.
np):
1410 for imol, iatoms
in self.
patoms[i]:
1411 for iatom
in iatoms:
1412 if imol == molname
and iatom
in molatoms:
1415 qidx.append(molatoms.index(iatom))
1416 if any([j
in self.
plist[i]
for j
in concern])
and qct > 0:
1420 "Parameter %i occurs %i times in molecule %s in locations %s (%s)\n" 1421 % (i, qct, molname, str(qidx), self.
plist[i]))
1424 qtrans2 = build_qtrans2(tq, qid, qmap)
1425 if self.constrain_charge:
1426 insert_mat(qtrans2, qmap)
1434 elif self.constrain_charge:
1436 "'adict' {molecule:atomnames} was not found.\n This isn't a big deal if we only have one molecule, but might cause problems if we want multiple charge neutrality constraints." 1440 [self.
Readers[i].pdict ==
"XML_Override" for i
in self.fnms]):
1446 e.get(
'type')
for e
in self.
ffdata[k].getroot().xpath(
1450 for i
in range(self.
np):
1451 if any([j
in self.
plist[i]
for j
in concern]):
1453 if 'Multipole/c0' in self.
plist[
1454 i]
or 'Atom/charge' in self.
plist[i]:
1455 AType = self.
plist[i].split(
'/')[-1].split(
'.')[0]
1456 nq = ListOfAtoms.count(AType)
1459 for k
in self.
plist[i].split():
1462 thisq.append(k.split(
'-')[-1])
1467 [self.
atomnames.index(k)
for k
in thisq]))
1472 [self.
plist[i].count(j)
for j
in concern]))
1473 self.
qid.append(qnr + np.arange(nq))
1475 if len(self.
qid2) == 0:
1477 'Unable to match atom numbers up with atom names (minor issue, unless doing ESP fitting). \nAre atom names implemented in the force field parser?\n' 1483 if len(self.
qmap) > 0:
1484 cons0 = np.ones((1, tq))
1485 qtrans2 = build_qtrans2(tq, self.
qid, self.
qmap)
1487 if self.constrain_charge:
1488 insert_mat(qtrans2, self.
qmap)
1492 if self.constrain_charge:
1493 MultipoleAtoms = set(
1494 [p.split(
'/')[-1]
for p
in self.
plist if 'Multipole' in p])
1496 i
for i, p
in enumerate(self.
plist)
1497 if 'Multipole' in p
and p.split(
'/')[-1] == A
1498 and p.split(
'/')[1]
in [
'q11',
'q22',
'q33']
1499 ]
for A
in MultipoleAtoms]
1500 for Grp
in QuadrupoleGrps:
1501 qid = [np.array([i])
for i
in range(3)]
1503 qtrans2 = build_qtrans2(tq, qid, Grp)
1505 "Making sure that quadrupoles are traceless (for parameter IDs %s)\n" 1507 insert_mat(qtrans2, Grp)
1521 transmat = np.dot(qmat2, np.diag(self.
rs))
1522 transmat1 = np.zeros((self.
np, self.
np))
1523 for i
in range(self.
np):
1524 for k
in range(self.
np):
1525 transmat1[i, k] = qmat2[i, k] * self.
rs[k]
1531 if len(transmat) > 0
and np.max(np.abs(transmat1 - transmat)) > 0.0:
1533 'The difference between the numpy multiplication and the manual multiplication is \x1b[1;91m%f\x1b[0m, ' 1534 'but it should be zero.\n' % np.max(
1535 np.abs(transmat1 - transmat)))
1537 transmat = np.array(transmat1, copy=
True)
1538 transmatNS = np.array(transmat, copy=
True)
1540 for i
in range(self.
np):
1541 if abs(transmatNS[i, i]) < 1e-8:
1543 transmatNS[i, i] += 1
1546 transmat[i, :] = np.zeros(self.
np)
1548 self.
tmI = transmat.T
1551 """ Create the plist, which is like a reversed version of the parameter map. More convenient for printing. """ 1552 if len(self.
map) == 0:
1554 'The parameter map has no elements (Okay if we are not actually tuning any parameters.)' 1558 []
for j
in range(max([self.
map[i]
for i
in self.
map]) + 1)
1562 for i
in range(self.
np):
1565 def print_map(self, vals=None, precision=4):
1566 """Prints out the (physical or mathematical) parameter indices, IDs and values in a visually appealing way.""" 1569 logger.info(
'\n'.join([
1571 (self.
plist.index(i),
1572 "%% .%ie" % precision % float(vals[self.
plist.index(i)])
1574 (str(vals[self.
plist.index(i)]))) +
" : " +
"%s" % i.split()[0]
1579 def sprint_map(self, vals=None, precision=4):
1580 """Prints out the (physical or mathematical) parameter indices, IDs and values to a string.""" 1585 (self.
plist.index(i),
1586 "%% .%ie" % precision % float(vals[self.
plist.index(i)])
1588 (str(vals[self.
plist.index(i)]))) +
" : " +
"%s" % i.split()[0]
1594 """ Assign physical parameter values to the 'pvals0' array. 1596 @param[in] idx The index to which we assign the parameter value. 1597 @param[in] val The parameter value to be inserted. 1599 if idx == len(self.
pvals0):
1604 def assign_field(self, idx, pid, fnm, ln, pfld, mult, cmd=None):
1605 """ Record the locations of a parameter in a txt file; [[file name, line number, field number, and multiplier]]. 1607 Note that parameters can have multiple locations because of the repetition functionality. 1609 @param[in] idx The (not necessarily unique) index of the parameter. 1610 @param[in] pid The unique parameter name. 1611 @param[in] fnm The file name of the parameter field. 1612 @param[in] ln The line number within the file (or the node index in the flattened xml) 1613 @param[in] pfld The field within the line (or the name of the attribute in the xml) 1614 @param[in] mult The multiplier (this is usually 1.0) 1617 self.
pfields.append([pid, fnm, ln, pfld, mult, cmd])
1621 if isinstance(other, FF):
1623 self_pfields = [p[2:]
for p
in self.
pfields]
1624 other_pfields = [p[2:]
for p
in other.pfields]
1626 return self_pfields == other_pfields
and\
1627 self.
map == other.map
and\
1632 return NotImplemented
1635 def rs_override(rsfactors, termtype, Temperature=298.15):
1636 """ This function takes in a dictionary (rsfactors) and a string (termtype). 1638 If termtype matches any of the strings below, rsfactors[termtype] is assigned 1639 to one of the numbers below. 1641 This is LPW's attempt to simplify the rescaling factors. 1643 @param[out] rsfactors The computed rescaling factor. 1644 @param[in] termtype The interaction type (corresponding to a physical unit) 1645 @param[in] Temperature The temperature for computing the kT energy scale 1648 if match(
'PIMPDIHS[1-6]K|PDIHMULS[1-6]K|PDIHS[1-6]K|RBDIHSK[1-5]|MORSEC',
1651 rsfactors[termtype] = 96.4853
1652 elif match(
'UREY_BRADLEYK1|ANGLESK', termtype):
1653 rsfactors[termtype] = 96.4853 * 6.28
1654 elif match(
'COUL|VPAIR_BHAMC|QTPIEA', termtype):
1656 rsfactors[termtype] = 1.0
1657 elif match(
'QTPIEC|QTPIEH', termtype):
1659 rsfactors[termtype] = 27.2114
1661 'BONDSB|UREY_BRADLEYB|MORSEB|VDWS|VPAIRS|VSITE|VDW_BHAMA|VPAIR_BHAMA',
1664 rsfactors[termtype] = 0.05291772
1665 elif match(
'BONDSK|UREY_BRADLEYK2', termtype):
1667 rsfactors[termtype] = 34455.5275 * 27.2114
1668 elif match(
'PDIHS[1-6]B|ANGLESB|UREY_BRADLEYB', termtype):
1670 rsfactors[termtype] = 57.295779513
1671 elif match(
'VDWT|VDW_BHAMB|VPAIR_BHAMB', termtype):
1673 rsfactors[termtype] = kb * Temperature
1674 elif match(
'MORSEE', termtype):
1675 rsfactors[termtype] = 18.897261
def make_rescale(self, scales, mvals=None, G=None, H=None, multiply=True, verbose=False)
Obtain rescaled versions of the inputs according to dictionary values in "scales" (i...
pvals0
Initial value of physical parameters.
Readers
A dictionary of force field reader classes.
tmI
The transpose of the transformation matrix.
Class for parsing OpenMM force field files.
ffdata
As these options proliferate, the force field class becomes less standalone.
def rs_override(rsfactors, termtype, Temperature=298.15)
This function takes in a dictionary (rsfactors) and a string (termtype).
def __setstate__(self, state)
def make_redirect(self, mvals)
np
The total number of parameters.
def mktransmat(self)
Create the transformation matrix to rescale and rotate the mathematical parameters.
Nifty functions, intended to be imported by any module within ForceBalance.
def assign_field(self, idx, pid, fnm, ln, pfld, mult, cmd=None)
Record the locations of a parameter in a txt file; [[file name, line number, field number...
atomnames
A list of atom names (this is new, for ESP fitting)
rs
List of rescaling factors.
linedestroy_save
Destruction dictionary (experimental).
Finite state machine for parsing DVR grid files.
def determine_fftype(ffname, verbose=False)
Determine the type of a force field file.
def addff_xml(self, ffname)
Parse an XML force field file and create important instance variables.
def addff(self, ffname, xmlScript=False)
Parse a force field file and add it to the class.
def __init__(self, options, verbose=True, printopt=True)
Instantiation of force field class.
def create_mvals(self, pvals)
Converts physical to mathematical parameters.
def invert_svd(X, thresh=1e-12)
Invert a matrix using singular value decomposition.
def isint(word)
ONLY matches integers! If you have a decimal point? None shall pass!
rs_ord
Takes the dictionary 'BONDS':{3:'B', 4:'K'}, 'VDW':{4:'S', 5:'T'}, and turns it into a list of term t...
def flat(vec)
Given any list, array, or matrix, return a single-index array.
tm
The transformation matrix for mathematical -> physical parameters.
Interaction type -> Parameter Dictionary.
def natural_sort(l)
Return a natural sorted list.
Finite state machine for parsing custom GROMACS force field files.
map
The mapping of interaction type -> parameter number.
def __missing__(self, key)
def rsmake(self, printfacs=True)
Create the rescaling factors for the coordinate transformation in parameter space.
Finite state machine for parsing Mol2 force field file.
def assign_p0(self, idx, val)
Assign physical parameter values to the 'pvals0' array.
def col(vec)
Given any list, array, or matrix, return a 1-column 2D array.
excision
Indices to exclude from optimization / Hessian inversion.
redirect
Creates plist from map.
def print_map(self, vals=None, precision=4)
Prints out the (physical or mathematical) parameter indices, IDs and values in a visually appealing w...
def make(self, vals=None, use_pvals=False, printdir=None, precision=12)
Create a new force field using provided parameter values.
def check_dupes(self, pid, ffname, ln, pfld)
Check to see whether a given parameter ID already exists, and provide an alternate if needed...
def __init__(self, backup_dict)
def orthogonalize(vec1, vec2)
Given two vectors vec1 and vec2, project out the component of vec1 that is along the vec2-direction...
patoms
A listing of parameter number -> atoms involved.
def warn_press_key(warning, timeout=10)
def printcool_dictionary(Dict, title="Dictionary Keys : Values", bold=False, color=2, keywidth=25, topwidth=50, center=True, leftpad=0)
See documentation for printcool; this is a nice way to print out keys/values in a dictionary...
Finite state machine for parsing GROMACS force field files.
def printcool(text, sym="#", bold=False, color=2, ansi=None, bottom='-', minwidth=50, center=True, sym2="=")
Cool-looking printout for slick formatting of output.
pfields
A list where pfields[i] = [pid,'file',line,field,mult,cmd], basically a new way to modify force field...
def addff_txt(self, ffname, fftype, xmlScript)
Parse a text force field and create several important instance variables.
plist
The listing of parameter number -> interaction types.
Finite state machine for parsing FrcMod force field file.
Finite state machine for parsing TINKER force field files.
def sprint_map(self, vals=None, precision=4)
Prints out the (physical or mathematical) parameter indices, IDs and values to a string.
Finite state machine for parsing Q-Chem input files.
def create_pvals(self, mvals)
Converts mathematical to physical parameters.
def isfloat(word)
Matches ANY number; it can be a decimal, scientific notation, integer, or what have you...
def wopen(dest, binary=False)
If trying to write to a symbolic link, remove it first.
def list_map(self)
Create the plist, which is like a reversed version of the parameter map.
FFAtomTypes
WORK IN PROGRESS ## This is a dictionary of {'AtomType':{'Mass' : float, 'Charge' : float...