Tinkergpu:Forcebalance

From tinkergpu
Jump to: navigation, search

Background

ForceBalance is an automatic parameter optimization tool that was developed by Lee-Ping Wang et al. Please read at least these three papers I listed here to get a basic understanding of ForceBalance.

   J. Phys. Chem. Lett., 2014, 5 (11), pp 1885–1891 (ForceBalance paper, TIP3P and TIP4P as examples)

   J. Phys. Chem. B, 2013, 117 (34), pp 9956–9972 (iAMOEBA paper)

   J. Phys. Chem. B, 2015, 119 (29), pp 9423–9437 (AMOEBA14 paper)

In the following I will provide a brief tutorial on how to set up and run ForceBalance according to my experience. 

 

ForceBalance Installation

According to my experience, this is the simplest way to install ForceBalance software.

Make sure you have:

1) python 2.7 or later version;

2) pip (how to get it and install: https://pip.pypa.io/en/stable/installing);

3) forcebalance source code.

Keep in mind: whichever dependencies required by next steps, you can easily install them by pip.

pip install xxx 

Install 'forcebalance':

Go to forcebalance directory and do the following:

python setup.py install --user

This will install the python modules in ~/.local/lib/python?.?/site-packages/forcebalance*/forcebalance

Add the ForceBalance executable to your PATH

export PATH=$PATH:~/.local/bin

 

ForceBalance setup

 I will take forcebalance directory /forcebalance/studies/015_tinker_amoeba as an example.

A directory called "forcefield" is necessary, under which the amoeba parameter file is presented and a shot.key file for gas phase run. 

Another directory called "targets" is necessary which specify different targets (such as cluster force and energy matching, liquid properties, cluster binding energies etc).

Remember: you can design your own target either like the current ForceBalance supported ones, or completely different ones, which you may need to modify the source code.

Running ForceBalance is a simple command: ForceBalance example.in

The example.in file is the FB input which tells FB program to do the optimization/single point task. About the detailed control keywords, please look at the forcebalance manual file. 

ForceBalance Run with TINKER for AMOEBA+ model

New features added to ForceBalance

ForceBalance code was modified to deal with our new terms in AMOEBA+ model, which includes, charge transfer, charge penetration (as a correction of multipole-multipole interation), vdW shape parameters and Quadupole scaling.

All these modifications are made in tinkerio.pyfile.

allp = [ ..., "ct', 'delta-halgren', 'gamma-halgren','cp', ...]
eckeys = [..., 'Charge-Charge', ...] #Chengwen added
pdict = {..., 'CT'           : {'Atom':[1], 2:'A',3:'B'},   # CT parameters
         'DELTA-HALGREN': {'Atom':[], 0:''},    #Delta in VDW function
         'GAMMA-HALGREN': {'Atom':[], 0:''},    #Gamma in VDW function
         'QPOLESCALING' : {'Atom':[], 0:'', #Quadrupole scaling factor&nbsp
         ...};

Liquid simulation parallelization

      The original code was modified in the liquid.py file in order to run liquid simulation parallelly. Basically, I am using a combination of "ssh" and background command "&" to accomplish this goal. Besides, inparser'.py file, we need to add a line to specify the file location which will specify file contains the temperature-node list information. Here is the core part of the code:

   
 373    #! Modified by Chengwen Liu for multiple trajectores liquid simulations
 374    #==============================================================================================================
 375     def npt_simulation(self, temperature, pressure, simnum):
 376         """ Submit Several NPT simulations on our Clusters """
 377         temps = [];nodes = []
 378         lines = file(self.liquid_node).readlines()
 379         for line in lines:
 380             data = line.split()
 381             temps.append(float(data[0]))
 382             nodes.append(str(data[1]))
 383         if not os.path.exists('npt_result.p'):
 384             link_dir_contents(os.path.join(self.root,self.rundir),os.getcwd())
 385             self.last_traj += [os.path.join(os.getcwd(), i) for i in self.extra_output]
 386             self.liquid_mol[simnum%len(self.liquid_mol)].write(self.liquid_coords, ftype='tinker' if self.engname == 'tinker' else None)
 387             if temperature in temps[0:-1]:
 388               cmdstr = 'ssh %s "source ~/.forcebalance; cd %s; nohup %s python -u npt.py %s %.3f %.3f > npt.out 2>err.log &" ' % (nodes[int(temps.index(temperature))], os.getcwd(), self.nptpfx, self.eng
 389               _exec(cmdstr, copy_stderr=True, outfnm='npt.out')
 390
 391             if temperature == temps[-1]:
 392               cmdstr = '%s python -u npt.py %s %.3f %.3f >npt.out 2>err.log' % (self.nptpfx, self.engname, temperature, pressure)
 393               _exec(cmdstr, copy_stderr=True, outfnm='npt.out')
 394
 395               #Check whether ALL simulations are completed; if not, wait for 30sec
 396               parent_dir = os.getcwd()[0:-14]
 397               sleepingtime = 0.0
 398               allFiles = 0
 399               while allFiles != len(temps):
 400                 allFiles = 0
 401                 for temp in temps:
 402                   directory_name = parent_dir+str(temp)+"K-1.0atm/"
 403                   allFiles = allFiles + int(os.path.isfile(directory_name+'npt_result.p'))
 404                   if not os.path.isfile(directory_name+'npt_result.p'):
 405                     print("File %s does not exist, I will sleep for 30sec"%(directory_name+'npt_result.p'))
 406                     time.sleep(30.0)
 407                     sleepingtime += 30.0
 408                 #if sleepingtime >= 900:break # which means some trajectories were probably crashed.
 409 #Above ============================================================================================================

 

ForceBalance run with Tinker-OpenMM

For GPU run, we have to change the following lines:

1. In tinkerio.py: rbytes=1024 should be smaller, otherwise it will cause a "stream interleave" error. (change it to 1)

2. In npt.py, comment out the PrintEDA function calls. In tinker-openmm it will not print this information.

3. In tinkerio.py, change "dynamic" to "dynamic_omm", which is the GPU executable. For gas phase, we are still using "dynamic".

4. In order to speed up the post-processing (analyze *.arc file to get energy and dipole gradient), I made a new version of energy_derivatives() function in npt.py, which allows one to run as many analyze jobs as possible.

188 def energy_derivatives_TINKER(FF, mvals, h, pgrad, length, AGrad=True):
189     #initialize the energy gradient and dipole gradient array
190     G   = np.zeros((FF.np,length))
191     GDx = np.zeros((FF.np,length))
192     GDy = np.zeros((FF.np,length))
193     GDz = np.zeros((FF.np,length))
194     if not AGrad:
195         return G, GDx, GDy, GDz
196     #Actually tinkerpath is an argument in FB
197     tinkerpath = "/home/liuchw/bin/tinkerOMM0910"
198     #Record key file except for the first line
199     lines = file("liquid-md.key").readlines()[1:]
200     ofile = open("runAna_m.sh","w")
201     ofil1 = open("runAna_p.sh","w")
202     #backup the current water.prm
203     os.rename("water.prm", "water.prm.org")
204     for i in pgrad:
205         #minus and plus
206         prmfile1 = open("liquid_%02d_m.key"%i, 'w')
207         prmfile2 = open("liquid_%02d_p.key"%i, 'w')
208         prmfile1.write("parameters ./water_%02d_m.prm\n"%i)
209         prmfile2.write("parameters ./water_%02d_p.prm\n"%i)
210         for line in lines:
211             prmfile1.write(line)
212             prmfile2.write(line)
213         prmfile1.close()
214         prmfile2.close()
215
216         mvals_= mvals
217         mvals_[i] += -abs(h)
218         FF.make(mvals_)
219         os.rename("water.prm", "water_%02d_m.prm"%i)
220         mvals_[i] += abs(h)*2.0
221         FF.make(mvals_)
222         os.rename("water.prm", "water_%02d_p.prm"%i)
223         mvals_[i] += -abs(h)
224
225         if i == pgrad[-1]:
226             cmdstr1 = tinkerpath+"/analyze ./liquid-md.arc -k ./liquid_%02d_m.key G,E,M > liquid_%02d_m.out \n"%(i,i)
227             cmdstr2 = tinkerpath+"/analyze ./liquid-md.arc -k ./liquid_%02d_p.key G,E,M > liquid_%02d_p.out \n"%(i,i)
228         else:
229             cmdstr1 = tinkerpath+"/analyze ./liquid-md.arc -k ./liquid_%02d_m.key G,E,M > liquid_%02d_m.out &\n"%(i,i)
230             cmdstr2 = tinkerpath+"/analyze ./liquid-md.arc -k ./liquid_%02d_p.key G,E,M > liquid_%02d_p.out &\n"%(i,i)
231         ofile.write(cmdstr1)
232         ofil1.write(cmdstr2)
233     ofile.close()
234     ofil1.close()
235     #wait 5sec, for safe
236     time.sleep(5.0)
237     os.rename("water.prm.org", "water.prm")
238
239     os.system("sh runAna_m.sh")
240     os.system("sh runAna_p.sh")
241     #Check whether all analyze jobs finished!
242     readFlag = 0
243     while readFlag==0:
244         cmdstr1 = "grep 'Dipole X,Y,Z-Components :' liquid_*_m.out >dipole_m.dat"
245         cmdstr2 = "grep 'Dipole X,Y,Z-Components :' liquid_*_p.out >dipole_p.dat"
246         os.system(cmdstr1)
247         os.system(cmdstr2)
248         nLinesDm = sum(1 for line in open("dipole_m.dat"))
249         nLinesDp = sum(1 for line in open("dipole_p.dat"))
250         if ((nLinesDm==length*len(pgrad)) and (nLinesDp==length*len(pgrad))):
251             readFlag = 1
252             break
253         else:
254             print("Some analyze jobs are still running! I will sleep for 30 sec!")
255             time.sleep(30.0)
256     #If all jobs finished, calculate numerical gradients
257     if readFlag==1:
258         for i in pgrad:
259             eanl_m = []
260             eanl_p = []
261             dip_px = []
262             dip_py = []
263             dip_pz = []
264             dip_mx = []
265             dip_my = []
266             dip_mz = []
267             lines = file("liquid_%02d_m.out"%i).readlines()
268             for line in lines:
269                 s = line.split()
270                 if 'Total Potential Energy : ' in line:
271                     eanl_m.append(float(s[4]) * 4.184)
272                 if 'Dipole X,Y,Z-Components :' in line:
273                     dip_mx.append(float(s[-3]))
274                     dip_my.append(float(s[-2]))
275                     dip_mz.append(float(s[-1]))
276             lines = file("liquid_%02d_p.out"%i).readlines()
277             for line in lines:
278                 s = line.split()
279                 if 'Total Potential Energy : ' in line:
280                     eanl_p.append(float(s[4]) * 4.184)
281                 if 'Dipole X,Y,Z-Components :' in line:
282                     dip_px.append(float(s[-3]))
283                     dip_py.append(float(s[-2]))
284                     dip_pz.append(float(s[-1]))

285
286             #use np.array here
287             eanl_p = np.array(eanl_p)
288             eanl_m = np.array(eanl_m)
289             dip_px = np.array(dip_px)
290             dip_py = np.array(dip_py)
291             dip_pz = np.array(dip_pz)
292             dip_mx = np.array(dip_mx)
293             dip_my = np.array(dip_my)
294             dip_mz = np.array(dip_mz)
295             #2-sides numerical grad.
296             G[i,:]   = (eanl_p - eanl_m)/(2*h)
297             GDx[i,:] = (dip_px - dip_mx)/(2*h)
298             GDy[i,:] = (dip_py - dip_my)/(2*h)
299             GDz[i,:] = (dip_pz - dip_mz)/(2*h)
300     return G, GDx, GDy, GDz


Code availability

All of the modified code can be found in https://biomol.bme.utexas.edu/~liuchw/