Speeding Up Analysis¶
In this tutorial, we’ll explore how to speed up analysis of LAMMPS trajectory data with lammpsio. We will analyze a dump file from a simulation of a single 100-bead linear polymer represented using the Kremer–Grest model. All particles interact pairwise through the Weeks-Chandler-Anderson potential:
Additionally, bonded particles interact via the FENE bond potential:
We’ve chosen standard model parameters of \(\sigma=1\), \(\varepsilon=1\), \(R_0=1.5\), and \(k=30\) in LAMMPS lj units.
We’ve included the LAMMPS input and initial configuration files if you want to run the simulation yourself and follow along. Note that to keep compute time short, we have reduced the length of the trajectory. Averages computed from this trajectory are probably not reliable, but you can extend the simulation if you’d like to improve them!
First, we import lammpsio and load the corresponding dump file.
[35]:
import lammpsio
import numpy
import numba
traj = lammpsio.DumpFile("traj.lammpstrj")
Calculating the radius of gyration¶
We will calculate the radius of gyration \(R_g\), a key measure of polymer size:
where \(\mathbf{R}_i\) is the position vector of particle i, N is the number of beads (\(N=100\) for our polymer), and \(\mathbf{R}_{\rm cm}\) is the position vector of the center of mass of the polymer:
Note that the position vectors of the particles should all be unwrapped to account for the periodic boundary conditions. lammpsio makes it easy to extract this data from LAMMPS dump files!
Literal implementation¶
We’ll start by calculating \(R_g^2\) with a literal Python implementation that uses loops.
[36]:
def compute_rg(pos):
N = pos.shape[0]
# Compute center of mass
rcm = numpy.zeros(3)
for i in range(N):
rcm += pos[i]
rcm /= N
# Compute radius of gyration squared
rg_sqr = 0
for i in range(N):
dr = pos[i] - rcm
rg_sqr += dr[0]**2 + dr[1]**2 + dr[2]**2
rg_sqr /= N
return rg_sqr
We will time how long it takes to evaluate \(R_g^2\) over all configurations in the trajectory.
[37]:
%%timeit -n 100 -r 3
rg_sqr = []
for snapshot in traj:
pos = snapshot.position + 2 * snapshot.box.high[0] * snapshot.image
rg_sqr.append(compute_rg(pos))
40.2 ms ± 920 μs per loop (mean ± std. dev. of 3 runs, 100 loops each)
It is known that loops are usually quite slow in Python. Next, we’ll look at ways to speed things up using either NumPy or Numba.
NumPy¶
One approach we can take is to use NumPy’s vectorized array operations. Let’s replace the first loop with numpy.mean and the second loop with a combination of numpy.sum to get the squared distance from the center of mass for each particle and numpy.mean to average this quantity over all particles.
[38]:
def compute_rg_numpy(pos):
rcm = numpy.mean(pos, axis=0)
rg_sqr = numpy.mean(numpy.sum((pos - rcm)**2, axis=1))
return rg_sqr
Now we repeat the same timing.
[39]:
%%timeit -n 100 -r 3
rg_sqr = []
for snapshot in traj:
pos = snapshot.position + 2 * snapshot.box.high[0] * snapshot.image
rg_sqr.append(compute_rg_numpy(pos))
30.3 ms ± 300 μs per loop (mean ± std. dev. of 3 runs, 100 loops each)
The vectorized NumPy approach is significantly faster than the literal implementation!
Numba¶
Alternatively, we can use just-in-time (JIT) compilation with numba to speed up our calculations. We’ll take the compute_rg function from our first implementation, then use numba.njit to enable JIT compilation. If you were writing the function from scratch, you could put the @numba.njit decorator on your function instead.
[40]:
compute_rg_numba = numba.njit(compute_rg)
Now we’ll go ahead with timing! Note that the first time compute_rg_numba gets called will be slower than subsequent calls. We don’t worry about separating that out here because we repeat the timing many times.
[ ]:
%%timeit -n 100 -r 3
rg_sqr = []
for snapshot in traj:
pos = snapshot.position + 2 * snapshot.box.high[0] * snapshot.image
rg_sqr.append(compute_rg_numba(pos))
29.8 ms ± 448 μs per loop (mean ± std. dev. of 3 runs, 100 loops each)
This approach gives a similar speed up as using NumPy, but we didn’t need to rewrite anything!
Summary¶
lammpsio makes it simple to load and analyze LAMMPS dump files in Python. As shown above, taking advantage of the Python ecosystem can dramatically speed up your analysis! NumPy and Numba are just two examples that provide significant performance gains. There are many other tools that lammpsio can interface with to optimize your specific workflows!
Disclaimer: the timings above are for our workstation. Your mileage may vary based on your system specifications.