Creating an Ethanol-Water Mixture

In this tutorial, we’ll walk through setting up an atomistic system containing water and ethanol molecules using lammpsio. This tutorial demonstrates how to work with molecular systems that have bonds, angles, and dihedral angles across multiple species. At the end, we will create a data file ready to be used by LAMMPS.

First, we import the necessary packages.

[1]:
import json
import subprocess

import lammpsio
import numpy

System parameters

We define the composition of our mixture and the simulation box size. Our system will contain 125 water molecules and 125 ethanol molecules, creating an equimolar mixture in a cubic box with a side length of 25 Å.

[2]:
N_water = 125
N_ethanol = 125
L_box = 25

Generating the molecular configuration

Before we can build our LAMMPS data file, we need to create a molecular configuration. We’ll use PACKMOL, a powerful tool for creating initial configurations, to set up a dense, nonoverlapping arrangement of molecules.

For this tutorial, we’ve provided the necessary input files for you to follow along:

  • water.json: LAMMPS JSON molecule file for water

  • ethanol.json: LAMMPS JSON molecule file for ethanol

We start by reading the molecular structures from the LAMMPS JSON molecule files. These files contain the atomic coordinates, masses, types, and topological information (bonds, angles, and dihedrals) for each molecule type. The provided files use OPLS-aa types for ethanol and TIP3P types for water. We assume that all sections containing atom IDs are sorted.

[3]:
with open("water.json", "r") as f:
    water_data = json.load(f)

with open("ethanol.json", "r") as f:
    ethanol_data = json.load(f)

Preparing and running PACKMOL

Now we create the necessary input files for PACKMOL. We write XYZ coordinate files for each molecule type and generate a PACKMOL input file that specifies how to pack the molecules in our simulation box.

[4]:
# write water xyz file
with open("water.xyz", "w") as f:
    f.write(f"{len(water_data["coords"]["data"])}\n")
    f.write("Angstrom\n")
    for cords, types in zip(water_data["coords"]["data"], water_data["types"]["data"]):
        id, x, y, z = cords
        id, _type = types
        f.write(f"{_type} {x} {y} {z} \n")

# write ethanol xyz file
with open("ethanol.xyz", "w") as f:
    f.write(f"{len(ethanol_data["coords"]["data"])}\n")
    f.write("Angstrom\n")
    for cords, types in zip(ethanol_data["coords"]["data"], ethanol_data["types"]["data"]):
        id, x, y, z = cords
        id, _type = types
        f.write(f"{_type} {x} {y} {z} \n")

# create PACKMOL input file
with open("pack.inp", "w") as f:
    lines = [
        "tolerance 2.0\n",
        "output mixture.xyz\n",
        "filetype xyz\n",
        f"pbc 0 0 0 {L_box} {L_box} {L_box}\n",
        "structure water.xyz\n",
        f"  number {N_water}\n",
        f"  inside box 0 0 0 {L_box} {L_box} {L_box}\n",
        "end structure\n",
        "structure ethanol.xyz\n",
        f"  number {N_ethanol}\n",
        f"  inside box 0 0 0 {L_box} {L_box} {L_box}\n",
        "end structure\n"
    ]
    f.writelines(lines)

With our input files prepared, we can now execute PACKMOL to generate the mixed molecular configuration. This will create a single file mixture.xyz containing all molecules positioned within our simulation box.

[5]:
subprocess.run("packmol < pack.inp", shell=True, capture_output=True);

Reading packed coordinates

Now we read the molecular coordinates from the mixture.xyz file generated by PACKMOL and separate them into water and ethanol positions. PACKMOL writes coordinates in the order specified in the input, so water molecules appear first followed by ethanol molecules. We also store the types of the packed particles and number of atoms in each molecule, as this will be needed several times throughout the tutorial.

[6]:
data = numpy.genfromtxt('mixture.xyz', skip_header=2, dtype=[('type', 'U8'), ('x', float), ('y', float), ('z', float)])
positions = numpy.column_stack((data['x'], data['y'], data['z']))
types = data['type']

atoms_per_water = len(water_data["coords"]["data"])
atoms_per_ethanol = len(ethanol_data["coords"]["data"])

Creating the snapshot

Now we create a Snapshot to hold all our molecular data. We calculate the total number of atoms and define a simulation box that contains our system. We use the same box dimensions that were specified in our PACKMOL input file: a cubic box extending from 0 to 25 Å in each direction.

[7]:
N_total = N_water * atoms_per_water + N_ethanol * atoms_per_ethanol
box = lammpsio.Box([0, 0, 0], [L_box, L_box, L_box])

snap = lammpsio.Snapshot(N_total, box, step=0)

Assigning atomic properties

Now we assign positions, masses, and type IDs to each atom. The packed coordinates are assigned to each atom directly from the PACKMOL output.

[8]:
snap.position = positions

We use a set to get the unique atom types from our packed system and assign each a numeric identifier. Then, we create a LabelMap to link integer type IDs to their corresponding atom type, which could be useful for later force field assignment.

[9]:
unique_types = set(types)
snap.type_label = lammpsio.LabelMap({idx+1: atom_type for idx, atom_type in enumerate(unique_types)})

We use the LabelMap to convert the atom types and assign integer type IDs. We extract the atomic mass information from each molecule’s JSON and replicate them by the number of each molecule. We concatenate these together, remembering that water’s coordinates were generated first by PACKMOL, and assign them to the Snapshot object.

[10]:
# assign type IDs
snap.typeid = [snap.type_label.inverse[t] for t in types]

# assign masses
water_mass = [mass for _, mass in water_data["masses"]["data"]]
ethanol_mass = [mass for _, mass in ethanol_data["masses"]["data"]]

snap.mass = water_mass * N_water + ethanol_mass * N_ethanol

Parsing topology

To prevent code duplication, we write a helper function that will parse our topology objects and handle the replication across multiple molecules.

[11]:
def get_topology(topology_data, num_repeats, atoms_per_molecule):
    """Get topology for a mixture of molecules.

    Args:
        topology_data: List of topology data for each molecule type from JSON
            molecule files
        num_repeats: List of number of molecules for each molecule type
        atoms_per_molecule: List of atoms per molecule for each molecule type

    Returns:
        members_list: List of member atom indices for each topology item
        types_list: List of topology type labels for each topology item
    """
    members_list = []
    types_list = []

    if len(topology_data) != len(num_repeats) or len(topology_data) != len(atoms_per_molecule):
        raise ValueError("Length of topology_data, num_repeats, and atoms_per_molecule must be the same.")

    offset = 0
    for topology, repeats, atoms_per_mol in zip(topology_data, num_repeats, atoms_per_molecule):
        for _ in range(repeats):
            for topology_item in topology:
                topology_type = topology_item[0]
                member_indices = topology_item[1:]

                types_list.append(topology_type)
                members_list.append([offset + idx for idx in member_indices])
            offset += atoms_per_mol

    return members_list, types_list

Creating bonds

We extract the bond topology from the molecular data files we loaded earlier. Each molecule type has its own set of bonds defined in the JSON files. We start by reading the bond information from both molecule types.

[12]:
water_bond = water_data["bonds"]["data"]
ethanol_bond = ethanol_data["bonds"]["data"]

Now we calculate the total number of bonds, use our helper function to get the bond topology for all molecules, and get the unique bonds in the system.

[13]:
# calculate total number of bonds
N_bonds = N_water * len(water_bond) + N_ethanol * len(ethanol_bond)

# get bond topology for all molecules using our helper function
bond_members, bond_types = get_topology(
    topology_data=[water_data["bonds"]["data"], ethanol_data["bonds"]["data"]],
    num_repeats=[N_water, N_ethanol],
    atoms_per_molecule=[atoms_per_water, atoms_per_ethanol],
)

# get unique bond types
unique_bonds = set(bond_types)

Finally, we create and assign the Bonds object with all the bond connectivity, LabelMap, and integer type-id information for our system.

[14]:

snap.bonds = lammpsio.Bonds(N=N_bonds, num_types=len(unique_bonds)) snap.bonds.members = bond_members snap.bonds.type_label = lammpsio.LabelMap({idx+1:bond_type for idx, bond_type in enumerate(unique_bonds)}) snap.bonds.typeid = [snap.bonds.type_label.inverse[bond_type] for bond_type in bond_types]

Creating angles

We follow the same approach for angles as we did for bonds: extract the angle topology from the molecule JSON files, replicate the per-molecule angles to all molecules in the system, and finally create the Angles object to store all the angle information and the LabelMap for the system.

[15]:
# get angle topology from molecule JSON data
water_angles = water_data["angles"]["data"]
ethanol_angles = ethanol_data["angles"]["data"]

# calculate total number of angles
N_angles = N_water * len(water_angles) + N_ethanol * len(ethanol_angles)

# get angle members and types for all molecules using our helper function
angle_members, angle_types = get_topology(
    topology_data=[water_data["angles"]["data"], ethanol_data["angles"]["data"]],
    num_repeats=[N_water, N_ethanol],
    atoms_per_molecule=[atoms_per_water, atoms_per_ethanol],
)

# get unique angle types
unique_angles = set(angle_types)

# create and assign the Angles object
snap.angles = lammpsio.Angles(N=N_angles, num_types=len(unique_angles))
snap.angles.members = angle_members
snap.angles.type_label = lammpsio.LabelMap({idx+1:angle_type for idx, angle_type in enumerate(unique_angles)})
snap.angles.typeid = [snap.angles.type_label.inverse[angle_type] for angle_type in angle_types]

Creating dihedral angles

We also need dihedral angles (4-atom torsional angles) to properly define the molecular geometry of ethanol. Water molecules don’t have dihedral angles since they only have 3 atoms.

We follow the same protocol as above, but we only process ethanol molecules since water doesn’t have dihedrals.

[16]:
# get dihedral topology from ethanol JSON data
ethanol_dihedrals = ethanol_data["dihedrals"]["data"]

# calculate total number of dihedrals in the system
N_dihedrals = N_ethanol * len(ethanol_dihedrals)

# get dihedral members and types for all ethanol molecules using our helper function
dihedral_members, dihedral_types = get_topology(
    topology_data=[ethanol_data["dihedrals"]["data"]],
    num_repeats=[N_ethanol],
    atoms_per_molecule=[atoms_per_ethanol],
)

# get unique dihedral types
unique_dihedrals = set(dihedral_types)

# create and assign the Dihedrals object
snap.dihedrals = lammpsio.Dihedrals(N=N_dihedrals, num_types=len(unique_dihedrals))
snap.dihedrals.members = dihedral_members
snap.dihedrals.type_label = lammpsio.LabelMap({idx+1: dihedral_type for idx, dihedral_type in enumerate(unique_dihedrals)})
snap.dihedrals.typeid = [snap.dihedrals.type_label.inverse[dihedral_type] for dihedral_type in dihedral_types]

Save to LAMMPS data file

Finally, we save the complete molecular system to a LAMMPS data file.

[17]:
# Write to LAMMPS data file
lammpsio.DataFile.create(
    filename="ethanol_water_mixture.data", snapshot=snap
)
[17]:
<lammpsio.data.DataFile at 0x105279730>

Summary

In this tutorial, we created an ethanol-water mixture using lammpsio. We demonstrated how to work with atomistic systems that require bonds, angles, and dihedral angles. The key steps were:

  1. Read molecular data from LAMMPS JSON files containing coordinates, masses, and topological information

  2. Generate coordinates using PACKMOL to create a realistic configuration of molecules

  3. Build molecular topology by extracting and processing bonds, angles, and dihedrals from the JSON molecule files

  4. Create the data file ready to be used in LAMMPS

This approach shows how lammpsio can interface with LAMMPS molecule JSON files and PACKMOL to handle molecular systems with multiple species!