CageWater: Accurately predicting water properties at rocket speed
Lakshmanji Verma and Ken Dill
Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY , USA
Water is everywhere, and life depends on it—we are about 70% water. Yet despite its ubiquity, water remains one of the most poorly understood liquids. It exhibits striking anomalies—such as a density maximum and unusually high heat capacity—that make it both scientifically fascinating and notoriously difficult to model. Atomistic simulations remain the most reliable way to reproduce its behavior, but they are computationally expensive and generally accurate only near ambient conditions.
CageWater addresses these challenges through a statistical-mechanical, fully analytical model. It predicts water’s properties roughly 1000× faster than explicit simulations, remains accurate across a wide range of temperatures and pressures. It provides microscopic level understanding of the origins of water anomalies—such as high heat capacity is a result of breaking of stronger cooperative bonds. It even resolves long-standing controversial questions about the liquid-liquid phase separation and critical point in the supercooled water. By providing a fast and reliable description of pure water, CageWater also paves the path for solvation and efficient simulations of biomolecules (proteins, DNA, drugs, etc.) and chemical materials in aqueous solutions under diverse thermodynamic conditions, accelerating drug discovery and materials design.
A major challenge—beyond the theoretical development—was calibrating and validating the model against the enormous volume of experimental data available for water. Although CageWater itself can run on any computer, its nonlinear structure and high-dimensional parameter space make heuristic optimization impossible on a standard workstation. Calibration of the model required exploring millions of candidate parameter sets and comparing them against thousands of experimental data points.
To achieve this, we employed a classical machine-learning approach: a genetic algorithm (GA). Using the Intel Sapphire Rapids nodes on the SeaWulf cluster, we parallelized the evaluation of millions of solutions, enabling rapid evolutionary searches for the global optimum. This large-scale calibration was crucial for identifying optimum parameters that allow CageWater to match or surpass, in some cases, the accuracy of leading explicit and polarizable water models (TIP3P, TIP4P, TIP5P, SPC, MB-pol) at a tiny fraction of the computational cost.

References
Lakshmanji Verma and Ken A Dill, Statistical Mechanical Theory of Liquid Water, J. Chem. Theory Comput. 2025, 21, 16, 7755–7764
