Structure to Energy and Forces (S2EF) #

The S2EF task takes an atomic system as input and predicts the energy of the entire system and forces on each atom. This is our most general task, ultimately serving as a surrogate to DFT. A model that can perform well on this task can accelerate other applications like molecular dynamics and transitions tate calculations.

Steps for training an S2EF model#

  1. Define or load a configuration (config), which includes the following

  • task

  • model

  • optimizer

  • dataset

  • trainer

  1. Create a ForcesTrainer object

  2. Train the model

  3. Validate the model

Warning

For storage and compute reasons we use a very small subset of the OC20 S2EF dataset for this tutorial. Results will be considerably worse than presented in our paper.

Imports#

from ocpmodels.trainers import ForcesTrainer
from ocpmodels.datasets import TrajectoryLmdbDataset
import ocpmodels.models
from ocpmodels.common import logger
from ocpmodels.common.utils import setup_logging
setup_logging()

import numpy as np
import copy
import os

Dataset#

%%bash
mkdir data
cd data
wget -q -nc http://dl.fbaipublicfiles.com/opencatalystproject/data/tutorial_data.tar.gz -O tutorial_data.tar.gz
tar -xzvf tutorial_data.tar.gz
mkdir: cannot create directory ‘data’: File exists
./
./is2re/
./is2re/train_100/
./is2re/train_100/data.lmdb
./is2re/train_100/data.lmdb-lock
./is2re/val_20/
./is2re/val_20/data.lmdb
./is2re/val_20/data.lmdb-lock
./s2ef/
./s2ef/train_100/
./s2ef/train_100/data.lmdb
./s2ef/train_100/data.lmdb-lock
./s2ef/val_20/
./s2ef/val_20/data.lmdb
./s2ef/val_20/data.lmdb-lock
train_src = "data/s2ef/train_100"
val_src = "data/s2ef/val_20"

Normalize data#

If you wish to normalize the targets we must compute the mean and standard deviation for our energy values. Because forces are physically related by the negative gradient of energy, we use the same multiplicative energy factor for forces.

train_dataset = TrajectoryLmdbDataset({"src": train_src})

energies = []
for data in train_dataset:
  energies.append(data.y)

mean = np.mean(energies)
stdev = np.std(energies)
/home/runner/micromamba-root/envs/buildenv/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3433: UserWarning: TrajectoryLmdbDataset is deprecated and will be removed in the future.Please use 'LmdbDataset' instead.
  exec(code_obj, self.user_global_ns, self.user_ns)

Define the Config#

For this example, we will explicitly define the config; however, a set of default configs can be found here. Default config yaml files can easily be loaded with the following utility. Loading a yaml config is preferrable when launching jobs from the command line. We have included our best models’ config files here for reference.

Note - we only train for a single epoch with a reduced batch size (GPU memory constraints) for demonstration purposes, modify accordingly for full convergence.

# Task
task = {
    'dataset': 'trajectory_lmdb', # dataset used for the S2EF task
    'description': 'Regressing to energies and forces for DFT trajectories from OCP',
    'type': 'regression',
    'metric': 'mae',
    'labels': ['potential energy'],
    'grad_input': 'atomic forces',
    'train_on_free_atoms': True,
    'eval_on_free_atoms': True
}
# Model
model = {
    'name': 'gemnet_t',
    "num_spherical": 7,
    "num_radial": 128,
    "num_blocks": 3,
    "emb_size_atom": 512,
    "emb_size_edge": 512,
    "emb_size_trip": 64,
    "emb_size_rbf": 16,
    "emb_size_cbf": 16,
    "emb_size_bil_trip": 64,
    "num_before_skip": 1,
    "num_after_skip": 2,
    "num_concat": 1,
    "num_atom": 3,
    "cutoff": 6.0,
    "max_neighbors": 50,
    "rbf": {"name": "gaussian"},
    "envelope": {
      "name": "polynomial",
      "exponent": 5,
    },
    "cbf": {"name": "spherical_harmonics"},
    "extensive": True,
    "otf_graph": False,
    "output_init": "HeOrthogonal",
    "activation": "silu",
    "scale_file": "configs/s2ef/all/gemnet/scaling_factors/gemnet-dT.json",
    "regress_forces": True,
    "direct_forces": True,
}
# Optimizer
optimizer = {
    'batch_size': 1,         # originally 32
    'eval_batch_size': 1,    # originally 32
    'num_workers': 2,
    'lr_initial': 5.e-4,
    'optimizer': 'AdamW',
    'optimizer_params': {"amsgrad": True},
    'scheduler': "ReduceLROnPlateau",
    'mode': "min",
    'factor': 0.8,
    'patience': 3,
    'max_epochs': 1,         # used for demonstration purposes
    'force_coefficient': 100,
    'ema_decay': 0.999,
    'clip_grad_norm': 10,
    'loss_energy': 'mae',
    'loss_force': 'l2mae',
}
# Dataset
dataset = [
  {'src': train_src,
   'normalize_labels': True,
   "target_mean": mean,
   "target_std": stdev,
   "grad_target_mean": 0.0,
   "grad_target_std": stdev
   }, # train set 
  {'src': val_src}, # val set (optional)
]

Create the trainer#

trainer = ForcesTrainer(
    task=task,
    model=copy.deepcopy(model), # copied for later use, not necessary in practice.
    dataset=dataset,
    optimizer=optimizer,
    identifier="S2EF-example",
    run_dir="./", # directory to save results if is_debug=False. Prediction files are saved here so be careful not to override!
    is_debug=False, # if True, do not save checkpoint, logs, or results
    print_every=5,
    seed=0, # random seed to use
    logger="tensorboard", # logger of choice (tensorboard and wandb supported)
    local_rank=0,
    amp=True, # use PyTorch Automatic Mixed Precision (faster training and less memory usage),
)
amp: true
cmd:
  checkpoint_dir: ./checkpoints/2022-10-31-18-01-36-S2EF-example
  commit: cba9fb6
  identifier: S2EF-example
  logs_dir: ./logs/tensorboard/2022-10-31-18-01-36-S2EF-example
  print_every: 5
  results_dir: ./results/2022-10-31-18-01-36-S2EF-example
  seed: 0
  timestamp_id: 2022-10-31-18-01-36-S2EF-example
dataset:
  grad_target_mean: 0.0
  grad_target_std: !!python/object/apply:numpy.core.multiarray.scalar
  - &id001 !!python/object/apply:numpy.dtype
    args:
    - f8
    - false
    - true
    state: !!python/tuple
    - 3
    - <
    - null
    - null
    - null
    - -1
    - -1
    - 0
  - !!binary |
    dPVlWhRA+D8=
  normalize_labels: true
  src: data/s2ef/train_100
  target_mean: !!python/object/apply:numpy.core.multiarray.scalar
  - *id001
  - !!binary |
    zSXlDMrm3D8=
  target_std: !!python/object/apply:numpy.core.multiarray.scalar
  - *id001
  - !!binary |
    dPVlWhRA+D8=
gpus: 0
logger: tensorboard
model: gemnet_t
model_attributes:
  activation: silu
  cbf:
    name: spherical_harmonics
  cutoff: 6.0
  direct_forces: true
  emb_size_atom: 512
  emb_size_bil_trip: 64
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_rbf: 16
  emb_size_trip: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  max_neighbors: 50
  num_after_skip: 2
  num_atom: 3
  num_before_skip: 1
  num_blocks: 3
  num_concat: 1
  num_radial: 128
  num_spherical: 7
  otf_graph: false
  output_init: HeOrthogonal
  rbf:
    name: gaussian
  regress_forces: true
  scale_file: configs/s2ef/all/gemnet/scaling_factors/gemnet-dT.json
noddp: false
optim:
  batch_size: 1
  clip_grad_norm: 10
  ema_decay: 0.999
  eval_batch_size: 1
  factor: 0.8
  force_coefficient: 100
  loss_energy: mae
  loss_force: l2mae
  lr_initial: 0.0005
  max_epochs: 1
  mode: min
  num_workers: 2
  optimizer: AdamW
  optimizer_params:
    amsgrad: true
  patience: 3
  scheduler: ReduceLROnPlateau
slurm: {}
task:
  dataset: trajectory_lmdb
  description: Regressing to energies and forces for DFT trajectories from OCP
  eval_on_free_atoms: true
  grad_input: atomic forces
  labels:
  - potential energy
  metric: mae
  train_on_free_atoms: true
  type: regression
trainer: forces
val_dataset:
  src: data/s2ef/val_20

2022-10-31 18:01:30 (INFO): Batch balancing is disabled for single GPU training.
2022-10-31 18:01:30 (INFO): Batch balancing is disabled for single GPU training.
2022-10-31 18:01:30 (INFO): Loading dataset: trajectory_lmdb
2022-10-31 18:01:30 (INFO): Loading model: gemnet_t
/home/runner/micromamba-root/envs/buildenv/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py:115: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn("torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [6], line 1
----> 1 trainer = ForcesTrainer(
      2     task=task,
      3     model=copy.deepcopy(model), # copied for later use, not necessary in practice.
      4     dataset=dataset,
      5     optimizer=optimizer,
      6     identifier="S2EF-example",
      7     run_dir="./", # directory to save results if is_debug=False. Prediction files are saved here so be careful not to override!
      8     is_debug=False, # if True, do not save checkpoint, logs, or results
      9     print_every=5,
     10     seed=0, # random seed to use
     11     logger="tensorboard", # logger of choice (tensorboard and wandb supported)
     12     local_rank=0,
     13     amp=True, # use PyTorch Automatic Mixed Precision (faster training and less memory usage),
     14 )

File ~/work/ml_catalysis_tutorials/ml_catalysis_tutorials/ocp/ocpmodels/trainers/forces_trainer.py:88, in ForcesTrainer.__init__(self, task, model, dataset, optimizer, identifier, normalizer, timestamp_id, run_dir, is_debug, is_hpo, print_every, seed, logger, local_rank, amp, cpu, slurm, noddp)
     67 def __init__(
     68     self,
     69     task,
   (...)
     86     noddp=False,
     87 ):
---> 88     super().__init__(
     89         task=task,
     90         model=model,
     91         dataset=dataset,
     92         optimizer=optimizer,
     93         identifier=identifier,
     94         normalizer=normalizer,
     95         timestamp_id=timestamp_id,
     96         run_dir=run_dir,
     97         is_debug=is_debug,
     98         is_hpo=is_hpo,
     99         print_every=print_every,
    100         seed=seed,
    101         logger=logger,
    102         local_rank=local_rank,
    103         amp=amp,
    104         cpu=cpu,
    105         name="s2ef",
    106         slurm=slurm,
    107         noddp=noddp,
    108     )

File ~/work/ml_catalysis_tutorials/ml_catalysis_tutorials/ocp/ocpmodels/trainers/base_trainer.py:205, in BaseTrainer.__init__(self, task, model, dataset, optimizer, identifier, normalizer, timestamp_id, run_dir, is_debug, is_hpo, print_every, seed, logger, local_rank, amp, cpu, name, slurm, noddp)
    203 if distutils.is_master():
    204     print(yaml.dump(self.config, default_flow_style=False))
--> 205 self.load()
    207 self.evaluator = Evaluator(task=name)

File ~/work/ml_catalysis_tutorials/ml_catalysis_tutorials/ocp/ocpmodels/trainers/base_trainer.py:214, in BaseTrainer.load(self)
    212 self.load_datasets()
    213 self.load_task()
--> 214 self.load_model()
    215 self.load_loss()
    216 self.load_optimizer()

File ~/work/ml_catalysis_tutorials/ml_catalysis_tutorials/ocp/ocpmodels/trainers/base_trainer.py:369, in BaseTrainer.load_model(self)
    364 bond_feat_dim = self.config["model_attributes"].get(
    365     "num_gaussians", 50
    366 )
    368 loader = self.train_loader or self.val_loader or self.test_loader
--> 369 self.model = registry.get_model_class(self.config["model"])(
    370     loader.dataset[0].x.shape[-1]
    371     if loader
    372     and hasattr(loader.dataset[0], "x")
    373     and loader.dataset[0].x is not None
    374     else None,
    375     bond_feat_dim,
    376     self.num_targets,
    377     **self.config["model_attributes"],
    378 ).to(self.device)
    380 if distutils.is_master():
    381     logging.info(
    382         f"Loaded {self.model.__class__.__name__} with "
    383         f"{self.model.num_params} parameters."
    384     )

File ~/work/ml_catalysis_tutorials/ml_catalysis_tutorials/ocp/ocpmodels/models/gemnet/gemnet.py:261, in GemNetT.__init__(self, num_atoms, bond_feat_dim, num_targets, num_spherical, num_radial, num_blocks, emb_size_atom, emb_size_edge, emb_size_trip, emb_size_rbf, emb_size_cbf, emb_size_bil_trip, num_before_skip, num_after_skip, num_concat, num_atom, regress_forces, direct_forces, cutoff, max_neighbors, rbf, envelope, cbf, extensive, otf_graph, use_pbc, output_init, activation, num_elements, scale_file)
    252 self.int_blocks = torch.nn.ModuleList(int_blocks)
    254 self.shared_parameters = [
    255     (self.mlp_rbf3.linear.weight, self.num_blocks),
    256     (self.mlp_cbf3.weight, self.num_blocks),
    257     (self.mlp_rbf_h.linear.weight, self.num_blocks),
    258     (self.mlp_rbf_out.linear.weight, self.num_blocks + 1),
    259 ]
--> 261 load_scales_compat(self, scale_file)

File ~/work/ml_catalysis_tutorials/ml_catalysis_tutorials/ocp/ocpmodels/modules/scaling/compat.py:55, in load_scales_compat(module, scale_file)
     52 def load_scales_compat(
     53     module: nn.Module, scale_file: Optional[Union[str, ScaleDict]]
     54 ):
---> 55     scale_dict = _load_scale_dict(scale_file)
     56     if not scale_dict:
     57         return

File ~/work/ml_catalysis_tutorials/ml_catalysis_tutorials/ocp/ocpmodels/modules/scaling/compat.py:31, in _load_scale_dict(scale_file)
     29 path = Path(scale_file)
     30 if not path.exists():
---> 31     raise ValueError(f"Scale file {path} does not exist.")
     33 scale_dict: Optional[ScaleDict] = None
     34 if path.suffix == ".pt":

ValueError: Scale file configs/s2ef/all/gemnet/scaling_factors/gemnet-dT.json does not exist.
trainer.model

Train the model#

trainer.train()

Validate the model#

Load the best checkpoint#

The checkpoints directory contains two checkpoint files:

  • best_checkpoint.pt - Model parameters corresponding to the best val performance during training. Used for predictions.

  • checkpoint.pt - Model parameters and optimizer settings for the latest checkpoint. Used to continue training.

# The `best_checpoint.pt` file contains the checkpoint with the best val performance
checkpoint_path = os.path.join(trainer.config["cmd"]["checkpoint_dir"], "best_checkpoint.pt")
checkpoint_path
# Append the dataset with the test set. We use the same val set for demonstration.

# Dataset
dataset.append(
  {'src': val_src}, # test set (optional)
)
dataset
pretrained_trainer = ForcesTrainer(
    task=task,
    model=model,
    dataset=dataset,
    optimizer=optimizer,
    identifier="S2EF-val-example",
    run_dir="./", # directory to save results if is_debug=False. Prediction files are saved here so be careful not to override!
    is_debug=False, # if True, do not save checkpoint, logs, or results
    print_every=10,
    seed=0, # random seed to use
    logger="tensorboard", # logger of choice (tensorboard and wandb supported)
    local_rank=0,
    amp=True, # use PyTorch Automatic Mixed Precision (faster training and less memory usage)
)

pretrained_trainer.load_checkpoint(checkpoint_path=checkpoint_path)

Run on the test set#

# make predictions on the existing test_loader
predictions = pretrained_trainer.predict(pretrained_trainer.test_loader, results_file="s2ef_results", disable_tqdm=False)
energies = predictions["energy"]
forces = predictions["forces"]