We are working with collaborators at Facebook AI Research on the Open Catalyst Project. We released the first step in this project - an open dataset of more than 1.2M adsorbate relaxations and 250M DFT single-point calculations across catalyst materials and chemistry space.
In addition to the dataset, we have also released a set of baseline models on github using state-of-the-art graph convolution models. These demonstrate what is feasible with current models and how to work and interact with the dataset. Get in touch if you have questions!
We develop methods for machine learning in materials, uncertainty quantification, and automation. You can find a complete list of our software efforts on github
Automating surface chemistry or catalysis calculations can be complicated. Our workflows are built on top of fireworks, pymatgen, luigi, ase, and other helpful toolkits. With this system we perform ~100-200 DFT calculations per day across various chemistries and materials. A lot of work is put into the active learning system to find and schedule interesting calculations. You can read more about this system
We are part of a DOE-funded group led by Prof Andy Peterson at Brown to develop machine learning potentials and the AMP code for exascale machines. We have been exploring pytorch as a system for faster and more complicated training systems. We are also working on various active learning strategies to accelerate the train/predict/calculate pipeline.
We showed that crystal graph convolution networks can be applied to surface chemistry and catalysis with some modifications. Our fork of the repo has the changes made in our papers. We are in the process of making these models easier to use and develop with collaborators.
Uncertainty estimation is important in engineering and active learning, and UQ applied to graph convolution networks is still an emerging area. We explored a number of different strategies and collected our results as a series of jupyter notebooks.
We use Google colab and other hosted jupyter instances for a lot of our day to day work. We’ve been exploring how to demonstrate active learning strategies with DFT calculations on these systems, and showed that Google Colab GPU instances can actually run QE very quickly! See John Kitchin’s updated DFT book for examples.
Large consistent datasets can be difficult to find and use in catalysis, and this has limited the development of deep learning models and representations. We’re working with collaborators to improve this further, but for now you can find the datasets we have published or use internally for developing new methods.
We have performed about 100,000 adsorption energy calculations over the past couple years across a range of intermetallics. About 50,000 are considered high quality, with relatively little surface/adsorbate movement. This subset is what we use for training.
pickle
instead of json
.If the function is not working, you may need to downgrade your ASE to version 3.17. If you have any questions or comments about or data, do not hesitate to contact us.
Surface energies for asymmetric inorganic surfaces are not well defined, but are important for bimetallic nanoparticle shape and faceting. We calculated several thousand cleavage energies across a range of stable inorganic crystals from the Materials Project, which we used for training graph convolution networks. We are working with collaborators to improve on asymmetric limitations.