
catlas.prediction_steps.enumerate_adslabs_wrapper(config, surface_bag, adsorbate_bag)#

Generate adslabs from all filtered surfaces and adsorbates.

  • config (dict) – A config file specifying what surfaces to filter.

  • surface_bag (dask.bag.Bag) – surfaces to enumerate

  • adsorbate_bag (dask.bag.Bag) – adsorbates to enumerate


a bag of metadata for the adslabs dask.bag.Bag: a bag of enumerated adslabs

Return type


catlas.prediction_steps.enumerate_surfaces_and_filter(config, filtered_catalyst_bag, bulk_num)#

Enumerate surfaces from an input bulk bag according to the input config.

  • config (dict) – A config file specifying what surfaces to filter.

  • filtered_catalyst_bag (dask.bag.Bag) – Bulk materials to enumerate surfaces for.

  • bulk_num (int) – The number of bulk materials in the input bag.


A dask Bag containing filtered surfaces int: The number of slabs enumerated from filtered bulks before slab filtering.

Return type


catlas.prediction_steps.finish_sankey_diagram(sankey, num_unfiltered_slabs, num_filtered_slabs, num_adslabs, inference_list, run_id) catlas.sankey.sankey_utils.Sankey#

Make sankey diagram for the catlas run.

  • sankey (catlas.sankey.sankey_utils.Sankey) – Sankey object from predictions run

  • num_unfiltered_slabs (int) – the number of slabs before filtering

  • num_filtered_slabs (int) – the number of slabs after filtering

  • num_adslabs (int) – the number of adslabs enumerated

  • num_inferred (int) – the number of adslabs that inference was run on

  • run_id (str) – an arbitrary string identifying the run


finished Sankey diagram

Return type


catlas.prediction_steps.generate_outputs(config, adslab_atoms_bag, results_bag, run_id, inference, most_recent_step)#

Process the remaining outputs selecting in the inputs yaml for catlas.

  • config (dict) – A config file specifying what surfaces to filter.

  • results_bag (dask.bag.Bag) – A dask Bag object of adslabs and their predicted adsorption energies.

  • run_id (str) – A string with a timestamp uniquely identifying the run.

  • inference (bool) – Whether a model was used to predict adsorption energies during the execution of this script.


the number of adslabs immediately after adslab enumeration int: the number of adslabs that inference was run on int: the number of adslabs remaining after all inference

Return type


catlas.prediction_steps.load_adsorbates_and_filter(config, sankey)#

Load adsorbates and filter them according to the input config. Update sankey diagram based on adsorbate filtering.

  • config (dict) – a config file specifying what adsorption calculations to run.

  • sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.


a dictionary containing adsorbates that survived filtering. catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering. int: the number of bulk materials that survived filtering.

Return type


catlas.prediction_steps.load_bulks_and_filter(config, client, sankey)#

Load bulk materials from file and filter them according to the input config file. Update sankey diagram based on bulk filtering.

  • config (dict) – a dictionary specifying what adsorption calculations to run.

  • client (dask.distributed.Client) – a Dask cluster that runs calculations during execution of this program.

  • sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.


a dictionary containing bulk materials that survived filtering catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering.

Return type


catlas.prediction_steps.make_predictions(config, adslab_atoms_bag, results_bag)#

Make predictions on enumerated adslabs. This may include a single inference step or multiple with optional intermediate filtering.

  • config (dict) – A config file specifying what surfaces to filter.

  • results_bag (dask.bag.Bag) – a bag of metadata for the adslabs

  • adslab_atoms_bag (dask.bag.Bag) – a bag of enumerated adslabs


a dask Bag containing adslabs and their predicted adsorption

energies according to models specified in the config file.

dask.bag.Bag: a dask Bag containing adslabs before any predictions were run. bool: True if a model was used to predict adsorption energies of the inputs. str: the name of the column corresponding to the minimum adsorption energy

on each surface according to the model that was run last during predictions.

Return type



Parse and prepare inputs for use by the main script. This function loads and validates the config, pulls the run_id from the config,generates the dask cluster script from the dask cluster script path, makes a folder for the ouputs, generates parity plots if available for the model(s) in use, generates a Sankey dictionary for later use in the script, and writes “CATLAS” in large isometrically displayed ASCII letters.

  • config_path (str) – a path to a config yml file describing what adsorption calculations to run in the main script.

  • dask_cluster_script_path (str) – a path to a script that connects to a dask cluster that executes calculations run during the main script.


a config describing what adsorption predictions to run. str: the text of a python script that connects to a Dask cluster str: a string including a timestamp that uniquely identifies this run.

Return type
