prediction_steps#

catlas.prediction_steps.enumerate_adslabs_wrapper(config, surface_bag, adsorbate_bag)#

Generate adslabs from all filtered surfaces and adsorbates.

Parameters
  • config (dict) – A config file specifying what surfaces to filter.

  • surface_bag (dask.bag.Bag) – surfaces to enumerate

  • adsorbate_bag (dask.bag.Bag) – adsorbates to enumerate

Returns

a bag of metadata for the adslabs dask.bag.Bag: a bag of enumerated adslabs

Return type

dask.bag.Bag

catlas.prediction_steps.enumerate_surfaces_and_filter(config, filtered_catalyst_bag, bulk_num)#

Enumerate surfaces from an input bulk bag according to the input config.

Parameters
  • config (dict) – A config file specifying what surfaces to filter.

  • filtered_catalyst_bag (dask.bag.Bag) – Bulk materials to enumerate surfaces for.

  • bulk_num (int) – The number of bulk materials in the input bag.

Returns

A dask Bag containing filtered surfaces int: The number of slabs enumerated from filtered bulks before slab filtering.

Return type

dask.bag.Bag

catlas.prediction_steps.finish_sankey_diagram(sankey, num_unfiltered_slabs, num_filtered_slabs, num_adslabs, inference_list, run_id) catlas.sankey.sankey_utils.Sankey#

Make sankey diagram for the catlas run.

Parameters
  • sankey (catlas.sankey.sankey_utils.Sankey) – Sankey object from predictions run

  • num_unfiltered_slabs (int) – the number of slabs before filtering

  • num_filtered_slabs (int) – the number of slabs after filtering

  • num_adslabs (int) – the number of adslabs enumerated

  • num_inferred (int) – the number of adslabs that inference was run on

  • run_id (str) – an arbitrary string identifying the run

Returns

finished Sankey diagram

Return type

catlas.sankey.sankey_utils.Sankey

catlas.prediction_steps.generate_outputs(config, adslab_atoms_bag, results_bag, run_id, inference, most_recent_step)#

Process the remaining outputs selecting in the inputs yaml for catlas.

Parameters
  • config (dict) – A config file specifying what surfaces to filter.

  • results_bag (dask.bag.Bag) – A dask Bag object of adslabs and their predicted adsorption energies.

  • run_id (str) – A string with a timestamp uniquely identifying the run.

  • inference (bool) – Whether a model was used to predict adsorption energies during the execution of this script.

Returns

the number of adslabs immediately after adslab enumeration int: the number of adslabs that inference was run on int: the number of adslabs remaining after all inference

Return type

int

catlas.prediction_steps.load_adsorbates_and_filter(config, sankey)#

Load adsorbates and filter them according to the input config. Update sankey diagram based on adsorbate filtering.

Parameters
  • config (dict) – a config file specifying what adsorption calculations to run.

  • sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.

Returns

a dictionary containing adsorbates that survived filtering. catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering. int: the number of bulk materials that survived filtering.

Return type

dict

catlas.prediction_steps.load_bulks_and_filter(config, client, sankey)#

Load bulk materials from file and filter them according to the input config file. Update sankey diagram based on bulk filtering.

Parameters
  • config (dict) – a dictionary specifying what adsorption calculations to run.

  • client (dask.distributed.Client) – a Dask cluster that runs calculations during execution of this program.

  • sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.

Returns

a dictionary containing bulk materials that survived filtering catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering.

Return type

dict

catlas.prediction_steps.make_predictions(config, adslab_atoms_bag, results_bag)#

Make predictions on enumerated adslabs. This may include a single inference step or multiple with optional intermediate filtering.

Parameters
  • config (dict) – A config file specifying what surfaces to filter.

  • results_bag (dask.bag.Bag) – a bag of metadata for the adslabs

  • adslab_atoms_bag (dask.bag.Bag) – a bag of enumerated adslabs

Returns

a dask Bag containing adslabs and their predicted adsorption

energies according to models specified in the config file.

dask.bag.Bag: a dask Bag containing adslabs before any predictions were run. bool: True if a model was used to predict adsorption energies of the inputs. str: the name of the column corresponding to the minimum adsorption energy

on each surface according to the model that was run last during predictions.

Return type

dask.bag.Bag

catlas.prediction_steps.parse_inputs()#

Parse and prepare inputs for use by the main script. This function loads and validates the config, pulls the run_id from the config,generates the dask cluster script from the dask cluster script path, makes a folder for the ouputs, generates parity plots if available for the model(s) in use, generates a Sankey dictionary for later use in the script, and writes “CATLAS” in large isometrically displayed ASCII letters.

Parameters
  • config_path (str) – a path to a config yml file describing what adsorption calculations to run in the main script.

  • dask_cluster_script_path (str) – a path to a script that connects to a dask cluster that executes calculations run during the main script.

Returns

a config describing what adsorption predictions to run. str: the text of a python script that connects to a Dask cluster str: a string including a timestamp that uniquely identifies this run.

Return type

dict