prediction_steps#

catlas.prediction_steps.enumerate_adslabs_wrapper(config, surface_bag, adsorbate_bag)#

Generate adslabs from all filtered surfaces and adsorbates.

Parameters

config (dict) – A config file specifying what surfaces to filter.
surface_bag (dask.bag.Bag) – surfaces to enumerate
adsorbate_bag (dask.bag.Bag) – adsorbates to enumerate

Returns

a bag of metadata for the adslabs dask.bag.Bag: a bag of enumerated adslabs

Return type

dask.bag.Bag

catlas.prediction_steps.enumerate_surfaces_and_filter(config, filtered_catalyst_bag, bulk_num)#

Enumerate surfaces from an input bulk bag according to the input config.

Parameters

config (dict) – A config file specifying what surfaces to filter.
filtered_catalyst_bag (dask.bag.Bag) – Bulk materials to enumerate surfaces for.
bulk_num (int) – The number of bulk materials in the input bag.

Returns

A dask Bag containing filtered surfaces int: The number of slabs enumerated from filtered bulks before slab filtering.

Return type

dask.bag.Bag

catlas.prediction_steps.finish_sankey_diagram(sankey, num_unfiltered_slabs, num_filtered_slabs, num_adslabs, inference_list, run_id) → catlas.sankey.sankey_utils.Sankey#

Make sankey diagram for the catlas run.

Parameters

sankey (catlas.sankey.sankey_utils.Sankey) – Sankey object from predictions run
num_unfiltered_slabs (int) – the number of slabs before filtering
num_filtered_slabs (int) – the number of slabs after filtering
num_adslabs (int) – the number of adslabs enumerated
num_inferred (int) – the number of adslabs that inference was run on
run_id (str) – an arbitrary string identifying the run

Returns

finished Sankey diagram

Return type

catlas.sankey.sankey_utils.Sankey

catlas.prediction_steps.generate_outputs(config, adslab_atoms_bag, results_bag, run_id, inference, most_recent_step)#

Process the remaining outputs selecting in the inputs yaml for catlas.

Parameters

config (dict) – A config file specifying what surfaces to filter.
results_bag (dask.bag.Bag) – A dask Bag object of adslabs and their predicted adsorption energies.
run_id (str) – A string with a timestamp uniquely identifying the run.
inference (bool) – Whether a model was used to predict adsorption energies during the execution of this script.

Returns

the number of adslabs immediately after adslab enumeration int: the number of adslabs that inference was run on int: the number of adslabs remaining after all inference

Return type

int

catlas.prediction_steps.load_adsorbates_and_filter(config, sankey)#

Load adsorbates and filter them according to the input config. Update sankey diagram based on adsorbate filtering.

Parameters

config (dict) – a config file specifying what adsorption calculations to run.
sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.

Returns

a dictionary containing adsorbates that survived filtering. catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering. int: the number of bulk materials that survived filtering.

Return type

dict

catlas.prediction_steps.load_bulks_and_filter(config, client, sankey)#

Load bulk materials from file and filter them according to the input config file. Update sankey diagram based on bulk filtering.

Parameters

config (dict) – a dictionary specifying what adsorption calculations to run.
client (dask.distributed.Client) – a Dask cluster that runs calculations during execution of this program.
sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.

Returns

a dictionary containing bulk materials that survived filtering catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering.

Return type

dict

catlas.prediction_steps.make_predictions(config, adslab_atoms_bag, results_bag)#

Make predictions on enumerated adslabs. This may include a single inference step or multiple with optional intermediate filtering.

Parameters

config (dict) – A config file specifying what surfaces to filter.
results_bag (dask.bag.Bag) – a bag of metadata for the adslabs
adslab_atoms_bag (dask.bag.Bag) – a bag of enumerated adslabs

Returns

a dask Bag containing adslabs and their predicted adsorption: energies according to models specified in the config file.

dask.bag.Bag: a dask Bag containing adslabs before any predictions were run. bool: True if a model was used to predict adsorption energies of the inputs. str: the name of the column corresponding to the minimum adsorption energy

on each surface according to the model that was run last during predictions.

Return type

dask.bag.Bag

catlas.prediction_steps.parse_inputs()#

Parse and prepare inputs for use by the main script. This function loads and validates the config, pulls the run_id from the config,generates the dask cluster script from the dask cluster script path, makes a folder for the ouputs, generates parity plots if available for the model(s) in use, generates a Sankey dictionary for later use in the script, and writes “CATLAS” in large isometrically displayed ASCII letters.

Parameters

config_path (str) – a path to a config yml file describing what adsorption calculations to run in the main script.
dask_cluster_script_path (str) – a path to a script that connects to a dask cluster that executes calculations run during the main script.

Returns

a config describing what adsorption predictions to run. str: the text of a python script that connects to a Dask cluster str: a string including a timestamp that uniquely identifies this run.

Return type

dict

Catlas

prediction_steps

prediction_steps#