prediction_steps
prediction_steps#
- catlas.prediction_steps.enumerate_adslabs_wrapper(config, surface_bag, adsorbate_bag)#
Generate adslabs from all filtered surfaces and adsorbates.
- Parameters
config (dict) – A config file specifying what surfaces to filter.
surface_bag (dask.bag.Bag) – surfaces to enumerate
adsorbate_bag (dask.bag.Bag) – adsorbates to enumerate
- Returns
a bag of metadata for the adslabs dask.bag.Bag: a bag of enumerated adslabs
- Return type
dask.bag.Bag
- catlas.prediction_steps.enumerate_surfaces_and_filter(config, filtered_catalyst_bag, bulk_num)#
Enumerate surfaces from an input bulk bag according to the input config.
- Parameters
config (dict) – A config file specifying what surfaces to filter.
filtered_catalyst_bag (dask.bag.Bag) – Bulk materials to enumerate surfaces for.
bulk_num (int) – The number of bulk materials in the input bag.
- Returns
A dask Bag containing filtered surfaces int: The number of slabs enumerated from filtered bulks before slab filtering.
- Return type
dask.bag.Bag
- catlas.prediction_steps.finish_sankey_diagram(sankey, num_unfiltered_slabs, num_filtered_slabs, num_adslabs, inference_list, run_id) catlas.sankey.sankey_utils.Sankey #
Make sankey diagram for the catlas run.
- Parameters
sankey (catlas.sankey.sankey_utils.Sankey) – Sankey object from predictions run
num_unfiltered_slabs (int) – the number of slabs before filtering
num_filtered_slabs (int) – the number of slabs after filtering
num_adslabs (int) – the number of adslabs enumerated
num_inferred (int) – the number of adslabs that inference was run on
run_id (str) – an arbitrary string identifying the run
- Returns
finished Sankey diagram
- Return type
- catlas.prediction_steps.generate_outputs(config, adslab_atoms_bag, results_bag, run_id, inference, most_recent_step)#
Process the remaining outputs selecting in the inputs yaml for catlas.
- Parameters
config (dict) – A config file specifying what surfaces to filter.
results_bag (dask.bag.Bag) – A dask Bag object of adslabs and their predicted adsorption energies.
run_id (str) – A string with a timestamp uniquely identifying the run.
inference (bool) – Whether a model was used to predict adsorption energies during the execution of this script.
- Returns
the number of adslabs immediately after adslab enumeration int: the number of adslabs that inference was run on int: the number of adslabs remaining after all inference
- Return type
int
- catlas.prediction_steps.load_adsorbates_and_filter(config, sankey)#
Load adsorbates and filter them according to the input config. Update sankey diagram based on adsorbate filtering.
- Parameters
config (dict) – a config file specifying what adsorption calculations to run.
sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.
- Returns
a dictionary containing adsorbates that survived filtering. catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering. int: the number of bulk materials that survived filtering.
- Return type
dict
- catlas.prediction_steps.load_bulks_and_filter(config, client, sankey)#
Load bulk materials from file and filter them according to the input config file. Update sankey diagram based on bulk filtering.
- Parameters
config (dict) – a dictionary specifying what adsorption calculations to run.
client (dask.distributed.Client) – a Dask cluster that runs calculations during execution of this program.
sankey (catlas.sankey.sankey_utils.Sankey) – a Sankey object describing how objects have been filtered so far.
- Returns
a dictionary containing bulk materials that survived filtering catlas.sankey.sankey_utils.Sankey: a Sankey object updated with bulk filtering.
- Return type
dict
- catlas.prediction_steps.make_predictions(config, adslab_atoms_bag, results_bag)#
Make predictions on enumerated adslabs. This may include a single inference step or multiple with optional intermediate filtering.
- Parameters
config (dict) – A config file specifying what surfaces to filter.
results_bag (dask.bag.Bag) – a bag of metadata for the adslabs
adslab_atoms_bag (dask.bag.Bag) – a bag of enumerated adslabs
- Returns
- a dask Bag containing adslabs and their predicted adsorption
energies according to models specified in the config file.
dask.bag.Bag: a dask Bag containing adslabs before any predictions were run. bool: True if a model was used to predict adsorption energies of the inputs. str: the name of the column corresponding to the minimum adsorption energy
on each surface according to the model that was run last during predictions.
- Return type
dask.bag.Bag
- catlas.prediction_steps.parse_inputs()#
Parse and prepare inputs for use by the main script. This function loads and validates the config, pulls the run_id from the config,generates the dask cluster script from the dask cluster script path, makes a folder for the ouputs, generates parity plots if available for the model(s) in use, generates a Sankey dictionary for later use in the script, and writes “CATLAS” in large isometrically displayed ASCII letters.
- Parameters
config_path (str) – a path to a config yml file describing what adsorption calculations to run in the main script.
dask_cluster_script_path (str) – a path to a script that connects to a dask cluster that executes calculations run during the main script.
- Returns
a config describing what adsorption predictions to run. str: the text of a python script that connects to a Dask cluster str: a string including a timestamp that uniquely identifies this run.
- Return type
dict