filter_utils#

catlas.filter_utils.filter_best_facet_by_surface_property(bag_partition, name: str, val: dict)#

Parse each facet and pick the one that should be the lowest energy by the broken bond model or the surface density model

Parameters
  • bag_partition (Iterable[dict]) – a partition of a dask bag containing enumerated surfaces

  • name (str) – filter name to be applied (comes from the main catlas config)

  • val (dict) – values associated with name from the config yaml file, which futher specify how the filter should be applied

Returns

the bag partition with undesired slabs filtered out

Return type

Iterable[dict]

catlas.filter_utils.filter_by_surface_property(bag_partition, name: str, val: dict)#

Parse all miller facets per material and pick those that should be lower energy by the broken bond model or the surface density model

Parameters
  • bag_partition (Iterable[dict]) – a partition of a dask bag containing enumerated surfaces

  • name (str) – filter name to be applied (comes from the main catlas config)

  • val (dict) – values associated with name from the config yaml file, which futher specify how the filter should be applied

Returns

the bag partition with undesired slabs filtered out

Return type

Iterable[dict]

catlas.filter_utils.filter_columns_by_type(df, type_kws)#
Filter columns of a dataframe based on what datatype they contain.

If any element of the provided list is present in the string representation of the type of the first non-None element of the column, that column name will be included in the list of returned columns.

Example: if type_kws=[‘ocp’, ‘ocdata’], you will filter out any column whose first valid element is an ocp.ocpmodels.preprocessing.atoms_to_graphs.AtomsToGraphs object, or an ocdata.surfaces.Surface object, or a pydocparser.Parser object.

Parameters
  • df (pd.core.frame.DataFrame) – a pandas DataFrame

  • column_kws (list[str]) – If any element of this list is present in the type of

Returns

the index corresponds to the column name, the value is True if the column contains a type corresponding to the filtering criteria.

Return type

pd.core.series.Series[bool]

catlas.filter_utils.get_bond_length(ucell, neighbor_factor)#

Gets all bond lengths of all symmetrically distinct sites and organizes it as a dictionary with the unique Wyckoff symbol as key and the bondlegnth as float value. :param ucell: PMG Structure representation of a bulk unit cell. :type ucell: pymatgen.structure.Structure :param factor: buffer for the radius to look

for neighbors in order to calculate bond length

Returns

{wyckoff symbol (str): bondlength (float)}

Return type

dict

catlas.filter_utils.get_broken_bonds(row: dict, neighbor_factor: float) float#

Estimates surface energy using a broken bond model.

Parameters
  • ucell (pymatgen.structure.Structure) – PMG Structure representation of a bulk unit cell.

  • slab (pymatgen.structure.Structure) – PMG Structure representation of a slab cell.

  • ecoh (float) – Cohesive energy which correlates to the surface energy

  • factor (float) – buffer for the radius to look for neighbors in order to calculate bond length

Returns

Rough estimate of surface energy

Return type

(float)

catlas.filter_utils.get_bulk_cn(ucell, neighbor_factor)#

Gets coordination number of each symmetrically distinct site in the unit cell and organizes it as a dictionary with the unique Wyckoff symbol as key and the coordination number as an int value.

Parameters
  • ucell (pymatgen.structure.Structure) – PMG Structure representation of a bulk unit cell.

  • factor (float) – buffer for the radius to look for neighbors in order to calculate bond length

Returns

{wyckoff symbol (str): coordination number (int)}

Return type

(dict)

catlas.filter_utils.get_center_of_mass(pmg_struct)#

Calculates the center of mass of a pmg structure.

Parameters

pmg_struct (pymatgen.core.structure.Structure) – pymatgen structure to be considered.

Returns

the center of mass

Return type

numpy.ndarray

catlas.filter_utils.get_decomposition_bools_from_list(pbx, pbx_entry, conditions)#

Evaluates decomposition energies at regular pH and voltage windows at specified pH and voltage points.

Parameters
  • pbx (pymatgen.analysis.pourbaix_diagram.PourbaixDiagram) – an electrochemical stability diagram for the reference system.

  • pbx_entry (pymatgen.analysis.pourbaix_diagram.PourbaixEntry) – a pourbaix entry specific to the material we want to calculate the decomposition energy of.

  • conditions (list[dict]) – Conditions to evaluate the decomposition energy at. Each dictionary contains a pH and a voltage, both expressed as floats.

Returns

Whether the input entry is stable under each set of conditions.

Return type

Iterable[bool]

catlas.filter_utils.get_decomposition_bools_from_range(pbx, pbx_entry, conditions)#

Evaluates decomposition energies at regular pH and voltage windows within specified intervals.

Parameters
  • pbx (pymatgen.analysis.pourbaix_diagram.PourbaixDiagram) – a pourbaix diagram object containing information about the reference chemical system.

  • pbx_entry (pymatgen.analysis.pourbaix_diagram.PourbaixEntry) – a pourbaix entry specific to the material we want to calculate the decomposition energy of.

  • conditions (dict) – A dictionary specifying what condition or sets of conditions to evaluate the decomposition energy at.

Returns

A list corresponding to whether the input entry is stable under

each set of conditions.

Return type

Iterable[bool]

catlas.filter_utils.get_elements_in_groups(groups: list) list#

Grabs the element symbols of all elements in the specified groups.

Parameters

groups (list[str]) – Names of groups to include in the output. Any element in any group will be included. Valid groups are listed in the implemented_groups variable.

Returns

Elements included in the input groups.

Return type

list[str]

catlas.filter_utils.get_first_type(x)#

Get the type of the input, unpacking lists first if necessary. This is used to discard large objects from the output df of catlas if they are specified as unnecessary in the config yaml by examining the type of objects in a list where applicable.

Parameters

x (Any) – An object to get the type of

Returns

The type of the input

Return type

type

catlas.filter_utils.get_pourbaix_info(entry: dict) dict#

Construct a Pourbaix diagram for a material. This currently only supports MP inputs.

Parameters

entry – bulk structure entry as constructed by catlas.load_bulk_structures.load_bulks_from_db

Raises

ValueError – The bulk id provided in the entry is not a materials project id

catlas.filter_utils.get_pourbaix_stability(entry: dict, conditions: dict) list#

Evaluate whether a material will be stable under various electrochemical conditions.

Parameters
  • entry – A dictionary containing the bulk entry which will be assessed

  • conditions – The dictionary of Pourbaix settings set in the config yaml

Returns

True if the material is stable for each input condition

Return type

list[bool]

catlas.filter_utils.get_surface_density(row: dict, neighbor_factor: float) float#

Estimates surface density multiplied by cohesive energy.

Parameters
  • ucell (pymatgen.structure.Structure) – PMG Structure representation of a bulk unit cell.

  • slab (pymatgen.structure.Structure) – PMG Structure representation of a slab cell.

  • ecoh (float) – Cohesive energy which correlates to the surface energy

  • factor (float) – buffer for the radius to look for neighbors in order to calculate bond length

Returns

Rough estimate of cohesive energy x surface density

Return type

(float)

catlas.filter_utils.get_total_bb(ucell, slab, neighbor_factor: float) float#

Calculates the total ratio of broken bonds to bulk coordination number. Often used as a factor in surface energy.

Parameters
  • ucell (pymatgen.structure.Structure) – PMG Structure representation of a bulk unit cell.

  • slab (pymatgen.structure.Structure) – PMG Structure representation of a slab cell.

  • factor (float) – buffer for the radius to look for neighbors in order to calculate bond length

Returns

Sum of undercoordination/full bulk coordination for each surface site

Return type

(float)

catlas.filter_utils.get_total_nn(ucell, slab, neighbor_factor: float) int#

Calculates the sum of nearest neighbors for each surface site.

Parameters
  • ucell (pymatgen.structure.Structure) – PMG Structure representation of a bulk unit cell.

  • slab (pymatgen.structure.Structure) – PMG Structure representation of a slab cell.

  • factor (float) – buffer for the radius to look for neighbors in order to calculate bond length

Returns

Sum of surface coordination number

Return type

(int)

catlas.filter_utils.pb_query_and_write(entry: dict, lmdb_path: str)#

Pull pourbaix info from MP and write it to the lmdb.

Parameters
  • entry (dict) – entry to query

  • lmdb_path (str) – path of lmdb to write to

catlas.filter_utils.surface_area(slab)#

Gets cross section surface area of the slab. :param slab: PMG Structure representation of a slab. :type slab: pymatgen.structure.Structure

Returns

surface area

Return type

(float)

catlas.filter_utils.write_pourbaix_info(pbx_entry: dict, lmdb_path)#

Write the pourbaix query info to lmdb for future use.

Parameters
  • pbx_entry (dict) – Relevant pourbaix query info for a single mpid.

  • lmdb_path (str) – Location where the lmdb will be written, including file name.