dask_utils#

catlas.dask_utils.bag_split_individual_partitions(bag)#

Split a bag into as many partitions as possible.

Parameters

bag (dask.bag.Bag) – a dask Bag

Returns

the input bag, split into more partitions.

Return type

dask.bag.Bag

catlas.dask_utils.split_balance_df_partitions(df, npartitions)#

Repartition a dask dataframe.

Parameters
  • df (dask.core.frame.DataFrame) – a dask DataFrame to repartition.

  • npartitions (int) – a number of partitions to use. Set to -1 to infer.

Returns

rebalanced dataframe

Return type

dask.core.frame.DataFrame

catlas.dask_utils.to_pickles(b, path, name_function=None, compute=True, **kwargs)#

Send a dask dataframe to a series of pickle files.

Parameters
  • b (dask.dataframe.core.DataFrame) – Dataframe to pickle.

  • path (str) – Folder location where pickles are written to.

  • name_function (function, optional) – Function defining how pickle files are named. Defaults to None.

  • compute (bool, optional) – Whether to compute the dataframe before pickling. Defaults to True.

Returns

_description_

Return type

_type_