dask_utils
dask_utils#
- catlas.dask_utils.bag_split_individual_partitions(bag)#
Split a bag into as many partitions as possible.
- Parameters
bag (dask.bag.Bag) – a dask Bag
- Returns
the input bag, split into more partitions.
- Return type
dask.bag.Bag
- catlas.dask_utils.split_balance_df_partitions(df, npartitions)#
Repartition a dask dataframe.
- Parameters
df (dask.core.frame.DataFrame) – a dask DataFrame to repartition.
npartitions (int) – a number of partitions to use. Set to -1 to infer.
- Returns
rebalanced dataframe
- Return type
dask.core.frame.DataFrame
- catlas.dask_utils.to_pickles(b, path, name_function=None, compute=True, **kwargs)#
Send a dask dataframe to a series of pickle files.
- Parameters
b (dask.dataframe.core.DataFrame) – Dataframe to pickle.
path (str) – Folder location where pickles are written to.
name_function (function, optional) – Function defining how pickle files are named. Defaults to None.
compute (bool, optional) – Whether to compute the dataframe before pickling. Defaults to True.
- Returns
_description_
- Return type
_type_