frflib.utils.ml_analysis.processing

module for genereating prepocessing pipeline in sklearnDF

Functions

`make_static_list`(→ list)	Create a list to store the preprocessing transformations we want to apply to each column.
`make_dict_preprocessing`(→ dict)	Create a dict with the name of transformation to apply to categorical and numerical columns.
`make_pipeline`(→ sklearn.compose.ColumnTransformer)	Concatenate the numerical pipeline and categorical pipeline to make the final pipeline to handle the data preprocessing

Module Contents

frflib.utils.ml_analysis.processing.make_static_list(df: pandas.DataFrame) → list

Create a list to store the preprocessing transformations we want to apply to each column.

Parameters:: df (pd.DataFrame) – Dataframe
Returns:: list for all columns with the transformation to apply to each columns
Return type:: list

frflib.utils.ml_analysis.processing.make_dict_preprocessing(data_static: list) → dict

Create a dict with the name of transformation to apply to categorical and numerical columns.

Parameters:: data_static (list for all columns with the transformation to apply to each columns) – []
Returns:: In each list of transformation available we add the names of the columns concerned
Return type:: dict

frflib.utils.ml_analysis.processing.make_pipeline(dict_preprocessing: dict, fill_num=DEFAULT_NUM_FILLNA, fill_cat=DEFAULT_CAT_FILLNA) → sklearn.compose.ColumnTransformer

Concatenate the numerical pipeline and categorical pipeline to make the final pipeline to handle the data preprocessing

Parameters:

dict_preprocessing (dict) – dict with the name of transformation to apply to categorical and numerical columns
fill_num (str, optional) – replace the numerical missing values, defaults to “mean”
fill_cat (str, optional) – replace the categorical missing values, defaults to “constant”

Returns:

Final pipeline with all preprocessing steps

Return type:

sklearndf.transformation.ColumnTransformerDF