eazieda package¶
Submodules¶
eazieda.eazieda module¶
-
eazieda.eazieda.corr_plot(data, features, method='pearson', plot_width=600, plot_height=400)[source]¶ Generates a correlation plot for a list of features in a given dataframe
- Parameters
data (pandas.core.frame.DataFrame) – The input dataframe
features (list) – A list of strings that represents numerical feature names len(features) >=2
method (str, default = "pearson") – The correlation method Other correlation methods are “spearman” or “kendall”
plot_width (int, default = 600) – The width of the plot
plot_height (int, default = 400) – The height of the plot
- Returns
An interactive altair correlation plot
- Return type
altair plot
Examples
>>> from eazieda.eazieda import corr_plot >>> from vega_datasets import data >>> df = data.iris() >>> corr_plot(df, ["petal_length", "petal_width", "sepal_length"])
-
eazieda.eazieda.missing_impute(data, impute=False, method_num='mean', method_non_num='most_frequent')[source]¶ Return the number/percentage of missing values for each column in the dataframe as well as giving the choice of imputing the missing values in place
- Parameters
data (pandas.core.frame.DataFrame) – A Pandas Dataframe for which the missing values need to be detected
impute (bool, default = False) – Whether to impute the missing values in place.
method_num (str, default = "mean") – The method used for imputing numerical missing values This is only applicable if impute=True One of ‘drop’, mean’, ‘median’
method_non_num (str, default = "most_frequent") – The method used for imputing non-numerical missing values This is only applicable if impute=True One of ‘drop’, ‘most_frequent’
- Returns
A dataframe containing two columns: the number of missing values and the percentage of missing values for each column
- Return type
pandas.core.frame.DataFrame
Examples
>>> from eazieda.eazieda import missing_impute >>> df = pd.DataFrame([[1, "x"], [np.nan, "y"], [2, np.nan], [3, "y"]], columns = ['a', 'b']) >>> missing_impute(df) n_missing percent a 1 25% b 1 25%