===== Usage ===== All examples we'll walk through need ``pandas``: import pandas as pd To generate correlation plots for your data frame, import the ``corr_plot`` function from the ``eazieda.corr_plot`` module:: from eazieda.corr_plot import corr_plot Then pass your data frame (``df``) and the columns you want to generate the plots for:: corr_plot(df, ['column1','column2','column3']) This will give you an interactive correlation plot implemented in altair *Make sure to pass at least two columns to get the correlation!* To generate histograms and bar plots from your data, import the ``histograms`` function from the ``eazieda.histograms`` module:: from eazieda.histograms import histograms Then pass your data frame (``df``) and the columns you want to generate the plots for:: histograms(df, ['numeric_column', 'categorical_column'],num_cols=2) This will give you a combined histogram (for numeric columns) and bar plot (for categorical columns) implemented in altair. You can control the number of columns using ``num_cols`` and the plot dimensions with ``plot_width`` and ``plot_height`` To detect missing data in your dataframe, import the ``missing_detect`` function from the ``eazieda.missing_detect`` module:: from eazieda.missing_detect import missing_detect To detect the missing data, do this:: df = pd.DataFrame([[1, "x"], [np.nan, "y"], [2, np.nan], [3, "y"]], columns = ['a', 'b']) missing_detect(df) You will get a data frame back which has the number of missing values and their corresponding percentage To deal with missing data in your dataframe, import the ``missing_impute`` function from the ``eazieda.missing_impute`` module:: from eazieda.missing_impute import missing_impute To impute the missing data, do this:: df = pd.DataFrame([[1, "x"], [np.nan, "y"], [2, np.nan], [3, "y"]], columns = ['a', 'b']) missing_impute(df) You will get an imputed data frame back. You can control the type of imputation with ``method_num`` and ``method_non_num`` To detect outliers in your data, import the ``outliers_detect`` function from the ``eazieda.outliers_detect`` module:: from eazieda.outliers_detect import outliers_detect To detect the outliers do this:: s = pd.Series([1,1,1,1,1,1,1,1,1,1,1e14]) outliers_detect(s) You can control the method used to detect the outliers with ``method``. You can choose one of ``iforest``, ``iqr`` or ``zscore`` To remove outliers in your data, import the ``remove_outliers`` function from the ``eazieda.outliers_detect`` module:: from eazieda.outliers_detect import remove_outliers To remove the outliers, do this:: s = pd.Series([1,1e14]) outliers = np.array([False,,True]) s_without_outliers = remove_outliers(s, outliers) You can choose to do the removal in place with ``inplace=True``