Smoothing and Filtering of Data
- class smoothing_and_filtering_functions.Remove_Outliers
This class is used to provide methods for removing outliers from a given dataset.
- removeOutlier(columnName, n)
RemoveOutlier - method based on mean and standard deviation. The function determines the mean and standard deviation of the value in a given column. Further, following the formula, it determines the boundary values and removes outliers. Remove outliers using a coefficient of standard deviation (n) in range from 1.0 to 3.0.
- Parameters
(pandas.core.frame.DataFrame) (df) – input dataframe
(str) (columnName) – a column of the input dataframe
(float) (n) – standard deviation coefficient
- Return pandas.core.frame.DataFrame
filtered dataframe
- removeOutlier_q(columnName, n1, n2)
removeOutlier_q - method based on the removal of data that lie below the lower quantile and above the upper quantile. Remove outliers using a quantile range from 0.0 to 1.0
- Parameters
(pandas.core.frame.DataFrame) (df) – input dataframe
(str) (columnName) – a column of the input dataframe
(float) (n2) – lower quantile
(float) – upper quantile
- Return pandas.core.frame.DataFrame
filtered dataframe
- removeOutlier_z(columnName, n)
removeOutlier_z - method based on Z-score. The z-score define the number of standard deviations above and below the mean.
- Parameters
(pandas.core.frame.DataFrame) (df) – input dataframe
(str) (columnName) – a column of the input dataframe
(float) (n) – standard deviation coefficient
- Return pandas.core.frame.DataFrame
filtered dataframe
- class smoothing_and_filtering_functions.Converter
Class checks columns of dataset for datetime data and then converts it to the datetime datatype
- dateTime_converter()
Timeseries converter
- Parameters
df (pandas.core.frame.DataFrame) – input dataframe
- Returns
converted dataframe
- Return type
pandas.core.frame.DataFrame
- class smoothing_and_filtering_functions.Smoothing
This class is used to provide methods for smoothing a given dataset.
- median_filter(columnName, filter_length)
Data smoothing using median filter function
- Parameters
df (pandas.core.frame.DataFrame) – input dataframe
columnName (str) – a column of the input dataframe
filter_length (int) – filter length parameter
- Returns
smoothed dataframe (median filter)
- Return type
pandas.core.frame.DataFrame
- moving_average(columnName, filter_length)
Data smoothing using moving average function
- Parameters
df (pandas.core.frame.DataFrame) – input dataframe
columnName (str) – a column of the input dataframe
filter_length (int) – filter length parameter
- Returns
smoothed dataframe (moving average)
- Return type
pandas.core.frame.DataFrame
- savitzky_golay(columnName, filter_length, order)
Data smoothing using savitzky-golay function
- Parameters
df (pandas.core.frame.DataFrame) – input dataframe
columnName (str) – a column of the input dataframe
filter_length (int) – filter length parameter
order (int) – order of the polynomial to fit the function
- Returns
smoothed dataframe (savitzky-golay)
- Return type
pandas.core.frame.DataFrame
- class smoothing_and_filtering_functions.TimeSeriesOOP(current_df, column_of_interest, time_column, col_group)
This class is used to provide methods for interpolation of a given dataset with time-series. Interpolation is a method of finding new data points based on the range of a discrete set of known data points.
- int_df_bfill(column_of_interest, col_group)
Backward fill - Backward filling means fill missing values with next data point. backward-fill propagates the first observed non-null value backward until another non-null value is met.
- Parameters
column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping
- Returns
processed dataframe
- Return type
pandas.core.frame.DataFrame
- int_df_ffill(column_of_interest, col_group)
Forward fill - It fills missing values with previous data. Forward-fill propagates the last observed non-null value forward until another non-null value is encountered.
- Parameters
column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping
- Returns
processed dataframe
- Return type
pandas.core.frame.DataFrame
- make_interpolation_cubic(column_of_interest, col_group)
Cubic interpolation offers true continuity between the segments. As such it requires more than just the two endpoints of the segment but also the two points on either side of them.
- Parameters
column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping
- Returns
processed dataframe
- Return type
pandas.core.frame.DataFrame
- make_interpolation_liner(column_of_interest, col_group)
Linear Interpolation - In this, the points are simply joined by straight line segments. Each segment (bounded by two data points) can be interpolated independently.
- Parameters
column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping
- Returns
processed dataframe
- Return type
pandas.core.frame.DataFrame
- process_dataframe(col_group, column_of_interest)
Resampling the dataframe
- Parameters
col_group (list) – list of columns for groupping
column_of_interest (str) – column selected for processing