Smoothing and Filtering of Data

class smoothing_and_filtering_functions.Remove_Outliers

This class is used to provide methods for removing outliers from a given dataset.

removeOutlier(columnName, n)

RemoveOutlier - method based on mean and standard deviation. The function determines the mean and standard deviation of the value in a given column. Further, following the formula, it determines the boundary values and removes outliers. Remove outliers using a coefficient of standard deviation (n) in range from 1.0 to 3.0.

Parameters

(pandas.core.frame.DataFrame) (df) – input dataframe
(str) (columnName) – a column of the input dataframe
(float) (n) – standard deviation coefficient

Return pandas.core.frame.DataFrame

filtered dataframe

removeOutlier_q(columnName, n1, n2)

removeOutlier_q - method based on the removal of data that lie below the lower quantile and above the upper quantile. Remove outliers using a quantile range from 0.0 to 1.0

Parameters

(pandas.core.frame.DataFrame) (df) – input dataframe
(str) (columnName) – a column of the input dataframe
(float) (n2) – lower quantile
(float) – upper quantile

Return pandas.core.frame.DataFrame

filtered dataframe

removeOutlier_z(columnName, n)

removeOutlier_z - method based on Z-score. The z-score define the number of standard deviations above and below the mean.

Parameters

(pandas.core.frame.DataFrame) (df) – input dataframe
(str) (columnName) – a column of the input dataframe
(float) (n) – standard deviation coefficient

Return pandas.core.frame.DataFrame

filtered dataframe

class smoothing_and_filtering_functions.Converter

Class checks columns of dataset for datetime data and then converts it to the datetime datatype

dateTime_converter()

Timeseries converter

Parameters: df (pandas.core.frame.DataFrame) – input dataframe
Returns: converted dataframe
Return type: pandas.core.frame.DataFrame

class smoothing_and_filtering_functions.Smoothing

This class is used to provide methods for smoothing a given dataset.

median_filter(columnName, filter_length)

Data smoothing using median filter function

Parameters

df (pandas.core.frame.DataFrame) – input dataframe
columnName (str) – a column of the input dataframe
filter_length (int) – filter length parameter

Returns

smoothed dataframe (median filter)

Return type

pandas.core.frame.DataFrame

moving_average(columnName, filter_length)

Data smoothing using moving average function

Parameters

df (pandas.core.frame.DataFrame) – input dataframe
columnName (str) – a column of the input dataframe
filter_length (int) – filter length parameter

Returns

smoothed dataframe (moving average)

Return type

pandas.core.frame.DataFrame

savitzky_golay(columnName, filter_length, order)

Data smoothing using savitzky-golay function

Parameters

df (pandas.core.frame.DataFrame) – input dataframe
columnName (str) – a column of the input dataframe
filter_length (int) – filter length parameter
order (int) – order of the polynomial to fit the function

Returns

smoothed dataframe (savitzky-golay)

Return type

pandas.core.frame.DataFrame

class smoothing_and_filtering_functions.TimeSeriesOOP(current_df, column_of_interest, time_column, col_group)

This class is used to provide methods for interpolation of a given dataset with time-series. Interpolation is a method of finding new data points based on the range of a discrete set of known data points.

int_df_bfill(column_of_interest, col_group)

Backward fill - Backward filling means fill missing values with next data point. backward-fill propagates the first observed non-null value backward until another non-null value is met.

Parameters

column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping

Returns

processed dataframe

Return type

pandas.core.frame.DataFrame

int_df_ffill(column_of_interest, col_group)

Forward fill - It fills missing values with previous data. Forward-fill propagates the last observed non-null value forward until another non-null value is encountered.

Parameters

column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping

Returns

processed dataframe

Return type

pandas.core.frame.DataFrame

make_interpolation_cubic(column_of_interest, col_group)

Cubic interpolation offers true continuity between the segments. As such it requires more than just the two endpoints of the segment but also the two points on either side of them.

Parameters

column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping

Returns

processed dataframe

Return type

pandas.core.frame.DataFrame

make_interpolation_liner(column_of_interest, col_group)

Linear Interpolation - In this, the points are simply joined by straight line segments. Each segment (bounded by two data points) can be interpolated independently.

Parameters

column_of_interest (str) – column selected for processing
col_group (list) – list of columns for groupping

Returns

processed dataframe

Return type

pandas.core.frame.DataFrame

process_dataframe(col_group, column_of_interest)

Resampling the dataframe

Parameters

col_group (list) – list of columns for groupping
column_of_interest (str) – column selected for processing