Regression Class

class Regression_Group4.Regression(dataframe)

This class allows to read a Pandas Dataframe and train different regression models on the given data. At the beginning the dataframe has to be read in when creating an instance of the class. Afterwards a target dimension is chosen and the data is divided into training and test sections. One of the provided regression types can be selected and the model can be trained. Information on the accuracy of the model is provided by means of various graphics and key figures. Via a method it is possible to enter new parameters and to output the prediction of the model.

This class was build by Team 4.

MainEffectsPlot()

Plots the development of the Target Value with respect to each input values independently (all other values are fixed at their mean for each line graph).

Returns: Figure

_build_elastic_net_regression()

Gradient Boosting for regression

Returns: Fitted estimator

_build_linear_regression()

Ordinary least squares Linear Regression

Returns: Fitted estimator

_build_ridge_regression(max_iter=15000, solver='auto')

Ridge Regression or Tikhonov regularization

Parameters

max_iter (int, optional) – Maximum number of iterations for conjugate gradient solver, defaults to 15000
solver (str = 'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs', optional) – Solver to use in the computational routines:, defaults to ‘auto’

Returns

Fitted estimator

_build_sgd_regression(max_iter=1000)

Linear model fitted by minimizing a regularized empirical loss with SGD.

Parameters: max_iter (int, optional) – The maximum number of passes over the training data, defaults to 1000
Returns: Fitted estimator

_build_svm_regression(kernel='poly', degree=3, svmNumber=0.5, maxIterations=- 1)

Nu Support Vector Regression

Parameters

kernel (str = 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed', optional) – Specifies the kernel type to be used in the algorithm, defaults to ‘poly’
degree (int, optional) – Degree of the polynomial kernel function, defaults to 3
svmNumber (float, optional) – An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors, should be in the interval (0, 1), defaults to 0.5
maxIterations (int, optional) – Hard limit on iterations within solver, or -1 for no limit, defaults to -1

Returns

Fitted estimator

build_regression(regression_name, **args)

Builds a specified Regression Model with the given Training Data.

Parameters

regression_name (str = 'Support Vector Machine Regression ','Elastic Net Regression ','Ridge Regression ','Linear Regression ', 'Stochastic Gradient Descent Regression ') – Name of the chosen Regression Model
args – Arguments depending on the chosen Regression Model.

Returns

self.regression, params

dropColumns(label_drop)

Deletes a specified column within the dataframe.

Parameters: label_drop (str) – Target Columns to drop
Returns: None

get_dataframe_description()

Returns a brief overiew with basic information of the dataframe.

Returns: Dataframe Overview

get_dataframe_head()

Return the head of the dataframe.

Returns: Dataframe Head

get_regression_type()

Returns the Regression Type of the created model.

Returns: Regression Type

plot_correlation(label_target_1, label_target_2)

Plots the correlation of two Target Labels, defined by the User, within a Scatter Plot.

Parameters

label_target_1 (str) – First column name to plot
label_target_2 (str) – Second column name to plot

Returns

None

plot_heatmap_correlation(figsize=(5, 4))

Plots a correlation matrix / heatmap to show correlation between different columns of the dataframe. Serves as a descision support for the user.

Parameters: figsize (tuple, optional) – Size of the figure, defaults to (5,4)
Returns: None

plot_regression_1()

Plots the deviation between the prediction and the acutal test data of the Target Label.

Returns: Figure

plot_regression_2(label_second)

Plots the deviation between the prediction and the acutal test data from a choosen column.

Parameters: label_second (str) – Set dimension / column to plot
Returns: None

regression_function(user_input)

Feeds user inputs into the Regression Model and outputs the Prediction.

Parameters: user_input (pd.DataFrame) – Give input as Pandas Dataframe
Returns: model prediction

split_columns(label_target, new_label_1, new_label_2, seperator=' ')

Splits a specified column into two new columns.

Parameters

label_target (str) – Target column to split
new_label_1 (str) – Name of the first new column
new_label_2 (str) – Name of the second new column
seperator (str, optional) – Operator to seperate the target column, defaults to ” “

Returns

None

split_train_test(label_target, testsize=0.3, random_state=1, deleting_na=False, scaling=False, deleting_duplicates=False)

Method to preprocess the data. Sets a target column and splits the given dataframe into test and training data. Allows to delete NA columns, the scaling of the dataset (centering and scaling to unit variance) and deleting duplicates.

Parameters

label_target (str) – Sets the target column for the regression
testsize (float, optional) – Represents the proportion of the dataset to include in the test split, defaults to 0.3
random_state (int, optional) – Controls the shuffling applied to the data before applying the split, defaults to 1
deleting_na (bool, optional) – Remove missing values, defaults to False
scaling (bool, optional) – Standardize features by removing the mean and scaling to unit variance, defaults to False
deleting_duplicates (bool, optional) – Deletes duplicates, defaults to False

Returns

None