Regression Class

class Regression_Group4.Regression(dataframe)

This class allows to read a Pandas Dataframe and train different regression models on the given data. At the beginning the dataframe has to be read in when creating an instance of the class. Afterwards a target dimension is chosen and the data is divided into training and test sections. One of the provided regression types can be selected and the model can be trained. Information on the accuracy of the model is provided by means of various graphics and key figures. Via a method it is possible to enter new parameters and to output the prediction of the model.

This class was build by Team 4.

MainEffectsPlot()

Plots the development of the Target Value with respect to each input values independently (all other values are fixed at their mean for each line graph).

Returns

Figure

_build_elastic_net_regression()

Gradient Boosting for regression

Returns

Fitted estimator

_build_linear_regression()

Ordinary least squares Linear Regression

Returns

Fitted estimator

_build_ridge_regression(max_iter=15000, solver='auto')

Ridge Regression or Tikhonov regularization

Parameters
  • max_iter (int, optional) – Maximum number of iterations for conjugate gradient solver, defaults to 15000

  • solver (str = 'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs', optional) – Solver to use in the computational routines:, defaults to ‘auto’

Returns

Fitted estimator

_build_sgd_regression(max_iter=1000)

Linear model fitted by minimizing a regularized empirical loss with SGD.

Parameters

max_iter (int, optional) – The maximum number of passes over the training data, defaults to 1000

Returns

Fitted estimator

_build_svm_regression(kernel='poly', degree=3, svmNumber=0.5, maxIterations=- 1)

Nu Support Vector Regression

Parameters
  • kernel (str = 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed', optional) – Specifies the kernel type to be used in the algorithm, defaults to ‘poly’

  • degree (int, optional) – Degree of the polynomial kernel function, defaults to 3

  • svmNumber (float, optional) – An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors, should be in the interval (0, 1), defaults to 0.5

  • maxIterations (int, optional) – Hard limit on iterations within solver, or -1 for no limit, defaults to -1

Returns

Fitted estimator

build_regression(regression_name, **args)

Builds a specified Regression Model with the given Training Data.

Parameters
  • regression_name (str = 'Support Vector Machine Regression ','Elastic Net Regression ','Ridge Regression ','Linear Regression ', 'Stochastic Gradient Descent Regression ') – Name of the chosen Regression Model

  • args – Arguments depending on the chosen Regression Model.

Returns

self.regression, params

dropColumns(label_drop)

Deletes a specified column within the dataframe.

Parameters

label_drop (str) – Target Columns to drop

Returns

None

get_dataframe_description()

Returns a brief overiew with basic information of the dataframe.

Returns

Dataframe Overview

get_dataframe_head()

Return the head of the dataframe.

Returns

Dataframe Head

get_regression_type()

Returns the Regression Type of the created model.

Returns

Regression Type

plot_correlation(label_target_1, label_target_2)

Plots the correlation of two Target Labels, defined by the User, within a Scatter Plot.

Parameters
  • label_target_1 (str) – First column name to plot

  • label_target_2 (str) – Second column name to plot

Returns

None

plot_heatmap_correlation(figsize=(5, 4))

Plots a correlation matrix / heatmap to show correlation between different columns of the dataframe. Serves as a descision support for the user.

Parameters

figsize (tuple, optional) – Size of the figure, defaults to (5,4)

Returns

None

plot_regression_1()

Plots the deviation between the prediction and the acutal test data of the Target Label.

Returns

Figure

plot_regression_2(label_second)

Plots the deviation between the prediction and the acutal test data from a choosen column.

Parameters

label_second (str) – Set dimension / column to plot

Returns

None

regression_function(user_input)

Feeds user inputs into the Regression Model and outputs the Prediction.

Parameters

user_input (pd.DataFrame) – Give input as Pandas Dataframe

Returns

model prediction

split_columns(label_target, new_label_1, new_label_2, seperator=' ')

Splits a specified column into two new columns.

Parameters
  • label_target (str) – Target column to split

  • new_label_1 (str) – Name of the first new column

  • new_label_2 (str) – Name of the second new column

  • seperator (str, optional) – Operator to seperate the target column, defaults to ” “

Returns

None

split_train_test(label_target, testsize=0.3, random_state=1, deleting_na=False, scaling=False, deleting_duplicates=False)

Method to preprocess the data. Sets a target column and splits the given dataframe into test and training data. Allows to delete NA columns, the scaling of the dataset (centering and scaling to unit variance) and deleting duplicates.

Parameters
  • label_target (str) – Sets the target column for the regression

  • testsize (float, optional) – Represents the proportion of the dataset to include in the test split, defaults to 0.3

  • random_state (int, optional) – Controls the shuffling applied to the data before applying the split, defaults to 1

  • deleting_na (bool, optional) – Remove missing values, defaults to False

  • scaling (bool, optional) – Standardize features by removing the mean and scaling to unit variance, defaults to False

  • deleting_duplicates (bool, optional) – Deletes duplicates, defaults to False

Returns

None