Regression Class
- class Regression_Group4.Regression(dataframe)
This class allows to read a Pandas Dataframe and train different regression models on the given data. At the beginning the dataframe has to be read in when creating an instance of the class. Afterwards a target dimension is chosen and the data is divided into training and test sections. One of the provided regression types can be selected and the model can be trained. Information on the accuracy of the model is provided by means of various graphics and key figures. Via a method it is possible to enter new parameters and to output the prediction of the model.
This class was build by Team 4.
- MainEffectsPlot()
Plots the development of the Target Value with respect to each input values independently (all other values are fixed at their mean for each line graph).
- Returns
Figure
- _build_elastic_net_regression()
Gradient Boosting for regression
- Returns
Fitted estimator
- _build_linear_regression()
Ordinary least squares Linear Regression
- Returns
Fitted estimator
- _build_ridge_regression(max_iter=15000, solver='auto')
Ridge Regression or Tikhonov regularization
- Parameters
max_iter (int, optional) – Maximum number of iterations for conjugate gradient solver, defaults to 15000
solver (str = 'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs', optional) – Solver to use in the computational routines:, defaults to ‘auto’
- Returns
Fitted estimator
- _build_sgd_regression(max_iter=1000)
Linear model fitted by minimizing a regularized empirical loss with SGD.
- Parameters
max_iter (int, optional) – The maximum number of passes over the training data, defaults to 1000
- Returns
Fitted estimator
- _build_svm_regression(kernel='poly', degree=3, svmNumber=0.5, maxIterations=- 1)
Nu Support Vector Regression
- Parameters
kernel (str = 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed', optional) – Specifies the kernel type to be used in the algorithm, defaults to ‘poly’
degree (int, optional) – Degree of the polynomial kernel function, defaults to 3
svmNumber (float, optional) – An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors, should be in the interval (0, 1), defaults to 0.5
maxIterations (int, optional) – Hard limit on iterations within solver, or -1 for no limit, defaults to -1
- Returns
Fitted estimator
- build_regression(regression_name, **args)
Builds a specified Regression Model with the given Training Data.
- Parameters
regression_name (str = 'Support Vector Machine Regression ','Elastic Net Regression ','Ridge Regression ','Linear Regression ', 'Stochastic Gradient Descent Regression ') – Name of the chosen Regression Model
args – Arguments depending on the chosen Regression Model.
- Returns
self.regression, params
- dropColumns(label_drop)
Deletes a specified column within the dataframe.
- Parameters
label_drop (str) – Target Columns to drop
- Returns
None
- get_dataframe_description()
Returns a brief overiew with basic information of the dataframe.
- Returns
Dataframe Overview
- get_dataframe_head()
Return the head of the dataframe.
- Returns
Dataframe Head
- get_regression_type()
Returns the Regression Type of the created model.
- Returns
Regression Type
- plot_correlation(label_target_1, label_target_2)
Plots the correlation of two Target Labels, defined by the User, within a Scatter Plot.
- Parameters
label_target_1 (str) – First column name to plot
label_target_2 (str) – Second column name to plot
- Returns
None
- plot_heatmap_correlation(figsize=(5, 4))
Plots a correlation matrix / heatmap to show correlation between different columns of the dataframe. Serves as a descision support for the user.
- Parameters
figsize (tuple, optional) – Size of the figure, defaults to (5,4)
- Returns
None
- plot_regression_1()
Plots the deviation between the prediction and the acutal test data of the Target Label.
- Returns
Figure
- plot_regression_2(label_second)
Plots the deviation between the prediction and the acutal test data from a choosen column.
- Parameters
label_second (str) – Set dimension / column to plot
- Returns
None
- regression_function(user_input)
Feeds user inputs into the Regression Model and outputs the Prediction.
- Parameters
user_input (pd.DataFrame) – Give input as Pandas Dataframe
- Returns
model prediction
- split_columns(label_target, new_label_1, new_label_2, seperator=' ')
Splits a specified column into two new columns.
- Parameters
label_target (str) – Target column to split
new_label_1 (str) – Name of the first new column
new_label_2 (str) – Name of the second new column
seperator (str, optional) – Operator to seperate the target column, defaults to ” “
- Returns
None
- split_train_test(label_target, testsize=0.3, random_state=1, deleting_na=False, scaling=False, deleting_duplicates=False)
Method to preprocess the data. Sets a target column and splits the given dataframe into test and training data. Allows to delete NA columns, the scaling of the dataset (centering and scaling to unit variance) and deleting duplicates.
- Parameters
label_target (str) – Sets the target column for the regression
testsize (float, optional) – Represents the proportion of the dataset to include in the test split, defaults to 0.3
random_state (int, optional) – Controls the shuffling applied to the data before applying the split, defaults to 1
deleting_na (bool, optional) – Remove missing values, defaults to False
scaling (bool, optional) – Standardize features by removing the mean and scaling to unit variance, defaults to False
deleting_duplicates (bool, optional) – Deletes duplicates, defaults to False
- Returns
None