Random Forest Regression Class

class Regression_final.Regressor(X, Y)

Class Regressor: Random forest Regressor

This Class contains the methods for Random forest Regressor

Class input parameters:

Parameters

df (Pandas DataFrame) – The input data frame
estimator (Integer) – Number of decision trees to be build before taking the average of all predictions to be specified by the user
test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.

Class Output Parameters:

Parameters

Y_pred (float) – The resulting output of the Regression test
Y_test (float) – The expected output of the Regression test
score (R squared) – Model accuracy on the Training data
RMSE (float) – Root mean squared error
Error_message (str) – Error message if an exception was encountered during the processing of the code
flag (bool) – internal flag for marking if an error occurred while processing a previous method

Type

float

gridSearchCV(reg, X_train, y_train)

gridSearchCV Method :

This method returns the best parameters on which the model is to be fitted.

Searched Parameters:

max_features: Number of features to consider at every split
max_depth : Maximum number of levels in tree
min_samples_split: Minimum number of samples required to split a node
in_samples_leaf: Minimum number of samples required at each leaf node

Returns: Best parameter values on which regressor is fitted.

model(estimator, test_size)

model Method :

This method splits the data into train and test sets, then creates a model based on the user input n_estimator and test_size.

It calls model ‘gridSearchCV’ that returns the best parameters on which the model can be fitted.

It then fits the model based on the best parameters obtained after Grid search cross validation and test it on the test dataset, then returns the predicted value ‘Y_pred’

Parameters

estimator (Integer) – User Input - Number of decision trees to be build before taking the average of all prediction.
test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.

prediction_plot(Y_test, Y_pred)

prediction_plot Method :

This method prints a plot that represents the real prediction and optimal prediction. Optimal prediction are the true values that are plotted as a line plot and real prediction are values predicted by the classification model that are plotted as a scatter plot.

result(Y_test, Y_pred)

model result :

This method displays metrics ‘R-squared score’ and ‘RMSE - Root Mean Squared Error’ value to analyze the performace of model,

R-Squared Score: It is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.

RMSE: Mean Squared Error represents the average of the squared difference between the original and predicted values in the data set. It measures the variance of the residuals. Root Mean Squared Error is the square root of Mean Squared error. It measures the standard deviation of residuals.

Parameters

Y_test (float) – The resulting output of the Regression test
Y_pred (float) – The expected output of the Regression test