Random Forest Regression Class

class Regression_final.Regressor(X, Y)

Class Regressor: Random forest Regressor

This Class contains the methods for Random forest Regressor

Class input parameters:

Parameters
  • df (Pandas DataFrame) – The input data frame

  • estimator (Integer) – Number of decision trees to be build before taking the average of all predictions to be specified by the user

  • test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.

Class Output Parameters:

Parameters
  • Y_pred (float) – The resulting output of the Regression test

  • Y_test (float) – The expected output of the Regression test

  • score (R squared) – Model accuracy on the Training data

  • RMSE (float) – Root mean squared error

  • Error_message (str) – Error message if an exception was encountered during the processing of the code

  • flag (bool) – internal flag for marking if an error occurred while processing a previous method

Type

float

gridSearchCV(reg, X_train, y_train)

gridSearchCV Method :

This method returns the best parameters on which the model is to be fitted.

Searched Parameters:
  1. max_features: Number of features to consider at every split

  2. max_depth : Maximum number of levels in tree

  3. min_samples_split: Minimum number of samples required to split a node

  4. in_samples_leaf: Minimum number of samples required at each leaf node

Returns

Best parameter values on which regressor is fitted.

model(estimator, test_size)

model Method :

This method splits the data into train and test sets, then creates a model based on the user input n_estimator and test_size.

It calls model ‘gridSearchCV’ that returns the best parameters on which the model can be fitted.

It then fits the model based on the best parameters obtained after Grid search cross validation and test it on the test dataset, then returns the predicted value ‘Y_pred’

Parameters
  • estimator (Integer) – User Input - Number of decision trees to be build before taking the average of all prediction.

  • test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.

prediction_plot(Y_test, Y_pred)

prediction_plot Method :

This method prints a plot that represents the real prediction and optimal prediction. Optimal prediction are the true values that are plotted as a line plot and real prediction are values predicted by the classification model that are plotted as a scatter plot.

result(Y_test, Y_pred)

model result :

This method displays metrics ‘R-squared score’ and ‘RMSE - Root Mean Squared Error’ value to analyze the performace of model,

R-Squared Score: It is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.

RMSE: Mean Squared Error represents the average of the squared difference between the original and predicted values in the data set. It measures the variance of the residuals. Root Mean Squared Error is the square root of Mean Squared error. It measures the standard deviation of residuals.

Parameters
  • Y_test (float) – The resulting output of the Regression test

  • Y_pred (float) – The expected output of the Regression test