Random Forest Classification Class

class RF_classfication.Sample(X, Y)

Class Sample for Random forest Classification

This Class contains the methods used for Random forest Classifier

Class input parameters:

Parameters
  • df (Pandas DataFrame) – The input data frame

  • estimator (Integer) – Number of decision trees to be specified by user

  • test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.

Class Output Parameters:

Parameters
  • Y_pred (int array) – The resulting output of the Regression test

  • Y_test (int array) – The expected output of the Regression test

  • acctrain (float) – Model accuracy on the Training data

  • acctest (float) – Model accuracy on the Testing data

  • Error_message (str) – Error message if an exception was encountered during the processing of the code

  • flag (bool) – internal flag for marking if an error occurred while processing a previous method

RandomizedSearchoptim()

RandomizedSearchoptim Method :

This method returns the best parameters using Randomized search cross validation method on which the model is to be fitted.

Parameters:

max_features: Number of features to consider at every split max_depth : Maximum number of levels in tree min_samples_split: Minimum number of samples required to split a node in_samples_leaf: Minimum number of samples required at each leaf node bootstrap: Method of selecting samples for training each tree

Returns

Best parameters

accuracy()

accuracy Method :

Classification accuracy is a measure that indicates a classification model’s performance by dividing the number of correct predictions by the total number of predictions.

This method returns the accuracy for the training and testing dataset.

model(estimator, test_size)

Model Method :

This method splits the data into train and test sets, then creates a model based on the user input n_estimator and test_size.

It calls model ‘RandomizedSearchoptim’ that returns the best parameters on which the model can be fitted.

It then fits the model based on the best parameter obtained after Randomized search cross validation and test it on the test dataset, then returns the predicted value ‘Y_pred’

Parameters
  • estimator (Integer) – User Input - Number of decision trees for random forest classifier

  • test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.

report()

report Method :

This method prints the Confusion Matrix and classification report for the performance of the whole model.

Confusion matrix - A confusion matrix is a summary of prediction results on a classification problem.The number of correct and incorrect predictions are summarized with count values and broken down by each class.

Classification Report - The classification report visualizer displays the precision, recall, F1, and support scores for the model.

Precision: Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.

Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to the all observations in actual class

The F1 score represents the balance of accuracy and recall. F1 Score is the weighted average of Precision and Recall. F1 score is good parameter in analyzing performance of model on an unbalanced dataset.