Time Series Random Forest Regression Class
- class TimeSeries_Final.Timeseries(df, Input, base, target, time_col, sysCodeNo)
Class Timeseries: Random forest Regressor for time series implementation
This Class contains the methods for Random forest Regressor for time series implementation Class input parameters:
- Parameters
df (Pandas DataFrame) – The input data frame
estimator (Integer) – Number of decision trees to be build before taking the average of all predictions to be specified by the user
test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.
Class Output Parameters:
- Parameters
Y_pred (float) – The resulting output of the Regression test
Y_test (float) – The expected output of the Regression test
score (R-squared) – Model accuracy on the Training data
:type : float :param RMSE: Root mean squared error :type RMSE: float :param Target: Occupancy to be predicted at a user specified time. :type Target: int :param Error_message: Error message if an exception was encountered during the processing of the code :type Error_message: str :param flag: internal flag for marking if an error occurred while processing a previous method :type flag: bool
- DataPrep()
DataPrep Method :
This method includes all the preprocessing of data required for the regressor model. It extracts the year, month, day, hours and minutes from the time column. It also calculates the rate based on target and base column. The minutes column of the given dataset have been binned in an interval of 15 mins for better prediction of the target value.
- RandomizedSearchoptim()
RandomizedSearchoptim Method : This method returns the best parameters on which the model is to be fitted.
- Parameters:
max_features: Number of features to consider at every split max_depth : Maximum number of levels in tree min_samples_split: Minimum number of samples required to split a node in_samples_leaf: Minimum number of samples required at each leaf node bootstrap: Method of selecting samples for training each tree
- Returns
Best parameter values on which regressor is fitted.
- Results()
Results Method :
This method displays metrics ‘R-squared score’ and ‘RMSE - Root Mean Squared Error’ value to analyze the performace of model,
R-Squared Score: It is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model.
RMSE: Mean Squared Error represents the average of the squared difference between the original and predicted values in the data set. It measures the variance of the residuals. Root Mean Squared Error is the square root of Mean Squared error. It measures the standard deviation of residuals.
This method prints a plot that represents the real prediction and optimal prediction. Optimal prediction are the true values that are plotted as a line plot and real prediction are values predicted by the classification model that are plotted as a scatter plot.
- SelectedSysCode(Input)
DataPrep Method :
This method includes all the preprocessing of data required for the regressor model for the user input data. It extracts the year, month, day, hours and minutes from the time column. The minutes column of the user specified time have been binned in an interval of 15 mins for better prediction of the target value.
The data frame is filtered according to the user specified syscodeno(system code number) in order to obtain the target value for the user specified system code number. Last but not the least the method also initializes the features and target column.
- UserSelectedmodel(estimator, test_size)
UserSelectedmodel Method:
This method splits the data into train and test sets, then creates a model based on the user input n_estimator,test_size and the specific system code number.
It calls model ‘RandomizedSearchoptim’ that returns the best parameters on which the model can be fitted.
It then fits the model based on the best parameters obtained after Randomized search cross validation and test it on the test dataset, then returns the predicted value ‘Y_pred’ i.e Occupancy at a user specified date and time.
- Parameters
estimator (Integer) – User Input - Number of decision trees to be build before taking the average of all prediction.
test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.
- fullmodel()
fullmodel Method :
This method includes the initialization of features and target for training and testing the regressor model on the whole dataset.
- model(estimator, test_size)
model Method :
This method splits the data into train and test sets, then creates a model based on the user input n_estimator and test_size.
It calls model ‘RandomizedSearchoptim’ that returns the best parameters on which the model can be fitted.
It then fits the model based on the best parameters obtained after Randomized search cross validation and test it on the test dataset, then returns the predicted value ‘Y_pred’
- Parameters
estimator (Integer) – User Input - Number of decision trees to be build before taking the average of all prediction.
test_size (float) – User Input - Proportion of test data specified by user in which dataset is to be splitted.
- Returns
Modified set of class parameters