KNN-,SVM- and LR-Classification Class
- class ClassificationClass.Classification(dataframe, column_names=[])
This Class is used to build one of three possible classifiers (KNN,LR,SVM) with a givin dataset.
- _build_LogisticRegression_classifier(solver='liblinear')
Builds the Logistic Regression Classifier with the given dataframe.
- Parameters
solver (string) – (optional, default=liblinear) Name of the Solver: liblinear, newton-cg, lbfgs, sag, saga.
- Returns
sklearn.linear_model._logistic.LogisticRegression
- _build_SVM_classifier(kernel_fun='linear')
Builds the Support Vector Machine Classifier with the given dataframe.
- Parameters
kernel_fun (string) – (optional, default=linear) Name of the Kernel function: linear, ploy, rbf, sigmoid.
- Returns
sklearn.svm.SVC
- _build_knn_classifier(k=5)
Builds the K-Nearest Neighbor Classifier with the given dataframe.
- Parameters
k (int) – (optional, default=5) Number of k-Values
- Returns
sklearn.neighbors.KNeighborsClassifier
- _upsample_dataset(y_column_name)
Function for upsampling data
- Parameters
y_column_name (string) – Name of the target column of the dataset
- Return pandas.core.frame.DataFrame dataframe
The upscaled dataframe
- build_classifier(classifier_name, *args)
The second important class method. This function builds the heart of the class - the classifier. It can be distinguished between three different types of classifiers by usage of the classifier_name argument.
- Parameters
classifier_name (str) – Name of the wished classifier. Three options: KNN = K-Nearest Neighbor, SVM = Support Vector Machine, LR = Logistic Regression
args – Argument depending on the chosen classifier. For KNN: the value of k (int), for SVM: the desired kernel_fun (string = linear, ploy, rbf, sigmoid) for LR: the desired Solver (string = liblinear, newton-cg, lbfgs, sag, saga)
- Returns
None
- describe_dataframe()
Describes the internal dataframe. (Not used in Streamlit, but helpful for other applications) :return: pd.Dataframe.describe()
- drop_dataframe_column(column_name)
This function drops a given column from the dataset. (Not used in Streamlit, but helpful for other applications)
- Parameters
column_name (str) – Name of the cloumn to drop from the dataset
- Returns
None
- get_classifer()
Returns the Classifier which was build in the build_classifier method. (Not used in Streamlit, but helpful for other applications) :return: sklearn.classifier
- show_classifier_accuracy()
The third of the three important Shows the Accuracy of the classifier by giving the Accuracy for each class and by showing a correlation matrix.
- Returns
None
- show_correlation(figsize=(5, 4))
Shows the correlation of the given dataset. (Not used in Streamlit, but helpful for other applications) :param list figsize: x and y size of the plotted figure (optional, default=(5,4)) :return: None
- show_dataframe_head()
Returns the Head of the internal dataframe. (Not used in Streamlit, but helpful for other applications) :return: pd.Dataframe.head()
- show_unique_values()
Shows the unique values of each column of the internal dataset (Not used in Streamlit, but helpful for other applications) :return: None
- split_train_test(y_column_name, test_size=0.2, random_state=0, upsample=False, scaling=False, deleting_na=False)
The first of the three important classes methods. This function splits the input data frame into training and test data and provides three optional steps for data preparation:
data which are not available are deleted
data is upsampled (see: _upsample_dataset)
data is scaled to a value between 0 and 1
- Parameters
y_column_name (str) – Name of the target column
test_size (float,optional) – Percentage of the number of test data, default=0.2
random_state (int,optional) – Value for the seed, default=0
upsample (boolean,optional) – If True the data gets upsampled, default = False
scaling (boolean,optional) – If True the data gets scaled, default = False
deleting_na (boolean,optional) – If True all na data gets deleted, default = False
- Returns
None