KNN-,SVM- and LR-Classification Class

class ClassificationClass.Classification(dataframe, column_names=[])

This Class is used to build one of three possible classifiers (KNN,LR,SVM) with a givin dataset.

_build_LogisticRegression_classifier(solver='liblinear')

Builds the Logistic Regression Classifier with the given dataframe.

Parameters: solver (string) – (optional, default=liblinear) Name of the Solver: liblinear, newton-cg, lbfgs, sag, saga.
Returns: sklearn.linear_model._logistic.LogisticRegression

_build_SVM_classifier(kernel_fun='linear')

Builds the Support Vector Machine Classifier with the given dataframe.

Parameters: kernel_fun (string) – (optional, default=linear) Name of the Kernel function: linear, ploy, rbf, sigmoid.
Returns: sklearn.svm.SVC

_build_knn_classifier(k=5)

Builds the K-Nearest Neighbor Classifier with the given dataframe.

Parameters: k (int) – (optional, default=5) Number of k-Values
Returns: sklearn.neighbors.KNeighborsClassifier

_upsample_dataset(y_column_name)

Function for upsampling data

Parameters: y_column_name (string) – Name of the target column of the dataset
Return pandas.core.frame.DataFrame dataframe: The upscaled dataframe

build_classifier(classifier_name, *args)

The second important class method. This function builds the heart of the class - the classifier. It can be distinguished between three different types of classifiers by usage of the classifier_name argument.

Parameters

classifier_name (str) – Name of the wished classifier. Three options: KNN = K-Nearest Neighbor, SVM = Support Vector Machine, LR = Logistic Regression
args – Argument depending on the chosen classifier. For KNN: the value of k (int), for SVM: the desired kernel_fun (string = linear, ploy, rbf, sigmoid) for LR: the desired Solver (string = liblinear, newton-cg, lbfgs, sag, saga)

Returns

None

describe_dataframe(): Describes the internal dataframe. (Not used in Streamlit, but helpful for other applications) :return: pd.Dataframe.describe()

drop_dataframe_column(column_name)

This function drops a given column from the dataset. (Not used in Streamlit, but helpful for other applications)

Parameters: column_name (str) – Name of the cloumn to drop from the dataset
Returns: None

get_classifer(): Returns the Classifier which was build in the build_classifier method. (Not used in Streamlit, but helpful for other applications) :return: sklearn.classifier

show_classifier_accuracy()

The third of the three important Shows the Accuracy of the classifier by giving the Accuracy for each class and by showing a correlation matrix.

Returns: None

show_correlation(figsize=(5, 4)): Shows the correlation of the given dataset. (Not used in Streamlit, but helpful for other applications) :param list figsize: x and y size of the plotted figure (optional, default=(5,4)) :return: None

show_dataframe_head(): Returns the Head of the internal dataframe. (Not used in Streamlit, but helpful for other applications) :return: pd.Dataframe.head()

show_unique_values(): Shows the unique values of each column of the internal dataset (Not used in Streamlit, but helpful for other applications) :return: None

split_train_test(y_column_name, test_size=0.2, random_state=0, upsample=False, scaling=False, deleting_na=False)

The first of the three important classes methods. This function splits the input data frame into training and test data and provides three optional steps for data preparation:

data which are not available are deleted

data is upsampled (see: _upsample_dataset)

data is scaled to a value between 0 and 1

Parameters

y_column_name (str) – Name of the target column
test_size (float,optional) – Percentage of the number of test data, default=0.2
random_state (int,optional) – Value for the seed, default=0
upsample (boolean,optional) – If True the data gets upsampled, default = False
scaling (boolean,optional) – If True the data gets scaled, default = False
deleting_na (boolean,optional) – If True all na data gets deleted, default = False

Returns

None