KNN-,SVM- and LR-Classification Class

class ClassificationClass.Classification(dataframe, column_names=[])

This Class is used to build one of three possible classifiers (KNN,LR,SVM) with a givin dataset.

_build_LogisticRegression_classifier(solver='liblinear')

Builds the Logistic Regression Classifier with the given dataframe.

Parameters

solver (string) – (optional, default=liblinear) Name of the Solver: liblinear, newton-cg, lbfgs, sag, saga.

Returns

sklearn.linear_model._logistic.LogisticRegression

_build_SVM_classifier(kernel_fun='linear')

Builds the Support Vector Machine Classifier with the given dataframe.

Parameters

kernel_fun (string) – (optional, default=linear) Name of the Kernel function: linear, ploy, rbf, sigmoid.

Returns

sklearn.svm.SVC

_build_knn_classifier(k=5)

Builds the K-Nearest Neighbor Classifier with the given dataframe.

Parameters

k (int) – (optional, default=5) Number of k-Values

Returns

sklearn.neighbors.KNeighborsClassifier

_upsample_dataset(y_column_name)

Function for upsampling data

Parameters

y_column_name (string) – Name of the target column of the dataset

Return pandas.core.frame.DataFrame dataframe

The upscaled dataframe

build_classifier(classifier_name, *args)

The second important class method. This function builds the heart of the class - the classifier. It can be distinguished between three different types of classifiers by usage of the classifier_name argument.

Parameters
  • classifier_name (str) – Name of the wished classifier. Three options: KNN = K-Nearest Neighbor, SVM = Support Vector Machine, LR = Logistic Regression

  • args – Argument depending on the chosen classifier. For KNN: the value of k (int), for SVM: the desired kernel_fun (string = linear, ploy, rbf, sigmoid) for LR: the desired Solver (string = liblinear, newton-cg, lbfgs, sag, saga)

Returns

None

describe_dataframe()

Describes the internal dataframe. (Not used in Streamlit, but helpful for other applications) :return: pd.Dataframe.describe()

drop_dataframe_column(column_name)

This function drops a given column from the dataset. (Not used in Streamlit, but helpful for other applications)

Parameters

column_name (str) – Name of the cloumn to drop from the dataset

Returns

None

get_classifer()

Returns the Classifier which was build in the build_classifier method. (Not used in Streamlit, but helpful for other applications) :return: sklearn.classifier

show_classifier_accuracy()

The third of the three important Shows the Accuracy of the classifier by giving the Accuracy for each class and by showing a correlation matrix.

Returns

None

show_correlation(figsize=(5, 4))

Shows the correlation of the given dataset. (Not used in Streamlit, but helpful for other applications) :param list figsize: x and y size of the plotted figure (optional, default=(5,4)) :return: None

show_dataframe_head()

Returns the Head of the internal dataframe. (Not used in Streamlit, but helpful for other applications) :return: pd.Dataframe.head()

show_unique_values()

Shows the unique values of each column of the internal dataset (Not used in Streamlit, but helpful for other applications) :return: None

split_train_test(y_column_name, test_size=0.2, random_state=0, upsample=False, scaling=False, deleting_na=False)

The first of the three important classes methods. This function splits the input data frame into training and test data and provides three optional steps for data preparation:

  1. data which are not available are deleted

  2. data is upsampled (see: _upsample_dataset)

  3. data is scaled to a value between 0 and 1

Parameters
  • y_column_name (str) – Name of the target column

  • test_size (float,optional) – Percentage of the number of test data, default=0.2

  • random_state (int,optional) – Value for the seed, default=0

  • upsample (boolean,optional) – If True the data gets upsampled, default = False

  • scaling (boolean,optional) – If True the data gets scaled, default = False

  • deleting_na (boolean,optional) – If True all na data gets deleted, default = False

Returns

None