Neural Network Training using Scikit-Learn

This notebook implements a neural network for classifying the Iris dataset using MLPClassifier from Scikit-Learn instead of manually programmed functions.


import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Load data from binary dataset
data = np.load("iris_train_test_data.npz")

# Extract data
X_train, y_train, X_test, y_test = data["X_train"], data["y_train"], data["X_test"], data["y_test"]

print("Dataset loaded successfully.")

Dataset loaded successfully.

Defining and Training the Neural Network

We will use MLPClassifier from Scikit-Learn, which is a multi-layer perceptron classifier.

Hidden Layer Size: We use one hidden layer with 5 neurons (hidden_layer_sizes=(5,)).
Activation Function: The hidden layer uses the sigmoid function (activation='logistic') and the output layer uses softmax.
Learning Rate: The learning rate is set to 0.01.
Batch Size: The batch size is set to 10.
Epochs (Max Iterations): The training will run for 1000 iterations (max_iter=1000).

Scikit-Learn automatically selects the appropriate loss function based on the type of classification problem. When using MLPClassifier, the behavior is as follows:

If MLPClassifier detects that the target variable (y_train) has more than two classes and is formatted in one-hot encoding, it automatically applies log_loss, which is equivalent to Softmax + Cross-Entropy loss.
For binary classification, the default loss is log-loss, applied to a sigmoid output layer.
For regression problems, MLPRegressor uses squared_loss (Mean Squared Error, MSE).


# Define the MLPClassifier model
mlp = MLPClassifier(hidden_layer_sizes=(5,), 
                     activation='logistic',  # Sigmoid activation
                     solver='adam',  # Optimizer
                     learning_rate_init=0.01,
                     batch_size=10,
                     max_iter=1000,
                     random_state=123)
# Train the model
mlp.fit(X_train, y_train)

print("Training complete.")

Training complete.

Model Evaluation

We now evaluate the trained model using the test set (X_test, y_test). The confusion matrix will be displayed to assess classification performance.


# Predict on test set
y_pred = mlp.predict(X_test)

# Compute confusion matrix
cm = confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))

# Convert to DataFrame for better visualization
df_cm = pd.DataFrame(cm, index=['setosa', 'versicolor', 'virginica'],
                     columns=['setosa', 'versicolor', 'virginica'])

# Plot confusion matrix
plt.figure(figsize=(6,6))
sns.heatmap(df_cm, annot=True, fmt="d", cmap="Blues", linewidths=0.5)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()

# Display classification report
print(classification_report(y_test.argmax(axis=1), y_pred.argmax(axis=1), target_names=['setosa', 'versicolor', 'virginica']))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        14
  versicolor       0.93      0.88      0.90        16
   virginica       0.88      0.93      0.90        15

    accuracy                           0.93        45
   macro avg       0.94      0.94      0.94        45
weighted avg       0.93      0.93      0.93        45