A4.3.3 Explain the role of hyperparameter tuning when evaluating supervised learning algorithms. (HL only)

A4.3.3 Explain the role of hyperparameter tuning when evaluating supervised learning algorithms. 
• Accuracy, precision, recall and F1 score as evaluation metrics 
• The role of hyperparameter tuning on model performance 
• Overfitting and underfitting when training algorithms

The Big Idea

In supervised machine learning, training an algorithm involves not only learning patterns from data but also choosing the right configuration for the algorithm’s behavior—this is where hyperparameter tuning comes in. Hyperparameters are settings external to the model's learned parameters (like weights) and must be manually selected before training. They control aspects like model complexity, learning rate, regularization strength, and tree depth.

Hyperparameter tuning is the process of systematically selecting the best set of these values to optimize the model’s performance, avoid underfitting or overfitting, and generalize well to unseen data.


Key Evaluation Metrics

Before tuning, we need a way to evaluate how well a model performs. The following metrics are essential when assessing classification models:

1. Accuracy

  • Measures the proportion of total correct predictions.
Accuracy=True Positives + True NegativesTotal Predictions\text{Accuracy} = \frac{\text{True Positives + True Negatives}}{\text{Total Predictions}}
  • Works well when classes are balanced.

2. Precision

  • Measures how many predicted positives were actually correct.
Precision=True PositivesTrue Positives + False Positives\text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}

3. Recall (Sensitivity)

  • Measures how many actual positives were correctly predicted.
Recall=True PositivesTrue Positives + False Negatives\text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}

4. F1 Score

  • Harmonic mean of precision and recall—useful when you need to balance both.
F1=2×Precision×RecallPrecision + Recall\text{F1} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision + Recall}}

Example: In spam detection, high precision ensures few false spam alerts (important emails aren't lost), while high recall ensures most spam is caught.


The Role of Hyperparameter Tuning

What Are Hyperparameters?

Hyperparameters control the learning process and structure of the model. Unlike model parameters (e.g., weights), they are not learned during training—they are set before training begins.

Examples:

  • k in K-Nearest Neighbours
  • Max depth in decision trees
  • Learning rate in gradient descent
  • Regularization strength (e.g., L2 penalty)
  • Batch size and epochs in neural networks

Why Tuning Matters

The choice of hyperparameters can drastically affect:

  • Model accuracy on both training and validation sets
  • Training time
  • Generalization to new data

Tuning finds the “sweet spot” between:

Underfitting

  • Model is too simple to capture the underlying pattern.
  • High training and test error.
  • Example: A decision tree with max depth = 1.

Overfitting

  • Model is too complex and memorizes training data (including noise).
  • Low training error, but high test error.
  • Example: K-NN with k=1k = 1 — it classifies every training point perfectly but performs poorly on new data.

Common Tuning Strategies

  • Grid Search: Try every combination of hyperparameters in a predefined grid.
  • Random Search: Sample random combinations within a parameter range.
  • Bayesian Optimization: Use probabilistic models to predict better combinations based on past results.
  • Cross-Validation: Evaluate each hyperparameter setting by averaging performance across multiple data folds.

Student-Relatable Example

Suppose you're building a model to predict if a student will pass or fail a course based on attendance, homework completion, and quiz scores.

  • You use a decision tree. If you set the max depth too low, the model can't capture the difference between diligent and non-diligent students (underfitting).
  • If you set the max depth too high, it memorizes the specific quirks of your training data—like one student who skipped class but aced every quiz (overfitting).
  • By tuning the depth with cross-validation, you find the best balance between generality and specificity—improving your F1 score from 0.65 to 0.81.

Summary

Hyperparameter tuning is a crucial process that controls how a supervised learning model behaves and how well it performs. By carefully selecting values like model depth, regularization strength, or learning rate, we can improve metrics like accuracy, precision, recall, and F1 score, and avoid the pitfalls of underfitting and overfitting. A well-tuned model doesn't just perform well on training data—it makes reliable predictions in the real world.