A4.3.8 Outline the structure and function of ANNs and how multi-layer networks are used to model complex patterns in data sets. (HL only)

A4.3.8 Outline the structure and function of ANNs and how multi-layer networks are used to model complex patterns in data sets.

• An artificial neural network (ANN) to simulate interconnected nodes or “neurons” to process and learn from input data, enabling tasks such as classification, regression and pattern recognition

• Sketch of a single perceptron, highlighting its input, weights, bias, activation function and output

• Sketch of a multi-layer perceptron (MLP) encompassing the input layer, one or more hidden layers and the output layer

The Big Idea

Artificial Neural Networks (ANNs) are computational models inspired by the biological neurons in the human brain. They consist of layers of interconnected processing units, or artificial neurons, which learn to map input data to outputs through training. ANNs are capable of capturing complex, nonlinear relationships in data, making them powerful tools for classification, regression, and pattern recognition tasks.

While a single-layer network can solve only simple, linearly separable problems, modern ANNs typically use multi-layer architectures—called Multi-Layer Perceptrons (MLPs)—to model nonlinear and high-dimensional patterns in large datasets.

1. The Perceptron: The Fundamental Unit

A perceptron is the simplest type of artificial neuron. It computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function to produce an output.

Structure of a Single Perceptron:

\text{Output} = \phi\left( \sum_{i=1}^{n} w_i x_i + b \right)

Where:

$x_i$ : Input values
$w_i$ : Weights corresponding to each input
$b$ : Bias term (adjusts output independently of inputs)
$\phi$ : Activation function (e.g., step, sigmoid, ReLU)

Function:

The weights determine the importance of each input.
The activation function introduces non-linearity, allowing the network to model complex patterns.
The perceptron outputs a signal that can be used for decision-making (e.g., classification).

2. Sketch: Single Perceptron

Inputs:    x₁ ----->┐
                    │
           x₂ ----->┼── [ Σ wᵢxᵢ + b ] ──> Activation φ ──> Output
                    │
           xₙ ----->┘

Weights $w_1, w_2, ..., w_n$ applied to inputs
Bias $b$ added
Activation function (e.g., ReLU, sigmoid)
Single scalar output

3. Multi-Layer Perceptron (MLP): Modeling Complex Patterns

A Multi-Layer Perceptron is an ANN with at least one hidden layer between the input and output layers. Each layer consists of multiple neurons, and each neuron in one layer connects to every neuron in the next layer (fully connected architecture).

Structure:

Input Layer: Receives raw features (e.g., pixel values, sensor readings)
Hidden Layers: One or more layers of neurons that transform the data through nonlinear activation functions
Output Layer: Produces final prediction (e.g., class probabilities, regression output)

Activation functions (common choices):

ReLU (Rectified Linear Unit): $\phi(x) = \max(0, x)$
Sigmoid: $\phi(x) = \frac{1}{1 + e^{-x}}$
Tanh: $\phi(x) = \tanh(x)$
Softmax: Often used in the output layer for multi-class classification

4. Sketch: Multi-Layer Perceptron (MLP)

Input Layer       Hidden Layer(s)                Output Layer

  x₁   ──┐          ●        ●        ●          ┐
         ├───────>  ●        ●        ●  ───────>●─> y₁
  x₂   ──┤          ●        ●        ●          │
         └───────>  ●        ●        ●  ───────>●─> y₂
  ...               ↑        ↑        ↑
  xₙ   ──────────> (weighted connections + activations)

Each arrow represents a weight.
Each circle applies a weighted sum + bias + activation function.
Output layer neurons may represent classes (for classification) or real values (for regression).

5. Applications of ANNs

Classification

Image recognition (e.g., handwritten digit classification)
Sentiment analysis (e.g., predicting movie review polarity)
Disease detection (e.g., cancer vs. non-cancer)

Regression

House price prediction based on location, size, etc.
Estimating continuous sensor values from physical systems

Pattern Recognition

Speech recognition
Gesture recognition in video
Fraud detection

Summary

Artificial Neural Networks simulate the structure and function of biological neural systems to process data in layers of interconnected nodes. The perceptron is the fundamental building block, and by combining multiple perceptrons into multi-layer networks, ANNs can model complex, nonlinear patterns in high-dimensional data. With broad applications across classification, regression, and pattern recognition, ANNs are foundational to modern AI, particularly in deep learning systems.