TSE.
MathematicsFinanceHealthPhysicsEngineeringBrowse all

Computer Science · Machine Learning · Neural Networks

Perceptron Learning Calculator

Calculates updated perceptron weights and bias after a single learning step using the perceptron learning rule.

Calculator

Advertisement

Formula

w_i is the weight for input i; \eta (eta) is the learning rate; y is the true label (target output); \hat{y} is the predicted output from the step activation function; x_i is the i-th input feature; b is the bias term. The error term (y - \hat{y}) is 0 when the prediction is correct, +1 when the true label is 1 but prediction was 0, and -1 when the true label is 0 but prediction was 1.

Source: Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65(6), 386–408.

How it works

The perceptron is the simplest form of an artificial neural network — a binary linear classifier that maps a vector of inputs to a single binary output. Given a set of inputs and their corresponding weights, the perceptron computes a weighted sum (the net input), compares it to a threshold, and fires a 1 if the sum meets or exceeds the threshold, or a 0 otherwise. This step-function activation is the hallmark of the classical perceptron model introduced by Rosenblatt.

The core learning rule updates each weight according to the formula: w_i(new) = w_i(old) + η × (y − ŷ) × x_i, where η is the learning rate (a small positive value controlling step size), y is the true class label, ŷ is the perceptron's prediction, and x_i is the corresponding input feature. The bias is updated analogously: b(new) = b(old) + η × (y − ŷ). When the prediction is correct, the error term (y − ŷ) equals zero and no update occurs. When the prediction is wrong, weights and bias shift in the direction that would have produced the correct output.

This calculator supports up to three input features (x₁, x₂, x₃) along with their respective weights and a bias term. It is used in machine learning coursework to trace through single-epoch updates, in AI research to prototype simple classifiers, and by practitioners who want to build intuition about gradient-descent-based learning before tackling multi-layer networks and backpropagation. Set x₃ and w₃ to 0 for a standard two-feature perceptron.

Worked example

Suppose a perceptron is learning a simple binary classification task. The current state is: w₁ = 0.5, w₂ = −0.3, b = 0.1, η = 0.1, and the activation threshold is 0.

A training sample arrives with inputs x₁ = 1, x₂ = 0, and true label y = 1.

Step 1 — Compute the weighted sum:
Net = (0.5 × 1) + (−0.3 × 0) + 0.1 = 0.5 + 0 + 0.1 = 0.6

Step 2 — Apply the step activation:
0.6 ≥ 0 (threshold), so ŷ = 1

Step 3 — Compute the error:
error = y − ŷ = 1 − 1 = 0

Step 4 — Update weights and bias:
Since error = 0, no update is needed.
w₁(new) = 0.5 + 0.1 × 0 × 1 = 0.5
w₂(new) = −0.3 + 0.1 × 0 × 0 = −0.3
b(new) = 0.1 + 0.1 × 0 = 0.1

Now consider a misclassified sample: x₁ = 1, x₂ = 1, true label y = 1, but with weights w₁ = −0.4, w₂ = −0.3, b = 0.1.
Net = (−0.4 × 1) + (−0.3 × 1) + 0.1 = −0.6 → ŷ = 0
error = 1 − 0 = 1
w₁(new) = −0.4 + 0.1 × 1 × 1 = −0.3
w₂(new) = −0.3 + 0.1 × 1 × 1 = −0.2
b(new) = 0.1 + 0.1 × 1 = 0.2

Limitations & notes

The classical perceptron learning rule is guaranteed to converge only when the training data is linearly separable — that is, when a hyperplane can perfectly divide the two classes. For non-linearly separable problems (such as XOR), the perceptron will never converge and weights will oscillate indefinitely. This limitation motivated the development of multi-layer perceptrons (MLPs) and the backpropagation algorithm. Additionally, this calculator computes a single learning step for up to three features; real-world perceptron training iterates over an entire dataset, potentially for many epochs. The step (Heaviside) activation function used here is non-differentiable at zero, which prevents gradient-based optimisation and is why modern networks use smooth activations like sigmoid or ReLU. The learning rate η must be chosen carefully — too large causes oscillation, too small causes slow convergence. Finally, the standard perceptron outputs only binary classes (0 or 1), making it unsuitable for multi-class or regression problems without extension.

Frequently asked questions

What is the perceptron learning rule?

The perceptron learning rule updates each weight by adding the product of the learning rate, the prediction error (true label minus predicted output), and the corresponding input value. Weights are adjusted only when the perceptron makes an incorrect prediction, nudging the decision boundary toward classifying the sample correctly.

What does the learning rate η do in the perceptron algorithm?

The learning rate η controls how large each weight update step is. A high learning rate makes large updates that can cause instability or overshooting, while a low learning rate makes small, cautious updates that may converge more slowly. Typical values range from 0.01 to 0.5 for simple perceptron tasks.

Why does the perceptron fail on the XOR problem?

The XOR function is not linearly separable — no single straight line can divide its outputs into two classes. The perceptron can only learn linear decision boundaries, so it cannot represent XOR. This fundamental limitation was famously highlighted by Minsky and Papert in 1969 and spurred research into multi-layer networks.

What is the difference between the perceptron and logistic regression?

Both are linear binary classifiers, but they differ in activation and loss. The perceptron uses a hard step function and updates only on misclassified points, while logistic regression uses a sigmoid activation and minimises a probabilistic log-loss across all data points. Logistic regression produces probability estimates; the perceptron does not.

How is the bias term updated in a perceptron?

The bias is treated as a special weight connected to a constant input of 1. It is updated with the same rule as other weights: b(new) = b(old) + η × (y − ŷ). The bias allows the decision boundary to shift away from the origin, giving the perceptron greater flexibility in separating classes.

Last updated: 2025-01-15 · Formula verified against primary sources.