Question 1

What is the difference between precision and recall, and which matters more?

Accepted Answer

Precision measures how often a positive prediction is correct (low FP rate), while recall measures how often actual positives are found (low FN rate). Which matters more is entirely domain-dependent: in cancer screening, high recall is critical to avoid missing cases; in email spam filtering, high precision is preferred to avoid blocking legitimate emails. The F1 score balances both when neither can be prioritised alone.

Question 2

Why is accuracy a misleading metric for imbalanced datasets?

Accepted Answer

On an imbalanced dataset where 99% of samples are negative, a classifier that always predicts 'negative' achieves 99% accuracy while having 0% recall — it never detects a single positive case. Metrics like precision, recall, F1, and MCC account for the distribution of classes and provide a more honest picture of model performance.

Question 3

What is a good MCC value for a binary classifier?

Accepted Answer

MCC ranges from -1 to +1. An MCC of 0 indicates performance no better than random guessing. Values above 0.5 are generally considered good, above 0.7 are strong, and above 0.9 are excellent. Unlike F1 score, MCC is symmetric with respect to both classes, making it particularly reliable for imbalanced problems.

Question 4

What is the relationship between recall and sensitivity, and between specificity and the false positive rate?

Accepted Answer

Recall and sensitivity are exactly the same metric — both equal TP / (TP + FN). Specificity (TNR) and the False Positive Rate (FPR) are complements: FPR = 1 − Specificity = FP / (FP + TN). These relationships are foundational to ROC curve analysis, where the TPR (recall) is plotted against the FPR across decision thresholds.

Question 5

How do I choose the right classification threshold to optimise these metrics?

Accepted Answer

Most probabilistic classifiers output a score between 0 and 1, and the default threshold of 0.5 is rarely optimal. You can plot the ROC curve (TPR vs. FPR) or the Precision-Recall curve across all thresholds and choose the operating point that best matches your application's cost structure. Lowering the threshold increases recall but reduces precision; raising it does the opposite. Use this calculator to evaluate the full metric suite at each candidate threshold.

Confusion Matrix Calculator

Formula

How it works

Worked example

Limitations & notes

Frequently asked questions

What is the difference between precision and recall, and which matters more?

Why is accuracy a misleading metric for imbalanced datasets?

What is a good MCC value for a binary classifier?

What is the relationship between recall and sensitivity, and between specificity and the false positive rate?

How do I choose the right classification threshold to optimise these metrics?