Mathematics · Probability & Statistics · Inferential Statistics
T-Test Calculator
Calculates the one-sample t-statistic and degrees of freedom to test whether a sample mean differs significantly from a known or hypothesised population mean.
Calculator
Formula
\bar{x} is the sample mean, \mu_0 is the hypothesised population mean (null hypothesis value), s is the sample standard deviation, and n is the sample size. The denominator s / \sqrt{n} is the standard error of the mean. Degrees of freedom are given by df = n - 1.
Source: Student (W.S. Gosset), 'The Probable Error of a Mean', Biometrika, 6(1), 1908.
How it works
The one-sample t-test addresses the question: given a sample drawn from a population, is there sufficient evidence that the true population mean differs from some pre-specified value μ0? The procedure rests on the assumption that the underlying data are approximately normally distributed — a condition that, by the Central Limit Theorem, holds reasonably well for sample sizes of 30 or more even when the population distribution is not perfectly normal. When the population standard deviation σ is unknown (the common case), the t-distribution is used instead of the standard normal distribution because it accounts for the extra uncertainty introduced by estimating variability from the sample itself.
The formula is t = (&x̄; − μ0) / (s / √n). The numerator measures the raw difference between what you observed (the sample mean &x̄;) and what you expected under the null hypothesis (μ0). The denominator — the standard error of the mean — scales this difference by how much sampling variability you would expect given your sample size n and sample standard deviation s. A larger sample size shrinks the standard error, making the same raw difference yield a larger t-statistic and therefore stronger evidence against the null hypothesis. The degrees of freedom df = n − 1 determine the shape of the reference t-distribution; as df increases the t-distribution approaches the standard normal.
In practice the t-statistic is compared to a critical value from the t-distribution at a chosen significance level α (most commonly 0.05 for a two-tailed test). If |t| exceeds the critical value, or equivalently if the p-value is less than α, the null hypothesis is rejected. The t-test is used across virtually every quantitative discipline: a pharmaceutical company testing whether a new drug produces a different mean biomarker level than the known baseline, a manufacturer checking whether produced components meet a specification, an educator evaluating whether a new curriculum changes average test scores, or a financial analyst testing whether a portfolio's mean return differs from a benchmark.
Worked example
Suppose a nutritionist hypothesises that adults in a particular city consume a mean of μ0 = 2000 kcal per day. She surveys a random sample of n = 36 adults and finds a sample mean of &x̄; = 2150 kcal with a sample standard deviation of s = 420 kcal.
Step 1 — Compute the standard error:
SE = s / √n = 420 / √36 = 420 / 6 = 70 kcal
Step 2 — Compute the t-statistic:
t = (&x̄; − μ0) / SE = (2150 − 2000) / 70 = 150 / 70 ≈ 2.1429
Step 3 — Degrees of freedom:
df = n − 1 = 36 − 1 = 35
Step 4 — Interpret:
For a two-tailed test at α = 0.05 with df = 35, the critical t-value is approximately ±2.030. Since |2.1429| > 2.030, we reject the null hypothesis and conclude there is statistically significant evidence that mean daily caloric intake in this city differs from 2000 kcal. The corresponding p-value is approximately 0.039, confirming the result is significant at the 5% level.
Limitations & notes
The one-sample t-test makes several assumptions that should be checked before drawing conclusions. First, observations must be independent — if the data contain repeated measurements, time-series autocorrelation, or cluster effects, the standard error calculation is biased. Second, the test assumes the data (or the sampling distribution of the mean) are approximately normally distributed; for very small samples (n < 15) from heavily skewed or heavy-tailed distributions, consider a non-parametric alternative such as the Wilcoxon signed-rank test. Third, the sample standard deviation s must be a reasonable estimate of population variability — outliers can inflate s dramatically, distorting the t-statistic. Fourth, this calculator covers only the one-sample case; testing the difference between two group means requires an independent-samples or paired t-test. Finally, statistical significance does not imply practical significance — a very large sample can yield a highly significant t-statistic for a trivially small effect. Always report effect size (e.g. Cohen's d) alongside the t-statistic.
Frequently asked questions
What is a good t-statistic value for rejecting the null hypothesis?
There is no universally 'good' t-value because the critical threshold depends on both the degrees of freedom and your chosen significance level. For a two-tailed test at α = 0.05, the critical value approaches ±1.96 as the sample size grows large (df → ∞), but is ±2.306 at df = 8 and ±2.042 at df = 30. You must compare your computed t-statistic to the appropriate critical value from a t-distribution table or use software to obtain an exact p-value.
What is the difference between a one-tailed and two-tailed t-test?
A two-tailed test asks whether the sample mean is significantly different from μ<sub>0</sub> in either direction (higher or lower), while a one-tailed test asks specifically whether it is significantly greater than, or significantly less than, μ<sub>0</sub>. For the same t-statistic, a one-tailed test yields half the p-value of a two-tailed test. Use a one-tailed test only when the direction of the effect was specified before data collection; otherwise the two-tailed test is the safer default.
How large does my sample need to be for the t-test to be valid?
As a practical rule of thumb, n ≥ 30 is often cited as sufficient for the Central Limit Theorem to ensure the sampling distribution of the mean is approximately normal, making the t-test robust even if the underlying data are moderately non-normal. For normally distributed data, the t-test is valid for any sample size. For small samples (n < 15) from clearly non-normal or heavily skewed populations, consider the Wilcoxon signed-rank test instead.
What is the difference between the t-test and the z-test?
The z-test is used when the population standard deviation σ is known, using the standard normal distribution as the reference. The t-test is used when σ is unknown and must be estimated from the sample using s — which is almost always the case in practice. As sample size increases, the t-distribution converges to the standard normal, so for n ≥ 120 the two tests produce nearly identical results.
How do I compute Cohen's d from these results to measure effect size?
Cohen's d for a one-sample t-test is computed as d = (&x̄; − μ<sub>0</sub>) / s — the raw mean difference divided by the sample standard deviation (not the standard error). Conventional benchmarks are d = 0.2 (small), d = 0.5 (medium), and d = 0.8 (large). Effect size is independent of sample size, which makes it essential for interpreting whether a statistically significant result is also practically meaningful.
Last updated: 2025-01-15 · Formula verified against primary sources.