Mathematics · Statistics · Hypothesis Testing
Kolmogorov-Smirnov Test Calculator
Computes the one-sample Kolmogorov-Smirnov test statistic D and approximate p-value to assess whether a sample follows a specified continuous distribution.
Calculator
Formula
D_n is the KS test statistic — the supremum (maximum) of the absolute difference between F_n(x), the empirical cumulative distribution function (ECDF) of the sample of size n, and F_0(x), the theoretical CDF of the hypothesized distribution. A large D_n provides evidence against the null hypothesis that the sample was drawn from F_0.
Source: Kolmogorov, A.N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari, 4, 83–91.
How it works
The one-sample KS test compares the empirical cumulative distribution function (ECDF) of your observed data against the theoretical CDF of a specified distribution. The ECDF at any point x is simply the proportion of sample values less than or equal to x. As your sample size grows, the ECDF converges to the true underlying CDF — the KS test formalizes whether the gap between the empirical and theoretical CDFs is small enough to be explained by chance.
The test statistic D is defined as the maximum absolute difference between the ECDF F_n(x) and the theoretical CDF F_0(x) over all x. Formally, D_n = sup|F_n(x) − F_0(x)|. Under the null hypothesis that the sample comes from F_0, the distribution of the scaled statistic (√n · D_n) converges to the Kolmogorov distribution, which allows computation of an asymptotic p-value. The critical value at significance level α is given by c(α)/√n, where c(α) equals approximately 1.3581 at α = 0.05, 1.2238 at α = 0.10, and 1.6276 at α = 0.01.
In practice, the KS test is applied in many fields: engineers use it to verify that material strength data follows a normal distribution before applying stress-based reliability models; financial analysts test whether asset returns follow a theoretical distribution; and machine learning practitioners use two-sample variants to detect dataset drift between training and production data. This calculator handles the one-sample test against normal and uniform distributions, which covers a large fraction of real-world use cases.
Worked example
Suppose a quality engineer measures the diameter (in mm) of 10 ball bearings: 10.02, 9.98, 10.05, 10.01, 9.97, 10.03, 9.99, 10.04, 10.00, 10.02. She wants to test at α = 0.05 whether these diameters are normally distributed with mean μ = 10.00 mm and standard deviation σ = 0.03 mm.
Step 1 — Sort the data: 9.97, 9.98, 9.99, 10.00, 10.01, 10.02, 10.02, 10.03, 10.04, 10.05.
Step 2 — Compute the ECDF and theoretical CDF at each point: For each sorted value x_i, ECDF = i/n (using the post-step value). For x = 9.97, F_0(9.97) = Φ((9.97−10.00)/0.03) = Φ(−1) ≈ 0.1587; ECDF = 1/10 = 0.10. |ECDF − F_0| = 0.0587.
Step 3 — Find the maximum difference: Repeating for all 10 values, the largest absolute difference is approximately D = 0.0952.
Step 4 — Compare to critical value: At α = 0.05 with n = 10, the critical value is 1.3581/√10 ≈ 0.4294.
Step 5 — Conclusion: Since D = 0.0952 < 0.4294, we fail to reject H₀. The data are consistent with a normal distribution having μ = 10.00 and σ = 0.03.
Limitations & notes
The KS test has several important limitations to keep in mind. First, it is most powerful near the center of the distribution and less sensitive to differences in the tails, which can miss important departures from normality in extreme values — the Anderson-Darling test is preferred when tail behavior matters. Second, the asymptotic p-value approximation used here is less accurate for very small samples (n < 10); exact tables or simulation-based p-values should be used in those cases. Third, when distribution parameters (mean, standard deviation) are estimated from the same sample being tested rather than specified a priori, the standard KS critical values are too liberal — the Lilliefors correction should be applied instead. Fourth, the test only evaluates whether the data fit a single fully specified distribution; it does not select or rank competing distributions. Finally, passing the KS test does not prove that the data follow the hypothesized distribution — it merely means the evidence is insufficient to reject that assumption at the chosen significance level.
Frequently asked questions
What is the difference between the one-sample and two-sample KS tests?
The one-sample KS test (computed here) compares a dataset against a specific theoretical distribution such as normal or uniform. The two-sample KS test compares two empirical datasets to determine whether they were drawn from the same underlying distribution, without specifying what that distribution is. Both use the same D statistic concept but different critical value tables.
Should I use the KS test or the Shapiro-Wilk test to check normality?
For testing normality specifically, the Shapiro-Wilk test is generally more powerful than the KS test, particularly for small to moderate sample sizes (n ≤ 50). The KS test is more general and can test against any continuous distribution, not just normal. For normality testing with estimated parameters, the Lilliefors test (a modified KS test) is more appropriate than the standard KS test.
What does it mean if the p-value is greater than 0.05?
A p-value greater than 0.05 means you fail to reject the null hypothesis at the 5% significance level. This indicates that the observed maximum difference between your sample's ECDF and the theoretical CDF is not statistically significant — the data are consistent with having been drawn from the specified distribution. It does not prove the distribution is correct, only that you lack sufficient evidence to rule it out.
Can I use the KS test if I estimated the distribution parameters from the data?
Not with standard KS critical values. When you estimate parameters (e.g., mean and standard deviation) from the same sample, the actual Type I error rate is much lower than the nominal level, making the test too conservative. In this situation you should use the Lilliefors test, which provides corrected critical values specifically for parameters estimated from sample data.
How large a sample do I need for the KS test to be reliable?
The asymptotic p-value approximation becomes increasingly accurate as n grows and is generally reliable for n ≥ 30. For smaller samples, the test has low power — it may fail to detect even substantial departures from the hypothesized distribution. For very small samples (n < 10), use exact KS tables rather than the asymptotic approximation. Power increases substantially as n exceeds 50 or 100.
Last updated: 2025-01-15 · Formula verified against primary sources.