Mahalanobis Distance Calculator
Understanding Mahalanobis distance (2D)
Mahalanobis distance measures how far an observation is from the center of a multivariate distribution after accounting for scale (variance) and correlation (covariance). Unlike Euclidean distance, it adapts to the “shape” of the data cloud: directions with high variance contribute less, and correlated variables are treated appropriately. This makes the metric useful for outlier detection, anomaly scoring, clustering, and multivariate quality control.
This calculator is for the two-variable case, where you provide an observation vector x, a mean vector μ, and a 2×2 covariance matrix Σ.
Definition and core formula
Let x be your observation and μ be the mean of the reference distribution. Define the centered vector:
v = x − μ
The (squared) Mahalanobis distance is:
And the distance is D = √(D²).
2×2 expansion (what the calculator computes)
In two dimensions:
x = (x1, x2)T, μ = (μ1, μ2)T, and
Σ = [ [σ11, σ12], [σ21, σ22] ]
The determinant of the covariance matrix is:
det(Σ) = σ11σ22 − σ12σ21
If det(Σ) ≠ 0, then the inverse is:
Σ−1 = (1 / det(Σ)) · [ [σ22, −σ12], [−σ21, σ11] ]
Let v1 = x1 − μ1 and v2 = x2 − μ2. Then:
D² = [v1 v2−1 · [v1, v2]T
How to interpret the result
Interpretation depends on assumptions about your reference distribution and on the number of dimensions. In the special (but common) case where the data are approximately multivariate normal with k variables, the statistic D² approximately follows a chi-square distribution with k degrees of freedom.
For this page (k = 2):
- If the model is reasonable, typical points have smaller D values (closer to the mean in covariance-adjusted units).
- Large D suggests the observation is unusual relative to the covariance structure (a candidate outlier/anomaly), but “large” depends on your tolerance and context.
- Because D² is chi-square-like under normality, thresholds are more principled in terms of D² cutoffs (percentiles) than fixed D cutoffs.
Quick reference table (2D, approximate under bivariate normal)
| Coverage (inside contour) | Chi-square cutoff for D² (df = 2) | Equivalent D = √(cutoff) | Practical interpretation |
|---|---|---|---|
| ≈ 68% | ≈ 2.28 | ≈ 1.51 | “Near typical” in 2D (analogous to 1σ idea, but not identical) |
| ≈ 95% | ≈ 5.99 | ≈ 2.45 | Common outlier screening boundary |
| ≈ 99% | ≈ 9.21 | ≈ 3.03 | Stronger anomaly flag (fewer false positives) |
These numbers are guidelines, not universal rules. If your data are heavy-tailed, skewed, multi-modal, or the mean/covariance estimates are unstable, then “chi-square thresholds” can be misleading.
Worked example (fully computed)
Suppose your two variables are (1) height and (2) weight. Let the reference mean be μ = (170, 70). Let the covariance matrix be:
Σ = [ [36, 30], [30, 100] ]
Interpretation: variance of height is 36 (sd 6), variance of weight is 100 (sd 10), and covariance is 30 (positive correlation).
Now evaluate the observation x = (180, 90):
- v = x − μ = (10, 20)
- det(Σ) = 36·100 − 30·30 = 3600 − 900 = 2700
- Σ−1 = (1/2700) · [ [100, −30], [−30, 36] ]
Compute D²:
- [ [100, −30], [−30, 36] ] · (10, 20)T = (100·10 − 30·20, −30·10 + 36·20) = (400, 420)
- vT · (…) = (10, 20) · (400, 420) = 10·400 + 20·420 = 12400
- D² = 12400 / 2700 ≈ 4.5926
- D = √4.5926 ≈ 2.143
Interpretation: In 2D, D ≈ 2.143 corresponds to D² ≈ 4.59, which is below the 95% cutoff (5.99). Under an approximately bivariate normal reference, this point is somewhat unusual but not extreme enough to be outside a common 95% ellipse.
Assumptions, input requirements, and limitations
- 2×2 only: This page computes Mahalanobis distance for exactly two variables. For higher dimensions you need a k×k covariance matrix inversion.
- Covariance should be symmetric: For a true covariance matrix, σ12 = σ21. If you enter different values, you are no longer using a valid covariance matrix; results may be hard to interpret.
- Matrix must be invertible: det(Σ) must be non-zero. If det(Σ) = 0 (or extremely close to 0), the matrix cannot be reliably inverted, and the distance is undefined/unstable.
- Positive definiteness matters: Practical covariance matrices are typically positive definite. If Σ is not positive definite, D² can behave unexpectedly (numerical issues or non-sensical geometry).
- Units: x and μ must be in the same units for each variable. Covariances must match those units squared (and cross-units for off-diagonals).
- Distributional caveat: Using chi-square thresholds for “outliers” is most justified when the reference distribution is approximately multivariate normal and μ/Σ are representative estimates.
- Estimation sensitivity: If μ and Σ are estimated from small samples or contain outliers, Mahalanobis distances can be distorted. Robust covariance estimators may be preferable in such cases.
FAQ
What does Mahalanobis distance measure?
It measures how far a point is from a distribution’s mean after accounting for variance and correlation—effectively a covariance-adjusted “number of standard deviations” in multivariate space.
Introduction: Why is covariance needed?
Covariance captures both scaling (variances) and correlation (how variables move together). Mahalanobis distance down-weights directions with high variance and corrects for correlated axes, unlike Euclidean distance.
What if σ12 ≠ σ21?
A valid covariance matrix is symmetric. If they differ, you likely have a data entry error. Consider setting both to the same value (often the estimated covariance between the two variables).
What does “invertible covariance matrix” mean?
It means det(Σ) ≠ 0 so Σ has a well-defined inverse. If the variables are perfectly (or near-perfectly) linearly dependent, Σ becomes singular (non-invertible) and Mahalanobis distance cannot be computed reliably.
How do I choose an outlier cutoff?
If a multivariate normal approximation is reasonable, use D² cutoffs from the chi-square distribution with k degrees of freedom (here k = 2). For example, the 95% cutoff is D² ≈ 5.99 (D ≈ 2.45). Otherwise, consider empirical/robust thresholds.
How to use this calculator
- Enter Observation x 1 using the unit or time period shown by the field.
- Enter Observation x 2 using the unit or time period shown by the field.
- Enter Mean μ 1 using the unit or time period shown by the field.
- Run the calculation and compare the output with a second scenario before acting on it.
Arcade Mini-Game: Mahalanobis Distance Calculator Calibration Run
Use this quick arcade run to practice separating useful scenario inputs from common planning mistakes before you rely on the calculator output.
Start the game, then use your pointer or arrow keys to catch useful inputs and avoid bad assumptions.
