Kullback–Leibler Divergence Calculator
Introduction: What this calculator does
This page helps you compare two discrete probability distributions (probability vectors) P and Q defined over the same set of outcomes (categories). You enter the probabilities as comma-separated lists (for example, 0.6, 0.4). The calculator then reports common information-theoretic divergence measures, such as:
- KL divergence (sometimes called “forward KL”)
- Reverse KL
- Cross-entropy
- Jensen–Shannon divergence (JSD), a symmetrized and smoothed alternative
You can also choose the log base (natural log gives results in nats; base 2 gives results in bits).
Definitions and formulas (discrete case)
Assume P and Q are discrete distributions over outcomes , with , , and .
KL divergence
The Kullback–Leibler divergence from to is:
If you choose ln, then log is and the unit is nats. If you choose log2, the unit is bits.
Cross-entropy
Cross-entropy of relative to is:
H(P, Q) = - ∑ P(i) log Q(i)
It relates to KL divergence via:
H(P, Q) = H(P) + D_KL(P‖Q), where H(P) = -∑ P(i) log P(i) is the entropy of P.
Jensen–Shannon divergence (JSD)
JSD is a symmetric, smoothed divergence based on the mixture :
JSD(P, Q) = 1/2 · D_KL(P‖M) + 1/2 · D_KL(Q‖M)
With base-2 logs, JSD is bounded between 0 and 1 bit for discrete distributions.
How to interpret the results
- KL(P‖Q) = 0 only when the distributions match exactly (for all outcomes with nonzero probability).
- Asymmetry matters: KL(P‖Q) is generally not equal to KL(Q‖P). “Forward KL” heavily penalizes cases where
Passigns probability to outcomes thatQconsiders impossible. - Units: nats (ln) or bits (log2). Bits are often easier to interpret as “extra bits per event” under an optimal code.
- Cross-entropy can be read as the expected coding cost if data truly follow
Pbut you code using a modelQ. - JSD is often preferred when you want a finite, symmetric comparison and better behavior around zeros.
Worked example
Let:
P = 0.6, 0.4Q = 0.5, 0.5
Using natural logs:
D_KL(P‖Q) = 0.6·ln(0.6/0.5) + 0.4·ln(0.4/0.5)
= 0.6·ln(1.2) + 0.4·ln(0.8) ≈ 0.6·0.1823 + 0.4·(-0.2231) ≈ 0.0201 nats
This is small, indicating P and Q are close.
Metric comparison (at a glance)
| Metric | Discrete formula | Symmetric? | Range / behavior | Notes |
|---|---|---|---|---|
| KL(P‖Q) | ∑ P(i) log(P(i)/Q(i)) | No | ≥ 0; can be ∞ | Undefined/infinite if Q(i)=0 where P(i)>0 |
| KL(Q‖P) | ∑ Q(i) log(Q(i)/P(i)) | No | ≥ 0; can be ∞ | Highlights different failure modes than KL(P‖Q) |
| Cross-entropy H(P,Q) | −∑ P(i) log Q(i) | No | ≥ H(P); can be ∞ | Common in classification/log-loss settings |
| JSD(P,Q) | ½·KL(P‖M)+½·KL(Q‖M), M=(P+Q)/2 | Yes | Finite; bounded (≤ 1 bit with log2) | More stable and interpretable for “distance-like” comparison |
Limitations and assumptions (important)
- Same length / same outcomes: The i-th entry of
Pmust correspond to the same outcome as the i-th entry ofQ. If you reorder one list, the divergence changes. - Non-negative inputs: Probabilities must be ≥ 0. Negative values are not meaningful for KL/cross-entropy/JSD.
- Normalization: Many calculators (and some scripts) will normalize lists that don’t sum to 1. This is convenient, but it changes the interpretation if your inputs were counts/weights rather than probabilities. If you provide raw counts, be aware you are effectively comparing the normalized empirical distributions.
- Zeros in P: Terms with
P(i)=0contribute 0 to KL (by the limit behavior), so they do not cause problems by themselves. - Zeros in Q: If
Q(i)=0whileP(i)>0, thenKL(P‖Q)diverges to infinity (because you are assigning zero probability to an event that occurs underP). This is not a bug; it reflects an impossible event underQ. - Finite precision: Very small probabilities can lead to large logs and numerical instability. If you see surprising outputs, consider smoothing (e.g., adding a tiny epsilon to each probability and renormalizing), and report the method if used.
- Discrete-only: These formulas apply to discrete distributions. Continuous distributions require integrals and careful handling of densities (and units) rather than probabilities.
References
- T. M. Cover & J. A. Thomas, Elements of Information Theory.
- Wikipedia: Kullback–Leibler divergence; Jensen–Shannon divergence (for quick reference).
How to use this calculator
- Enter Probabilities P using the unit or time period shown by the field.
- Enter Probabilities Q using the unit or time period shown by the field.
- Enter Log base using the unit or time period shown by the field.
- Run the calculation and compare the output with a second scenario before acting on it.
Arcade Mini-Game: Kullback–Leibler Divergence Calculator Calibration Run
Use this quick arcade run to practice separating useful scenario inputs from common planning mistakes before you rely on the calculator output.
Start the game, then use your pointer or arrow keys to catch useful inputs and avoid bad assumptions.
