Kullback–Leibler Divergence Calculator

JJ Ben-Joseph headshot Editorial review by: JJ Ben-Joseph

Introduction: What this calculator does

This page helps you compare two discrete probability distributions (probability vectors) P and Q defined over the same set of outcomes (categories). You enter the probabilities as comma-separated lists (for example, 0.6, 0.4). The calculator then reports common information-theoretic divergence measures, such as:

KL divergence $D_{K L} (P ∥ Q)$ (sometimes called “forward KL”)
Reverse KL $D_{KL} (Q ∥ P)$
Cross-entropy $H (P, Q)$
Jensen–Shannon divergence (JSD), a symmetrized and smoothed alternative

You can also choose the log base (natural log gives results in nats; base 2 gives results in bits).

Definitions and formulas (discrete case)

Assume P and Q are discrete distributions over outcomes $i = 1, \dots, n$ , with $P_{i} \geq 0$ , $Q_{i} \geq 0$ , and $\sum_{i}^{} P_{i} = \sum_{i}^{} Q_{i} = 1$ .

KL divergence

The Kullback–Leibler divergence from $P$ to $Q$ is:

D_{K L} (P ∥ Q) = \sum_{i = 1}^{n} P_{i} \cdot log (\frac{P_{i}}{Q_{i}})

If you choose ln, then log is $\ln$ and the unit is nats. If you choose log2, the unit is bits.

Cross-entropy

Cross-entropy of $P$ relative to $Q$ is:

H(P, Q) = - ∑ P(i) log Q(i)

It relates to KL divergence via:

H(P, Q) = H(P) + D_KL(P‖Q), where H(P) = -∑ P(i) log P(i) is the entropy of P.

Jensen–Shannon divergence (JSD)

JSD is a symmetric, smoothed divergence based on the mixture $M = \frac{1}{2} (P + Q)$ :

JSD(P, Q) = 1/2 · D_KL(P‖M) + 1/2 · D_KL(Q‖M)

With base-2 logs, JSD is bounded between 0 and 1 bit for discrete distributions.

How to interpret the results

KL(P‖Q) = 0 only when the distributions match exactly (for all outcomes with nonzero probability).
Asymmetry matters: KL(P‖Q) is generally not equal to KL(Q‖P). “Forward KL” heavily penalizes cases where P assigns probability to outcomes that Q considers impossible.
Units: nats (ln) or bits (log2). Bits are often easier to interpret as “extra bits per event” under an optimal code.
Cross-entropy can be read as the expected coding cost if data truly follow P but you code using a model Q.
JSD is often preferred when you want a finite, symmetric comparison and better behavior around zeros.

Worked example

Let:

P = 0.6, 0.4
Q = 0.5, 0.5

Using natural logs:

D_KL(P‖Q) = 0.6·ln(0.6/0.5) + 0.4·ln(0.4/0.5)

= 0.6·ln(1.2) + 0.4·ln(0.8) ≈ 0.6·0.1823 + 0.4·(-0.2231) ≈ 0.0201 nats

This is small, indicating P and Q are close.

Metric comparison (at a glance)

Metric	Discrete formula	Symmetric?	Range / behavior	Notes
KL(P‖Q)	∑ P(i) log(P(i)/Q(i))	No	≥ 0; can be ∞	Undefined/infinite if Q(i)=0 where P(i)>0
KL(Q‖P)	∑ Q(i) log(Q(i)/P(i))	No	≥ 0; can be ∞	Highlights different failure modes than KL(P‖Q)
Cross-entropy H(P,Q)	−∑ P(i) log Q(i)	No	≥ H(P); can be ∞	Common in classification/log-loss settings
JSD(P,Q)	½·KL(P‖M)+½·KL(Q‖M), M=(P+Q)/2	Yes	Finite; bounded (≤ 1 bit with log2)	More stable and interpretable for “distance-like” comparison

Limitations and assumptions (important)

Same length / same outcomes: The i-th entry of P must correspond to the same outcome as the i-th entry of Q. If you reorder one list, the divergence changes.
Non-negative inputs: Probabilities must be ≥ 0. Negative values are not meaningful for KL/cross-entropy/JSD.
Normalization: Many calculators (and some scripts) will normalize lists that don’t sum to 1. This is convenient, but it changes the interpretation if your inputs were counts/weights rather than probabilities. If you provide raw counts, be aware you are effectively comparing the normalized empirical distributions.
Zeros in P: Terms with P(i)=0 contribute 0 to KL (by the limit behavior), so they do not cause problems by themselves.
Zeros in Q: If Q(i)=0 while P(i)>0, then KL(P‖Q) diverges to infinity (because you are assigning zero probability to an event that occurs under P). This is not a bug; it reflects an impossible event under Q.
Finite precision: Very small probabilities can lead to large logs and numerical instability. If you see surprising outputs, consider smoothing (e.g., adding a tiny epsilon to each probability and renormalizing), and report the method if used.
Discrete-only: These formulas apply to discrete distributions. Continuous distributions require integrals and careful handling of densities (and units) rather than probabilities.

References

T. M. Cover & J. A. Thomas, Elements of Information Theory.
Wikipedia: Kullback–Leibler divergence; Jensen–Shannon divergence (for quick reference).

How to use this calculator

Enter Probabilities P using the unit or time period shown by the field.
Enter Probabilities Q using the unit or time period shown by the field.
Enter Log base using the unit or time period shown by the field.
Run the calculation and compare the output with a second scenario before acting on it.

Probabilities P: Probabilities Q: Log base:

Enter P and Q.

Divergence metrics
Metric	Value

Arcade Mini-Game: Kullback–Leibler Divergence Calculator Calibration Run

Use this quick arcade run to practice separating useful scenario inputs from common planning mistakes before you rely on the calculator output.

Score: 0 Timer: 30s Best: 0

Start the game, then use your pointer or arrow keys to catch useful inputs and avoid bad assumptions.

Kullback–Leibler Divergence Calculator

Introduction: What this calculator does

Definitions and formulas (discrete case)

KL divergence

Cross-entropy

Jensen–Shannon divergence (JSD)

How to interpret the results

Worked example

Metric comparison (at a glance)

Limitations and assumptions (important)

References

How to use this calculator

Embed this calculator

Related Calculators

Shannon Entropy Calculator - Information Content

Divergence Theorem Calculator - Relate Flux and Volume Integral

Divergence and Curl Calculator - Analyze Vector Fields

Letter Frequency Analyzer | Count A–Z Frequencies and Entropy

Roulette Bias Detection Sample Size Calculator

Shannon Diversity Index Calculator (H′) | Richness, Evenness, Effective Species