Model Scaling Law Performance Calculator

JJ Ben-Joseph headshot JJ Ben-Joseph

Empirical scaling laws summarize how training loss tends to improve as you increase training resources. This calculator focuses on the common “data scaling” relationship where loss decreases as a power law in the number of training tokens.

Introduction: What this calculator estimates

Given a baseline training run with token count N0 and observed training loss L0, plus a scaling exponent α and an irreducible loss floor B, the calculator:

Definitions and units

The scaling-law formula

The calculator uses the common form:

L(N) = A × N−α + B

Presented in MathML:

L (N) = A N α + B

Solving for A from the baseline

Using the baseline observation (N0, L0):

A = (L0 − B) × N0α

This requires L0 > B. If L0 is less than or equal to B, the fitted A is non-positive and the model no longer represents a diminishing-loss curve.

Projecting loss at N1

Once A is known, the projected loss at N1 is:

L(N1) = A × N1−α + B

Solving for tokens needed to reach a target loss

If you provide a target loss Ltarget (must satisfy Ltarget > B), then:

Ntarget = (A / (Ltarget − B))1/α

How to interpret the results

Worked example (matches the default inputs)

Suppose you observed:

First compute A:

Now project loss at N1:

If you also set Ltarget = 1.5:

This illustrates a common takeaway: pushing training loss close to the floor B can require enormous increases in tokens.

Quick comparison: what changes when you scale data?

Change What happens to L(N)? Practical implication
Increase N (more tokens) L decreases roughly as N−α until it nears B Diminishing returns; biggest gains are earlier
Increase α Curve falls faster with N Fewer extra tokens needed for the same loss drop
Increase B Loss floor rises; all projections shift upward Data/architecture/label noise may be limiting
Increase L0 with same N0 Implied A increases Worse baseline implies higher losses at all N unless you change B or α

Assumptions & limitations (read before acting on projections)

Practical tips

How to use this calculator

  1. Enter Baseline Dataset Tokens (N₀) using the unit or time period shown by the field.
  2. Enter Observed Baseline Loss (L₀) using the unit or time period shown by the field.
  3. Enter Scaling Exponent (α) using the unit or time period shown by the field.
  4. Run the calculation and compare the output with a second scenario before acting on it.

Arcade Mini-Game: Model Scaling Law Performance Calculator Calibration Run

Use this quick arcade run to practice separating useful scenario inputs from common planning mistakes before you rely on the calculator output.

Score: 0 Timer: 30s Best: 0

Start the game, then use your pointer or arrow keys to catch useful inputs and avoid bad assumptions.

Enter baseline metrics and scaling parameters to project performance.