Empirical Bayes information quantifies how much information each “other” observation z_j contributes to estimating a particular parameter mu_i in empirical Bayes estimation via generalized Tweedie’s formula. This concept, introduced by Efron (2011), formalizes the fundamental trade-off in score-based estimation: the score function nabla log f(z) must be estimated from data, and the quality of that estimate determines the gap between empirical and true Bayes performance.
The conditional regret of using the empirical Bayes estimate mu-hat_z(z_0) = z_0 + l-hat’(z_0) instead of the true Bayes estimate mu^+(z_0) = z_0 + l’(z_0) is Reg(z_0) = E{(l-hat’(z_0) - l’(z_0))^2 | z_0} — the squared error of the estimated score. This decays as Reg(z_0) ~ c(z_0)/N, where N is the number of observations. The empirical Bayes information at z_0 is defined as I(z_0) = 1/c(z_0), so that Reg(z_0) ~ 1/(N * I(z_0)).
In the context of diffusion models, this framework quantifies how well a score network s_theta(x,t) can approximate the true score nabla log p_t(x): the regret is the score matching loss, and the information governs how it scales with training data. The James-Stein estimation case gives I(z_0) = (V/z_0)^2 / 2, showing information is highest near the center of the distribution and lowest in the tails.
Key Details
- Definition: I(z_0) = 1/c(z_0), where Reg(z_0) ~ c(z_0)/N
- Regret equals squared error of estimated score: E{(l-hat’(z_0) - l’(z_0))^2 | z_0}
- James-Stein case: I(z_0) = (V/z_0)^2 / 2
- High near the center of f(z), low in the tails where data is sparse
- Directly analogous to score estimation error in diffusion model training