A noise schedule for diffusion models that destroys information more gradually than the linear schedule, particularly beneficial for lower-resolution images. It is defined via ᾱ_t = f(t)/f(0) where f(t) = cos²((t/T + s)/(1+s) · π/2) with a small offset s = 0.008. This produces a linear drop-off of ᾱ_t in the middle of the process with gentle transitions near t=0 and t=T, preserving more signal at both ends of the diffusion trajectory.

The clipping of β_t at 0.999 prevents singularities near t=T where the cosine function approaches zero. Introduced by Nichol & Dhariwal (2021), the cosine schedule improves both log-likelihoods and sample quality compared to the linear schedule, especially on 32×32 and 64×64 images where the linear schedule wastes diffusion capacity on near-pure-noise timesteps that contribute little to learning.

The offset s = 0.008 prevents β_t from being too small near t=0, which would otherwise cause the schedule to start too slowly and waste early diffusion steps on negligible noise levels.

Key Details

  • f(t) = cos²((t/T + s)/(1+s) · π/2) with s=0.008
  • Gentle transitions at both endpoints of the diffusion process
  • Improves NLL from 3.99 to 3.57 bits/dim on ImageNet 64×64
  • β_t clipped at 0.999

concept