A noise schedule for diffusion models that destroys information more gradually than the linear schedule, particularly beneficial for lower-resolution images. It is defined via ᾱ_t = f(t)/f(0) where f(t) = cos²((t/T + s)/(1+s) · π/2) with a small offset s = 0.008. This produces a linear drop-off of ᾱ_t in the middle of the process with gentle transitions near t=0 and t=T, preserving more signal at both ends of the diffusion trajectory.
The clipping of β_t at 0.999 prevents singularities near t=T where the cosine function approaches zero. Introduced by Nichol & Dhariwal (2021), the cosine schedule improves both log-likelihoods and sample quality compared to the linear schedule, especially on 32×32 and 64×64 images where the linear schedule wastes diffusion capacity on near-pure-noise timesteps that contribute little to learning.
The offset s = 0.008 prevents β_t from being too small near t=0, which would otherwise cause the schedule to start too slowly and waste early diffusion steps on negligible noise levels.
Key Details
- f(t) = cos²((t/T + s)/(1+s) · π/2) with s=0.008
- Gentle transitions at both endpoints of the diffusion process
- Improves NLL from 3.99 to 3.57 bits/dim on ImageNet 64×64
- β_t clipped at 0.999