LiSSA for inverse Hessian approximation

LiSSA (Linear time Stochastic Second-order Algorithm) is a computationally efficient method for approximating the inverse Hessian matrix, adopted in ZHANGB2024 to make certified unlearning practical for deep neural networks.

In certified unlearning, the Newton update requires computing (H_{w*} + lambdaI)^{-1} * grad_L(w, D_r), where H is the Hessian of the loss and lambda is the local convex approximation parameter. Direct computation of the inverse Hessian has complexity O(np^2 + p^3), which is prohibitive for large networks with p parameters and n training samples.

LiSSA approximates the inverse Hessian through a stochastic recursive procedure. Given s i.i.d. samples {H_{1,lambda}, …, H_{s,lambda}} of the regularized Hessian matrix, the approximation is computed iteratively:

H_tilde_{0,lambda}^{-1} = I
H_tilde_{t,lambda}^{-1} = I + (I - H_{t,lambda}/H) * H_tilde_{t-1,lambda}^{-1}

where H is an upper bound on the Hessian norm. The key theoretical guarantee (Proposition 3.5 in ZHANGB2024) is that H_tilde_{s,lambda}^{-1} / H is an asymptotic unbiased estimator of (H_{w*} + lambdaI)^{-1}, since the expectation E[H_tilde_{infinity,lambda}^{-1}/H] = E[(H_{w} + lambda*I)^{-1}].

Using the Hessian-vector product technique, the approximation H_tilde_{s,lambda}^{-1} * grad_L(w*, D_u) can be computed with complexity O(sp^2), independent of the full dataset size n. The gradient computation is also made efficient by leveraging the property that grad_L(w*, D_r) = -n_u/(n - n_u) * grad_L(w*, D_u), reducing complexity from O(np) to O(n_u * p) when the unlearned set is much smaller than the full dataset.

In experiments on MNIST with MLP, LiSSA achieves a 470x speedup compared to computing the exact inverse Hessian while maintaining the same certified unlearning guarantees.

method

Alethograph

Explorer

LiSSA for inverse Hessian approximation

Graph View

Backlinks