Summation form unlearning is the foundational approach to exact machine unlearning proposed by Cao & Yang (2015). It converts learning algorithms into a “summation form” based on statistical query (SQ) learning, where the model depends on a small number of aggregate summations rather than individual data points. Unlearning then reduces to subtracting one sample’s contributions from these summations and recomputing the model.

For non-adaptive SQ algorithms (naive Bayes, chi-squared test), the learning function takes the form Learn(Σ g₁(x_i, l_i), Σ g₂(x_i, l_i), …, Σ g_m(x_i, l_i)), where each g_k is an efficiently computable transformation. To unlearn sample x_p, compute G_k’ = G_k - g_k(x_p, l_p) and recompute Learn. For adaptive SQ algorithms (gradient descent, k-means, SVM), the converged state after subtraction serves as a warm start, requiring fewer iterations to re-converge.

Key Details

  • Non-adaptive unlearning: O(1) per summation, complete by construction
  • Adaptive unlearning: Resume from near-converged state, typically 10-20x fewer iterations than from scratch
  • Completeness: Guaranteed for non-adaptive (identical to retrained) and single-convergence-point adaptive algorithms
  • Timeliness: Asymptotic speedup of O(N/m) for non-adaptive, empirically 4.5x-95000x across systems
  • Scope: Applies to naive Bayes, linear regression, logistic regression, SVM, k-means, decision trees
  • Lineage tracking: Concept of tracking and forgetting data’s complete propagation network

method