Abstract
The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition. After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation.
Summary
This monograph provides the canonical reference for the mathematical theory of differential privacy. Chapter 2 introduces the formal definition of (epsilon, delta)-differential privacy (Definition 2.4): a randomized algorithm M is (epsilon, delta)-differentially private if for all neighboring databases x, y differing in at most one record and all measurable sets S, Pr[M(x) in S] ⇐ exp(epsilon) * Pr[M(y) in S] + delta. The parameter epsilon controls the multiplicative privacy loss while delta allows a small additive probability of failure. When delta = 0, the definition reduces to pure epsilon-differential privacy. Key properties established include closure under post-processing (Proposition 2.1) and group privacy (Theorem 2.2).
Chapter 3 develops the fundamental mechanisms for achieving differential privacy. The Laplace mechanism (Definition 3.3) adds noise from Lap(Delta_f / epsilon) to each coordinate of a real-valued query f, where Delta_f is the L1 sensitivity. Appendix A provides the Gaussian mechanism, which instead adds N(0, sigma^2) noise to each coordinate, achieving (epsilon, delta)-DP when sigma >= c * Delta_2_f / epsilon for c^2 > 2 ln(1.25/delta) (Theorem A.1). The Gaussian mechanism is calibrated to the L2 sensitivity rather than L1 sensitivity, which is often much smaller for high-dimensional queries. The chapter also covers composition: the basic composition theorem (Theorem 3.16) shows that k-fold composition of (epsilon_i, delta_i)-DP mechanisms yields (sum epsilon_i, sum delta_i)-DP, while the advanced composition theorem (Theorem 3.20) gives the tighter bound epsilon’ = sqrt(2k ln(1/delta’) * epsilon) + k * epsilon * (e^epsilon - 1) for k-fold adaptive composition.
These foundational results underpin the privacy analysis in certified machine unlearning. The Gaussian mechanism provides the noise injection used in gradient perturbation and output perturbation approaches to unlearning, while the advanced composition theorem governs how privacy degrades across sequential unlearning requests. The sensitivity framework determines how much noise is needed given the influence of individual data points on model parameters.
Key Contributions
- Formal definition of (epsilon, delta)-differential privacy with rigorous treatment of the probabilistic semantics
- The Laplace mechanism for achieving pure epsilon-DP for real-valued queries, calibrated to L1 sensitivity
- The Gaussian mechanism (Appendix A) for achieving (epsilon, delta)-DP, calibrated to L2 sensitivity
- Basic composition theorem: epsilons and deltas add up linearly under composition
- Advanced composition theorem: epsilon grows as O(sqrt(k)) rather than O(k) under k-fold adaptive composition, at the cost of an additional delta’ failure probability
- The exponential mechanism for differentially private selection from discrete sets
- Framework of approximate max-divergence and statistical distance for proving composition (Lemma 3.17)
Methodology
The monograph develops a systematic mathematical framework proceeding from definitions through basic mechanisms to composition theorems. Privacy guarantees are proved by bounding the log-likelihood ratio (privacy loss) between outputs on neighboring databases. The Gaussian mechanism proof (Appendix A) proceeds by bounding the privacy loss random variable |1/(2sigma^2) * (2x*Delta_f + Delta_f^2)| and using Gaussian tail bounds to ensure this exceeds epsilon with probability at most delta. The advanced composition theorem proof uses Azuma’s inequality applied to the sequence of per-step privacy loss random variables, exploiting the martingale structure that arises from the bounded expectation of each step’s privacy loss (Lemma 3.18: D(Y||Z) ⇐ epsilon * (e^epsilon - 1) when D_inf(Y||Z) ⇐ epsilon).
Key Findings
- The Gaussian mechanism requires sigma >= sqrt(2 ln(1.25/delta)) * Delta_2_f / epsilon, where Delta_2_f is the L2 sensitivity
- Under k-fold adaptive composition of (epsilon, delta)-DP mechanisms, the overall guarantee is (epsilon’, k*delta + delta’)-DP where epsilon’ = sqrt(2k ln(1/delta’) * epsilon) + k * epsilon * (e^epsilon - 1)
- For small epsilon, the advanced composition bound simplifies to approximately epsilon’ ~ 2 * epsilon * sqrt(k * ln(1/delta’))
- Pure epsilon-DP composes to kepsilon under k-fold use; (epsilon, delta)-DP composes to (kepsilon, k*delta) under basic composition
Important References
- Calibrating Noise to Sensitivity in Private Data Analysis
- The Composition Theorem for Differential Privacy
- Deep Learning with Differential Privacy