Privacy by Design in the Age of Machine Learning

Traditional privacy frameworks were built for a world where data had a defined lifecycle: collection, storage, use, deletion. Machine learning breaks this model. Data is not just stored and retrieved. It is transformed into model parameters that persist indefinitely, resist deletion, and can regenerate approximations of the original data.

Privacy by Design, originally articulated by Ann Cavoukian, requires reimagining for the AI context. The EIAF extends its seven foundational principles with AI-specific implementations.

The AI Privacy Threat Model

AI systems create privacy risks that traditional data protection does not address. Model memorization occurs when models retain specific training examples that can be extracted through targeted prompting. Inference attacks derive sensitive attributes from seemingly innocuous data points. Membership inference determines whether a specific individual’s data was in the training set. Model inversion reconstructs training data from model outputs.

These are not theoretical attacks. They have been demonstrated repeatedly in research literature and are increasingly accessible to practitioners.

The EIAF Privacy Architecture

The EIAF requires AI-specific Data Protection Impact Assessments that go beyond standard DPIA methodology. Seven additional modules assess training data provenance, model memorization risk, inference attack surface, data minimization in feature engineering, consent management for model training, cross-border data flows in distributed training, and machine unlearning capability.

Differential privacy, federated learning, and synthetic data generation are technical controls that can reduce privacy risk without eliminating AI capability. The framework does not mandate specific technologies but requires privacy-preserving measures proportional to the data sensitivity and system risk tier.

The Right to Be Forgotten in ML

GDPR’s right to erasure creates a fundamental challenge for machine learning. Deleting someone’s data from a database is straightforward. Removing their influence from a trained model is computationally expensive and sometimes incomplete. The EIAF addresses this through machine unlearning requirements: documented capability to remove individual data influence, verification methods to confirm removal, and retention schedules for model retraining.

Privacy in the age of machine learning is not a checkbox. It is an architecture decision that must be made before the first training run.