Data Minimization in the GDPR: A Primer

An important principle in the European Union’s General Data Protection Regulation (GDPR) is data minimization. Data processing should only use as much data as is required to successfully accomplish a given task. Additionally, data collected for one purpose cannot be repurposed without further consent.

Data minimization is referenced in five separate sections in the GDPR. While not a novel concept in data management, the GDPR does re-emphasize the importance of applying the concept in practice. In fact, it is impossible to be GDPR-compliant without implementing data minimization rules and processes at every step in the data lifecycle. This means that companies must limit personal data collection, storage, and usage to data that is relevant, adequate, and absolutely necessary for carrying out the purpose for which the data is processed.

The first step in determining data limitations is understanding the data’s context. Typically, an organization’s data exists in three different environments with three different purposes: (1) dev/test, (2) production, and (3) data warehousing & analytics.

In dev/test environments…
In dev/test environments, the testing data must behave in the same fashion as the data held in the production environment. As such, the data must have realistic behavior and distribution. However, any personal data in dev/test datasets must be suppressed. This dilemma can be solved in a variety of ways. Simply omitting the sensitive data from the dev/test environment can be a quick and easy fix, but it restricts realistic testing scenarios. Using fictitious datasets can enable the full range of testing scenarios but may not accurately reflect the behavior seen in production data. For optimal business use, masking the sensitive data in a realistic and consistent fashion satisfies GDPR requirements while maintaining production-quality behavior and distribution.

In production environments…
For production environments, data minimization is less about obscuring data than it is about limiting data collection to begin with. When big data first sprang on the scene, a trend towards collecting every bit of data began. In actual practice, personal data that is also useful for customer service is often fairly concise. When requesting personal data, first ask: “Do I really need this data point?”

In data warehousing & analytics environments…
For data warehousing and analytics, the question changes slightly to: “Do I still need this data point?” In order for data to be useful, it needs to be timely and accurate. As time passes, data elements become less useful. What’s the probability that an address collected 65 years ago is still current? Sometimes, data points become irrelevant or unnecessary before any time passes at all. Wherever possible, companies should minimize personal data exposure in data warehousing & analytics environments through protection measures like encryption, masking, and monitoring. For example, analyzing purchasing trends does not necessarily require keeping complete credit card numbers. If still needed, they could be partially masked to comply with GDPR.

From dev/test to analytics, data minimization requires understanding a business’s needs and collecting, storing, and using only the data that relates to those needs. This paradigm is constantly shifting as a business grows and competes in the marketplace, but ensuring the privacy of personal data is a constant requirement of the GDPR that will take effect in May 2018. Will you be ready? We’re here to help!