GDPR Technical Series #3: Understanding Terminology for Finding Personal Data on Oct 16, 2018
In the previous posts of our GDPR series, we covered element-level protection techniques and how they map to anonymization and pseudonymization. In this post, we’ll focus on the difficult task of finding what to protect. In the context of GDPR, this means finding where a given organization stores its personal data.
Before going into techniques for finding personal data, it’s appropriate to first discuss terminology, as there are several terms related to finding personal data used in the industry that are close to each other, each having its own emphasis and connotation.
Generally refers to finding, extracting and aggregating data within an enterprise to support various business initiatives, ranging from analysis to decision making. Data discovery does not refer specifically to personal data, although personal data might be among the types of data discovered.
eDiscovery is a special instance of Data Discovery that refers to finding data relevant to a legal case.
Refers to classifying, and often tagging, documents into classification levels specified by an organization. Example classifications could include the level of sharing allowed for a particular document – such as PUBLIC, RESTRICTED, or PRIVATE. There can also be hierarchical classifications which go into greater levels of granularity within top-level classifications – for example, PRIVATE.PAYROLL-DEPT. Data classification doesn’t have to be specific to personal data.
Categorization is a flexible form of sorting data sets into groups based on inherent similarities between them, as opposed to externally imposed criteria used in classification. Two different documents that are about a particular topic might fall within a category, but depending on classification criteria, might fall into different classes. Despite the differences, categorization can often be used to aid classification. For a more in-depth discussion on the differences between classification and categorization, please refer to Elin K. Jacob’s paper “Classification and Categorization: A Difference that makes a Difference”.
Data Mapping refers to both identifying and mapping the flow of personal data across an enterprise. GDPR’s Article 30, which requires records of data processing activities to be kept by controllers and processors, is the key GDPR tenet driving Data Mapping requirements. While the GDPR itself doesn’t refer to Data Mapping, the ICO, for example, recommends in its section on documenting processing activities, “A good way to start is by doing an information audit or data-mapping exercise to clarify what personal data your organisation holds and where.”
Data Cataloging refers to creating an inventory of all data within the enterprise. It is often part of a larger data governance program and includes items which may or may not be personal data. Due to the breadth of requirements, data cataloging often does not get to the depth needed to understand and protect personal data.
Detection of Personal Data
The term Detection of Personal Data refers to the item-level detection of personal data in an enterprise and is used to differentiate itself from general data discovery and e-discovery. Detection of Personal Data is a subset of an activity generally described as ‘Detection of Sensitive Data.’ The difference between ‘Sensitive Data’ and ‘Personal Data’ is that Personal Data is sensitive data that can be tied to a real person, as defined by the GDPR. For example, a Vehicle Identification Number (VIN) might be viewed as Sensitive Data, and not Personal Data, as long as it cannot be associated to a particular person; once an association can be made, it becomes Personal Data. Sometimes, the association may not be made at the time of detection. Additional information, provided later, might create the association. The variability of timing in establishing an association indicates the difficulty with schemes claiming to find only Personal Data but not general Sensitive Data. While it sounds attractive to match the approach to GDPR terms, just finding Personal Data is not sufficient, given the possibility that any Sensitive Data could become Personal Data at a later time.
Note that the term “Sensitive Personal Data” is also used sometimes to refer to what is officially known as “Special Categories of Personal Data” in the GDPR specification. For the purposes of the above discussion, “Sensitive Personal Data” is to be treated as a subset of the term “Personal Data”.
Going forward in this GDPR series, we will use the term “Detection of Personal Data” when referring to the activities needed for GDPR compliance. In our next post, we’ll dive deeper into the categories of personal data and mechanics for finding them within large enterprise stores.