Rethink Data Management Practices for CCPA, GDPR and Beyond

[Part 1 of a 3-Part Blog Series]
One of the big differences between privacy and security regulations is the role they give technological solutions as part of the compliance effort. Security regulations speak of tools, technical controls, and automated solutions to enable the protection of data. Privacy regulations, on the other hand, are written as if compliance can be met with paper — policies, contracts, and training. The California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR) are good examples of how privacy regulations can no longer be addressed with the “paper compliance” approach, yet these regulations also are very light on the crucial role automation must play in support of compliance.

Maintain customer trust

Before delving into the specific regulatory requirements that heavily rely on privacy technological capabilities, it’s important to remind the wind beneath any compliance effort related to the CCPA and the GDPR; namely, breach notification, the increased cost of mishandling personal information, and in the case of the CCPA, a clear distrust between businesses and their consumers. Breach notification draws attention to the company in question and opens it to a deeper examination of its privacy practices by relevant regulators. These examinations come with higher fines and in CA, litigation costs through a private right of action. Considering the data-sharing practices that led to the creation of the CCPA, the regulation’s requirements that give control to consumers (the data subjects of this regulation) should be of heightened concern for companies as they are increasingly viewed as the privacy self-defense tools in the hands of the consumers.

Do you know where your PI is located?

Automating data management practices to meet privacy regulations is intended to introduce effectiveness and efficiencies in managing the personal information of data subjects. My blog is organized into sections to address each of the layers of control over that information. As can be seen in Image A below, we start with the basic capability to recognize data subjects across systems and formats (Part 1). The second topic is the ability to protect data as it used by minimizing it is based on need, and then deleting data subjects completely when needed (Part 2). We close with the challenge of associating third parties with the data and data subjects they process (Part 3).



Image A: Layers of Control over Personal Information (PI)

Part 1: Know your data subjects

One of the requirements in many privacy regulations, including the CCPA and the GDPR, is for companies to share with a data subject – the data they maintain about them upon request. This basic privacy right is commonly referred to as a Data Subject Access Request (DSAR). Addressing DSARs is not new to companies as it has been part of privacy regulations around the world since the second half of the 20th century.

Raising the bar

Privacy regulations provide some exceptions for the level of effort a company is expected to assume when searching for information to satisfy a DSAR request. However, the current practice followed by companies stretches the boundaries of these exceptions and often limits the DSAR response to a search in a handful of pre-determined repositories. This practice is not sustainable for several reasons.

  • First, with the prevalence of breach notification data subjects that previously asked for a DSAR can discover in the future that the response they received was incomplete. In fact, with the private right of action as offered under the CCPA, if there is a data breach, a CA consumer can claim that had they knew the true extent of their data the breached company maintained about them, and if they had asked for it to be erased, they could then sue for damages due to the incomplete DSAR response.
  • Second, breaches lead to audits by regulators, which in turn may include a review of the company’s overall privacy practices, including how the company responded to the DSARs it previously received.
  • Last, but not least, data subjects today are more jaded and knowledgeable about the information companies collect and are more inclined to challenge a thin or neatly curated DSAR response.

Can technology rescue poor data management?

It is not easy to find information about data subjects across repositories. Years of lacking data management practices make this task a technological challenge. There are several specific gaps that must be addressed to enable companies to address this requirement with effectiveness and efficiency.

Example:
The inconsistent management of primary and foreign keys in databases for matching identities across tables is one example of an area that sorely needs improvement. Each table in a database has a key that allows its data to be matched with data in other tables. For example, a company may have three tables in a database: one with the contact information of its customers, the second with the transaction information of those customers and the third with the customers’ responses to satisfaction surveys. In the three tables, the customers are identified by their Customer ID number (CID). The CID is the primary key that links the three tables and allows the company to correctly connect the contact information, transaction and survey response of the same customer. The problem many companies are facing is that each database may contain thousands of tables with different keys (foreign) for the same person and no mapping that connects the primary with the foreign keys. Different tables were created by different users of the database who were only concerned with connecting the few tables they needed to use. Management of primary and foreign keys refers to the diligence of tracking how data subjects are identified on the database level so that when a DSAR needs to be fulfilled, the search for the data subject can be done efficiently across those various keys.

Example:
The clear naming of data elements in database tables is another common challenge that needs to be corrected so that data subjects can receive a complete view of the data companies process about them. This is especially relevant when the data itself is not self-evident. Machine Learning and Artificial Intelligence can figure out that a string of eight digits in a table’s column represent dates, but without clear headers in the table no technology can figure out whether the date stands for account creation, last transaction, last call for customer service, or any other possible activity. Traditionally, headers were titled without the expectation that others in the company outside of the database administrators and a few users will need to understand them. Privacy regulations require us to rethink this practice and establish new requirements to guide our management of data.

I have orphaned what?

When we speak of data subjects, we think of those individuals that have transacted with the company in one way or another (employees, applicants, customers, visitors, etc.). One of the dirty secrets of data management is that all companies have many “orphaned identities” in their repositories as well. Orphaned identities are data subjects that cannot be tied to any processing activity or privacy commitment. Orphaned identities can be in information that was added by employees who use company resources for personal use (the resumes of nannies, for example), information received in error from partners, tables with no keys, and more. Orphaned identities are still data subjects and the inappropriate disclosure of their personal information can still trigger breach notification obligations. The ability to find orphaned identities and correct them (eliminate them, most likely) is a data management practice that is detrimental to maintaining a clean and valid inventory of data subjects and for reducing its risk profile.

Set standards, goals, and reduce risk

When it comes to unstructured data, such as documents, we need some consensus on how to accurately identify an identity in data; e.g., is April Green a name of a person, street, company, or just a noun and adjective in proximity. There are different technical techniques to address such questions and they involve a calculation of the statistical confidence that certain data represents one thing and not the other. However, we do not have any standards that will guide our use of statistical confidence when searching for personal information when responding to DSARs. Such standards are important to create consistency and provide the rationale whenever challenged by a data subject or regulator about existing practices.

The correct identification of data subjects across the enterprise will allow the organization to develop a complete and accurate inventory of its data subjects – a goal most companies have yet to achieve. Such an organized view of data subjects is not only the best way to address DSARs but is the first steppingstone to address the other data management tasks companies are facing, such as those we cover in the next two blogs in my series.