Gartner on Masking and Encryption for PII/PHI Protection Sep 14, 2015
Gartner gets it almost quite right in their newly published Technical Advice article on Protecting PII and PHI With Data Masking, Format-Preserving Encryption and Tokenization” (Joerg Fritsch and Ramon Krikken, published on Sept 10, 2015.
The key findings of this paper examine the uses and impacts of tokenization, data masking (both static and dynamic masking are explored) and format-preserving encryption.
Here’s what I liked and didn’t like about the research.
- They nailed data masking exactly right… Use static data masking to de-identify PHI and PII data for all secondary use where the characteristics of the original data are required but not the original data — for example, application testing and development or data publishing for medical studies.
- They drew the correct distinction that encryption and/or tokenization plays a de-identification role when some set of users need to retrieve access to the original cleartext data… Use reversible techniques, such as dynamic data masking or tokenization, for all operational uses of data when either some users still need to view sensitive data in the clear or when you are uncertain about the future secondary use of your data.
- Gartner seems to be finally coming around to the reality that dynamic masking is of limited use and has some real practical deployment hurdles, challenges and outright barriers. In their table comparing techniques, dynamic masking has no recommended applications, and accurately reflects some of the reduced masking options, and late-binding challenges inherent in this approach. We’ve chosen at Dataguise to not implement dynamic masking for SQL because of these problems, as well as the reality that it won’t scale to complex, large database environments. (With the exception of dynamic masking for Cassandra NoSQL, which utilizes the CQL in a more elegant, clean, scalable manner.)
- Gartner is enamored with the notion that data protection does not equal privacy, and that true privacy is achieved only when protected data is compared to, and analyzed against all other published or known data elements. Known as inference risk, this is a legitimate, but small issue for most enterprises. The reality of today’s modern enterprise is that breach risk is job #1, 2, and 3. People are hacking to steal, sell, post, and broadcast stolen data. There may be a much smaller subset of attackers that are inference-attacking data, but it’s a several orders of magnitude smaller business risk. Equivalently, businesses need to protect sensitive data to achieve regulatory compliance. I know of NO compliance legislation that mandates the full protection of sensitive data against inference risk – it would be simply too hard to govern, implement, or verify. As a result, enterprises striving for compliance are thinking very little to none about the role of inference in achieving HIPAA, PCI, State Privacy compliance. In 200+ customer engagements, we’ve been asked for inference and “privacy analytics” only twice.
- This Gartner report indicates that the only way to run static data masking is via an ETL process (referred to as “Batch mode”). Here at Dataguise, we’ve built dedicated masking agents that run in-place, inside the database, for Oracle, Microsoft, IBM DB2, Teradata and all leading databases. These can run as batch jobs, or more practically, as continuous protection operations, masking new database changes incrementally in a live, production environment. An in-place masking agent has tremendous performance and security advantages over ETL masking, and should be the first choice for customers running large or complex masking jobs.
- Finally, Gartner misses some of the creative options available in utilizing hybrid methods in-between these techniques. For instance, Dataguise utilizes the FPE algorithm – it’s strong, secure, cryptographically researched, but we do so in a mode that creates only masks. Format preserving masking, and we’re bullish that it can deliver some of the best-of-both – lower architectural complexity, higher, provable security, format preservation, PCI scope reduction, etc.
All in all though, there is a lot of good research and market guidance here. And congrats to Fritsch and Krikken for finally tackling and putting into a logical framework the whens/hows for selecting between masking/tokenizing/encrypting for data protection. This is sorely needed and this report is a first to my knowledge in attempting to strictly clarify these. The full research report is available at Gartner.com and requires a Gartner subscription: https://www.gartner.com/doc/3128117?ref=SiteSearch&sthkw=joerg&fnl=search&srcId=1-3478922254