Attack the Big Data Security Problem by Reversing the 80-20 Rule
Jun 11, 2015
Big data analysts spend most of their time ensuring that sensitive data fields are secure. Read about solutions that could allow analysts to focus more attention on algorithms and reporting.
The 80-20 rule, originally developed by management consultant Joseph Juran, is a rule of thumb that is commonly used in business. For example, 80% of a company’s business is conducted by using only 20% of its available analytics reports, or 80% of a company’s product returns are generated from 20% of its products.
This 80-20 rule also burdens big data analysts when they have to apply security rules to big data. The 80-20 wreaks havoc because many big data analysts spend 80% of their time in the data prep process, and a large part of this effort is ensuring that sensitive data fields embedded in big data are secure before the data is imported into analytics work. This leaves analysts only 20% of their time to work on analytics algorithms and reporting for the end business.
The problem in a nutshell
When big data comes in from websites and Internet of Thing (IoT) sources like machines, it comes in completely unedited. This includes the roughly 5% of highly sensitive data elements snarled up in this data, and also the reams of raw logging data, clickstream data, sentiment data, and general jitter that you have to make your way through to determine where security controls are needed.
The data enters at breakneck velocities, making data cleaning and security processes even more difficult. To compound matters further, businesses want to be able to blend in data from different data sources for their analytics. This means that every contributing raw data source must be cleaned and prepped for quality and security before it can be committed to a central data repository for use in analytics or used in live streams for real-time analytics.
How to address the problem
“The most effective way of attacking the big data security challenge is by adopting a data-centric approach to securing and prepping your data,” said Jeremy Stieglitz, vice present of products for Dataguise, which sells security software that detects, audits, protects, and monitors sensitive data assets in real time. “We don’t think that the answer is coding to an API (application programming interface) and developing applications to handle it.” Instead, Dataguise uses two methods for securing big data.
- Masking: A data analyst can “mask out” a security-sensitive data element so it is redacted from all incoming big data streams that contain it. Once a data mask is installed, it is more or less permanent.
- Encryption: A data analyst can encrypt security-sensitive data elements so they cannot be seen or used. Encryption offers flexibility that masking doesn’t, because at any time you can choose to decrypt a data element to make it available for analytics. Encryption controls can also be used with security permissions across the company that allow certain power users access to the data, while classifying the data as off-limits to general and casual users.
Being able to selectively apply security restrictions to incoming big data streams is critical, because privacy is a concern to businesses and customers. Rigorous security regulations must also be observed in industries such as finance, insurance, and healthcare. There is the additional danger that employees can exploit their access to sensitive data in ways that can expose companies to liability.
“In one case, two data analysts were able to see data that normally should have been secured and restricted, and to personally profit from the information about a particular company in advance of market reports,” said Stieglitz. “This was an issue of security access control, and it placed both the employees and their company in the position of potentially being ‘inside traders.'”
As big data liability and governance concerns grow, so will the demand for securing this data. A security tool that enables sites to do this by flagging only the data fields that must be secured (which are typically four to five percent of total data fields) can be run against incoming big data without unduly impacting system performance.
“It is good timing for big data security tools, because the big data security discussion has moved into corporate boardrooms and also into dialogues with company customers,” said Stieglitz.