Big Data Security: Progress is Made, But is it Enough?

It’s taken a few years and more than a few major data breaches, but it would appear the tide is finally beginning to turn when it comes to awareness of the importance of protecting data stored in analytical and transactional systems alike. But don’t let your guard down yet, as major security threats—such as gaps in the Hadoop security stack, ransomware, and corporate bureaucracy—continue to threaten data sanctity.

It wasn’t that long ago that businesses didn’t even bother thinking about customer trust. Indeed, PricewaterhouseCoopers didn’t even track trust 20 years ago. But a rash of accounting frauds (Enron, Worldcom, et al) in the early 2000s popped that bubble, and before long, hackers were weaseling their way into unsecured corporate databases to pilfer customer records on a regular basis.

In 2013, only 37% of CEOs worried that lack of trust in business would harm their company’s growth in 2013, according to PwC’s 2016 CEO Global Survey. By 2016, that number had jumped to 58%. And today, nearly nine out of 10 CEOs in the US are somewhat or extremely concerned about cyber threats, according to the report.

It’s clear that data security has finally landed on the corporate radar with a loud, sickening smack. But just because one is aware of the threats doesn’t make them go away. In fact, the security situation may be getting worse thanks to the adoption of distributed big data platforms, like Hadoop.

Holey Hadoop, Batman!

While Hadoop continues to be the focus of many organizations big data analytic and IoT strategies, the distributed storage and computing platforms also continues to bear bad news for those who value the security of data.

“The continuing growth of Hadoop as a platform for data analysis and, increasingly, for more operational data processing uses has created data security issues that are not being addressed,” Gartner analyst Merv Adrian wrote in a December research report titled Rethink and Extend Data Security Policies to Include Hadoop.

The report took the three major commercial Hadoop distributors to task for creating “three distinct competing stacks of security software.” What’s more, each of the stacks is immature, is not comprehensive, and appears destined to promote incompatibility and vendor lock-in, Adrian’s report says.

“Unlike DBMSs,” the respected analyst says, “Hadoop software stacks have not had built-in security capabilities and, because they increase utilization of file system-based data that is not otherwise protected, new vulnerabilities can emerge that compromise carefully crafted data security regimes.”

What’s more, because of the nature of Hadoop data lakes—where raw unstructured and semi-structured data of unknown quality is written, and only structured when it’s read (or schema-on-read)—it raises other risks.

“Unlike DBMSs, which are typically used to store known data that conforms to predetermined policies about quality, ownership and standards, Hadoop creates the possibility of presenting users with ‘dark data,’” Adrian writes.

Third-Party Security

This gap in Hadoop security protections has created room for third-party vendors to operate and exploit. Informatica, which is attempting to pivot its long-standing dominance in ETL for data warehouses into the new distributed Hadoop world, is one of the vendors looking to make a splash.

“Despite all the time, effort and billions, maybe trillions, of dollars spent, security is not working,” says Amit Walia, executive vice president and chief product officer for Informatica. “Security breaches are still on the rise because most organizations are taking the wrong approach; they are focused on securing the end-points.”

Informatica says it has a better approach with a new release of its existing security product, [email protected] Unveiled Wednesday, the Redwood City, California company says the software combines several important capabilities needed to protect data as it sits in Hadoop and legacy environments, including automated discovery of sensitive data, proliferation analysis, anomalous user activity detection, multi-factor data risk analytics, and automated orchestration of remediation.

Another vendor angling for a piece of the emerging Hadoop security pie, as defined by Gartner, is Dataguise.

“Gartner has again nailed the importance of broadening one’s perspective when it comes to Hadoop,” says JT Sison, VP of marketing and business development for the Fremont, California company. “As mentioned in the report, there are many threats to consider regardless of the data framework selected so it will be necessary for organizations to orchestrate a Hadoop security stack.”

Dataguise says its DGProtect offering can help companies detect, audit, protect, and monitor their sensitive data assets residing in Hadoop, in the cloud, and other repositories, such as NoSQL databases like Apache Cassandra, Teradata warehouses, and even Microsoft SharePoint.

Bureaucratic Malaise

Meanwhile, a new report from Intel finds the corporate bureaucracy also poses a risk to effective remediation of the security threat posed by cybercriminals.

“…[C]ybercriminals have the advantage, thanks to the incentives for cybercrime creating a big business in a fluid and dynamic marketplace,” Intel’s McAfee subsidiary writes in the report, titled Titling the Playing Field: How Misaligned Incentives Work Against Cybersecurity. “Defenders on the other hand, often operate in bureaucratic hierarchies, making them hard-pressed to keep up.” View More