DG for Hadoop offers businesses a straightforward and economical way to determine whether sensitive data is stored in their Hadoop repositories, evaluate their data exposure and compliance risk, and enforce the most appropriate remediation actions to protect their companies from financial and brand damage.
Challenges of Data Privacy and Compliance for Big Data
Petabytes of new data are accumulating and propagating across most businesses. Some of this data comes from external sources and from customer interaction channels, such as web sites, call centers, Facebook, and Twitter. Other resides in traditional data repositories, such RDBMS and file servers. To mine these large volumes and varieties of data in a cost efficient way, companies are adopting new technologies such as Hadoop. Line of Business managers are benefiting from Hadoop and its ability to enable the analysis of data patterns previously inaccessible. But Security officers are concerned about the nature of the information assembled in Hadoop and its uncontrolled accessibility. They are well aware of the potential catastrophic financial losses and the brand damage that compliance breaches can cause to their business. Additionally, they know that traditional top down security approaches aimed at surrounding all data with countless layers of peripheral protection do not work, because they inhibit data accessibility at a macro level, failing to take into account the variety and volume of data stored in Hadoop. These legacy security approaches are just too complex, too expensive, and incapable of selectively protecting the data that matters.
A Bull's-Eye Approach to Data Privacy Protection for Big Data
DG for Hadoop offers companies a straightforward and economical way to determine where data security is needed and then enforce it, right on the data that needs. To maintain compliance on a constant basis, security personnel can easily schedule DG for Hadoop to detect and protect sensitive data at the frequency they need. This greatly reduces the complexity and cost of data privacy protections. protection. With DG for Hadoop security personnel define what types of data are sensitive for their business and then schedule discoveries tasks to automatically determine whether such data is present in their Hadoop implementations.
After analyzing the results of the discovery tasks, they can instruct DG for Hadoop to protect specific data sets with the most appropriate remediation technique, such as masking or encryption. Masking replaces sensitive data with other realistic data of the same kind. This technique is useful when the data can be changed without compromising the scope of the analysis performed on it. In situations in which authorized users need to access the actual sensitive data for their analysis, encryption is the preferred data protection technique. Encryption makes data unreadable for all users but the ones granted a decryption key.
Flexibility to Adapt to Different Business Need
Recognizing the unique characteristics of Big Data, DG for Hadoop parses through multi-terabytes of structured, unstructured, and semi-structured data in just a few hours.
Due to the volume of new data produced on a daily basis any security solution must be able to act incrementally on new data feeds. DG for Hadoop can deliver both detection and protection of sensitive data incrementally and at different stages of its aggregation. For example, it can scan and protect data either at ingestion time or once the data is in Hadoop. Additionally, when the data originates from an RDBMS, or other traditional data repositories such as Microsoft SharePoint servers of file systems, other components of the DgSecure productline can complement DG for Hadoop, scan the data and take the appropriate actions at the source, within these repositories. These options allow security officers and IT managers to choose the most effective protection workflow for their businesses.
- Detects and protects sensitive data at collection points or after it is stored in HDFS or NFS.
- Allows users to implement either masking or encryption for remediation
- Leverages Hadoop APIs or works with Flume, Sqoop, NFS and other access methods
- Sets access permissions on files to facilitate user credential enforcement
- Preserves data integrity when masking
- Schedules data inspection and policy enforcements at the frequency the users need
- Interoperates with central directory services
- Maintains highly secure data environments