FAQs

Company – Why Dataguise

Dataguise offers a unique approach to sensitive data protection in Hadoop. This approach combines intelligent data discovery with automated, data-centric encryption, masking, and audit tools for Flume, Sqoop, FTP, MapReduce, Hive, and SPARK.

Large customers in Finance, Insurance, Healthcare, Government, Technology and Retail utilize Dataguise for two business goals:

1. Reduce breach risk and data loss through sensitive data protection

2.  Solve Hadoop compliance, privacy, and regulatory mandates for PII, PCI, PHI, HIPAA and data privacy and data residency laws

The ability to discover, protect, and audit access to sensitive data gives organizations an additional layer of protection beyond what’s possible with existing Hadoop access control, authentication, authorization and data-at-rest encryption available in Hadoop. This protective layer provides unique precision, control, and audit of sensitive data ideally suited to customers that may need to:

 

* Share data with “semi-trusted” users either inside the organization or out to partners

* Sell data to third parties

* Control and monitor internal access to gain better protection against insider risk

* Partially reveal data (while preserving data uniqueness for analytics) through intelligent data masking or Format-preserving encryption

Most customers in regulatory markets are already utilizing existing Hadoop security capabilities around authentication access control (Kerberos), file system and network isolation and segmentation (ACLs and network Firewalls), file or volume encryption, as well as activity monitoring (logging, auditing, and data lineage). By operating specifically at the sensitive data element level (locking IDs or names, for instance), Dataguise can fit and enrich existing systems in a simple, non-blocking manner.
Businesses are rapidly adding new data sources to Hadoop analytics; these sources can include logging data, clickstream data, customer feedback and sentiment data. Increasingly, the net result is that all this new data going into Hadoop is “gray” – harder to retain or maintain structure, harder to cleanse, harder to determine the location and amount of sensitive data. A sensitive data discovery service helps organizations bring context and protection to this new, co-mingled, raw, noisy data being utilized in Hadoop.
The process begins with the definition of a security policy. Organizations select which sensitive elements they need to discover.  The rest of the process is automated. Through agents for data ingest (Flume, Sqoop, FTP) as well as agents for at-rest data (HDFS, HIVE, PIG), the Dataguise discovery service analyzes all data and filters and counts sensitive data elements for .txt, csv, logging, AVRO, SequenceFile, as well as common unstructured data formats (Word, Excel, Powerpoint, SMS, Email).
Our encrypt engine runs as an automated process (“agent”) for these data loaders (FTP, FLUME, SQOOP). We also support native field and row encryption inside an HDFS encryption agent. More generically, we have a JAR for invoking encryption and decryption and have built decryption UDFs for Pig, HIVE, and MapReduce.
Dataguise is certified on all three major Hadoop distributions: Cloudera, Hortonworks and MapR. In addition, downloadable Sandbox trials are available on both the Hortonworks and MapR partner websites.Dataguise has production Hadoop customers also using DgSecure for Hadoop with: (Apache Hadoop, Amazon Elastic MapReduce, Pivotal HD).