Company – Why Dataguise
Most customers in regulatory markets are already utilizing existing Hadoop security capabilities around authentication access control (Kerberos), file system and network isolation and segmentation (ACLs and network Firewalls), file or volume encryption, as well as activity monitoring (logging, auditing, and data lineage). By operating specifically at the sensitive data element level (locking IDs or names, for instance), Dataguise can fit and enrich existing systems in a simple, non-blocking manner.
Businesses are rapidly adding new data sources to Hadoop analytics; these sources can include logging data, clickstream data, customer feedback and sentiment data. Increasingly, the net result is that all this new data going into Hadoop is “gray” – harder to retain or maintain structure, harder to cleanse, harder to determine the location and amount of sensitive data. A sensitive data discovery service helps organizations bring context and protection to this new, co-mingled, raw, noisy data being utilized in Hadoop.
The process begins with the definition of a security policy. Organizations select which sensitive elements they need to discover. The rest of the process is automated. Through agents for data ingest (Flume, Sqoop, FTP) as well as agents for at-rest data (HDFS, HIVE, PIG), the Dataguise discovery service analyzes all data and filters and counts sensitive data elements for .txt, csv, logging, AVRO, SequenceFile, as well as common unstructured data formats (Word, Excel, Powerpoint, SMS, Email).
Our encrypt engine runs as an automated process (“agent”) for these data loaders (FTP, FLUME, SQOOP). We also support native field and row encryption inside an HDFS encryption agent. More generically, we have a JAR for invoking encryption and decryption and have built decryption UDFs for Pig, HIVE, and MapReduce.
Dataguise is certified on all three major Hadoop distributions: Cloudera, Hortonworks and MapR. In addition, downloadable Sandbox trials are available on both the Hortonworks and MapR partner websites.Dataguise has production Hadoop customers also using DgSecure for Hadoop with: (Apache Hadoop, Amazon Elastic MapReduce, Pivotal HD).