Customers can search through structured, semi-structured, or unstructured content to find a variety of sensitive data elements such as Credit Cards, Social Security Numbers, Names, Addresses, Medical IDs, ABA bank routing numbers, and financial codes. In addition to pre-defined templates for such sensitive data types, customers can also extend and build their own custom sensitive data elements through a sophisticated regex builder.
Does DgSecure Discovery support discovering sensitive data for common data protection regulations and standards? If so, which ones?
Can DgSecure Discovery leverage external information to look for specific sensitive data elements? If so, how?
Does DgSecure Discovery support a distributed deployment and does it work scalably across distributed architectures?
Yes. DgSecure Discovery can be deployed in distributed environments. It is designed to leverage resources optimally in a multi-node or multi-host distributed deployment at scale. For example, DgSecure Discovery for Hadoop leverages distributed computing by having an agent based architecture that runs all discovery tasks natively as Java Map-Reduce jobs on the Hadoop cluster. DgSecure Discovery for Databases also uses an agent based architecture to run as a multi-threaded service across database instances within an enterprise.
While discovery can be a resource intensive operation even in a distributed architecture, DgSecure Discovery is tunable by a customer to fit within their infrastructure constraints. For example, the Hadoop HDFS agent can be throttled by limiting it to a certain number of maps. Our experience with large global production deployments at customer sites shows DgSecure Discovery scales with a low performance overhead of 5-10%. Dataguise is continually working on minimizing the performance overhead to improve discovery performance.
DgSecure Discovery uses three different techniques to minimize false positives/negatives. First, contextual data is leveraged (column names, key words, reference data, meta-data, and primary key/foreign key relationships) to more accurately match and disambiguate sensitive data elements. Second, for structured data, the DgSecure Discovery can present a confidence score, so that users can filter based on extent of match, reducing or removing nearby mismatches. Third, Dataguise is continually adding advanced computational and statistical methods to reduce false negatives including NLP, Bayesian inference, domain-based ontologies and machine learning techniques.