It must be about time for Strata again as all the major big data players are busy readying their “Spring Break” round of announcements. First up, and perhaps addressing the most important aspect of putting big data to work in real corporate IT data centers, is a new security solution from the folks at Dataguise (think “data in disguise”) aimed at NoSQL.
Dataguise has been successful with their cornerstone solution for Hadoop stored big data, making the data lake/hub/refinery practical for many organizations (and whose data isn’t sensitive?) through an ability to automatically discover and disguise key data bits across the cluster that fit certain patterns (i.e. SSNs or other PIIs). Unless of course the user has the right credentials in which case the data is made transparent.
Now they are bringing this data disguiser technology to Cassandra in a product called DgSecure for NoSQL, which later this year should also support HBASE, Mongo, and others. Interestingly, Cassandra is first up because their clients demanded it. Apparently Cassandra is a key production database for a whole bunch of big companies that rely on it for big sensitive data use cases.
There is some magic in making this data disguising work on a powerful NoSQL database like Cassandra. The whole point of using NoSQL (vice SQL) is to gain high performance and distributed availability. Layered security solutions can’t slow them down or impose fragile processes or potential failure points. Here we understand that the Dataguise folks worked hard with Datastax to hook in while preserving all the NoSQL benefits.
Dataguise solutions also provide a full audit trail of who accessed what sensitive data for forensic analysis, but we think the proactive parts are where app owners and data security folks should really focus. This is definitely an area to keep an eye on, and as a former SSO (from my USAF days), I think these Dataguise have a key piece of the big data puzzle for IT management and oversight.