Between the advent of Hadoop and the rise of the cloud, there not only is more data than ever to manage, but that data is more distributed than ever. But as Hadoop deployments begin to shift into the cloud, it is starting to look as though there soon might be a lot more Big Data deployments on public cloud computing platforms.
With that issue in mind Dataguise, a provider of data governance software optimized for Big Data environments, today announced it is extending its DgSecure platform to support deployments of Hadoop in the cloud. As part of the effort, Dataguise is also announcing that Altiscale and Qubole, two providers of Big Data platforms in the cloud, have joined the Dataguise Big Data Protection Partner Program, which already include Cloudera, Hortonworks, MapR and Amazon Web Services.
Dataguise CEO Manmeet Singh said next up Dataguise plans to add support for other types of NoSQL databases, including Cassandra and MongoDB.
All told, the amount of data flowing into all these Big Data platforms is creating a major management challenge for IT organizations. With vast amounts of data both inside and out of the enterprise, managing all that data is creating an opportunity for managed service providers. While there is no shortage of data management and governance platforms, Singh contended, none of them are optimized to handle the complexity of Big Data environments.
For example, Singh said DgSecure not only makes it simpler to govern all the data, it also provides data discovery tools that help identify what data is the most sensitive from a business process perspective. For example, DgSecure can identify what data may be personally identifiable information, automatically encrypt it, and then enable policies to be defined to specifically manage it.
Previously, noted Singh, the application of encryption would have been an all-or-nothing proposition when in reality only 3 percent to 5 percent of most enterprise data actually needs to be encrypted.
In terms of managed services, data management clearly represents a major new opportunity for IT services providers, especially as the amount of data being stored in multiple Big Data platforms exceeds both the capacity and skillset of the internal IT organization to cope with.
Of course, there’s no doubt at the moment that majority of Big Data platforms are being deployed on premise. But as IT organizations become more cognizant of the cost of storing all that data locally, a lot more of them are going to be looking for ways to deploy those platforms in cloud where the cost of storage is already approach pennies per Gigabyte.