Hadoop 2.0’s Deep Impact on Big Data and Big Data Technologies

Businesses that ventured early into big data territory leaned on cloud computing for their Hadoop pilot projects, but that’s changing, according to Merv Adrian, analyst with Stamford, Conn.-based Gartner Inc. These days, his clients are increasingly asking how to deploy Hadoop on-premises.

Observations like this highlight how quickly big data and its related technologies are evolving. That’s certainly the case, Adrian and fellow analyst Nick Heudecker argued, with the release of Hadoop 2.0, which became generally available in October. The updated version of the Apache Software Foundation‘s popular distributed computing framework made headlines last year predominately because of a new feature called YARN (Yet Another Resource Negotiator), which essentially breaks Hadoop out of batch processing and into the real-time world. The Gartner analysts said the more robust version of Hadoop will almost certainly lead to an uptick of deployments and more use cases.

“As people gain experience, we expect them to build larger projects,” Adrian said during a recent webinar he hosted with Heudecker. And not just larger projects, but completely new projects that can interact with each other in ways they’ve never been able to before. Hadoop 2.0 might even be hearty enough to move deeper into the organization and be integrated with the larger technology stack. And, while most businesses are using big data technology to tackle good, old-fashioned transactional data, Hadoop 2.0 can help businesses pursue unstructured and semi-structured data types, the experts said.

Promise aside, Hadoop 2.0 is not flawless. One significant weakness is security — or the lack thereof. “It’s important to note that these systems grew up largely in Web-centric, engineering-led companies that were primarily dealing with public data,” Heudecker said. “As enterprises adopt the technology, this needs to be backfilled.”

Read More