Hadoop Summit Opens Amid Growing Interest in Big Data

Hadoop is front and center at a three-day Hadoop Summit conference taking placing this week in San Jose, California. The Summit, sponsored by key Apache Hadoop provider Hortonworks and by Yahoo, is expected to attract more people and companies than last year’s event.

More than 3,000 participants are expected at the Summit, which begins Tuesday, and there are more than 80 sponsoring organizations. One of the key drivers for this year’s conference is YARN, a cluster resource management layer whose acronym stands for Yet Another Resource Manager and which was released last October. It allows Hadoop to do more than, say, batch-oriented tasks, because computing clusters can be allocated as needed to match workloads.

As the conference opened, Hortonworks announced it was launching a YARN Ready program as part of its Partner Certification effort. The new program provides assurance that tools and applications deemed to be YARN Ready are completely compatible with the Hortonworks Data Platform.

Informatica, MapR

“As more organizations move from single-application Hadoop clusters to a more versatile, integrated Hadoop 2 data platform hosting multiple applications,” Hortonworks said in a statement, “YARN is strategically positioned as the true integration point of today’s enterprise data layer.”

Informatica is expected to show its newest data integration tools, such as its PowerCenter Big Data Edition visual development platform with reusable business rules, running on Hortonworks 2.1. It will also exhibit the Vibe Data Stream for Machine Data, which is intended to collect the streams of messages and events from all those connected devices, or things residing in an Internet of Things, into a Hadoop environment.

MapR Technologies is launching the industry’s first Hadoop application gallery that features big data solutions from a variety of Hadoop ecosystem partners. MapR and big-data software provider Syncsort are announcing a partnership that allows data processing to move from legacy systems into a Hadoop platform, using Hadoop extract/transform/load software.

Dataguise is unveiling its DgSecure data protection platform with the Hortonworks HDP 2.1 Sandbox, designed for protecting data and addressing compliance and data governance in Apache Hadoop.

Skytree, Trifacta

Skytree is making its predictive analytics software available on Apache Hadoop YARN for delivering analytics on Hadoop clusters. Trifacta announced it has certified its Data Transformation Platform with the Hortonworks Data Platform 2.1.

Keynotes include Microsoft discussing “Transforming data into action using Hadoop, Excel, and the Cloud,” and Red Hat addressing “Enterprise Hadoop and the open hybrid Cloud.”

Apache Hadoop is an open-source software project and community that is designed for distributed processing of big data, using clusters of commodity servers that could number in the thousands. On its Web site, Hortonworks notes that Apache Hadoop “enables businesses to gain insight from massive amounts of structured and unstructured data quickly.”

Within the Enterprise Hadoop open-source ecosystem, there are sub-projects, including Falcon for governance, the HDFS distributed file system, Hive and Tez for SQL, and Oozie for operations.