Big Data Predictions (Scorecard from last year)

Ugh, it’s coming up on ½ through the year already. 2015! Remember when you were a kid and you thought that by 2015 we’d all be riding on hovercrafts, with sting-ray guns and meal pellets (meals that popped out of a drop of water?)

Well, with the exception of driver-less cars, those sci-fi predictions are still decades away. What’s not decades away is a review of my predictions of 2015 from last year’s Hadoop summit. I plan on making this an annual review. After all, what good are predictions if you or someone isn’t around to validate them.

I made my 2015 Big Data prediction way back in June 2014 at the Big Data Summit in San Jose. (why does everyone have to be forced into end-of-year predictions anyway?)

Here’s what I said 1 year ago, and I’ve got the tweets to prove it. Here’s a snapshot of the first one:
twitter1Well, not exactly a 2015 prediction—not yet anyway — but quite accurate. And now meta-data is lurking everywhere head everywhere in the security projects we’re tackling in Big Data. Specifically, we’re increasingly focused on adding meta-data discovery tools that can scan, sample, parse, and leverage metadata in our quest to gain better accuracy, better performance in sensitive data discovery on 1 Gb RC/ORC/AVRO/JSON/XML files. This is a huge accelerator for structure file discovery – potentially speeding discovery by 10-100x by obviating or dramatically reducing the need to sample and examine the raw data if the metadata announces the structure/format/content in Sequence, AVRO, ORC, etc. for you.

We will launch auto-discovery with meta-data management in DgSecure 5.1 in the fall.

Back last June, lacking specificity, I felt the need to break down my metadata tweet a little more precisely, so I followed it up 20 minutes later with a more reasonably time-bounded prediction:

twitter2

On this one, I give myself only mixed grades. I was wrong about Tez, as Apache Drill is now the hot property in SQL query engines, and HIVE is the Gorilla, with PIG fading very fast. But three of four (if you count metadata) ain’t bad. I kinda sorta missed Spark too. O well, better luck next year (this week to be precise as we’re exhibiting again at this year’s Hadoop Summit.

I will document via twitter (proof!) and review in 2016…No doubt by then Hadoop and Big Data landscape will have changed enough that all of this will look wrong, and the nice thing about next June is that nobody will notice., except me, driving my new hovercraft.