Hacker Interrupted
HackerInterrupted

Big Data is Dumb.

by Juniper Employee ‎07-08-2012 08:55 AM - edited ‎07-09-2012 02:56 PM

Let me be blunt: big data is going to disappoint.


I have seen this movie before. In 1999 I started a company that would eventually be classified by industry analysts under the banner of Application Service Providers (ASP). The analysts said the ASP market was going to be HUGE, and venture capitalists began tripping over themselves to try to fund us. The market never materialized, and most ASP’s were left for dead. The funding calls dried up.

 

Thankfully, we survived, and a few short years later a funny thing started happening. A new market was defined that we were lumped into called Software as a Service. And it was going to be huge. And again, venture capitalists began tripping over themselves to offer capital. Years later, the market, while successful, has still yet to live up to the grandiose industry predictions.

 

This movie repeats itself over and over in tech. A buzzword emerges followed by Big Predictions, and then equally Big Disappointment.

 

Big Data is the current fad getting crammed down our throats. It will also be the next trend to disappoint venture capitalists and CIO’s.

 

There are three reasons Big Data won’t live up to its huge expectations:

 

1. The Terminology Spiral: Start-ups aren’t stupid. They realize that slapping the big data moniker on their company no matter how half-assed, will help them raise funding. As a result, Big Data is rapidly becoming meaningless as hundreds of vendors throw themselves under the banner. This will keep happening for another 12-24 months until the category is so big it starts to burst. Enter phase two: companies will fracture the term in an effort to differentiate from the masses (i.e. Cloud Big Data, Big Data Tools, Big Data Analytics, etc.). This will make the problem even worse, as customers will have to attempt to parse confusing vendor-speak. The Terminology Spiral makes marketing products more expensive, slows down customer understanding, and ultimately results in slower adoption.

 

 2. Garbage In, Garbage Out: While you may fantasize that you can turn your business into a next generation Moneyball, the reality is that Brad Pitt isn’t going to be playing you anytime soon. You’re not even going to get Jonah Hill. The reason that Moneyball works, is because there are very few, very quantifiable, and very well documented statistics that measure success. Business is the exact opposite.  You don’t collect sufficient data on your people, you don’t know enough about your customers, and you don’t know enough about your own business.

 

Ask any scientist: your brilliant discoveries only start happening when you have statistically significant and consistently collected data.

 

You think you are going to instantly figure out that left-handed people from Eastern Europe make the best HR people? After 20 years of looking at data, doctors still can’t even decide if butter or margarine is worse for you.

Big Data will only be as smart as your data collection.

 

3. Magnifying the Problem: While it might be impressive to brag at the country club that you have teams sifting through Petabytes of data, the overwhelming majority of that data is useless.

 

The log files generated by most security solutions never get looked at. The focus on quantity over quality is what causes vendors to write broad signatures that are rife with false positives.

 

People have convinced themselves that more data is always better.

 

The reverse is actually true. Logically, you want to store the smallest dataset possible that contains all of your useful data. The more data you have, the more time and compute power it takes to sift through it. You are actually magnifying the problem by parsing through useless data.

 

Worse, most companies don’t have the staff to make it actionable. They don’t have teams of Harvard PHd’s sitting behind arrays of monitors shaking out brilliant insights. They have overworked engineers that don’t have the cycles or training to solve these problems.

 

Most companies have about the same chances of using Big Data effectively as The Waffle House has of winning a Michelin Star.

 

Unfortunately, these three issues will combine to keep the industry from reaching the potential it otherwise could.

 

None of this is to say that Juniper Security won’t use Big Data. It definitively will.

 

The difference is our starting point.  We will focus on tracking attackers with unparalleled specificity, eliminating false positives to deliver greater efficacy, and connecting the millions of devices that Juniper touches to react faster and better protect customers.

 

After all, as any batting coach can tell you: you don’t focus on the hit, you focus on the swing.

 

-David Koretz (Twitter: @dkoretz) is the VP and GM of Mykonos Software, a Juniper Networks Company.

Comments
by Sarmishtha on ‎07-24-2012 09:28 AM

Thank you for an interesting article, I especially like how you have addressed the conundrum that shall begin around the name, as 'Big Data' gains prominence, companies will begin to differentiate themselves and what they offer. But, while I do agree with your view from a company's perspective, I could not help but think, what if we were to take a step back and wonder what if we had (not sure if we do now) models to really use this data and separate it out from the noise...we could very well do things like:

 

  1. Study diseases; sift through tons of material to establish a definite pattern to any genetic causes.
  2. Countries such as India or China may well use means to break this data down. It could be used to track government subsidies being given to poor people or computing copious amount of data for India's UID.

Once again...construction of such a model and its usage may be challenging, but the opportunity would be great. I quote from Chris Anderson's article 'The End of Theory: The Data Deluge Makes the Scientific Method Obsolete': "The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all."

Copyright© 1999-2013 Juniper Networks, Inc. All rights reserved.