It is interesting how different companies try to understand me. Sometimes they know more about me before I know. When I use gmail, it shows me a map of the address mentioned in my email. Google shows ads based on the content of my email. Amazon makes recommendations based on the product I purchased or browsed through. My credit card company calls me the second I swipe my credit card at a retailer to verify if it was indeed me that made the purchase or if my card was stolen. As I think about it, I am in many data bases such as gas station, credit card, social security, IRS, immigration, airline reservations, hotel reservations, search engines, twitter, Google and Yahoo email, past and present employers, online shopping, department store shopping, etc. I realize I am a digital trail maker! Now your trails may be different from mine. Together, we contribute to ‘big data’. Organizations go to great lengths to gather data to improve business, customer satisfaction, or win elections.
Orbitz quoted higher prices for Mac users than PC users. Rewards card at the super market tracks every purchase we make and tries to understand the buying behavior. Target knew about a pregnant teen before the family did. With a large number of data bases and large population, we are creating a ton of digital trails consciously or unconsciously. According to IBM, we generate 2.5 quintillion bytes of data every day (that’s 18 zeros). Like many things digital, 90% of the data in the world today has been created in the last two years alone. Furthermore, data is projected to grow at 40% a year compared to 5% growth in global IT spending.
This growth in data provides many opportunities and challenges. Let’s look at the opportunities first: Analyzing large sets of data effectively will enable companies to outshine competition, improve productivity and innovation. For example, information gathered from the data can provide a $300B potential annual value to the US health care alone. And, it can potentially increase retailers operating margins by 60% according to McKinsey Global Institute. Processing data real time enables online retailers to recommend products for you based on your current preferences. Data-driven order removes personal bias and enables organizations to act objectively.
Now let’s look at some of the challenges. As difficult as gathering the data may be, handling them is no walk in the park. The picture below captures the challenges with big data nicely in different categories.
Here is a short definition of each of the four categories: Volume refers to the amount of data. Velocity refers to the speed at which the data is transmitted. Variety refers to structured and unstructured data. Veracity refers to gathering reliable data. This is where signal and noise part ways.
We need to gather the data from different points and store them in data centers. This requires a well planned IT infrastructure. Examples include how to structure the data base, how to store and back up data, how to protect the data, how to securely transport the data, how to manage the latency, disaster recovery, etc. The volume of data and velocity at which the data has to travel creates many challenges.
In many cases, the data stream needs to be analyzed real time. Deep Packet Inspection (DPI) is critical for security. It also creates opportunity to provide targeted advertisements. In order to accomplish these, it is important that the underlying network elements have the ability to support DPI, minimize operating costs and provide revenue generating opportunities. Juniper Networks® JunosV™ App Engine helps our customers to accomplish this. By creating the target expression, customers can identify matching traffic. This traffic can then be duplicated or modified as required.
I would like to hear from you about your trails. Share your experiences on how data mining from organizations has impacted you.