Big Data Analytics. You hear it everywhere these days. In the industry press. On the Internet. In your email inbox. On your boss’ mind. Everywhere. But perhaps your grandmother’s question captures it best: “That’s nice, dear. And what the dickens is Big Data Analytics?” Ok, so you can answer Granny’s question, let’s have a look at how these words play together today.
Let’s start with “analytics.” That “–ic” ending, from the Greek –ikos, turns a noun into an adjective (analysis to. analytic) and oddly, by adding an ‘s’, into another noun (analytics). You can’t find a definition of analytics that doesn’t have “analysis” in it somewhere. So clearly, we first have to figure out what “analysis” means.
One of my favorite dictionaries says something like
analysis a·nal·y·sis / əˈnaləsis / n. late 16th century: via medieval Latin from Greek analusis, from analuein ‘unloose’, from ana- ‘up’ + luein ‘loosen’. Detailed examination of the elements or structure of something, typically as a basis for discussion or interpretation.
analytics ana·lyt·ics / anəˈlɪtɪks / n. the systematic computational analysis of data or statistics, or information resulting from such analysis.
So analytics1 is the process of, or the results from, analysis of data for purposes of making decisions, reaching conclusions, or disproving models or theories about how some currently interesting aspect of the world works. In our day-to-day business data processing world, the “interesting aspects” are generally related to “business intelligence” – that is, information about trend detection and prediction in our customers’ businesses, or the larger marketplaces in which they participate.
We often refer to analytics techniques collectively, be they mathematical, statistical, or computational, as algorithms. Analytics algorithms are at the core of most data processing scenarios beyond conventional transaction processing.
“Big Data” ought to be a lot easier. We all know what “data” is, right? And “big” means, well, big. Paired, these words have become an industry buzz-phrase referring to data streams that can swamp present day data ingest and analysis capabilities. Gartner brought some conceptual organization to the area as early as 2001, defining big data as follows
“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
These “three Vs” have become an iconic definition of big data. Some say there’s a fourth. The Vs tell us that big data comes fast, comes large, and comes in diverse forms; that’s how we separate “big data” from “normal data.” So, not all data problems are “big,” but even the small ones can be problematic nonetheless. And of course, “big” is relative; what seemed like a lot of data ten years ago may simply be business-as-usual today. In practice, “big data” frequently means data that, in one or more of its V characteristics, exceeds our ability to produce useful analyses in reasonable time frames.
In your excitement over those Vs, don’t neglect the “cost-effective, innovative forms of information processing” and “for enhanced insight and decision making” parts of the definition. They are the means and the payoff of big data, respectively; the things you should pay attention to in your big data quest.
Most analytics algorithms can be applied in both big data and normal data situations; they really don’t care how much data is involved. Big data difficulties crop up when practical constraints enter the picture, for example, data arrives faster than it can be consumed, overflows available storage space, or is in forms the analytics algorithms can’t handle. In addition, there are some analytics techniques targeted specifically at big data scenarios. Many are pretty technical in nature, but you might recognize “map/reduce”, “dimensionality reduction” and “stream sampling.” Finally, you may find custom-built analytics algorithms, especially in larger data shops. These capture and apply our customers’ unique, often proprietary, knowledge about how they do business and may be essential components of achieving competitive advantage in the marketplace.
“Big data analytics,” then, must be about the problems of understanding and interpreting data streams that either stress, or are completely beyond, current abilities to handle them. Though likely annoying on a day-to-day basis, problems like this are good to have. This is where skills and capacities are stretched near their limits and where innovative solutions and new technologies emerge.
Of course, some non-analytic scenarios involving the mechanics of (mostly transactional) data ingestion also stress current high-volume and high-velocity capabilities. These scenarios differ from the intent of big data because techniques for dealing with them are well understood and do not require very innovative approaches to their resolution. Moreover, transactional data updates are individually small and their data types as well constrained, failing the high-variety characteristic of big data scenarios.
So in summary,
We can summarize all this in a nice Venn diagram.
The business payoff of big data analytics centers on the strategic combination of data technology (data representation, storage, and retrieval), computational statistics (analytics algorithms), and detailed business acumen to deliver timely interpretation of business activity. The challenge for today’s data scientists is to learn how to use big data analytics techniques to develop predictive tools that can suggest future business actions with some useful level of confidence.
1 I know it’s tempting, but don’t confuse “data analytics” with “data mining.” Data miners use statistical techniques to discover previously unrecognized patterns and relationships in data sets. Data analytics reports on past, and even possible future, states of previously known patterns and relationships in data sets. So, data mining discovers patterns, and data analytics tracks patterns after they are understood.