Data scientist and editor of KDNuggets.com Gregory Piatetsky-Shapiro recently shared some of his big data insights for 2013.
In a recent Google+ Hangout, he gives his take on where big data helps companies, where he thinks we’ll see more focus and the real role of the data scientist.
Before we jump into the predictions and commentary, we’d like to tell you a little bit about Piatetsky-Shapiro’s (@kdnuggets) background.
He started as a researcher in artificial intelligence databases when big data was “something like 1,000 MB” large. Then he organized the first workshop and later a conference on knowledge discovery and served as the chief scientist for a number of startups.
For the past 10 years, he’s been a consultant or a “data scientist” as well as publisher of the KDNuggets newsletter.
Is Big Data Just a Trend?
Piatetsky-Shapiro calls big data a “real phenomenon” but warns that the promise it brings may be overstated.
“Big data can do things better,” he says. “It can optimize things.”
But he warns that it is not a miracle worker. Like his prediction in 2012, many of the predictions backed by big data have been wrong.
“The phenomenon is real. The potential is real. What is new is just a buzzword. It captures the promise that big data will do things better. It can do things better. It can optimize things. Maybe it will not produce miracles. Not perfectly predict what will happen.”
Bad Predictions Abound in 2013
Piatetsky-Shapiro predicts that we will continue to hear bad predictions on highly-hyped topics like the end of the world.
He says that people tend to make predictions, but that intuition and data don’t always match up when the data is as large as it is today.
“People have this built-in tendency to make predictions. Unfortunately, our intuitions don’t work very well when data becomes large,” he says. “That’s why we need good statistical analysis and data science to make good predictions.”
Are We Missing an Opportunity with “Untapped Data”?
Piatetsky-Shapiro points to an IDC study that predicts the “digital universe will double every two years and grow to about 5,200 gigabytes per person in 2020.” The study also estimates that 23% of the data would be useful if it were “tagged and analyzed.”
However, it’s estimated that just 3% will be tagged and half a percent will be analyzed.
What’s Behind the “Data Gap?”
Piatetsky-Shapiro says that it’s mostly a skills shortage. But that it’s not a “data scientist” shortage. Just like last year, he believes the data scientist shortage is a bit overstated.
He says, “We’ve all probably read predictions about the mythical data scientist shortage of 150,000.”
While the job title is considered the “sexiest job of the 21st century,” he’s not buying in. He says that we need to look at where the real “talent shortage” is as it relates to big data.
“Although data scientist jobs grow very fast, they don’t grow as fast as other big data-related jobs,” he says. “For example, demand for Hadoop greatly outstrips demand for data scientists. Partly because most analytics professionals are not data scientists. Lots of people will just work on data engineering, data moving, etc.
Defining the Data Scientist
Piatetsky-Shapiro offers a colorful definition of a “real data scientist.”
He says, “A data scientist is a combination of a statistician, a hacker and an MBA – in different proportions.”
More Vertical Focus in 2013
Piatetsky-Shapiro strips another layer off the “big data” buzz with another prediction – big data in motion. He defines this as delivering answers “not quite real-time,” but he says that it gives companies a much faster response to problems.
“People are able to build smaller, vertical-level or division-level units that are able to collect the relevant details and get responses faster,” he says. “Whether it’s a hardware appliance or some specific software, I think that’s a direction that we’re going. More vertical and more domain-specific. No more unified data warehouse.”
- Don’t miss the second part of the discussion with Gregory Piatetsky-Shapiro tomorrow, where he shares his top prediction for 2013.
Spotfire Blogging Team