Transactional data is still the foundation for many businesses trying to mine data for insights, but big data has opened an entirely new realm of data mining prospects for a multitude of industries.
Instead of simply modeling data, big data provides the opportunity to model human intent, notes Mok Oh, chief scientist of PayPal.
“Ultimately, what we’re trying to model is every person’s brain – at least the part of the brain that decides how to shop, when to shop, and what you want,” Oh says. “We’re trying to reverse-engineer transactional data to figure out what people are going to buy next.”
As an example, a retailer might know that someone has purchased a shirt, but it doesn’t know that he’s looked at computer bags and jeans before buying that shirt, or that his brain is preternaturally focused on computer bags, and the retailer just doesn’t have what he needs. It’s easier to capture browsing behavior in an electronic platform like PayPal, but that still presents reverse-engineering challenges, Oh says.
In addition to retailers maneuvering to boost sales, big data sources are providing applications for many other organizations, including municipalities, to devise practical applications for the technology.
The city of Boston, for example, has put into place an application that taps into big data to help locate potholes and dispatch repair teams to fix them. The smartphone application uses the phone’s accelerometer to detect bumps in the road – when a car hits a pothole, the app sends information about the bump, including its location, to a database.
Here are the top 5 big data source types:
- Social network profiles – Tapping user profiles from Facebook, LinkedIn, Yahoo, Google, and specific-interest social or travel sites, to cull individuals’ profiles and demographic information, and extend that to capture their hopefully like-minded networks
- Social influencers – Editor, analyst and subject-matter expert blog comments, user forums, Twitter, and Facebook “likes,” Yelp-style catalog and review sites, and other review-centric sites like Apple’s App Store, Amazon, ZDNet, etc.
- Activity-generated data – Computer and mobile device log files, aka the “Internet of Things,” including website tracking information, application logs, and sensor data – such as check-ins and other location tracking – among other machine-generated content
- Software as a service (SaaS) and cloud applications – Represent data that’s already in the cloud but is difficult to move and merge with internal data
- Public – The World Bank, SEC/Edgar, Wikipedia, IMDb, etc., data that is publicly available on the Web that may enhance the types of analyses that can be performed
- Subscribe to our blog to stay up to date on the latest insights and trends in big data and data mining.
- Join us on August 23 at 1 p.m. EDT for our complimentary webcast, “In-Memory Computing: Lifting the Burden of Big Data,” presented by Nathaniel Rowe, Research Analyst, Aberdeen Group and Michael O’Connell, PhD, Sr. Director, Analytics, TIBCO Spotfire. In this webcast, Rowe will discuss recent findings from Aberdeen Group’s December 2011 study on the current state of big data, which shows that organizations that have adopted in-memory computing are not only able to analyze larger amounts of data in less time than their competitors – they do it much, much faster. TIBCO Spotfire’s Michael O’Connell will follow with a discussion of Spotfire’s big data analytics capabilities.
- Download a copy of the Aberdeen In-Memory Big Data whitepaper here.