10 Hadoop-able Problems (a summary)
So, the new company I work for, Affiliate Window, are pretty awesome. Technically, they’re not driven by what is cool, or what the latest buzzword is on The Twitter that one of the directors saw on the telebox. They do what is necessary to get the job done, using the best tools. If this requires some in house dev, then time is found. If there’s a cool bit of tech from outside which fits the problem, then they’ll try it.
They’re also not hemmed in by the corporate, big enterprise world of “it’s the way others do it, so we should to”. They’re also good at long-term investment in their team and their tools. Plus, I get to use Ubuntu as my desktop. Rock on.
Anyway, a meeting was arranged for today where we could watch a presentation on Cloudera’s Hadoop (which you can see here at GoMeeting, although only on windows and only after registering (great, more vendor lockin!)). It was called ’10 Common Hadoopable Problems’ given by Jeff Hammerbacher (their Chief Scientist no less!) and was basically things that you can do with hadoop (that isn’t counting words…). I thought I would summarise them here, although I’d encourage every last one of you to watch it as it’s pretty interesting.
- Modelling True Risk – If you think about this in the context of banks or other financial institues (which is, well, banks) this is a really useful way of burrowing deeper into your customers. You can suck in data about their spending habits, their credit, repayments everything. Munge it all together and squeeze out an answer on whether to lend them more money.
- Customer Churn Analysis – Hadoop was used here to analyse how a telco retained customers. Again, data from many different sources, including social networks AND the calls themselves (recorded and then voice analysed, I guess) were used to work out how and why the company were losing or gaining customers.
- Recommendation engines – I don’t really need to explain this one do I? Thinking about this in terms of Google, this is like the ranking algorithm. Sucking in a bunch of factors like; popularity, link depth, buzz on Twitter etc and then scoring links for display in score order later.
- Ad Targeting – Similar to the recommendation engine, but with the added dimension of the advertiser paying a premium for better ad-space
- Point Of Sale Transaction Analysis – On this face of it, this seems simple and straightforward; analysing the data that is provided by your P.O.S device (your till). However, this could also include other factors like weather and local news, which could influence how and why consumers spend money in your store.
- Analysing Network Data To Predict Failure – The example given here was that of an electricity company which used smart-somethings to measure the electricity flying around their network. They could pump in past failures and current fluctuations and then pass the whole lot into a modelling engine to predict where failures would occur. It turned out that seemingly unconnected, small anomolies on the system were connected after all. This data wouldn’t have been able to be mined any other way.
- Threat Analysis/Fraud Detection – Another one for the financial sector and very similar to Modelling True Risk. Hadoop can be used to analyse spending habits, earnings and all sorts of other key metrics to work out a transaction is fraudulent. Yahoo! use Hadoop with this pattern to ascertain whether a certain piece of mail heading into Yahoo! Mail is actually spam.
- Trade Surveillance – Similar to Threat Analysis and Fraud Detection, but this time pointed squarely at the markets, analysing gathered historical and current live data to see if there is Inside Trading or Money Laundering afoot!
- Search Quality – Similar to the recommendation engine. This will analyse search attempts and then try to offer alternatives, based on data gathered and pumped into Hadoop about the links and the things people search for.
- Data “Sandbox” – This is probably the most ambigious, but the most useful Hadoop-able problem. A data sandbox is just somewhere to dump data that you previously thought was too big, or useless or disparate to get any meaningful data from. Instead of just chucking it away, throw it into Hadoop (which can easily handle it) then see if there IS data you can glean from it. It’s cheap to run Hadoop and anyone can attach a datasource and push data in. It allows you to make otherwise arbitrary queries about stuff to see if it’s any use!
As you can see, most of these boil down to “Aggregate Data, Score Data, Present Score As Rank”, which, at it’s simplest, is what Hadoop can do. But the introduction of the idea of a Data Sandbox and the ability, using Sqoop, to push the analysed data back into a relational database (for a data warehouse for example) means that you can run Hadoop independently and prove it’s worth in your business very cheaply.

[...] 10 Hadoop-able Problems (a Summary) (From Mike Pearce – Blog) This is a nice primer for anyone wondering how their companies might benefit from Hadoop. It has mainstream uses beyond what many think, and it even works in tandem with your existing database. [...]
What We’re Reading About the Cloud: August 19
August 19, 2010 at 6:54 pm
Mike,
Really interesting stuff! Thanks for sharing!
Sharon
http://www.sharonmarkovsky.com
sharonmarkovsky
August 26, 2010 at 3:11 pm
[...] Follow this posting on Mike Pearce’s – blog… [...]
10 Hadoop-able Problems | Big Data Cloud
September 9, 2010 at 8:43 am
11. You could analyse blog posts for likely grocer’s apostrophes, too.
Russ Ferriday
September 9, 2010 at 9:22 pm
Haha! You could, yes!
… wait, what are you saying?
Mike Pearce
September 10, 2010 at 6:07 am
[...] we got a tad excited tonight when we ran across a post by Mike Pearce about "10 Hadoopable Problems: or in other words, 10 things you can do with Hadoop. But excitement turned to disappointment when [...]
Big Data and a Critique of Geek Culture | 【Facebook Club】facebook,games
September 9, 2010 at 9:31 pm
[...] we got a tad excited tonight when we ran across a post by Mike Pearce about "10 Hadoopable Problems: or in other words, 10 things you can do with Hadoop. But excitement turned to disappointment when [...]
Big Data and a Critique of Geek Culture | TechAggregator.com
September 10, 2010 at 8:18 am
[...] we got a tad excited tonight when we ran across a post by Mike Pearce about "10 Hadoopable Problems: or in other words, 10 things you can do with Hadoop. But excitement turned to disappointment when [...]
i-penny
September 10, 2010 at 8:20 am
[...] we got a tad excited tonight when we ran across a post by Mike Pearce about "10 Hadoopable Problems: or in other words, 10 things you can do with Hadoop. But excitement turned to disappointment when [...]
Big Data and a Critique of Geek Culture | Derivations of Thought
September 10, 2010 at 8:20 am
What are the most important users/applications of Hadoop outside of internet companies ?…
To add to Charles’s answer: I’d say the next two verticals rounding out the top five would be telecom and retail spaces. I expect more movement from those two in 2011, as we’re seeing some early pilot projects bear fruit in each. You might also find…
Quora
September 20, 2010 at 12:24 am
[...] these 10 problems here This entry was posted in General. Bookmark the permalink. ← Redis VS [...]
10 problems where Hadoop can be used | Large Data Matters
September 23, 2010 at 7:20 am
[...] 10 Hadoop-able Problems (a summary) WTF would you want to use Hadoop for? Here are ten problems you could use Hadoop to solve. [...]
What I’m Reading – 2010-09-24 | Jeremiah Peschka
September 24, 2010 at 2:05 pm