May 19, 2012, 5:10 am GMT  

Postings

Power your webapp with Cloudera, Hadoop, Hive, Pig, and EC2

Ever wonder how http://www.trendingtopics.org/ collects & process the visitor information from wikipedia? This Cloudera Post walks you through the steps of how to leverage various cloud tools to power a process-intensive web application. Overall the steps looks something like:

  1. provision a Hadoop cluster on EC2 for compute capabilities
  2. load the logs into Hadoop
  3. process the log data, clean it up, apply trending algorithms to organize the data
  4. export the processed data into MySQL for the web application to use

This is really cool stuff… at least for me ;-) . Now maybe I can leverage a Hadoop cluster to take all of the powerpoint slides and  process & organize them for me into a consumable way… hmmmmm.

Read more on the Cloudera blog.

Filed under: web X.0 — appgirl @ 8:53 pm
Comments (0)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

My Tweets

Fans

AppGirl on Facebook

See What I'm Uncorking

Powered by WordPress