Power your webapp with Cloudera, Hadoop, Hive, Pig, and EC2

Ever wonder how http://www.trendingtopics.org/ collects & process the visitor information from wikipedia? This Cloudera Post walks you through the steps of how to leverage various cloud tools to power a process-intensive web application. Overall the steps looks something like:

  1. provision a Hadoop cluster on EC2 for compute capabilities
  2. load the logs into Hadoop
  3. process the log data, clean it up, apply trending algorithms to organize the data
  4. export the processed data into MySQL for the web application to use

This is really cool stuff… at least for me ;-). Now maybe I can leverage a Hadoop cluster to take all of the powerpoint slides and  process & organize them for me into a consumable way… hmmmmm.

Read more on the Cloudera blog.

Tags :