Postings
Power your webapp with Cloudera, Hadoop, Hive, Pig, and EC2
Ever wonder how http://www.trendingtopics.org/ collects & process the visitor information from wikipedia? This Cloudera Post walks you through the steps of how to leverage various cloud tools to power a process-intensive web application. Overall the steps looks something like:
- provision a Hadoop cluster on EC2 for compute capabilities
- load the logs into Hadoop
- process the log data, clean it up, apply trending algorithms to organize the data
- export the processed data into MySQL for the web application to use
This is really cool stuff… at least for me
. Now maybe I can leverage a Hadoop cluster to take all of the powerpoint slides and process & organize them for me into a consumable way… hmmmmm.
Read more on the Cloudera blog.
Filed under: web X.0 — appgirl @ 8:53 pmComments (0)
No Comments »
No comments yet.
RSS feed for comments on this post. TrackBack URL

