September 5, 2010, 6:02 pm UTC  

Postings

Power your webapp with Cloudera, Hadoop, Hive, Pig, and EC2

Ever wonder how http://www.trendingtopics.org/ collects & process the visitor information from wikipedia? This Cloudera Post walks you through the steps of how to leverage various cloud tools to power a process-intensive web application. Overall the steps looks something like:

  1. provision a Hadoop cluster on EC2 for compute capabilities
  2. load the logs into Hadoop
  3. process the log data, clean it up, apply trending algorithms to organize the data
  4. export the processed data into MySQL for the web application to use

This is really cool stuff… at least for me ;-) . Now maybe I can leverage a Hadoop cluster to take all of the powerpoint slides and  process & organize them for me into a consumable way… hmmmmm.

Read more on the Cloudera blog.

Filed under: web X.0 — appgirl @ 8:53 pm
Comments (0)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

About

My name is Catherine Liao and you're reading the latest postings of various blogs I follow. You'll notice that the topics tend to center around Cloud Computing, Data Center, Virtualization, Servers, Web Technologies and 24x7 Operations.

These are topics that I'm interested in as I've spent a large chunk of my professional career building, deploying, and maintaining 24x7 application delivery environments. I use the knowledge I've garnered daily in my role as a Technology Solutions Architect for Cisco. I should note that this site is my personal site and does not reflect the views of Cisco.

Feel free to drop me a note if you find this site useful or if you'd like for me to check out your blog. I can be reached at catherine.liao@gmail.com. You can also connect with me via LinkedIn or Twitter.

Looking for less "geeky" content? Check out my travel blog 1-Day Itinerary.

Tweets

Fans

AppGirl on Facebook

See What I'm Uncorking