July 29, 2010, 4:49 pm UTC  

Postings

Keeping the lights on

Having spent years running 24×7 internet-facing production systems, I find that the monitoring element of an application delivery environment is often the last item to be addressed and built outside of the application delivery architecture. As we continue to build our application delivery infrastructure in the cloud, having a good monitoring strategy will allow us to arm ourselves with the information we need to make intelligent decisions.

So exactly what should be monitored?

Availability

The first element in a monitoring strategy is to determine whether the application is accessible. The most simplistic form of determining availability is ping. However, as most applications are obscured behind a load balancer, a ping response doesn’t necessarily mean that the application is responding to requests. Use a monitoring system that can speak application-layer protocols to ensure that the application is indeed healthy and responding to user requests. It’s best to leverage a 3rd party solutions that can assess availability from multiple networks and provide an unbiased view on the availability of the application.

Resource Utilization / Load

Next element in a good monitoring strategy is to determine how healthy a system is. Tracking the load of various system components will enable us to uncover bottlenecks within the application delivery environment. Leverage SNMP to capture and record utilization statistics on CPU, memory, disk IO, network IO, threads, and so on. Graph these stats to establish baseline and find correlations between each monitored element.

Performance

Performance monitoring is often the most challenging element of a monitoring strategy. Here we are concerned with how the application is performing for a given user. The most common approach is to create synthetic transactions simulating user behavior and run those transactions from different network locations. While availability & load monitoring focus on individual components within an application delivery environment. Performance monitoring delivers a holistic view on how well the individual components are working together.

Security

The final element in our monitoring strategy is focused on security. Unfortunately system security is often an afterthought, usually dealt with AFTER an intrusion resulted in significant downtime. I urge everyone to proactively monitor system behavior changes to minimize the time to discover & rectify an intrusion. At a minimum, track file- and network-level changes. Production systems should not see changes in system binaries. Unused TCP and UDP ports should remain closed. Changes in both of those would indicate anomaly and thoroughly investigated.

So far I’ve not touched on tools that can be leveraged to monitor each of these elements. Over the years I’ve worked with both open source and commercial tools. My favorites include Big Brother, Monit, Cacti, Analog (a bit long in the tooth now), Keynote, Tripwire, and lots of Perl scripting. A complete monitoring strategy will incorporate multiple tools as I’ve yet to come across a single tool that does it all.

Filed under: cloud & virtualization, web X.0 — Tags: , , , , , , — appgirl @ 9:15 am
Comments (1)

1 Comment »

  1. We used to have something inside applications called a ‘heartbeat’ – some transaction that exercises the major parts of the app but has no effect other than checking that the app is still up.
    You run the heartbeat periodically and measure the time it takes to get a feeling for load. Other than that it’s an app-level ping.

    Comment by Martin Stein — December 24, 2009 @ 3:39 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

About

My name is Catherine Liao and you're reading the latest postings of various blogs I follow. You'll notice that the topics tend to center around Cloud Computing, Data Center, Virtualization, Servers, Web Technologies and 24x7 Operations.

These are topics that I'm interested in as I've spent a large chunk of my professional career building, deploying, and maintaining 24x7 application delivery environments. I use the knowledge I've garnered daily in my role as a Technology Solutions Architect for Cisco. I should note that this site is my personal site and does not reflect the views of Cisco.

Feel free to drop me a note if you find this site useful or if you'd like for me to check out your blog. I can be reached at catherine.liao@gmail.com. You can also connect with me via LinkedIn or Twitter.

Looking for less "geeky" content? Check out my travel blog 1-Day Itinerary.

Tweets

Fans

AppGirl on Facebook

Powered by WordPress