Monitoring the Health of Your ElectricFlow Server Using statsd and Grafana

“If you can’t measure it, you can’t improve it”. And so, since the release of ElectricFlow 5.2, we have supported health monitoring of the ElectricFlow server(s) using statsd and data visualization tools, such as Grafana. In this article, I will walk you through the steps to set up statsd and Grafana using containers.

The Tools:

Before we explore how to take advantage of the health monitoring feature, let’s have a look at the tools we integrate with:

statsd and Graphite

statds was created by Etsy, based on some earlier work at Flickr. It is a very simple (a few hundred lines of code) NodeJS daemon that listens on a UDP port, extracts metric data from the messages and flushes them from time to time to Graphite. UDP is beneficial because it’s fast and won’t error out if nobody listens. After all, you don’t want to slow down your application in order to measure it.

Graphite is both a numeric time-series data storage and a graphical frontend rendering of this data. It includes 3 components:

  1. Carbon: a daemon (based on Twisted) that listens for time-series data, in our case coming from statsd
  2. Whisper: a simple database to store these metrics
  3. Graphite webapp: Django-based frontend to display on-demand charts.

Graphite allows you to create new metrics on demand, by simply sending new data – so there is no need to ask IT to modify the configuration to incorporate new metrics.

To help with the setup of those two applications, I’ve created a Vagrant environment that you can check out here.

Grafana

Grafana is a rich graphical frontend that allows you to display data from multiple sources like Graphite, InfluxDB, Elasticsearch, and more.

I’m not sure why but I’ve never been a big fan of Graphite, maybe I was not familiar enough with it to feel comfortable. A few months back, a customer showed me what they had accomplished with Grafana, and I thought it looks very good. Before we added statsd and REST API support, customers had to use our command line tool, ectool, to extract data for reporting. Now, with the new releases of ElectricFlow, I wanted to see if I could integrate ElectricFlow 6.0, statsd and Grafana to reduce the load on the server and the Database.

This time I decided to go the Docker route. I know, I’m adding yet another technology, but as Docker and Containers move to the mainstream, I figured it was time for me to get on the bandwagon. With a little bit of research I came across this Docker container and decided to test it out.

Docker

If you want to run Docker on Mac OS X or Windows you will have to install Virtualbox first, as Docker uses Linux-specific kernel features. The installation will create a small Linux VM to serve as the host for your containers. If you run Linux you do not need to do that, as it will use your host to run the container directly.

After installing VirtunalBox, I started the container with

By default, Linux VM (on mac and Windows) is associated to IP 192.168.99.100. If you run on Linux, use your host name or IP address. As it was my only container, those ports were available; you may need to redirect them. Check the Docker documentation for details.

How to integrate ElectricFlow 6.0, statsd and Grafana:

1ElectricFlow Setup

 

Modify your  <DATA_DIR>conf/wrapper.conf with the following information:

Setting Description
HOST The name or IP of your statsd server. In our case, this is the IP of the VM hosting the containers.
PORT The statsd UDP port that will receive the data. You may need to use the port on your Docker machine that you forward to 8125 in the statsd container.The nice thing about UDP is that if nobody listens you won’t get any error so you can turn it on even before your stats server is ready.
PREFIX Used to prefix all you data in statsd so it is easier to locate if your statsd server is getting data from multiple services.
HOSTNAME In case you’re running in a cluster, this parameter is useful to separate your data by server in your cluster. It’s also useful if you want to monitor your DEV and PROD ElectricFlow servers separately.

In order for any modification in this file to take effect, you need to restart your ElectricFlow server daemon.

2Grafana Configuration

I then pointed my web browser to the IP of my Docker virtual machine (http://192.168.99.100), and logged in to Grafana (admin/admin) and added my data source (be sure to keep the proxy setting)

grafana1

I created a new dashboard and a new license graph with 2 metrics.

One of the nice things with Grafana is that the system auto-discovers the different data points available, so you don’t have to remember all of them. However, a data point must have been generated before you see it automatically.

grafana2

In the picture above, you’ll notice the reference to “flow” that reflects the setting in the wrapper.conf as well as ec601, which is my ElectricFlow server name.

The only issue I’ve found with this Docker container is that Graphite fills the disk very quickly, so you may want to update the retention policy or get a bigger external storage.

3Monitoring of Your ElectricFlow Server

Now we have all the pieces in place to monitor our server, here are a few recommendations on what to monitor. This is not an exhaustive list, but some key metrics to keep in mind, and data that we also monitor internally.

Note: In the path below, “flow” is the prefix and ec601 is the name of the server.

Performance

  1. stats.gauges.flow.ec601.memory.G1_Old_Gen.usage.committed
    This is an obvious one to ensure your server does not run out of memory.
  2. stats.gauges.flow.ec601.cpu.user
    To monitor you have enough CPU power available to process all the requests.
  3. stats.gauges.flow.ec601.jobs.runnableSteps
    Indicates the number of steps that could be run during the last step scheduler invocation.
  4. stats.gauges.flow.ec601.api.active
    Gives you an idea of the general activity on your server.

Licenses and Users

  1. summarize(stats.counters.flow.ec601.login.count, ’10m’, ‘sum’, false)
    Aggregate the number of users for a 10 minutes period.
  2. stats.gauges.flow.ec601.licenses.hosts
    Checks the number of host licenses you are using. You can also query for managed hosts or steps or applications. I also recommend adding some thresholds to make it clear when you’ve reached your limit.
  3. stats.counters.flow.ec601.login.count ; stats.counters.flow.ec601.logouts.count ; diffSeries(#A,#B)
    This is to make sure your scripts don’t pile up sessions. This one is trickier to do in Grafana as you need to define each query, then hide the queries (by clicking on the eye icon), and then creating a third query that diff the #A and #B series:grafana3

Database

  1. stats.timers.flow.ec601.timers.Tx.commit.mean
    The mean transaction commit time. You’re looking for a smooth chart on this one.

4Dashboarding:

This is what my final dashboard looks like:

grafana4

As you can see my server is pretty tame :)

Feel free to share your beautiful dashboard, and any additional metrics you monitor, and why.

Happy monitoring!


*Bonus

Now that you have a statsd server running with a graphing frontend, why not take advantage of it to send some run-time data?

For example, let’s imagine you have a testing procedure that runs a bunch of tests and you collect 2 job properties “nbTests” and “nbErrors“. You can simply send those 2 data points to statsd with the following code:

As you can see, I used the same “domain” for the server stats (but you don’t have to).

Note the -u  and -w 1  for UDP to be sure a bad connection does not hang your process.

The result is something like:

grafana5

Additional Resources

Here are some additional resources you may find useful:

Laurent Rochette

Laurent Rochette is a Professional Service Engineer with Electric Cloud. He trains customers on our products and help them with deployments and consulting to enable them to use our products effectively. Prior to joining Electric Cloud, Laurent served as an IT Architect at Mentor Graphics. Laurent holds a Master degree in Computer Science from the Grenoble Polytechnic Institute (INPG) in France.

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Subscribe via RSS
Click here to subscribe to the Electric Cloud Blog via RSS

Subscribe to Blog via Email
Enter your email address to subscribe to this blog and receive notifications of new posts by email.