Somebody asked me the other day, “How much does the ElectricAccelerator filesystem cache reduce I/O load on my build host?” This is an interesting question, because in some cases, the impact of Accelerator caching is a big part of the performance benefit. Consider the case of ClearCase dynamic views, which have notoriously bad performance, particularly for stat() operations. By reducing the number of times the build accesses the host filesystem, Accelerator can provide a substantial performance boost. In one extreme case, I saw a build that ran 50 times faster just by using a single Accelerator agent, because the the host filesystem was so slow. In this post, I’ll show how you how to determine how much Accelerator caching is doing for your build.
To evaluate Accelerator cache efficiency, we need to compare the total amount of file accesses performed by the build with the amount that actually ends up hitting the host filesystem. The difference between these two values tells us how much of the total is served from the cache; in turn, the ratio between this difference and the total gives us the cache hit rate. For this comparison, we’ll look at the following specific types of filesystem access:
- The number of readdir() operations.
- The number of stat() operations.
- The amount of data read from disk.
- The amount of data written to disk.
Counting total file accesses
We can get the total number of accesses from agent performance metrics. We’ve looked at these once or twice before. Follow the directions here to obtain and aggregate metrics from the agents (make sure to get the latest version of the agentsummary script). Once you have that summary data, the
Directory scans “to EFS” metric in the
Caching section gives you the total number of readdir() operations performed by the build:
Next, you need the raw count of
Lookup records from the
Usage records section; this is the total number of stat() operations performed by the build:
Finally, you need the
Total MB values for
EFS disk reads and
EFS disk writes from the
Bandwidth section; these give us the total amount of data read and written by the build:
Counting host filesystem accesses
Now we need to find out how much of the I/O activity actually hit the host filesystem — all the accesses that were not serviced by the cache. You’ll find this data in the emake performance metrics, which we’ve looked at a bit previously. To obtain these metrics, you need to enable emake performance logging by adding
--emake-debug=g --emake-logfile=emake.dlog to your emake command-line options. When your build completes, you’ll find the metrics in the file emake.dlog. First, we need the
DirCache readdirs and
DirCache stats values in the
Counter values section; these give us the number of readdir() and stat() operations that hit the host filesystem, respectively:
Next, we need the
To disk and
From disk data from the
Bandwidth section; these tell us the amount of data written to and read from the host filesystem:
Computing cache efficiency
Now we can put all the numbers together and compute the cache efficiency:
|Metric||Total accesses||Host FS accesses||Cache hit rate|
|MB read from disk||295||29||90%|
|MB written to disk||219||168||23%|
The effect is dramatic, even on the small build I used for this example. This is why some people call emake a “ClearCase accelerator”. Of course, every build will have a slightly different profile, but in general you ought to see similar results.
Latest posts by Eric Melski (see all)
- Why I Love ElectricAccelerator — and You Should Too - February 3, 2014
- Electric Cloud Customer Summit 2012 by the Numbers - October 26, 2012
- The last word on SCons performance - August 11, 2010