In this continuation of the ElectricAccelerator vs. distcc battle royale, I’ll compare the performance of these two tools when building samba, a suite of tools that provide file and print services to Windows clients from Unix-like servers. Samba is a particularly interesting package for this comparison because distcc was originally created in order to accelerate samba builds, and until recently, the distcc project was hosted by the samba organization. In some sense, samba is the poster child for distcc acceleration, so it should work quite well with distcc.
For this investigation, there are two things that are really interesting about the samba project. First, although it is written entirely in C, it is a highly CPU intense build. This runs counter to my intuition and experience, which is that C compiles tend to be relatively light in terms of CPU usage (at least in comparison with C++ compiles, for example). You’ll see the evidence and impact of this in the results below.
Second, the samba build is structured as a single, non-recursive make invocation. This is markedly different from the other projects I’ve previously built in this series, both of which were built using the traditional recursive make style. The principal advantage of a non-recursive make build is that it enables you to completely, precisely specify all the dependencies across the entire system. In theory, this means that the build should be able to run at very high levels of parallelism, much higher than you might see with a typical recursive make build, due to the manner in which parallelism is implemented in gmake. It also means that in this comparison Accelerator is denied the benefit of one of its key optimizations, which is the ability to coalesce recursive makes into a single logical make. Or, to put it another way, it enables the gmake build to compete with Accelerator on a more even footing.
These tests were performed on a cluster of 12 physical servers, each with Dual 2.4GHz Xeon CPU’s with hyperthreading enabled. One system was designated the build host (and cluster manager, for Accelerator tests); the remaining hosts were used as worker nodes. The build host had 2 GB RAM, and the workers had 1.5 GB each. All systems were connected by a dedicated gigabit Ethernet switch, and they were all running RedHat Desktop 3, update 8. I used GNU make 3.79.1; distcc 3.1; and ElectricAccelerator 126.96.36.199685. I built samba 3.3.0, the latest release available at the time I did the tests.
Samba built cleanly out-of-the-box with both Accelerator and gmake with distcc in “pump” mode. As before I used a trivial driver script to simplify the process of running the builds at various levels of parallelism from 1 to 22; that script simply extracted a clean copy of the sources, ran
configure and started the build. I won’t bother reproducing the script here since there’s nothing magic in it, but it is available on request if somebody really wants to see it.
First, let’s look at the complete results for all three build tools: gmake alone; gmake with distcc; and Accelerator (emake):
Although it’s hard to see the differences between distcc and emake in this view, I wanted to include it in order to make a couple points about the performance of gmake alone on this build. As I said previously, this is a build that is very CPU intense. You can see the evidence of that in the performance data for gmake. Serially, the build runs in 30m46s; with -j2, the build time is reduced to 18m11s, about 1.7x faster. But after that point, trying to increase the parallelism actually makes the build run slower! This is because at -j2 we’ve already completed saturated the CPU’s on the build host. If the build were less CPU intensive, we would have seen some (possibly very small) improvement at higher levels, at least until we reached the point of saturation.
Now let’s focus on the distcc and emake results starting with 5 workers, so that we can really get a good look at the difference at high levels of parallelism:
With this graph we can see some really interesting things. First, distcc beats Accelerator at low levels of parallelism, but Accelerator steadily gains on distcc with each additional CPU brought into the build. With about 11 CPU’s in use, performance is tied. Past that, Accelerator takes and holds the lead. These results were a bit of a head scratcher for me. I was pleased to see that Accelerator won out in the end, but I was surprised that distcc demonstrated that early advantage, and that it maintained it for so long. After a few days of puzzling over this, I decided this must be the evidence of two separate factors playing together.
First, Accelerator seems to have slightly higher overhead per job run in the build. This explains why distcc beats Accelerator on this build at low levels of parallelism. But the overhead is pretty small — maybe 10 at most, and in the end, the performance is dominated by the second factor: Accelerator scales better than distcc, so it is able to bring more CPU’s to bear on the task. The fact that Accelerator can run more jobs in parallel more than makes up for the small additional overhead.
Although distcc had an early lead in this round, in the end Accelerator is the clear winner. Here’s the updated scorecard:
|Package||Best distcc time||Best Accelerator time||Advantage|
Still, it’s clear that there’s room for improvement in spite of this victory. That’s good though — as they say, people seldom improve when they have no other model but themselves to copy after.
UPDATE: I finally got time to examine this build more closely. The results of my investigation are here.
Latest posts by Eric Melski (see all)
- Why I Love ElectricAccelerator — and You Should Too - February 3, 2014
- Electric Cloud Customer Summit 2012 by the Numbers - October 26, 2012
- The last word on SCons performance - August 11, 2010