ElectricAccelerator vs. distcc, round 2: MySQL

Previously we have compared the performance of ElectricAccelerator and distcc when building the Linux 2.6 kernel. In what I hope will be the first of several followups, I will repeat the experiment with different software packages, to determine whether that result was a one-off, or whether ElectricAccelerator really is consistently better performing than distcc. In this round, we’ll be building MySQL, a popular free-for-non-commercial-use database server.

MySQL is not as large as the Linux kernel, but it is still reasonably large by open source standards: about 7,000 total files (including both source and non-source files) in the source distribution. The code is written in both C and C++, which is a significant difference from the Linux kernel, which is exclusively C (and assembly, of course) as far as I know. The serial build time on my test hardware is 20m2.8s (average after four runs). For those who haven’t read the previous articles, the test setup consists of nine physical servers configured as follows:

  • Dual Xeon 2.4GHz hyperthreaded CPU’s
  • 8 systems with 1.5GB RAM; one system with 2GB RAM
  • Gigabyte Ethernet connections on a dedicated switch
  • RedHat Desktop 3, update 8

I used the following software packages in this test:

  • GNU make 3.79.1
  • distcc 3.1
  • ElectricAccelerator 4.3.1.25685
  • MySQL 5.1.31

Process

As previously, I used a driver script to simplify running the tests:

[sourcecode language=”bash”] #!/bin/sh

gver=”make --version | head -1 | awk '{ print $4 }' | sed -e s/,//
PATH=/opt/ecloud/i686_Linux/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

package=mysql-5.1.31
unset ARCH
unset OUTTOP
unset LD_LIBRARY_PATH
DISTCC_POTENTIAL_HOSTS=’blade1 blade2 blade4 blade5 blade6 blade7 blade10 blade11′
export DISTCC_DIR=/bin/pwd/.distcc
export DISTCC_POTENTIAL_HOSTS

mkdir “distcc-3.1-pump”
(
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
do
pfx=../distcc-3.1-pump/distcc$i
rm -rf “$package”
tar xzf “$package.tar.gz”
cd “$package”
./configure

rm -f $DISTCC_DIR/lock/*
(time pump make -j $i CC=distcc CXX=”distcc g++”
$targets
) “$pfx.out” 2>&1
cd ..
done

echo DONE
) “distcc-3.1-pump/dtest.out” 2>&1
[/sourcecode]

MySQL built cleanly out-of-the-box with both distcc and Accelerator: no makefile modifications were required, and distcc’s “pump” mode worked with no additional configuration needed. You can see for this comparison I ran the build with varying levels of parallelism (by altering the -j parameter for the distcc runs, and by altering the –emake-maxagents parameter for the Accelerator runs). This gave me the data needed to show how distcc and Accelerator scale as you add more and more nodes to your cluster. Note that although the driver script only does one run at each level of parallelism, I ran the driver script in its entirety three times, so that I could get a good average time for each build tool at each level of parallelism. Also note that I ran the Accelerator build once to generate a history file which was used for subsequent runs; the time for that initial build is not included in the results.

Results

Here are the results:

There are a few things I find interesting about this result. First, of course, is the fact that Accelerator beats distcc, by a significant margin at high levels of parallelism. The best time delivered by distcc is about 4m15s, about 4.7x faster than serial time. The best time delivered by Accelerator is about 1m49s, about 11x faster than serial time. According to ElectricInsight, our build visualization and analysis tool, the best possible time for this build is about 1m20s, based on the dependency graph for the build.

It’s also interesting to compare the performance of gmake -j with and without distcc. On its own, gmake -j provides about a 1.9x improvement, which is about what you would expect given that the build system has two physical CPU’s. However, that tells us that of the total 4.7x improvement obtained with gmake -j and distcc, the addition of distcc only accounts for about 2.5x (NB: to get the total improvement from multiple techniques used in conjunction, you can multiply the improvement from each; for example, 2.5x times 1.9x is about 4.7x).

The next interesting thing is that Accelerator scales better on this build. gmake with distcc maxes out roughly when parallelism is about 10 or 11; Accelerator continues to see gains until it hits about 16 agents.

Finally, there is the surprising step result in the Accelerator timings: there is a significant improvement each time the number of agents is pushed passed a multiple of three. I believe this is an artifact of the way I have the cluster configured, with three agents per dual CPU host. The agent allocation algorithm in Accelerator tends to prefer to grab agents on the same host before grabbing agents from a different host, so most likely we’re seeing the effect of completely loading up one host followed by the addition of another host to the in-use pool.

Conclusion

It looks like Accelerator wins this round. Here’s the updated scorecard:

Package Best distcc time Best Accelerator time Advantage
Linux 2.6.28.1 4m25s 2m38s Accelerator
MySQL 5.1.31 4m15s 1m49s Accelerator
Follow me

Eric Melski

Eric Melski was part of the team that founded Electric Cloud and is now Chief Architect. Before Electric Cloud, he was a Software Engineer at Scriptics, Inc. and Interwoven. He holds a BS in Computer Science from the University of Wisconsin. Eric also writes about software development at http://blog.melski.net/.
Follow me

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Subscribe via RSS
Click here to subscribe to the Electric Cloud Blog via RSS

Subscribe to Blog via Email
Enter your email address to subscribe to this blog and receive notifications of new posts by email.