ElectricAccelerator Agent Metrics, part 2

ElectricAccelerator agent metrics provide a tremendous amount of data that you can use to analyze and improve the performance of your builds. Last time we saw how to collect the metrics and we explored the data presented in the Overall time usage section. This time, we’ll look at the data in the Usage records section.

Usage records

As a build is executed by Accelerator, the Electric File System tracks all filesystem and registry accesses made by commands in the build. This usage log is the means by which modifications made on cluster nodes are propagated back to the emake host, and it also forms the basis for our conflict detection algorithm. Because it is such a core piece of the system, we track numerous metrics specifically related to the usage log. Those metrics are reported in the Usage records section of the agent performance metrics.

After your build completes, use cmtool and agentsummary to get the agent metrics, then find the Usage record section. It looks something like this:

As you can see, Accelerator tracks a wide variety of file and registry operations:

Append Adding data to the end of an existing file.
Blind create Creating a new file.
Blind truncate Completely overwriting an existing file.
Create Creating a new file.
Create key Creating a registry key (Windows only).
Create(dir) Creating a directory.
Delete key Removing a registry key (Windows only).
Delete value Removing a registry value (Windows only).
Failed lookup stat() on an non-existent file.
Link Creating a hard link to an existing file.
Lookup stat() on an existing file.
Lookup key Lookup (or attempt to lookup) a registry key.
Modify Rewrite a portion of the contents of an existing file.
Modify atts Change the attributes of a file, such as access permissions.
New name Does not correspond to a specific operation; this is a placeholder used to facilitate reporting usage to emake.
Read Read the contents of a file.
Read value Read (or attempt to read) the contents of a registry value (Windows only).
Rename Rename a file.
Rename(dir) Rename a directory.
Set value Set a registry value (Windows only).
Set value if Set a registry value, with additional constraints on the value (Windows only).
Truncate Overwrite the contents of an existing file.
Unlink Remove a hard link to a file or directory.
Unlink(last) Delete a file or directory.

For each type of operation, the agent reports the following data:

  • Total number of records of that type reported to emake.
  • Portion of the total usage records reported represented by that type.
  • Average number of records of that type per job in the build.
  • Total number of raw records of that type reported by the EFS.
  • Portion of the raw records of that type reported to emake by the agent.

Why are there so many types of operations?

As of Accelerator 4.3.0, the EFS and agent track 25 different types of operations, almost double the variety that we tracked in Accelerator 3.5.0. Many of the different types are fairly subtle variations on one another. For example, the distinction between Create and Blind create is simply the flags used in the system call that created the file. Similarly, Create could be considered a special case of Modify. The question inevitably arises: wouldn’t it be simpler to track fewer types of operations?

The answer of course is yes, it would be simpler, but that simplicity would come at the cost of superior efficiency. For example, although Create is a special case of Modify usage, it has one very significant difference: Modify usage implies that a command used the previous contents of the file before making changes, but Create implies that the command did not use the previous contents. This seemingly trivial distinction is the difference between commands that must be serialized and commands that can run in parallel: two commands that modify the same file must be serialized (think of compile steps that each update a shared .PDB file), while two commands that each overwrite the same file can be run in parallel.

Reported Usage versus Raw Usage

You may have noted that there is sometimes a significant difference between the raw usage reported by the EFS to the agent, and the usage ultimately reported by the agent to emake. The simple explanation is that the agent has some smarts to eliminate redundant usage records, in order to minimize the number of records emake must manage and examine for conflicts.

A trivial example is eliminating redundant Read operations from the usage: the first time a file is read during a given job, the agent logs the usage. The second time the file is read during the same job, the agent discards the usage because it is redundant with the usage already logged for the file.

A more sophisticated example is eliminating Read operations on new files: when a file is created during a job, the agent logs the usage. If the file is later read during the same job, the agent discards the usage — the read cannot impact the dependency analysis that emake will perform, since it is reading data that was generated in the same job, rather than data generated during a different job in the build.

By volume, the most significant application of this technique is in the reduction of lookup operations. It’s astonishing just how many lookup operations occur in the course of an average build, and even more so how many of them are redundant. In the sample shown above, only a tiny fraction of the lookups were actually significant. The rest were discarded by the agent before ever being sent to emake, dramatically reducing the amount of data emake must manage. If anything, the sample shown here is on the low end in terms of the number of lookups generated — most builds generate far more. One customer’s build generated about 200 million lookups during a 50 minute long build.

Some types of usage, such as failed lookups, are not actually directly reported by the EFS. Instead, these types of records are synthesized by the agent to replace longer sequences of operations — again, in order to reduce the amount of data that emake must track.

What’s normal?

The usage profile of most builds is surprisingly similar: the vast majority of usage is lookups (both failed and successful); reads are the next most common, followed by creates. Just about everything else occurs so infrequently that it’s not worth calling out explicitly. To put concrete values on it:

Failed Lookup 50% – 60%
Read 25% – 30%
Lookup 10% – 15%
Create 2% – 10% (NB — usually 1 or 2 per job in the build)
Everything else 2% – 5%

How can I use this data?

There are two ways you can use the usage metrics to improve build performance. First, you can use the usage profile as a simple “gut check” to spot anomalous behavior. For example, a build that logs significantly more Create usage than others may be doing unnecessary additional work, such as building targets repeatedly. A build that logs excessive Read usage may indicate problems like badly factored header files which cause every compile to read every header. It’s hard to predict exactly what the problems might be; the key is to keep your eyes open for anything out of the ordinary.

The second consideration relates specifically to failed lookup usage. As you’ve seen, failed lookups account for the majority of all usage. Earlier I noted that failed lookups correspond to stat()-like operations on non-existent files, but what does that really mean?

The most common source of failed lookups is include path searches. Suppose your build specifies an include search path to the compiler, for example:

Now, imagine your source files contain #include directives like this:

When the compile looks for the file math.h, it will search for it in each directory specified in the include search path. Of course, the file exists in only one directory, so each attempt to find it in the other directories generates failed lookup usage. Now imagine a system in which the include search path contains dozens of directories. Every compile step in the build will search each of those directories for each header file included. You can see how quickly it adds up:

Depending on the build, there may be nothing you can do about this. But if you are able to, you could investigate optimizing the include search path:

  • Eliminate stale entries.
  • Reorder the include path so that more commonly used headers are found earlier.
  • Customize the include path per target. Some builds use a single include path for all compile steps, but if the megawidgets headers (for example) are used only by some targets, adding that directory to the include path for the other targets only bloats the search space.

Unfortunately, it’s hard to predict exactly how much impact changes like these could have on your build, although I can say that my experience has consistently shown that reducing the size of your build in any dimension often results in surprising performance benefits due to second-order effects. I encourage you to go for the “low hanging fruit” — make those changes that are easy to make and measure the impact. If the modifications pay off, find the next lowest hanging fruit and repeat the process. Keep going as long as the cost of making the changes is less than the benefit you see.

Follow me

Eric Melski

Eric Melski was part of the team that founded Electric Cloud and is now Chief Architect. Before Electric Cloud, he was a Software Engineer at Scriptics, Inc. and Interwoven. He holds a BS in Computer Science from the University of Wisconsin. Eric also writes about software development at http://blog.melski.net/.
Follow me

Share this:

2 responses to “ElectricAccelerator Agent Metrics, part 2”

  1. James Gan says:

    Is it possible to modify the block size of EFS? I don’t know the way of the copy from emake client to ea agents using EFS . if using file systems raw copy ,it should be modify according DISK I/O peroformances.

    • Eric Melski says:

      @James: No, it’s not possible the modify the block size of the EFS. What are you trying to accomplish?

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Subscribe via RSS
Click here to subscribe to the Electric Cloud Blog via RSS

Subscribe to Blog via Email
Enter your email address to subscribe to this blog and receive notifications of new posts by email.