Imagine if you discovered your colleagues only work 4 hours a day. You thought everyone was working as hard as you until you started monitoring what they did all day. To your surprise, many of them were idle for hours at a time, just sitting still waiting for someone to give them work. And when they did work it was in 10 minute bursts separated by more waiting around. I think you would be upset if this were true. You should only get a full day’s pay if you do a full day’s work, right?
Now re-read the previous paragraph and replace “colleagues” with “computer servers”? Would you still be mad at their laziness? You should be. In my experience computer servers are heavily underutilized, often only doing productive work 20 of the time. It really drives me nuts. I expect machines to work day and night for me! So why do we put up with this? We trade efficiency for simplicity.
Most server based software vendors recommend that you install their software on a dedicated server. It’s not because the application needs all the resources (although it may), but because mixing different systems is too complicated. Upgrading one package can break another, down time is harder to plan, they both want to use port 80, and on and on. The conclusion most of us came to is “servers are cheap, so give each application its own environment.” Throwing hardware at software problems seems like a good idea until you wake up one day with 100 servers all doing what 10 fully loaded boxes could handle. Think of the money being wasted on servers, rack space, electricity, cooling, and systems management time.
Enter server consolidation through virtualization. Virtualization software moves the sharing of resources outside of the OS environment. This allows each application to have its own server sandbox (solving the software problem) but it only consumes part of a physical box. Multi-core servers with a lot of memory can support consolidation rates as high as 20:1, although 10:1 is typical. That means one physical box is now doing the work of 10. Hey I like that! Even applications that appear moderately busy are usually not consistently consuming resources and will “share well” with other servers. Of course you have to use common sense. If your email server is pegging CPU and DISK all day long you don’t want to make it share a physical box with other applications.
Consolidation is the first level of virtualization and it is very popular because of the savings it provides. But is this good enough? It depends on what you are doing. In software engineering environments you may find that a lot of the servers exist to do specific tasks during different phases of development and don’t need to be running all the time. Testing, for example, may only happen at scheduled milestones. Builds may only run ten times a day for an hour. The rest of the time these machines are idle. That makes them great candidates for first level consolidation. But what if your testing matrix requires so many different server variants (OS, tool chain, language) that even with consolidation you can’t keep every flavor running? And what if you want to allow multiple developers to build simultaneously? You are right back to manually rotating 50 different environments through 20 available slots.
You need the next level of virtualization: automation. First, create a separate farm of physical servers for transient workloads. Next, collect the library of virtual machines that will be run on the farm. Now use automation software to automatically provision, use, and remove virtual machines from the farm. Automation is the key. Work should happen even at 3am when you are sleeping. Combine this with your “always on” farm and you have a virtualization environment that is more flexible and efficient.
We use this approach at Electric Cloud and have had great results. It used to take a week to manually test a new release of our software to make sure it would properly upgrade all of our previous releases, on all platforms, with all supported databases. Configuring the environments was a manual task and the tests themselves were run manually. By automating the test and provisioning process we now run everything in 45 minutes! Each platform combination is provisioned automatically and the tests are run in parallel. When the tests are done a report is created and the environments are torn down. Previously developers could not run the full upgrade test suite on every build. Now they regularly do, and the benefits are phenomenal: besides providing insurance against being flamed for breaking a production build, the number of upgrade issues reported by customers has dropped dramatically.
The automation is done with our own ElectricCommander product which has integrations with leading virtualization solutions. This made the provision/cleanup steps trivial, allowing us to concentration on the actual tests we want to run.
The results of an upgrade test are shown above. The basic steps are:
• Provision a virtual server
• Copy files/setup connection to database (this example tests remote databases)
• Install the old version and import interesting test data
• Install the new version
• Download all of the install logs to prove upgrade worked
• Do it again for other version/database/OS combinations (not shown).
Doing this is cheaper, faster, and gives us higher quality. Now that is hardware earning a full day’s pay!