Programmers automate things—it's what we do. In fact, it's pretty much all we do. Some of us automate payroll and billing, some of us automate the process of flying combat missions, and some of us automate the process of stalking people you knew in high school, but we're all in the automation business.

It's unsurprising, then, that programmers frequently automate the tasks that we ourselves do every day. Some of our automated processes are so good now that we don't even think of them as automated processes: it's easy to lose sight of the fact that compilers and linkers, and even operating systems, all automate things that used to be done by hand. Sometimes the automation is so good, and so pervasive, that the connection between the tool and the task is lost. I learned this the hard way a few days ago when I was staring at two blobs of text and trying to figure out whether they were different. The right answer was to use diff, but instead I scanned through the text with my eyes and my finger too see if I could find any differences. I use diff every day for a number of different tasks, but all of them are fairly specialized, and it didn't occur to me in that moment that diff had been created to automate precisely the task that I was doing by hand.

But there are several areas of programming where automated tasks are still very much in-your-face. For a little over a year now, I've been working on build automation, automated tests, and other infrastructure-type stuff. It's been surprisingly painful, but I think I've managed to pinpoint some of the pain sources. What follows are a few rules that should help avoid automation pains.

Automated Processes Must Be Automated

Well, duh!

But it's surprising how often people get this part wrong. When I started my first job, they had pretty good set of unit tests that had the unfortunate habit of popping up an error window during one of the tests. The window was an expected part of the test, and there was no way to run the test in Debug mode without popping up this window. (And the culture was resistant to getting rid of a test.)

If you have a test like this, either fix it or get rid of it. You don't even have to get rid of it completely: you can make it something that gets run by hand. Some people might complain that this destroys your automation, but what really destroys your automation is having some non-automated portion sticking up in the middle.

If something's really automated, I should be able to kick it off before lunch and expect it to be done when I get back. If I come back from lunch to find a pop-up window waiting to be clicked, or a line asking me to press “Enter”, then the process isn't automated.

Sometimes, of course, you need some input from the user, like the tag to build from or the test suite to run. In situations like this, there are two things you can do to make it better. One is to clearly indicate when the need for human interaction is over, so that the human can go off and interact with something more interesting. Note that there's no such thing as too clear. Something like “Your work here is done. Go be productive and check back around 2:45.” is just fine. (But only include the time estimate if you're committed to keeping it accurate. I once ran into 5-minute script that told users to be patient because it “could take up to two minutes”. Great way to create anxiety.)

The second way to help is to obtain all the input you need as close to the beginning of the automated process as possible. By which I mean right at the beginning. Like, immediately. Once I kick off a process that's supposed to be automated, I may keep looking at it for as much as a second before moving on to something else. Similarly, if I see more than a few lines of text scroll by, I assume I'm good to go. If I discover later that I needed to wait around for twenty seconds to provide some input, then I'm annoyed that the script wasted my time, and I'm even more annoyed that it expects me to waste twenty seconds every time I use it.

The ideal is to get all your input from the command line and/or a config file, and to check that data immediately. If I failed to provide all the input that the script needed, then I should instantly see an error message explaining what I did wrong. If you can't take the information on the command line, then at least get it right away after the program starts.

For example, if the automated process is going to ssh to a remote machine, and you don't want to force me to put my password on the command line (which would cause it to show up both on the screen and in my command history), then ask me right away for my password (using something like getpass) as soon as the process starts—don't wait until it's actually ready to connect to the remote machine.

Another sticking point could be sudo. By design, sudo makes it difficult to enter your password ahead of time, but /etc/sudoers is sufficiently flexible that you can probably make it possible for your script to perform whatever specific tasks it needs to without prompting for a password.

Automated Processes Must Not Require Manual Setup or Teardown

This is almost the same as having an automated process that's not actually automated. If your “completely automated” process comes with a list of two-dozen manual steps that have to be done before (or after) you run it, then your process isn't automated.

More than about four steps is completely unacceptable, and the ideal is just two:

  1. Check something out
  2. Run a script

Manual setup is more dangerous than blatant non-automation because it's more subtle. It's easy to think that your process is completely automated when it's not—at least, not quite. Think about it: do any of your “automated” processes require you to first

All of these things could be automated (or given reasonable defaults), so that you don't have to think about them. That'll make it easier to reproduce results later, and more importantly, it'll make it much easier for the next person to figure out how to get the process to work. One of the great benefits of automation is that it encapsulates knowledge about your process. But it only does that for parts of the process that you actually automate, so be sure to automate all of it.

Manual teardown is, if anything, even more insidious because it's entirely possible to run your script without doing the necessary manual teardown, and assume that everything is fine. It's even possible for other parts of your process to mask the need for a teardown step.

Suppose, for example, that one of your test suites alters the network configuration on the target machine, and never resets it. Also suppose that the normal practice for developers on your project is to start up a new VM to run each test suite against. So far, so good. One of the test suites screws up networking on its target VM, but nobody ever uses a VM for more than one test suite, so nobody notices a problem. This continues until one day when some innocent young developer (probably a new hire) recoils in horror at the thought of running each test suite as a separate manual step and decides to write a script that will run all your test suites in parallel against some fixed set of VMs, reusing VMs as needed.

Now there's a problem. There are always test failures, but they're not always the same tests. After a week of frustrating investigation, our intrepid automater discovers that the failing tests always run on the same VM as the test suite that changes the network configuration, and eventually solves the problem by adding an automated teardown step that resets the network. (Note that another solution would have been to have the tests themselves setup and tear down the VMs, which would have captured the process that the developers had previously been following, but which had escaped automation.)

Because they can hide in plain sight, it's important to be vigilant about teardown steps. They're a lot like resource leaks in traditional code. Speaking of which....

Automation Code Must Be Treated Like Code

If you're a good programmer, there are several things that you probably do with your code. You don't copy-and-paste, you factor out duplication, you create loosely coupled modules with simple interfaces, you remove code that's no longer pulling its weight, you collect libraries of utility functions, you write tests, you write documentation (at least useful comments, and maybe even standalone docs), and so on.

But for some reason, otherwise excellent programmers stop doing these things when confronted with automated tests or a build script. Are there actions that you perform in numerous tests? Put them in a library. Did you just make a copy of your build script for a new target? Factor out the common parts.

The hard part of factoring out automation code is figuring out where to put it. You need to pull it out into a separate project (like a “build” project), but you also want to be careful that this project isn't too tightly coupled with your other projects. Otherwise you end up making changes in multiple places whenever you need to update your build (or test) process for a particular project.

What else does good code need to do? It should not just be easy to use, but also hard to use the wrong way. Please don't assume that, just because the person using your build script is a programmer, he or she will never flip two command-line arguments by accident. Pretend you're going to ship your build script masses, and design the interface to be fool-proof.

When good code fails, it does so noisily. When it succeeds, it does so (relatively) quietly. Automation code too often gets this backwards. Happy, successful scripts will fill the terminal with output, and even error messages. Pare down the output to what's actually useful, and get rid of irrelevant error messages, or at least redirect them to /dev/null. And when the code finishes its task, the last line on standard output should be “Successfully completed ________”!

On the other hand, the nature of some scripting languages makes it easy for errors to be missed. If you're using the Bourne shell, for example, be sure to put set -e at the top, so that failing commands aren't silently ignored.

The hallmark of good code, however, is simplicity. Everybody recognizes this and at least pays lip-service to the idea that simplicity is good. But it's unfortunately easy to convince yourself (and your team) that simplicity isn't that important for your automated processes. I think this stems from a sort of level-confusion: the fact that you're automating the process really does mean that the simplicity of the process itself is less important. No matter how complex the process is, the automation will take care of it.

The important part is protecting our easily-overwhelmed human brains from complexity. The automation will shield you from complexity in your process, but in the long run, nothing is shielding you from the complexity of your code. Sooner or later, you're going to have to debug your automation code, and if the code is complicated, then you're going to be in trouble. So keep your code simple (and keep in mind that the complexity of your automation code is not orthogonal to the complexity of the process it's automating).

And of course, automation code—like all your code—needs to be under version control. Which brings us to...

Automated Processes Must Be Versioned

This doesn't just mean that your automation code should be under version control. What it means is this: if you're currently shipping version 5.0.11 of your software, you need to be able to go back and build or test version 3.6.0. This is true at least as long as you have customers potentially running 3.6.0, and possibly even longer than that (depending on liability issues, regulatory concerns, and whatnot).

If your automated process doesn't touch anything outside of your version control repository, then this is a freebie. You'll need to make sure your branching model gives you the granularity you want as far as building releases and release candidates (or whatever), but there's not a lot else you need to worry about.

Except that there's always stuff outside of version control. If you think there isn't, you probably need to think harder. What third-party packages and libraries are installed on your build machine? What versions of them did you use to build 3.6.0? What operating system version was your test machine running back then? What hardware were the tests running on? Can you still get your hands on a machine like that?

At first glance, it would seem like most of this (except probably hardware concerns) could be taken care of by something like pbuilder or by virtual machines. Alas, the reality is more challenging.

Let me explain what pbuilder is, just in case you've never heard of it before. It's essentially a shell script that sets up a minimal Debian system inside a chroot. The idea is that you can build a Debian package with pbuilder and be confident that your build doesn't accidentally depend on packages that you happen to have installed on your machine. The problem, from a versioning standpoint, is that it pulls down your build-dependencies using apt-get. So while you can be fairly certain that your pbuilder environment doesn't include unnecessary packages, you can't really be sure what package versions you had in your pbuilder environment two years ago. Unless you're willing to take a complete snapshot of a Debian mirror with every release, or put your local Debian mirror under version control.

You can always go a step further and use full-blown virtual machines in your automation, but you run into some of the same problems. Do you know what RedHat package versions you built 3.6.0 against? What patches were installed on the Windows XP machine that the 20091110 release was tested against, and what were the policies of its ActiveDirectory domain?

These aren't just theoretical concerns. This is stuff that's been causing me headaches for over a year. I inherited a large codebase, but no longer have access to the engineers that developed it. It came with a whole bunch of automated tests (good) that interact with several dozen virtual machines (fine), with essentially no documentation about how those VMs were supposed to be configured (uh-oh). Then, before we'd really gotten this test infrastructure sorted out, a bug in Xen corrupted some of the VM images (bad), and we apparently had no backups (BAD!). Now, I was able to configure some new VMs so that these tests (mostly) pass, but it's not possible for me to be sure that all of the tests are still testing what they're supposed to test, because I don't know how the VMs were originally configured.

One solution would be to have automated snapshots. Your build script, for example, could take a snapshot of the build VM before it starts. But although that will give you a way to replicate your build, it won't provide insight into why the VM might have changed. For that, what you really want is commit logs in version control. But there's just no way to put VM images under version control, is there?

Probably not, but here's how I think you can come close. Start with a few base disks: one for Windows XP SP2, one for Windows Vista, one for Debian Squeeze, and so on. Then put together some automation to recreate your desired configuration starting from the appropriate base disk. For Linux systems, this could be as simple as putting /etc into your version control system, along with, perhaps, a list of extra packages to be installed (along with desired versions). Alas, there doesn't seem to be any Windows equivalent of putting /etc under version control, and judging by this thread, Windows admins don't even know what they're missing[1]. But it shouldn't be too hard to throw together some VB scripts to configure a Windows box from scratch.

Once you've created this extra automation, take care to actually use it. It does you no good to have the theoretical capability to recreate all of your VMs unless you actually keep that capability current, and the only way to keep it current is to force the VMs to be rebuilt frequently. If something is wrong with your VM construction, then things will fail after a rebuild, and you'll be able to find out what's wrong and fix it.

So, do I think you should re-generate all of your VMs every time you run a build or test script? Hold that thought. I have one last rule about automation.

Automated Processes Should Be Fast

The system I'm currently working on takes about 18 hours to build. Even a partial rebuild can take hours, so if I'm doing a build and something gets screwed up, I can easily find myself waiting for the build to finish at three in the morning.

Parts of the test suite can run for days on end.

At another job, we had a substantial core library whose unit tests took about 45 minutes to compile. On top of that, we had some individual tests that were slow to run, but they paled in comparison to the compile time. At some point, I managed to do some simple things (like using pre-compiled headers) that cut the compile time roughly in half, but that wasn't really enough. It needs to be easy for people to run automated unit tests whenever they make a change in the code. Forty minutes is too long, but twenty minutes isn't much better. It probably needed to be under a minute to see real benefits. As it was, running (and writing) unit tests was something that got delayed until we were close to a major milestone. We always wrote them, and they provided some value, but they weren't an actively useful part of the development process the way they could have been if they'd been faster.

There were a few reasons those unit tests took so long to compile. They were testing heavily-templated C++, which is notoriously slow. We also had some tests that we didn't really need (partly for regulatory reasons). The reasons for slow automated processes vary substantially, but there are a few suggestions that should be helpful for any project.

The first and best is also the simplest: When your automated process is finished, display the amount of time it took. Not the current time, and not the start time and/or the end time—let the computer do the subtraction itself and display the actual number of hours/minutes/seconds that the process took. Once you have this number showing up at the end of every script's output, you'll naturally notice when things slow down. This helps for both the short term (“Yikes! Turning that class into a template added five minutes to the build”) and the long term (“Gosh, I'm pretty sure that last year the first digit in my testing time was a two instead of a five.”). But it's the medium term that's most useful. You'll notice the times slowly creeping up and decide to do something about it.

Don't do work you don't need to. Build systems have always checked dependencies to avoid re-building more than they need to. Extend that to your tests. Ideally, you can roll your tests into the build system, and it'll run only those tests that are actually affected by code that's changed.

Along the same lines, don't be afraid to remove functionality that you don't need. This helps both speed and simplicity. This sort of opportunity is easiest to spot in tests. You might have, for example, lots of tests that test trivial things (think accessors), but which have a substantial time cost when added together. Or you might be testing a lot of classes or functions that haven't actually been used in one of your products for years. Or your process itself may have expanded unnecessarily over time (e.g., you've got nested pbuilder environments three levels deep, or VMs running inside other VMs).

So I'm suggesting that you set up VMs from scratch for each test, but do it fast? You can rebuild everything from scratch overnight or on the weekend. Be sure to do it regularly, to ensure that your build is fully automated (i.e., to ensure that your automation properly documents your process). But the rest of the time, be sure to keep your automation fast, so that it actually helps instead of getting in the way.

So those are my five main automation suggestions: Automate everything (even setup and tear-down), take automation code seriously, make your process reproducible, and make it fast enough that people can actually use it. Obviously, you can take them or leave them, but keep in mind that they arose from about a year of substantial pain, and have some pity for the poor schmuck who inherits your code.