Environment based DevOps Deployment using Puppet and Mcollective

One of the challenges that we ran into at my current project was how to treat the deployment of our puppet configuration in the same way as we treat the deployment of our applications – i.e. push it to ‘test’ environments to verify the changes prior to pushing the changes to the production environment. We needed a way to validate that changes in the puppet code would produce the expected results when applied to the production environment without actually pushing them there.

Our solution to this was to setup 5 different puppet environments that represented each of the different environments into which code gets deployed. We then used a combination of puppet, mcollective and mercurial to promote changes between environments. With appropriate tests in each environment, we were able to validate that the infrastructure changes we had made were ready to be promoted up the ladder.

Technical Setup

We configured our machines into separate collectives that represent the deployment environment in which they lived. Each of these collectives had a corresponding environment allocated in puppet such that when they executed a puppet apply, they pulled their infrastructure code from their environment codebase. A successful application of the infrastructure code to the previous environment triggers an update of the environment codebase to the same mercurial revision via our continuous deployment server.

Our puppetmaster config (in /etc/puppet/puppet.conf) looks as follows:

manifest = /usr/share/puppet-recipes/$environment/puppet/manifests/site.pp
modulepath = /usr/share/puppet-recipes/$environment/puppet/modules

Our puppet application is triggered via an mcollective agent running the following command:

/usr/sbin/puppetd --environment=${collective} --onetime --no-daemonize --verbose

Execution Setup

The puppet environments we have configured are:

  1. NOOP
  2. CI
  3. DevTest
  4. UAT
  5. Production

Each of these environments corresponds to a different stage in our continuous deployment server. The first stage is the most interesting as it has the majority of the tests in place to catch issues with our puppet manifests. The NOOP run does the following:

  1. Pulls the latest checkin into the NOOP puppet environment codebase
  2. Compiles the catalogs for each of our nodes using the NOOP codebase – this catches the majority of typo errors, missing dependencies, forgotten variables for templates and missing files.
  3. Runs a puppet NO-OP run against all nodes – this catches most of the remaining logical and cyclical dependency errors that can be introduced by a puppet module change.
  4. The puppet NO-OP run also produces an output report that provides us with the visibility to understand what changes are going to be applied to each environment with the latest codebase – this is very useful for auditing and tracing purposes
  5. If the NO-OP run completes without any errors, the mercurial revision of the last checkin is exposed via our continuous deployment server

The following four stages all do the same thing:

  1. Grab the mercurial revision exposed by the last successful run of the previous stage and update the appropriate environment codebase to that revision
  2. Trigger a puppet apply run for all the machines in that collective – capture and analyse the output to verify no warnings or errors
  3. If the run completes without any errors, expose the mercurial revision that was just applied out via the continuous deployment server

Because each of our deployment environments is a similar setup to environments above it, this setup provides us with the opportunity to verify that changes applied to a server are going to work in upstream environments. The primary difference between a CI environment and a production environment in our case is that one will have more servers (of the same type) and may offload some work to a dedicated server instead of hosting it on the same box as the application runs (i.e. a db server running alongside an application server in CI and DevTest vs. an independent db server in UAT and Production).

This setup isn’t perfect – in particular, running the puppet NOOP stage whilst also running another stage can cause issues as puppet will fail if it detects another puppet run ongoing – but it provides us with a reasonable amount of certainty that the changes we have made are correct and will not break any of the systems in later environments.