Category: Puppet

Testing puppet manifests part 1 – Local Compilation

Testing puppet manifests

The pipeline approach we use to move our infrastructure changes from one environment to the next gives us the advantage of having some visibility into what will happen in an upstream environment. Still, it would be nice to be able to get some quick feedback on potential issues with out puppet codebase before we even apply the changes. We have come up with 2 mechanisms to do this that provide us with very fast feedback and some assurance that our changes won’t immediately break the first upstream environment. The first, covered in this blog post, is the local node compilation.

Node manifest compilation

In the same way that a developer compiles their code locally prior to checking in, the node manifest compilation step is a verification step that runs through each and every node we have defined in our puppet manifests and compiles the puppet code. This catches errors such as:

  • Syntax errors
  • Missing resource errors – i.e. a file source is defined but not checked in
  • Missing variable errors for templates

The code to do this is pretty simple:

  1. Configure Puppet with the manifest file location (nodes.pp) and the module directory path
  2. Use the puppet parser to evaluate the manifest file and find all available nodes for compilation
  3. For each node found, create a Puppet node object and then call compile on it
  4. Compile all nodes, fail only at end of run if any nodes fail to compile and provide all failed nodes in output
require 'rubygems'
require 'puppet'
require 'colored'
require 'rake/clean'

desc "verifies correctness of node syntax"
task :verify_nodes, [:manifest_path, :module_path, :nodename_filter] do |task, args|
  fail "manifest_path must be specified" unless args[:manifest_path]
  fail "module_path must be specified" unless args[:module_path]

  setup_puppet args[:manifest_path], args[:module_path]
  nodes = collect_puppet_nodes args[:nodename_filter]
  failed_nodes = {}
  puts "Found: #{nodes.length} nodes to evaluate".cyan
  nodes.each do |nodename|
    print "Verifying node #{nodename}: ".cyan
    begin
      compile_catalog(nodename)
      puts "[ok]".green
    rescue => error
      puts "[FAILED] - #{error.message}".red
      failed_nodes[nodename] = error.message
    end
  end
  puts "The following nodes failed to compile => #{print_hash failed_nodes}".red unless failed_nodes.empty?
  raise "[Compilation Failure] at least one node failed to compile" unless failed_nodes.empty?
end

def print_hash nodes
  nodes.inject("\n") { |printed_hash, (key,value)| printed_hash << "\t #{key} => #{value} \n" }
end

def compile_catalog(nodename)
  node = Puppet::Node.new(nodename)
  node.merge('architecture' => 'x86_64',
             'ipaddress' => '127.0.0.1',
             'hostname' => nodename,
             'fqdn' => "#{nodename}.localdomain",
             'operatingsystem' => 'redhat',
             'local_run' => 'true',
             'disable_asserts' => 'true')
  Puppet::Parser::Compiler.compile(node)
end

def collect_puppet_nodes(filter = ".*")
  parser = Puppet::Parser::Parser.new("environment")
  nodes = parser.environment.known_resource_types.nodes.keys
  nodes.select { |node| node =~ /#{filter}/ }
end

def setup_puppet manifest_path, module_path
  Puppet.settings.handlearg("--config", ".")
  Puppet.settings.handlearg("--manifest", manifest_path)
  Puppet.settings.handlearg("--modulepath", module_path)
  Puppet.parse_config
end

Code available here: https://github.com/oldNoakes/puppetTesting

Note that in our production code, we break up our nodes into subsets and then fork a process for each of these to compile in. Currently we run 20 parallel processes for over 400 nodes – typically takes about 45 seconds on a fast machine (i.e. our build server) and up to 120 seconds on a slower one (i.e. the worst developer station that we have).

Environment based DevOps Deployment using Puppet and Mcollective

One of the challenges that we ran into at my current project was how to treat the deployment of our puppet configuration in the same way as we treat the deployment of our applications – i.e. push it to ‘test’ environments to verify the changes prior to pushing the changes to the production environment. We needed a way to validate that changes in the puppet code would produce the expected results when applied to the production environment without actually pushing them there.

Our solution to this was to setup 5 different puppet environments that represented each of the different environments into which code gets deployed. We then used a combination of puppet, mcollective and mercurial to promote changes between environments. With appropriate tests in each environment, we were able to validate that the infrastructure changes we had made were ready to be promoted up the ladder.

Technical Setup

We configured our machines into separate collectives that represent the deployment environment in which they lived. Each of these collectives had a corresponding environment allocated in puppet such that when they executed a puppet apply, they pulled their infrastructure code from their environment codebase. A successful application of the infrastructure code to the previous environment triggers an update of the environment codebase to the same mercurial revision via our continuous deployment server.

Our puppetmaster config (in /etc/puppet/puppet.conf) looks as follows:

manifest = /usr/share/puppet-recipes/$environment/puppet/manifests/site.pp
modulepath = /usr/share/puppet-recipes/$environment/puppet/modules

Our puppet application is triggered via an mcollective agent running the following command:

/usr/sbin/puppetd --environment=${collective} --onetime --no-daemonize --verbose

Execution Setup

The puppet environments we have configured are:

  1. NOOP
  2. CI
  3. DevTest
  4. UAT
  5. Production

Each of these environments corresponds to a different stage in our continuous deployment server. The first stage is the most interesting as it has the majority of the tests in place to catch issues with our puppet manifests. The NOOP run does the following:

  1. Pulls the latest checkin into the NOOP puppet environment codebase
  2. Compiles the catalogs for each of our nodes using the NOOP codebase – this catches the majority of typo errors, missing dependencies, forgotten variables for templates and missing files.
  3. Runs a puppet NO-OP run against all nodes – this catches most of the remaining logical and cyclical dependency errors that can be introduced by a puppet module change.
  4. The puppet NO-OP run also produces an output report that provides us with the visibility to understand what changes are going to be applied to each environment with the latest codebase – this is very useful for auditing and tracing purposes
  5. If the NO-OP run completes without any errors, the mercurial revision of the last checkin is exposed via our continuous deployment server

The following four stages all do the same thing:

  1. Grab the mercurial revision exposed by the last successful run of the previous stage and update the appropriate environment codebase to that revision
  2. Trigger a puppet apply run for all the machines in that collective – capture and analyse the output to verify no warnings or errors
  3. If the run completes without any errors, expose the mercurial revision that was just applied out via the continuous deployment server

Because each of our deployment environments is a similar setup to environments above it, this setup provides us with the opportunity to verify that changes applied to a server are going to work in upstream environments. The primary difference between a CI environment and a production environment in our case is that one will have more servers (of the same type) and may offload some work to a dedicated server instead of hosting it on the same box as the application runs (i.e. a db server running alongside an application server in CI and DevTest vs. an independent db server in UAT and Production).

This setup isn’t perfect – in particular, running the puppet NOOP stage whilst also running another stage can cause issues as puppet will fail if it detects another puppet run ongoing – but it provides us with a reasonable amount of certainty that the changes we have made are correct and will not break any of the systems in later environments.