Vendoring Dependencies

All software has to deal with dependencies. It is a fact of life. No program can execute without some supporting tools. These tools, more often than not, are made available by some sort of dependency management system.

There are many paths to including dependencies in your software. A popular use case, especially in uncompiled languages like Python or Ruby, is to resolve dependencies when installing the program. In Python, pip install will look at the dependencies a program or library needs, download and install them into the environment.

The best argument for resolving dependencies during install (or runtime) is that you can automatically apply things like security fixes to dependent libraries across all applications. For example, if there is a libssl issue, you can install the new libssl, restart your processes and the new processes should automatically be using the newly installed library.

The biggest problem with this pattern is that it’s easy to have version conflicts. For example, if a software declares it needs version 1.2 of a dependency and some other library requires 1.1, the dependencies are in conflict. These conflicts are resolved by establishing some sort of sandboxed environment where each application can use the necessary dependencies in isolation. Unfortunately, by creating a sandboxed environment, you often disconnect the ability for a package to inherit system wide libraries!

The better solution is to vendor dependencies with your program. By packaging the necessary libraries you eliminate the potential for conflicts. The negative is that you also eliminate the potential for automatically inheriting some library fixes, but in reality, this relationship is not black and white.

To make this more concrete lets look at an example. Say we have a package called “supersecret” that can sign and decrypt messages. It uses libcrypto for doing the complicated work in C. Our “supersecret” package installs a command line utility ss that uses the click library. The current version of click is 4.x but we wrote this when 3.x was released. Lets assume as well that we use some feature that breaks our code if we’re using 4.x.

We’ll install it with pip

$ pip install supersecret

When this gets installed, it will use the system level shared libcryto library. But, we’ve vendored our click dependency.

The benefit here is that we’ve eliminated the opportunity to conflict with other packages, while still inheriting beneficial updates from the lower level system.

The arguments against this pattern is that keeping these dependencies up to date can be difficult. I’d argue that this is incorrect when you consider automated testing via a continuous integration server. For example, if we simply have click in our dependency list via setup.py or requirements.txt we can assume our test suite will be run from scratch, downloading the latest version and revealing broken dependencies. While this requires tests that cover your library usage, that is a good practice regardless.

To see a good example of how this vendoring pattern works in practice, take a look at the Go <http://golang.org/> language. Go has explicitly made the decision to push dependency resolution to happen at build time. The result is that go binaries can be copied to a machine and run without any other requirements.

One thing that would make vendoring even safer is a standard means of providing information about what libraries are versioned. For example, if you do use libssl, having a way to communicate that dependency is vendored would allow an operator recognize what applications may need to be updated when certain issues arise. That in mind, as we’ve seen above, many critical components in languages such as Python or Ruby make it trivial utilize the system level dependencies that typically are considered when discussions arise regarding rotted code due to vendoring.

Vendoring is far from a panacea, but it does put the onus on the software author to take responsibility for dependencies. It also promotes working software over purity from the user’s perspective. Even if you are releasing services where you are the operator, managing your dependencies when you are working on the code will greatly simplify the rest of the build/release process.

Small Functions without an IDE

I’ve been reading Clean Code for a book club at work. So far, it is really a great book as it provides attributes and techniques for understanding what clean code really looks like. My only complaint, which is probably the wrong word, is that the suggestions are based on using a more verbose language such as Java that you use with an IDE.

As I said, this really isn’t a complaint so much as the author mentions how things like renaming things and creating very small functions are painless thanks to the functionality of the IDE. In dynamically typed languages like Python, the same level of introspection doesn’t generally exist for tooling available.

As an aside, function calls in Python can be expensive, so extracting a single function into many, much smaller functions does have the potential to slow things down. I saw this impact on a data export tool that needed to perform a suite of operations on each role. It had started as one huge function and I refactored it into a class with a ton of smaller methods. Unfortunately, this did slow things down somewhat, but considering the domain and expense of maintaining the single, huge function, the slowdown was worth it.

Performance aside, I’d argue that it is definitely better to try to keep functions small and use more when writing any code. The question then, is how do you manage the code base when you can’t reliably jump to function references automatically?

I certainly don’t have all the answers, but here are some things I’ve noticed that seem to help.

Use a Single File

While your editor might support refactoring tools, it most certainly has the ability to search. When you need to pop around to different functions, keep the number of files to a minimum so you can easily use search to your advantage.

Use a Flat Namespace

Using a flat namespace goes hand in hand with keeping more functions / methods in a single file. Avoid nesting your modules to make it faster to find files within the code. One thing to note is that the goal here is no to keep a single folder with hundreds of files. The goal is to limit the scope of each folder / module to include the code it will be using.

You can think of this in the same terms as refactoring your classes. If a file has functionality that seems out of place in the module, move it somewhere else. One benefit of using a dynamic language like Python is you don’t have the same one class per file requirements you see in something like Java.

Consistent Naming

Consistent naming is pretty obvious, but it is even more important in a dynamic language. Make absolutely sure you name the same usage of the same objects / classes throughout your code base. If you are diligent in how you name your variables, search and replace can be extremely effective in refactoring.

Write Tests

Another obvious one here, but make sure you write tests. Smaller functions means more functions. More functions should mean more tests. Fortunately, writing the tests are much, much easier.

class TestFoo(object):

    def test_foo_default(self): ...
    def test_foo_bar(self): ...
    def test_foo_bar_and_baz(self): ...
    def test_foo_bar_and_baz_and_kw(self): ...

If you’ve ever written a class like this, adding more functions should make your tests easier to understand as well. A common pattern is to write a test for each path a function can take. You often end up with a class that has tons of oddly named test functions with different variables mocked in order to test the different code paths in isolation. When a function gets refactored into many small functions (or methods) you see something more like this:

class TestFoo(object):

    def setup(self):
        self.foo = Foo()

    def test_foo(self):
        self.foo.bar = Mock()

        self.foo()

        assert self.foo.bar.called

    def test_bar(self):
        self.foo.baz = Mock()

        self.foo.bar()

        self.foo.baz.assert_called_with('/path/to/cfg')

In the above example, you can easily mock the functions that should be called and assert the interface the function requires is being met. Since the functions are small, you’re tests end up being easy to isolate and you can test the typically very small bit of functionality that needs to happan in that function.

Use Good Tools

When I first started to program and found out about find ... | xargs grep my life was forever changed. Hopefully your editor supports some integration of search tools. For example, I use Emacs along with projectile, which supports searching with ag. When I use these tools in my editor along side the massive amount of functionality my editor provides, it is a very powerful environment. If you write code in a dynamic language, it is extremely important to take some time to master the tools available.

Conclusions

I’m sure there are other best practices that help to manage well factored code in dynamic languages. I’ve heard some programmers that feel refactoring code to very small functions is “overkill” in a language like Python, but I’d argue these people are wrong. The cost associated with navigating the code base can be reduced a great deal using good tools and some simple best practices. The benefits of a clean, well tested code base far out weigh the cost of a developer reading the code.

Operators

Someone mentioned to me a little while back a disinterest in going to PyCon because it felt directed towards operators more than programmers. Basically, there have become more talks about integrations using Python than discussions regarding language features, libraries or development techniques. I think this trend is natural because Python has proven itself as a main stream language that has solved many common programming problems. Therefore, when people talk about it, it is a matter of how Python was used rather than describing how to apply some programming technique using the language.

With that in mind, it got me thinking about “Operators” and what that means.

Where I work there are two types of operators. The first is the somewhat traditional system administrator. This role is focused on knowledge about the particular system being administered. There is still a good deal of automation work that happens at this level, but it is typically focused on administering a particular suite of applications. For example, managing apache httpd or bind9 via config files and rolling out updates using the specific package manager. There is typically more nuance to this sort of role than can be expressed in a paragraph, so needless to say, these are domain experts that understand the common and extreme corner cases for the applications and systems they administer.

The second type of operator is closer to the operations included devops. These operators are responsible for building the systems that run application software. These folks are responsible for designing the systems and infrastructure to run the custom applications. While more traditional sysadmins use configuration management, these operators master it. Ops must have a huge breadth of knowledge that spans everything. File systems, networking, databases, services, *nix, shell, version control and everything in between are all topics that Ops are familiar with.

As a software developer, we think about abstract designs, while ops makes the abstract concrete.

After working with Ops for a while, I have a huge amount of respect due to the complexity that must be managed. There is no way to simply import cloud and cloud.start(). The tools available to Ops for enacapsulating concepts is rudimentary by necessity. The configuration management tools are still very new and the terminology hasn’t coalesced towards design patterns due to the fact that everyone’s starting point is different. Ops is where linux distros, databases, load balancers, firewalls, user management and apps come together to actually have working products.

It is this complexity that makes DevOps such an interesting place for software development. Amidst the myriad of programs and systems, there needs to be established concepts that can be reused as best practices, and eventually, as programs. Just as C revolutionized programming by allowing a way to build for different architectures, DevOps is creating the language, frameworks, and concepts to deploy large scale systems.

The current state of the art is using configuration manangement / orchestration tools to configure a system. While in many ways this is very high level, I’d argue that it is closer to assembly in the grand scheme of things. There is still room to encapsulate these tools and provide higher level abstractions that simplify and make safe the processes of working with systems.

Docker and Chef

Chef is considered a “configuration management” tool, but really is an environment automation tool. Chef makes an effort to peform operations on your system according to a series of recipes. In theory, these recipes provide a declarative means of:

  1. Defining the process of performing some operations
  2. Defining the different paths to complete an operation
  3. The completed state on the system when the recipe has finished

An obvious, configuration specific, example would be a chef recipe to add a new httpd config file in /etc/httpd/sites.enabled.d/ or somewhere similar. You can use similar tactics you see in make check if you have a newer file or not and how to apply the change.

Defining the operations that need to happen, along with handling valid error cases, is non-trivial. When you add to that also defining what the final state should look like between processes running, file changes or even database updates, you have a ton of work to do with an incredible amount of room for error.

Docker, while it is not a configuration management tool, allows you to bundle your build with your configuration, thus separating some of the responsibility. This doesn’t preclude using chef as much as it limits it to configuring the system in which you will run the containers.

Putting this into more concrete terms, what we want is a cascading system that allows each level to encapsulate its responsibilities. By doing so, a declaration that some requirement has been met can allow the lower layer to report back a simple true/false.

In a nutshell, use chef to configure the host that will run your processes. Use docker containers to run your process with the production configuration bundled in the container. By doing so, you take advantage of Chef and its community cookbooks while making configuration of your application encapsulated in your build step and the resulting container.

While this should work, there are still questions to consider. Chef can dyanmically find configuration values when converging while a docker container’s filesystem is read only. While I don’t have a clear answer for this, my gut says it shouldn’t be that difficult to sort out in a reliable pattern. For example, chef could check out some tagged configuration from a git repo that gets mounted at /etc/$appname when running the container. Another option would be to use etcd to update the filesystem mounted in a container. In either case, the application uses the filesystem normally, while chef provides the dynamism when converging.

Another concern is that in order to use docker containers, it is important you have access to a docker registry. Fortunately, this is a relatively simple process. One downside is that there is not a OpenStack Swift backed v2 registry. The other option is to use docker hub and pay for more private containers. The containers should be registered as private because they include the production configuration.

It seems clear that a declarative system is valuable when configuring a host. Unfortunately, the reality is that the resources that are typically “declared” with Chef are too complex to maintain a completely declarative pattern. Using docker, a container can be can be tested reliably such that a running container is enough to consider its dependency met in the declared state.

Emacs and Strings

If you’ve ever programmed any elisp (emacs lisp) you might have been frustrated and surprised by the lack of string handling functions. In Python, it is trivial to do things like:

print('Hello World!'.split().lower())

The lack of string functions in elisp has been improved greatly by s.el, but why haven’t these sorts functions existed in Emacs in the first place? Obviously, I don’t know the answer, but I do have a theory.

Elisp is (obviously) a LISP and LISPs are functional! One tenant of functional languages is the use of immutable data. While many would argue immutability is not something elisp is known for, when acting on a buffer, it is effectively immutable. So, rather than load some string into memory, mutate it and use it somewhere, my hunch is early emacs authors saw things differently. Instead, they considered the buffer the place to act on strings. When you call an elisp function it acts like a monad or a transaction where the underlying text is effectively locked. Rather than loading it into some data structure, you instead are given access to the editor primitives to literally “edit” the text as necessary. When the function exits, the buffer is then returned to the UI and user in its new state.

The benefits here are:

  1. You use the same actions the user uses to manipulate text
  2. You re-use the same memory and content the editor is using

While, it feels confusing coming from other languages, if you think of all the tools available to edit text in Emacs, one could argue that string manipulation is not necessary.

Of course, my theory could be totally wrong, so who knows. Fortunately, there is s.el to help bridge the gap between editing buffers and manipulating text.

Announcing: Withenv

I wrote a tool to help sanely manage environment variables. Environment Variables (env vars) are a great way to pass data to programs because it works practically everywhere with no set up. It is a lowest common denominator that almost all systems support all the way from dev to production.

The problem with env vars is that they can be sticky. You are in a shell (zsh, bash, fish, etc...) and you set an environment variable. It exists and is available to every command from then on. If an env var contains an important secret such as a cloud account key, you could silently delete production nodes by mistake. Someone else could use your computer and do the same thing, with or without malicious intent.

Another difficulty with env vars is that they are a global key value store. Writing small shell scripts to export environment variables can be error prone. Copying and pasting or commenting out env vars in order to configure a script is easy to screw up. The fact these env vars are long lasting only makes it more difficult to automate reliably.

Withenv tries to improve this situation by providing some helpful features:

  • Setup the environment for each command without it leaking into your shell
  • Organization of your environment via YAML files
  • Cascading of your environment files in order to override specific values
  • Debugging the environment variables

Here is how it works.

Lets say we have a script that starts up some servers. It uses some environment variables to choose how many servers to spin, what cloud account to use and what role to configure them with (via Chef or Ansible or Salt, etc.). The script isn’t important, so we’ll just assume make create does all the work.

Lets organize our environment YAML files. We’ll create a envs folder that we can use to populate our environment. It will have some directories to help build up an environment.

envs
├─ env
│  ├─ dev
│  └─ prod
└─ roles
   ├─ app-foo
   └─ app-bar

Now we’ll add some YAML files. For example, lets create a YAML file in the envs/env/dev that connects to a development account.

# envs/env/dev/rax_creds.yml
---
- RACKSPACE:
  - USERNAME: eric
  - API_KEY: 02aloksjdfp;aoidjf;aosdijf

You’ll notice that we used a nested data structure as well as lists. Using lists ensure we get an explicit ordering. We could have used a normal dictionary as well if the order doesn’t matter. The nesting ensures that each child entry will use the correct prefix. For example, the YAML above is equivalent to the following bash script.

export RACKSPACE_USERNAME=eric
export RACKSPACE_API_KEY=02aloksjdfp;aoidjf;aosdijf

Now, lets create another file for defining some object storage info.

# envs/env/dev/cloud_storage.yml
---
- STORAGE_BUCKET: devstore
- STORAGE_PREFIX: $STORAGE_BUCKET/dev

You’ll notice that the STORAGE_PREFIX uses the value of the STORAGE_BUCKET. You can do normal dollar prefixed replacements like you would do normally in an shell. This includes any variables currently defined in your environment such as $HOME or $USER that are typically set. Also, by using a list (as defined by the -), we ensure that we apply the variables in order and the STORAGE_BUCKET exists for use within the STORAGE_PREFIX value.

With our environment YAML in place, we can now use the we command withenv provides in order to set up the environment before calling a command.

$ we -e envs/common.yml -d envs/env/dev -d envs/role/app-foo make create

The -e flag lets you point to a specific YAML file, while the -d flag points to a directory of YAML files. The ordering of the flags is important because the last entry will take precedence. In the command above, we might have configured common.yml with a personal dev account along with our defaults. The envs/env/dev/ folder contains a rax_creds.yml file that overrides the default cloud account with shared development account, leaving the other defaults alone.

The one limitation is that you cannot use the output from commands as a value to an env var. For example, the following wouldn’t work to set a directory path.

CONFIG_PATH: `pwd`/etc/foo/

This might be fixed in the future, but at the moment it is not supported.

If you don’t pass any argument to the we command it will output he environment as a bash script using export to set variables.

Withenv is available on pypi. Please let me know if you give it a try.

Log Buffering

Have you ever had code that needed to do some logging, but your logging configuration hadn’t been loaded? While it is a best practice to set up logging as early as possible, logging is still code that needs to be executed. The Python runtime will still do some setup (ie import everything) that MUST come before ANY code is executed, including your logging code.

One solution would be to jump through some hoops to make that code evaluated more lazily. For example, say you wanted to apply a decorator from some other package if it is installed. The first time the function is called, you could apply the decorator. This would get pretty complex pretty quickly.

class LazyDecorator(object):
    def __init__(self, entry_point):
        self.entry_point = entry_point
        self.func = None

    def find_decorator(self):
        # find our decorator...

    def __call__(self, f):
        self.original_func = f
        def lazy_wrapper(*args, **kw):
            if not self.func:
                self.func = self.find_decorator()
            return self.func(*args, **kw)
        return lazy_wrapper

I haven’t tried the code above, but it does rub me the wrong way. The reason being is that we’re jumping through hoops just to do some logging. Function calls are expensive in Python, which means if you decorated a ton of functions, the result could end up as a lot of overhead for a feature that only effects start up.

Instead, we can just buffer the log output until after we’ve loaded our logging config.

import logging


class LazyLogger(object):

    LVLS = dict(
        debug=logging.DEBUG,
        info=logging.INFO,
        warning=logging.WARNING,
        error=logging.ERROR,
        critical=logging.CRITICAL,
        exception=logging.ERROR,
    )

    def __init__(self):
        self.messages = []

    def replay(self, logger=None):
        logger = logging.getLogger(__name__)
        for level, msg, args, kw in self.messages:
            logger.log(level, msg, *args, **kw)

    __call__ = replay

    def capture(self, lvl, msg, *args, **kw):
        self.messages.append((lvl, msg, args, kw))

    def __getattr__(self, name):
        if name in self.LVLS:
            return functools.partial(self.capture, self.LVLS[name])

We can use this as our logging object in our code that needs to log before logging has been configured. Then, when we can replay our log when it is appropriate by importing the logger, and calling the replay method. We could even keep a registry of lazy loggers and call them all after configuring logging.

The benefit of this tactic is that you avoid adding runtime complexity, while supporting the same logging patterns at startup / import time.

DevOps System Calls

One thing I’ve found when looking at DevOps is the adherance to specific tools. For example, if an organization uses chef, then it is expected that chef be responsible for all tasks. It is understandable to reuse knowledge gained in a system, but at the same time, all systems have pros and cons.

More importantly, each tool adheres to its own philosophies for how a system should be defined. Some are declarative while others are iterative and almost all systems define their own (clever at times) verbage for what the different elements of a system should be.

What the DevOps ecosystem really needs is a low level suite of common primitives we can build off of. A set of DevOps System Calls, if you will, we can use to build higher order systems. The reason is to gain the ability to have some gaurantees we can start to assume will work.

For example, in Python, when I write tests, I assume the standard library functions such as open or the socket module work as expected. You don’t see tests such as:

def test_open():
    with open('test_file.txt') as fh:
        fh.write('foo')

    assert open('test_file.txt').read() = 'foo'

We have similar expectations regarding much of the TCP/IP stack. We assume the bits are read correctly on the network hardware and passed to the OS, eventually landing in our program correctly. We take it for granted that the HTTP request becomes something like request.headers[‘Content-Type’] in our language of choice.

These assumptions let us consider our program in higher level terms that are portable across languages and systems. Every programmer understands what it means to open file, connect to a database or make a HTTP request within our programs because our level of abstraction is reasonably high.

DevOps could use a similar standard and the implementation doesn’t matter. A machine might be created with Ansible, but configured via Chef. That part doesn’t matter. What matters is we can write simple code that manages our operations.

For example, lets say I want to spin up a machine to run an app and a DB. Here is some psuedo code that might get the job done.

machine = cloud.create(flavor=provider.FLAVOR_COMPUTE)
machine.bootstrap()
app = packages.find('my-app')
machine.deploy(app)

This would compile to a suite of commands that trigger some DevOps tools do the work necessary to build the machines. The configuration of what provider, available flavors, and repository locations would all live in OS level config like you see for your OS networking, auth and everything else in /etc.

The key is that we can assume the calls will work or throw an error. The process is ecapsulated in such a way that we don’t need to think about the provider, setting API keys in an environment, bootstrapping the node for our configuration managment and every other tiny detail that needs to be performed and validated in order to consider the “recipe” or “playbook” as done.

Obviously, this is not trivial. But, if we consider where our tools excel and begin the process of encapsulating the tools behind some higher order concepts, we can begin to create a glossary and shared expectations. The result is a true Cloud OS.

Playing with Repose

At work we use a proxy called repose in front of most services in order to make common tasks such as auth, rate limiting, etc. consistent. In python, this type of function might also be accomplished via WSGI middleware, but by using a separate proxy, you get two benefits.

  1. The service can be written in any language that understands HTTP.
  2. The service gets to avoid many orthogonal concerns.

While the reasoning for repose makes a lot of sense, for someone not familiar with Java, it can be a little daunting to play with. Fortunately, the repose folks have provided some packages to make playing with repose pretty easy.

We’ll start with a docker container to run repose. The repose docs has an example we can use as a template. But first lets make a directory to play in.

$ mkdir repose-playground
$ cd repose-playground

Now lets create our Dockerfile:

FROM ubuntu

RUN apt-get install -y wget

RUN wget -O - http://repo.openrepose.org/debian/pubkey.gpg | apt-key add - && echo "deb http://repo.openrepose.org/debian stable main" > /etc/apt/sources.list.d/openrepose.list

RUN apt-get update && apt-get install -y \
  repose-valve \
  repose-filter-bundle \
  repose-extensions-filter-bundle

CMD ["java", "-jar", "/usr/share/repose/repose-valve.jar"]

The next step will be to start up our container and grab the default config files. This makes it much easier to experiment since we have decent defaults.

$ docker build -t repose-playground .
$ mkdir etc
$ docker run -it -v `pwd`/etc:/code repose-playground cp -r /etc/repose /code

Now we have our config in ./etc/repose, we can try something out. Lets change our default endpoint to point to a different website.

<?xml version="1.0" encoding="UTF-8"?>

<!-- To configure Repose see: http://wiki.openrepose.org/display/REPOSE/Configuration -->
<system-model xmlns="http://docs.openrepose.org/repose/system-model/v2.0">
    <repose-cluster id="repose">
        <nodes>
            <node id="repose_node1" hostname="localhost" http-port="8080"/>
        </nodes>
        <filters></filters>
        <services></services>
        <destinations>
            <endpoint id="open_repose" protocol="http"
                      <!-- redirect to ionrock.org! -->
                      hostname="ionrock.org"
                      root-path="/" port="80"
                      default="true"/>
        </destinations>
    </repose-cluster>
</system-model>

Now we’ll run repose from our container, using our local config instead of the config in the container.

$ docker run -it -v `pwd`/etc/repose:/etc/repose -p 8080:8080

If you’re using boot2docker, you can use boot2docker ip to find the IP of your VM.

$ export REPOSE_HOST=`boot2docker ip`
$ curl "http://$REPOSE_HOST:8080"

You should see the homepage HTML from ionrock.org!

Once you have repose running, you can leave it up and change the config as needed. Repose will periodically pick up any changes without restarting.

I’ve gone ahead and automated the steps in this repose-playground repo. While it can be tricky to get started with repose, especially if you’re not familiar with Java, it is worth taking a look at repose for implementing orthogonal requirements that make the essential application code more complex. This especially true if you’re using a micro services model where the less code the better. Just run repose on the same node, proxying requests to your service, which only listens on localhost and you’re good to go.

Docker vs. Provisioning

Lately, I’ve been playing around with Docker as I’ve moved back to OS X for development. At the same time, I’ve been getting acquainted with Chef in a reasonably complex production environment. As both systems have a decent level of overlap, it might be helpful to compare and contrast the different methodologies of these two deployment tactics.

What does Docker actually do?

Docker wraps up the container functionality built into the Linux kernel. Basically, it lets a process use the machine’s hardware in a very specific manner, using a predefined filesystem. When you use docker, it feels like starting up a tiny VM to run a process. But, what really happens, the container’s filesystem is used along with the hardware provided by the kernel in order to run the process in an isolated environment.

When you use Docker, you typically start from an “image”. The image is just an initial filesystem you’ll be starting from. From there, you might install some packages and add some files in order to run some process. When you are ready to run the process, you use docker run and it will get the filesystem ready and run the process using the computer’s hardware.

Where this differs from VM is that you only start one process. While you might create a container that has installed Postgres, RabbitMQ and your own app, when you run docker run myimage myapp, no other processes are running. The container only provides the filesystem. It is up to the caller how the underlying hardware is accessed and utilized. This includes everything from the disk to the network.

What does a Provisioner do?

A provisioner, like Chef, configures a machine in a certain state. Like Docker, this means getting the file system sorted out, including installing packages, adding configuration, adding users, etc. A provisioner also can start processes on the machine as part of the provisioning process.

A provisioner usually starts from a known image. In this case, I’m using “image” in the more common VM context, where it is a snapshot of the OS. With that in mind, a provisioner doesn’t require a specific image, but rather, the set of required resources necessary to consider the provisioned machine as complete. For example, there is no reason you couldn’t use a provisioner to create user directories across variety of unices, including OS X and the BSDs.

Different Deployment Strategies

The key difference when using Docker or a provisioner is the strategy used for deployment. How do you take your infrastructure and configure it to run your applications consistently?

Docker takes the approach of deploying containers. The concept of a container is that it is self contained. The OS doesn’t matter, assuming it supports docker. Your deployment then involves getting the container image and running the processes supported by the container.

From a development perspective, the deliverable artifact of the build process would be a container (or set of containers) to run your different processes. From there, you would configure your infrastructure accordingly, configuring the resources the processes can use at run time.

A provisioner takes a more generalized route. The provisioner configures the machine, therefore, it can use any number of deliverables to get your processes running. You can create system packages, programming language environments or even containers to get your system up and running.

The key difference from the devops perspective (the intersection of development and sysops), is development within constraints of the system must be coordinated with the provisioner. In other words, developers can’t simply choose some platform or application. All dependencies must be integrated into the provisioning system. A docker container, on the other hand, can be based on any image and use any resource available within the image’s filesystem.

What do you want to do?

The question of whether to use Docker or a provisioning system is not an either or proposition. If you choose to use Docker containers as your deployment artifact, the actual machines may still need to be configured. There are options that avoid the need to use a provisioning system, but generally, you may still use something like Chef to maintain and provision the servers that will be running your containers.

One vector to make a decision on what strategy to use is the level of consistency across your infrastructure. If you are fine with developers creating containers that may use different operating systems and tooling, docker is an excellent choice. If you have hard requirements as to how your OS should be configured, using a provisioning system might be better suited for you.

Another thing to consider is development resources. It can be a blessing and a curse to provision a development system, no matter what system you use. Your team might be more than happy to take on managing containers efficiently, while other teams would be better off leaving most system decisions to the provisioning system. The ecosystem surrounding each platform is another consideration.

Conclusions

I don’t imagine that docker (and containers generally) will completely supplant provisioning services. But, I do believe the model does aid in producing more consistent deployment artifacts over time. Testing a container locally is a reasonably reliable means of ensuring it should run in production. That said, containers require that many resources must be configured (network, disk, etc.) in order to work correctly. This is a non-trivial step and making it work in development, especially when you consider devs using tools like boot2docker, can be a difficult process. It can much easier to simply spin up a Vagrant VM with the necessary processes and be done with it. Fortunately, there tools like docker compose and docker machine that seem to be addressing this shortcoming.