Original post

It’s funny, people complain about YAML, like “Oh my god, YAMLs… So terrible.” And you have to step back for a second. I’m gonna just talk through my journey of this to get to where I get to the concept of infrastructure as data. If you have five Linux machines, the first thing you’re gonna try to do is write some Bash scripts. You’re gonna write some Bash scripts, some for loops, you’re gonna SSH things around…

[00:12:09.18] And if you’ve ever seen people write Bash scripts over time, you into their home directory and you see files like “Do this stuff 01. Do this stuff 002, but don’t use it anymore, because you should be using the other script.” [laughter] So you have no version controls, you have no semantics, no abstractions. You’re just writing Bash scripts.

Fast-forward to configuration management. We get things like CFEngine – big shout-out to Promise Theory – and then we get Puppet, Chef and Ansible, and then they formalize. It’s almost like the Ruby on Rails for shell scripting. So now we have this configuration management error and we all start to say “Infrastructure as code.” The problem is now you have to test it, people can write any code they want, it’s unbounded context, and it’s the same problems we had with software – how do you secure it? You’re gonna have bugs… it’s just all over the place. But it is a better place to be than we were before.

Now let’s get to infrastructure as configuration. Now we’re removing all of the abstractions into the runtime, so the how. How do you create a load balancer, what goes in the load balancer and how do you remove it? That implementation detail – we’re gonna have a lot of discipline and we’re gonna move it into these controllers. So if we’re talking about Kubernetes, these are gonna be the controllers. If you’ve been in cloud, you’ve already done this. We always have control planes that do the heavy-lifting. Same is true for networking – we expose ports and protocols, not the control plane.

So configuration as data – we get to something very similar. Now what you say is “I want a load balancer in this region, pointed to those services. No for loops, no if statements, no language concepts.” So all you have is a data model. And that data model represents everything that the state machine on the other side can do. Why is this more powerful? Well, when you’re working with data, then you can manipulate the data much easier than you can manipulate code.

We’ve seen this before, in the Go world – there’s 10,000 Hacker News posts, “Oh, the same thing, but written in Go.” I prefer Go as my favorite language. So every time we do things in a language-specific way, we end up having to rewrite this thing to be compatible with those libraries, and so forth. But when you move to infrastructure as data, we can have these high-level APIs. You can write them in JSON, you can write them in YAML, or if you’re an enterprise, you can go XML if that’s your thing…