Original post

More than thirteen years have passed since Herb Sutter declared that
the free lunch is over and concurrency is upon us, and yet
it’s hard to claim that most mainstream languages have made a strong shift
towards concurrent modes of . We have to admit that is
just hard, and the struggles of some of the world’s leading programming
languages bear witness to this challenge.

Unfortunately, most languages didn’t yet move past the
threads vs. asynchronous dichotomy. You either use threads, or a
single-threaded event loop decorated with a bunch of bells and whistles
to make code more palatable. Mixing threads with event loops is possible, but so
complicated that few programmers can afford the mental burden for their
applications.

Threads aren’t a bad thing in languages that have good library support for them,
and their scalability is much better than it used to be a decade ago,
but for very high levels of concurrency (~100,000 threads and above) they are
still inadequate. On the other hand, event-driven programming models are usually
single-threaded and don’t make good use of the underlying HW. More offensively,
they significantly complicate the programming model.
I’ve enjoyed Bob Nystrom’s What Color is Your Function
for explaining how annoying the model of “non-blocking only here, please” is.
The core idea is that in the asynchronous model we have to mentally note the
blocking nature of every function, and this affects where we can call it from.

Python took the plunge with asyncio, which is so complicated that many
Python luminaries admit they don’t understand it, and of
it suffers from the “function color” problem as well, where any blocking call
can ruin your day. C++ seems to be going in a similar direction with the
coroutines proposal for C++20, but C++ has much less ability to hide magic from
users than Python, so I predict it will end up with a glorious soup of templates
that even fewer will be able to understand. The fundamental issue here is that
both Python and C++ try to solve this problem on a library level, when it really
needs a language runtime solution.

What does right

As you’ve probably guessed from this article’s title, this brings us to Go. I’m
happy to go on record claiming that Go is the mainstream language that gets
this really right. And it does so by relying on two key principles in its core
design:

  1. Seamless light-weight preemptive concurrency across cores
  2. CSP and
    sharing by communicating

These two principles are implemented very well in Go, and in unison make
concurrent programming in it the best experience, by far, compared to other
popular programming languages today. The main reason for this is that they are
both implemented in the language runtime, rather than being delegated to
libraries.

You can think of goroutines as threads, it’s a fairly good mental model. They
are truly cheap threads – because the Go runtime implements launching them and
switching between them without deferring to the OS kernel. In a recent post
I’ve measured goroutine switching time to be ~170 ns on my machine, 10x faster
than thread switching time.

But it’s not just the switching time; goroutines also have small stacks that can
grow at run-time (something thread stacks cannot do), which is also carefully
tuned to be able to run millions of goroutines simultaneously.

There’s no magic here; consider this claim – if threads in C++ or JS or Python
were extremely lightweight and fast, we wouldn’t need async models. Well, this
is the case with Go. As Bob Nystrom says in his post – Go has eliminated the
distinction between synchronous and asynchronous code
.

That’s not all, however. The second principle is critical too. The main
objections to threads are not just about performance, there’s also correctness
issues and complexity. Programming with threads is hard – it’s hard to synchronize
access to data structures without causing deadlocks; it’s hard to reason about
multiple threads accessing the same data, it’s hard to choose the right locking
granularity, etc.

And this is where Go’s sharing by communicating principle comes in. In
idiomatic Go programs you won’t see a lot of mutexes, condition variables and
critical areas protecting shared data. In fact, you probably won’t see much
locking at all. This is because Go encourages programmers to use channels
instead, and channels are built into the language, with awesome features like
select, and so on. Proper use of channels removes the need for more explicit
locking, is easier to write correctly, tune for performance, and debug.

Moreover, building these capabilities into the runtime, Go can implement great
tools like the race detector, which make concurrency
bugs easier to smoke out. It all just fits together so nicely! Obviously many
challenges of concurrent programming remain in Go – these are the essential
complexities of the problem that no language is likely to remove; Go does a
great job at removing the incidental complexities, though.

For these reasons, I believe Ryan Dahl – the creator of Node.js – is absolutely
right when he says:

[…] if you’re building a server, I can’t imagine using anything other than
Go. […] I think Node is not the best system to build a massive
server web. I would use Go for that. And honestly, that’s the reason why I
left Node. It was the realization that: oh, actually, this is not the best
server-side system ever.

Different languages are good for different things, which is why programmers
should have several sufficiently different languages in their arsenal.
If concurrency is central to your application, Go is the language to use.