Taking away control: The dumb programmer and the smart compiler

April 29, 2021

Good compilers prevent you from shooting yourself in the foot. Today, you have a wide assortment of programming languages to choose from, you can have your pick of paradigm. As a brief detour, the perspective of programmers comes full circle as they move from beginner to slightly experienced to intermediate.

The beginner is deathly afraid of the differences between programming languages. In the beginning, they stick to one language out of the fear of learning something new.

Once they become somewhat experienced, they start the proclamation that all languages are really the same, with the only differences being in the syntax. I don’t even know why this idea is all that prevalent, since this completely eschews the notion of some programming languages being better suited to certain ideas than others and puts forth a notion of closed-mindedness where you believe that you know all that you need to know to do something, and everything else is just an unnecessary optimization. Check out Steve Yegge’s blog post: “Being the averagest” for more exposition on this phenomenon.

The intermediate programmers, which is where my hierarchy of programmers cuts off, come around to realizing that each programming language is really a different set of ideas. Each programming language is more conducive to certain kinds of thinking.

How do programming languages expose you to new ideas? Ironically, a programming language opens you up to certain kinds of thinking by imposing constraints.

Each language imposes constraints that mold your thinking in certain ways. This goes beyond syntax, the language could be more hands-on with the bookkeeping the compiler does. Take Rust, for example. The language doesn’t allow you to have two mutable references to the same thing (ignoring the interior mutability pattern for now). The compiler enforces this constraint, and to satisfy this constraint, you have to architect your project in certain ways in order to satisfy that constraint. See? The language influenced the way you think about your project. Just syntax eh?

In Python, you have a choice. To stick to the style you know and love and to never venture to functional pastures. Or to actually go ahead and learn a new way of thinking. Many are content with the first option, but with a programming language like Haskell, there is no choice. It is much more opinionated, and some projects in it would leave you with a greater understanding of functional ideas which you may even apply when you’re doing projects in other languages.

The uniting notion behind imposing constraints in the above languages was to shape the way we write code. With assembly, with its gloriously unconstrained nature, you can do pretty much whatever you want. You just have to respect conventions when doing syscalls, but other than that? You can have one function calling convention in some places (using the registers) and something else (just the stack) in other places. Of course, this is obvious, since all programming languages eventually become machine code so that itself “proves” that assembly can support all paradigms of programming.

Deconstructing a programming language

The main components of a programming language that go into shaping the way you code, to be more concrete are:

The Compiler - This is a pillar of any programming language, and the properties of the compiler greatly inform the way you write code in that language. The compiler is the enforcer of constraints, and the really good ones make full use of the constraints imposed to help out the programmer. Consider Rust, for instance. The memory access constraint imposed by Rust is implemented by the compiler borrow checker, and to satisfy it, you need to rearchitect your application. Then there is Haskell, whose type system is so rich that people often intentionally leave things blank for the compiler to tell them the right type to use. This is an excellent case of the compiler making good use of the constraints imposed by the language to extract more information. Haskell’s compiler is really good at type inference, so a lot of the time you do not have to worry about explicit typing in Haskell.
The runtime - Depending on the programming language, it may have either a very bare minimum runtime (like C) or it may be more involved. The runtime in Go for instance manages threads for you. The runtime in Rust checks for out-of-bounds memory accesses or for multiple mutable references in case you’re using the interior mutability pattern. You may even have a garbage collector as part of the runtime. It depends on the runtime whether your language allows reflection.
The standard tooling - Other than the compiler, the other tools provided by the language become an important aspect of shaping your experience. Go has an official formatting tool, so you have uniformly formatted Go codebases. It also has a race detector. Another important tool may be the package management story.
The standard library - Gone are the days when standard libraries were supposed to be lean and mean. No more fumbling about with different thread libraries on different operating systems. Today, the defacto expectation is to have some sort of networking, threading, file I/O primitives being a part of the standard library.

The move towards more automatically managed resources

In the typical textbook, the typical hierarchy is that assembly is a low-level language, and C is a high-level language. But so is Python, and as is Haskell. The gulf between the lowest level high-level language and the highest level high-level language is massive enough to warrant more fine-grained categorization.

C is a jump-up from assembly. With C, you have the power of never having to worry about running out of variables. The compiler manages the stack for you, instead of whatever half-assed frankenstein register allocation pattern you came up with. Heck, the compiler goes ahead and does optimizations for you. Could the best assembly programmer who lives and breathes assembly and x86 calling conventions outperform the compiler? Who knows, maybe. Even if he does, what the C compiler does is to bring something within spitting distance of this guy’s performance to the masses, and instantly. Humans are error-prone, and assembly, with no constraints as such makes it extremely easy for you to have an error density that rivals the mass density of neutron stars.

Where does C falter? Dynamic memory. You manage your own memory. What’s the next step up? Either the language runtime manages memory for you (Aka garbage collection, like Java) or the compiler basically imposes constraints such that it is basically shepherding you back to the “right” way, aka Rust.

It is a similar case for threading. Like variables, programmers just love threads. They want to parallelize every problem they see. But of course, threads are expensive. So the nifty programmers came up with the idea of a thread pool, where you have a limited set of threads that actually do the work. Many programmers don’t use thread pools or aren’t aware of them. So this is where the power of the programming language gods to shape our code steps in. In Go, the runtime manages the threads. You can create as many Go threads as you want, it is the runtime that is managing the mapping with the OS threads. So in effect, every Go programmer has fast threads, because it is built into the language! The same goes for concurrency primitives.

For I/O, NodeJS is a good example. Asynchronous I/O isn’t exactly a new idea. Why all this hoopla about asynchronicity in NodeJS? Because the language and its runtime remove friction from doing asynchronous I/O. Most of the standard library and the packages on npm are oriented around asynchronous I/O, and the runtime takes care of the event loop that goes behind asynchronous I/O. So for a new programmer, the easiest thing for them to do would be to use these primitives to do asynchronous I/O. Other languages also support this paradigm but NodeJS makes the choice for you.

Let’s dive into the deep end of constraints. What does an extremely limited programming language look like? eBPF sounds like a good candidate. eBPF is one of the key kernel developments in recent years and intends to be a way to get quite a bit of the power of a kernel module (monitoring kernel data structures, network packets, observing programs, etc) while being completely safe. eBPF takes this idea of shaping the way you write code to the extreme. You can call a limited set of functions, you cannot have infinite loops, you cannot have uninitialized variables or access memory out of bounds. And your program must terminate within a set timeframe. Most of this stuff seems quite garden variety, but the most important constraint is that all execution paths of your program must be enumerable. You can’t have an exponential number of execution paths, for instance. So your cute program that enumerates all permutations of a string does not satisfy this constraint.

Why all this hassle? Well, the eBPF bytecode is to be run in kernel, in kernel mode. And it is supposed to be safe. These two conditions are orthogonal to one another, so the eBPF bytecode that you can write is extremely constrained and before running in kernel mode, it is first verified by a verifier. The verifier basically “runs” your code and checks all execution paths to see if all terminate in time or not and that no illegal memory accesses happen. Only once the verifier terminates in a reasonable amount of time does your eBPF code actually do something.

eBPF, by the constraints imposed by its build infrastructure, is said to be a safe way of writing code that needs to monitor stuff at the kernel level. This is because the end result of satisfying all those constraints is code that is safe to run in the kernel (unless some oversight in eBPF comes into the picture)

This article came about because I knew that with each programming language I used, I had a very different experience, but I couldn’t quite clearly discern why the programming language made me write code the way it did. Thus began many days of philosophical thought about what a programming language really is and a few observations about how newer languages are more opinionated and manage more things for you. Hopefully, some of it made sense :)

More things to check out: A tour of metaprogramming models for generics eBPF documentation WebAssembly Operating System that achieves safety while letting everything run in kernel mode*

* Note: After Spectre and Meltdown, the software-only operating system is no longer safe.