Rust is for Professionals

April 13, 2021 at 08:20 AM | categories: Programming, Rust

A professional programmer delivers value through the authoring and maintaining of software that solves problems. (There are other important ways for professional programmers to deliver value but this post is about programming.)

Programmers rely on various tools to author software. Arguably the most important and consequential choice of tool is the programming language.

In this post, I will articulate why I believe Rust is a highly compelling choice of a programming language for software professionals. I will state my case that Rust disposes software to a lower defect rate, reduces total development and deployment costs, and is exceptionally satisfying to use. In short, I hope to convince you to learn and deploy Rust.

My Background and Disclaimers

Before I go too far, I'm targeting this post towards professional programmers - people who program (or support programming through roles like management) as their primary line of work or who spend sufficient time programming outside of work. I consider myself a professional programmer both because I am a full-time engineer in the software industry and because I contribute to some significant open source projects outside of my day job.

The statement Rust is for Professionals does not imply any logical variant thereof. e.g. I am not implying Rust is not for non-professionals. Rather, the subject/thesis merely defines the audience I want to speak to: people who spend a lot of time authoring, maintaining, and supporting software and are invested in its longer-term outcomes.

I think opinion pieces about programming languages benefit from knowing the author's experience with programming. I first started hacking on code in the late 1990's. I've been a full-time software developer since 2007 after graduating with a degree in Computer Engineering (after an aborted attempt at Biomedical Engineering - hence my affinities for hardware and biological sciences). I've programmed in the following languages: C, C++ (only until C++11), C#, Erlang, Go, JavaScript, Java, Lua, Perl, PHP, Python, Ruby, Rust, shell, SQL, and Verilog. Notably missing from this list is a Lisp and a Haskell/Scala type language. Of these languages, I've spent the most time with C, C#, JavaScript, Perl, PHP, Python, and Rust.

I'm not that strong in computer science or language theory: many colleagues can talk circles around me when it comes to describing computer science and programming language concepts like algorithms, type theory, and common terms used to describe languages. (I have failed many technical interviews because of my limitations here.) In contrast, I perceive my technical strengths as applying an engineering rigor and practicality to problem solving. I care vastly more about how/why things work the way they do and the practical consequences of decisions/choices we make when it comes to software. I find that I tend to think about 2nd and 3rd order effects and broader or longer-term consequences more often than others. Some would call this systems engineering.

I've programmed all kinds of different software. Backend web services, desktop applications, web sites, Firefox browser internals, the Mercurial version control tool, build systems, system/machine management. Notably missing are mobile programming (e.g. iOS/Android) and serious embedded systems (I've hacked around with Raspberry Pis and Arduinos, but those seem very friendly compared to other embedded devices). My strongest affinity is probably towards systems software and general purpose tools: I enjoy building software that other people use to build things. Infrastructure if you will.

Finally, I am expressing my personal opinion in this post. I do not speak for any employer, present or former. While I would love to see more Rust at my current employer, this post is not an attempt to influence what happens behind my employer's walls: there a better ways to conduct successful nemawashi / 根回し than a public blog post. I am not affiliated with the Rust Project in any capacity beyond a very infrequent code contributor and issue filer: I view myself as a normal Rust user. I did work at Mozilla - the company who bankrolled most of Rust's initial development. I even briefly worked in the same small Vancouver office as Graydon Hoare, Rust's primary credited inventor! While I was keen for Rust to succeed because it was affiliated with my then employer, I was most definitely not a Rust evangelist or fan boy while at Mozilla. I have little to personally benefit from this post: I'm writing it because I enjoy writing and I believe the message is important.

With that out of the way, let's talk about Rust!

Rust Makes Me Irrationally Giddy

When I look back at my professional self when I was in my 20s, I feel like I was young and dumb and overly exuberant about computers, technology, new software, and the like. An older, more grizzled professional, I now accept the reality that it is a miracle computers and software work as well as they do as often as they do. Point at any common task on a computer and an iceberg of complexity and nuance lingers under the surface. Our industry is abound in the repetition of proven sub-optimal ideas. You see practices cargo culted across the decades (like the 80 character terminal/line width and null-terminated strings, which can both be traced back to Hollerith punchcards from the late 19th century). You witness cycles of pendulum swings, the same fads and trends, just with different labels (microservices are the new SOA, YAML is the new XML, etc). I can definitely relate to people in this industry who want to drop everything and move to a farm or something (but I grew up in Indiana and had cows living down the street, so I know this lifestyle isn't for me).

Rust is the first programming language I've encountered in years that makes me excited. And not just normal excited: irrationally excited. Like the kind of excitement you have for something when you are naive about its limitations and don't know any better (like many blockchain/cryptocurrency advocates). I feel like the discovery of Rust is transporting me back to my younger self, before I discovered the ugly realities of how computers and software work, and is giving me hope that better tools, better ways of building software could actually exist. To channel my inner Marie Kondo: Rust sparks joy.

When I started learning Rust in earnest in 2018, I thought this was a fluke. It is just the butterflies you get when you think you fall in love, I told myself. Give it time: your irrational excitement will fade. But after using Rust for ~2.5 years now, my positive feelings about it have only grown stronger. There's a reason Rust has claimed the top spot in Stack Overflow's most loved languages survey for 5 years and running. And not by the skin of its teeth: Rust is blowing the competition out of the water. 19% over TypeScript and Python. 23% over Kotlin and Go. If this were a Forrester report for a company-offered product, Rust would be the clear market leader and marketers and salespeople would be using this result to sign up new customers in droves and print money hand over fist.

Let me tell you why Rust excites me.

Rust is Different (In a Good Way)

After you've learned enough programming languages, you start to see common patterns. Manual versus garbage collected memory management. Control flow primitives like if, else, do, while, for, unless. Nullable types. Variable declaration syntax. The list goes on.

To me, Rust introduced a number of new concepts, like match for control flow, enums as algebraic types, the borrow checker, the Option and Result types/enums and more. There were also behaviors of Rust that were different from languages I knew: variables are immutable by default, Result types must be checked they aren't an error to avoid a compiler warning, refusing to compile if there are detectable memory access issues, and tons more.

Many of the new concepts weren't novel to Rust. But considering I've had exposure to many popular programming languages, the fact many were new to me means these aren't common features in mainstream languages. Learning Rust felt like fresh air to me: here was a language designed to be general purpose and make inroads into industry adoption while also willing to buck many of the trends of conventional language design from the last several decades.

When going against conventional practice, it is very easy to unintentionally alienate yourself from potential users. Design a programming language too unlike anything in common use and you are going to have a difficult time attracting users. This is a problem with many academic/opinionated programming languages (or so I hear). Rust does venture away from the tried and popular. And that does contribute to a steeper learning curve. However, there is enough familiarity in Rust's core language to give you a foothold when learning Rust. (And Rust's official learning resources are terrific.)

I feel like Rust's language designers set out to take a first principles approach to the language using modern ideas and ignoring old, disproven ones, realized they needed to ground the language in familiarity to achieve market penetration, and produced reasonable compromises to yield something that was new and novel but familiar enough to not completely alienate its large potential user base.

If you don't like being exposed to new ideas and ways of working, Rust's approach is probably a negative to you. But if you are like me and enjoy continuously expanding your knowledge and testing new ideas, Rust's novelty and willingness to be different is a much welcomed attribute.

Rust: Toolbox Included

It used to be that programming languages were just compilers or interpreters. In recent years, we've seen more and more programming languages bundled with other tools, such as build/packaging tools, code formatters, linters, documentation generators, language servers, centralized package repositories, and more.

I'm not sure what spurred this trend (maybe it was Go?), but I think it is a good move. Programming languages are ecosystems and the compiler/interpreter is just one part of a complex system. If you care about end-user experience and adoption (especially if you are a new language), you want an as turnkey on-boarding experience as possible. I think that's easier to pull off when you offer a cohesive, multi-tool strategy to attract and retain users.

We refer to programming languages with a comprehensive standard library as batteries included. I'm going to refer to programming languages with additional included tools beyond the compiler/interpreter as toolbox included.

Rust, is very much a toolbox included language. (Unless you are installing it via your Linux distribution: in that case Linux packagers have likely unbundled all the tools into separate packages, making the experience a bit more end-user hostile, as Linux packagers tend to do for reasons that merit their own blog post. If you want to experience Rust the way its maintainers intended - the Director's Cut if you will - install Rust via rustup.)

In addition to the Rust compiler (rustc) and the Rust standard library, the following components are all officially developed and offered as part of the Rust programming language on GitHub:

  • Cargo - Rust's package manager and build system.
  • Clippy - A Rust linter.
  • rustdoc - Documentation generator for Rust projects.
  • rustfmt - A Rust code formatter.
  • rls - A Rust Language Server Protocol implementation.
  • crates.io - Rust's official, public package registry.
  • rustup - Previously mentioned Rust installer.
  • vscode-rust - Visual Studio Code extension adding support for Rust. (JetBrains has their own high quality extension for their IDEs, which they develop themselves.)
  • The Rust Programming Language Book
  • And many more.

As an end-user, having all these tools and resources at my fingertips, maintained by the official Rust project is an absolute joy.

For the local tools, rustup ensures they are upgraded as a group, so I don't have to worry about managing them. I periodically run rustup update to ensure my Rust toolbox is up-to-date and that's all I have to do.

Contrast with say Node.js, Python, and Ruby, where the package manager is on a separate release cadence from the core language and I have to think about managing multiple tools. (Rust will likely have to cross this bridge once there are multiple implementations of Rust or multiple popular package managers. But until then, things are very simple.)

Further contrast with languages like JavaScript/Node.js, Python, and Ruby, where tools like a code formatter, linter, and documentation generator aren't always developed under the core project umbrella. As an end-user, you have to know to seek out these additional value-add tools. Furthermore, you have to know which ones to use and how to configure them. The fragmentation also tends to yield varying levels of quality and end-user experience, to the detriment of end-users. The Rust toolbox, by contrast, feels simple and polished.

Rust's toolbox included approach enables me to follow unified practices (arguably best practices) while expending minimal effort. As a result, the following tend to be very similar across nearly every Rust project you'll run into:

  • Code formatting. (Nearly everyone uses rustfmt.)
  • Adherence to common coding and style conventions. (Nearly everyone uses clippy.)
  • Project documentation. (Nearly everyone uses rustdoc.)

Cargo could warrant its own dedicated section. But I'll briefly touch on it here.

Cargo is Rust's official package manager and build system. With cargo, you can:

  • Create new Rust projects with a common project layout.
  • Build projects.
  • Run project tests.
  • Update project dependencies.
  • Generate project documentation (via rustdoc).
  • Install other Rust projects from source.
  • Publish packages to Rust package registries.

As a build system, Cargo is generally a breeze to work with. Configuration files are TOML. Adding dependencies is often a 1 line addition to a Cargo.toml file. Dependencies often just work on the first try. It's not like say C/C++, where taking on a new dependency can easily consume a day or two to get it integrated in your build system and compatible with your source code base. I can't emphasize enough how much joy it brings to be able to leverage an it just works build tool for systems-level programming: I'm finding myself doing things in Rust like parsing ELF, PE, and Mach-O binaries because it is so easy to integrate low-level functionality like this into any Rust program. Cargo is boring. And when it comes to build systems, that's a massive compliment!

No other language I've used has as comprehensive and powerful of a toolbox as Rust does. This toolbox is highly leveraged by the Rust community, resulting is remarkable consistency across projects. This consistency makes it easier to understand, use, and contribute back to other Rust projects. Contrast this with say C/C++, where large code bases often employ multiple tools in the same space on different parts of the same code base, leading to cognitive dissonance and overhead.

As a professional programmer, Rust's powerful and friendly toolbox enables me to build Rust software more easily than with other languages. I spend less time wrangling tools and more time coding. That translates to less overhead delivering value through software. Other languages would be wise to emulate aspects of Rust's model.

Rust is Humane

Of all the programming languages I've used, Rust seems to empathize with its users the most.

There's a few facets to this.

A lot of care seems to have gone into the end-user experience of the Rust toolbox.

The Rust compiler often gives extremely actionable error and warning messages. If something is wrong, it tells me why it is wrong, often pointing out exactly where in source code the problem resides, drawing carets to the source code where things went wrong. In many cases, the compiler will emit a suggested fix, which I can incorporate automatically by pressing a few keys in my IDE. Contrast this with C/C++ and even Go, which tend to have either too-terse-to-be-actionable or too-verbose-to-make-sense-of feedback. By comparison, output from other compilers often comes across as condescending, as if they are saying git gud, idiot. Rust's compiler output tends to come across as I'm sorry you had a problem: how can I help? I feel like the compiler actually cares about my [valuable] time and satisfaction. It wants to keep me in flow.

Then there's Clippy, a Rust linter maintained as part of the Rust project.

One thing I love about Clippy is - like the compiler - many of the lints contain suggestions, which I can incorporate automatically through my IDE. So many other linters just tell you what is wrong and don't seem to go the extra mile to be respectful of my time by offering to fix it for me.

Another aspect of Clippy I love is it is like having an invisible Rust mentor continuously providing constructive feedback to help me level-up my Rust. I don't know how many times I've written Rust code similarly to how I would write code in other languages and Clippy suggests a more Rustic solution. Most of the time I'm like oh, I didn't know about that: that's a much better pattern/solution than what I wrote!

Do I agree with Clippy all the time? Nope. But I do find its signal to noise ratio is exceptionally high compared to other linters I've used. And Clippy is trivial to configure and override, so disagreements are easy to manage. Like the Rust compiler, I feel that Clippy is respectful of my time and has the long term maintainability and correctness of my software at heart.

Then there's the Rust Community - the people behind the core Rust projects. The Rust Community is one of the most professional and welcoming I've seen. Their Code of Conduct is sufficiently comprehensive and actionable. They have their vigorous debates like any other community. But the conversation is civil. Bad apples are discarded when they crop up.

At a talk I made about PyOxidizer at a Rust meetup a few years back, I made a comment in passing about a negative comment I encountered on a Rust sub-Reddit. After the talk, a moderator of that sub who was in the audience (unbeknownst to me) approached for more information so they could investigate, which they did.

I once tweeted about a somewhat confusing, not-very-actionable compiler error I encountered. A few minutes later, some compiler developers were conversing in replies. A few hours later, a pull request was created and a much better error message was merged in short order. I'm not a special one-off here either: I've stumbled across Stack Overflow questions and other forums where Rust core developers see that someone is encountering a confusing issue, question the process that got them to that point, and then make refinements to minimize it from happening in the future. The practice is very similar to what empathetic product managers and user experience designers do.

Not many other communities (or companies for that matter) seem to demonstrate such a high level of compassion and empathy for their users. To be honest, I'm not sure how Rust manages to pull it off, as this tends to be very expensive in terms of people time and it can be very easy to not prioritize. One thing is for certain: the Rust Community is loaded with empathetic people who care about the well-being of users of their products. And it shows from the interaction in forums to the software tools they produce. To everyone who has contributed in the Rust Community: thank you for all that you have done and for setting an example for the rest of us to live up to.

Rust is Surprisingly High Level

One of the reasons I avoided learning Rust for years is that I perceived it was too low level and therefore tedious. Rust was being advertised as a systems programming language and you would hear stories of fighting the borrow checker. I assumed I'd need to be thinking a lot about memory and ownership. I assumed the cost to author and maintain Rust code would be high. I thought Rust would be a safer C/C++, with many of the software development lifecycle caveats that apply. And for the software I was writing at a time, the value proposition of Rust seemed weak. I thought a combination of C and say Python was good enough. When I started writing PyOxidizer, I initially thought only the run-time code calling into the Python interpreter C APIs would be written in Rust and the rest would be Python.

How wrong I was!

When I actually started coding Rust, I was shocked at how high-level it felt. Now, depending on the space of your software, Rust code can be very low-level and tedious (not unlike C/C++). However, for the vast majority of code I author, Rust feels more like Python than C. And even the lower-level code feels much higher level than C or even C++.

In my mind, the expressiveness of Rust comes very close to higher-level, dynamic languages (like JavaScript, Python, and Ruby) while maintaining the raw speed of C/C++ all without sacrificing low-level control for cases when you need it. And it does all of this while maintaining strong safety guarantees (unlike say Go, which has the billion dollar mistake: null references).

I had a mental Venn diagram of the properties of programming languages (gc versus non-gc, static versus dynamic typing, compiled versus interpreted, etc) and which traits (like execution speed, development time, etc) would be possible and Rust invalidated large parts of that model!

You often don't need to think about memory management in Rust: once you understand the rules the borrow checker enforces, memory is largely something that exists but is managed for you by the language, just like in garbage collected languages. Of course there are scenarios where you should absolutely be thinking about memory and should have a grasp on what Rust is doing under the hood. But in my experience, most code can be blissfully ignorant of what is actually happening at the memory level. (However, awareness of value ownership when programming Rust does add overhead, so it's not like the cognitive load required for reasoning about memory disappears completely.)

Rust has both a stack and a heap. But when programming you often don't need to distinguish these locations. You can do things in Rust like return a reference to a stack allocated value and pass this reference around to other functions. This would be a CVE factory in C/C++. But because of Rust's borrow checker, this is safe (and a common practice) in Rust. It also predisposes the code towards better performance! Often in C/C++ you will allocate on the heap because you need to return a reference to memory and returning a reference to a stack allocated value is extremely dangerous. This heap allocation incurs run-time overhead. So Rust allowing you to do the fast thing safely is a nice mini win.

In many statically typed languages, I feel like my programming speed is substantially reduced by having to repeatedly spell out or think about type names. In C, it feels like I'm always writing type names so I can perform casting. Newer versions of C++ and Java have improved matters significantly (e.g. the auto keyword). However, I haven't programmed them enough recently to know how they compare to Rust on this front. All I know is that I'm writing type names a lot less frequently in Rust than I thought I would be and that my programming output isn't limited by my typing speed as much as it historically was in C/C++.

Despite being compiled down to assembly and exposing extremely low-level control, Rust often feels like a higher-level language. Equivalent functionality in Rust is often more concise and/or readable than in C/C++, while performing similarly, all while having substantially stronger safety guarantees. As a professional programmer, the value proposition is blinding: Rust enables me to do more with less, achieve a lower defect rate, and not sacrifice on performance.

Correctness, Quality, Execution Speed, and Development Velocity: Pick 4

The operation of computers and operating systems is exceptionally complex.

All programming languages justifiably attempt to abstract away aspects of this complexity to make it easier to deliver value through software. For example:

  • Assembly is hard: here's a higher level language that compiles down to assembly or is implemented in a language that does.
  • Managing memory manually is hard: use garbage collection.
  • Concurrency is hard: only allow 1 thread to run at a time (JavaScript, Python, etc).
  • Text encoding is hard: strings are Unicode/UTF-8.
  • Operating systems have different interfaces: here's a pile of abstractions in the standard library for things like I/O, networking, filesystem paths, etc.
  • Strong, static typing isn't very flexible and can impose high change costs: use dynamic typing.
  • And tons more.

These abstractions often have undesirable consequences/trade-offs:

  • Garbage collection adds run-time overhead (10% is a number that's commonly cited).
  • Garbage collection adds random slowdowns/pauses, making it difficult to achieve consistency in long-tail latency optimization (i.e. ensuring consistency in P99.9, P99.99, and beyond percentiles).
  • Interpreted languages tend to be slower than compiled languages unless you invest lots of time into a JIT.
  • Limiting execution to a single thread limits the ability to harness the full power of modern CPUs, which tend to have several cores.
  • Primitives like environment variables, process arguments, and filenames aren't guaranteed to be UTF-8 and coercing them to UTF-8 can be lossy.
  • Dynamic typing doesn't catch as many bugs at compile time and you have to be more diligent about guarding against invariants.
  • And tons more.

In other words, there are trade-offs with nearly every decision in programming language and [standard] library design. There are usually no obviously correct and undesirable consequence-free decisions.

And we further have to consider the fallibility of people and the inevitability that mistakes will be made, that bugs and regressions will be introduced and will need addressing. As an industry, we generally accept that mistakes occur and bugs are an unavoidable aspect of software development. If new features and enhancements are value, bugs and defects are anti-value. Like financial debt, existence of bugs and sub-optimal code can be tolerated to varying extents. But this is a highly nuanced topic and different people, companies, and projects will have different perspectives on it. We can all agree that bugs are an inevitable fact of software.

We also need to confront the reality that as an industry we have very little empirical data that says much of significance about topics like static versus dynamic typing. Although we do know some things. As Alex Gaynor informs us in What science can tell us about C and C++'s security, the result of ~2/3 of security vulnerabilities being caused by memory unsafety seems to reproduce against a sufficiently diverse set of projects and companies. That result and the implications of it are worth paying attention to!

With that being said, let's dive into my take on the matter.

Of all the programming languages I've used, I feel that Rust has the strongest disposition towards authoring and maintaining correct, high-quality software. It does this by offering a myriad of features that are designed to prevent (or at least minimize) defects. In addition, I believe Rust shifts the detection of defects to earlier in the software development lifecycle, greatly reducing the cost to mitigate defects and therefore develop software.

(As an aside, every time the topic of Rust's safety and correctness comes up, random people on the Internet rush to their keyboards to say things along the lines of C/C++ and other languages can be made to be just as safe as Rust: it's the bad programmers who are using C/C++ wrong. To those people: please stop. Your belief implies the infallibility of people and machines and that mistakes won't be made. If things like memory unsafety bugs in C/C++ could be prevented, industry titans like Apple, Google, and Microsoft would have found a way. These companies are likely taking many more measures to prevent security vulnerabilities than you are and yet the ~2/3 of security vulnerabilities being caused by memory unsafety (read: humans and machines failing to reason about run-time behavior) result still occurs. To the wiser among us, I urge you to call out perpetrators of this good programmers don't create bugs myth when you see it, just like you would/should if you encounter racist, sexist, or other non-inclusive behaviors. The reason is that belief in this myth can lead to physical or emotional harm, just like non-inclusive -isms. Security bugs, for example, can lead to disclosure of private or sensitive data, which can result in real world harm. Think a stalker or abusive former partner learning where you now live. Or a memory unsafety error in a medical device leading to device malfunction, injuring or killing a patient. Because this is a sensitive topic, I want to be clear that I'm not trying to compare the relative harms incurred by racism, sexism, other -isms, or the mythical perfect programmer. Rather, all I'm saying is each of these surpass the minimum threshold of harm incurred that justifies calling out and stopping the harmful behavior. I believe that as professionals we have an ethical and professional obligation to actively squash the mythical perfect programmer fallacy when we encounter it. Debates on the merits and limits of tools to prevent/find defects is fine: belief in the perfect programmer is not. Sorry for the mini rant: I just get upset by people who think software exists in a vacuum and doesn't have real-world implications for people's safety.)

In the sections below, I'll outline some of Rust's features and behaviors that support my assertion that Rust is biased towards correct and higher quality outcomes and lowers total development cost.

The Borrow Checker

To the uninitiated, the borrow checker is perhaps Rust's most novel contribution to programming. It is a compile time mechanism that enforces various rules about how Rust code must behave. Think of these as laws that Rust code must obey. But these are more like societal laws, not scientific laws (which are irrefutable), as Rust's laws can be broken, often leading to negative consequences, just like societal laws.

Rust's ownership rules are as follows:

  • Each value in Rust has a variable that's called its owner.
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value will be dropped / released.

Then there are rules about references (think intelligent pointers) to owned values:

  • At any given time, you can have either one mutable reference or any number of immutable references.
  • References must always be valid.

Put together, these rules say:

  • There is only a single canonical owner of any given value at any given time. The owner automatically releases/frees the value when it is no longer needed (just like a garbage collected language does when the reference count goes to 0).
  • If there are references to an owned value, that reference must be valid (the owned value hasn't been dropped/released) and you can only have either multiple readers or a single writer (not e.g. a reader and a writer).

The implications of these rules on the behavior of Rust code are significant:

  • Use after free isn't something you have to worry about because references can't point to dropped/released values.
  • Buffer underruns, overflows, and other illegal memory access can't exist because references must be valid and point to an owned value / memory range.
  • Memory level data races are prevented because the single writer or multiple readers rule prevents concurrent reading and writing. (An assertion here is any guards - like locks and mutexes - have appropriate barriers/fences in place to ensure correct behavior in multi-threaded contexts. The ones in the standard library should.)

I used to think that these rules limited the behavior of Rust code. That statement is true. However, as I've thought about it more, I've refined my take to be that ownership and reference rules reinforce properties that well-behaved software exhibits.

If a C/C++ program had illegal memory access, you would say it is buggy and the behavior is not correct. If a Java program attempted to mutate a value on thread A without a lock or other synchronization primitive and thread B raced to read it, leading to data inconsistency, you would also call that a bug and incorrect behavior. If a JavaScript/Python/Ruby function were changed such that it started mutating a value that should be constant, you would call that a bug and incorrect behavior.

While Rust's ownership and reference rules do limit what software can do, the functionality they are limiting is often unsafe or buggy, so losing this functionality is often desirable from a quality and correctness standpoint. Put another way, Rust's borrow checker eliminates entire classes of [common] bugs by preventing patterns that lead to incorrect, buggy behavior.

This. Is. Huge.

Rust's borrow checker catches bugs for which other languages have no automated mechanism or no low cost, low latency mechanism for detecting. There are ways to achieve aspects of what the borrow checker does in other languages. But they tend to require contorting your coding style to accomplish and/or employing high cost tools (often running asynchronously to the compiler) such as {address, memory, thread} sanitizers or fuzzing. With Rust, you get this bug detection built into the language and compiler: no additional tools needed. (I'm not saying you shouldn't run additional tools like sanitizers or fuzz testing against Rust: just that you get a significant benefit of these tools for a drastically lower cost since they are built in to the core language.)

Rust's ownership and reference rules help ensure your software is more well-behaved and bug-free. But, sometimes those rules are too strict. Fortunately, Rust isn't dogmatic about enforcing them. There are legitimate cases where you can't work in the confines of these rules.

Say you want to share a cache between multiple threads. Caches need to be both readable and writable by multiple threads. This violates the reference rules and maybe the single owner ownership rule, depending on how things are implemented. Fortunately, there are primitives in the std::sync module like RwLock and Arc (atomically reference counted) you can use here. Arc (and its non-threadsafe Rc counterpart) give you reference counting, just like a garbage collected language. Primitives like RwLock allow you to wrap an inner value and temporarily acquire an appropriate reference to it, mutable or non-mutable. There's a bit of slight of hand here, but the tricks employed enable you to satisfy the ownership and reference rules and use common programming techniques and patterns while still having the safety and correctness protections the borrow checker enforces.

Data Races: What Data Races?

Multi-threaded and concurrent programming is hard. Really hard. Like it is exceptionally easy to introduce hard-to-diagnose-and-debug bugs hard.

There are many reasons for this. We can all probably relate to the fact that reasoning about multi-threaded code is just hard: instead of 1 call stack to reason about there are N. Further complicating matters are that many of us don't have a firm grasp on how memory works at a very low level. Do you know all the ins and outs on how CPU caches work on the architecture you are targeting? Me neither! (But this is a very good place to start excavating a rabbit hole.)

If you are like me, you've spent many years of your professional career not having to care about multi-threading or concurrent programming because you spend so much time in languages with single threads, are only implementing code that runs in single threaded contexts, or you've recognized the reality that implementing this code safely and correctly is hard and you've intentionally avoided the space or chosen software architectures (like queue-based message passing) to minimize risks. Or maybe if you are say a Java programmer you sprinkle synchronized everywhere out of precaution or in response to race conditions / bugs once they are found. (Everyone's personal experience is different, of course.)

Long story short, the aforementioned ownership and reference rules enforced by the borrow checker eliminate data races. This was a major oh wow moment for me when I learned Rust: I had heard about memory safety but didn't realize the same forces behind it were also responsible for making concurrency safe!

This property is referred to as fearless concurrency. I encourage you to read Aaron Turon's Fearless Concurrency as well as the Fearless Concurrency chapter in the Rust Book as well.

Operating Systems Abstractions Ground in Reality

Rust is the only programming language I've used that attempts to expose operating system primitives like environment variables, command arguments, and filesystem paths and doesn't completely mess it up. Truth be told, this is kind of a niche topic. But as I help maintain a version control tool which needs to care about preserving content identically across systems, this topic is near and dear to my heart.

In POSIX land, primitives like environment variables, command arguments, and filesystem paths are char*, or a bag of null-terminated bytes.

On Windows, these primitives are wchar_t*, or wide bytes.

On both POSIX and Windows, the encoding of the raw bytes can be... complicated.

Nearly every programming language / standard library in existence attempts to normalize these values to its native string type, which is typically Unicode or UTF-8. That's doable and correct a lot of the time. Until it isn't.

Rust, by contrast, has standard library APIs like std::env::vars() that will coerce operating system values to Rust's UTF-8 backed String type. But Rust also exposes the OsString type, which represents operating system native strings. And there are function variants like std::env::vars_os() to access the raw values instead of the UTF-8 normalized ones.

Rust paths internally are stored as OsString, as that as the value passed to the C API to perform filesystem I/O. However, you can coerce paths to String easily enough or define paths in terms of String without jumping through hoops.

The point I'm trying to get across is that Rust's abstractions are ground in the reality of how computers work. Given the choice, Rust will rarely sacrifice the ability to do something correctly. In cases like operating system interop, Rust gives you the choice of convenience or correctness, rather than forcing inconvenience or incorrectness on you, like nearly every other language.

Encoding and Enforcing Invariants in the Type System

Rust enums are algebraic data types. Rust enum variants can have values associated with them and Rust enums, like structs (Rust's main way to define a type), can have functions/methods hung off of them. Rust enums are effectively fully-featured, specialized types, where value instances must be a certain variant of that enum. This makes Rust enums much more powerful than in other languages where enums simply map to integer values and/or can't have associated functions. This power unlocks a lot of possibility and harnessed the right way can drastically improve correctness of code and lead to fewer defects.

Programming inevitably needs to deal with invariants, the various possibilities that can occur. Programmers will reach for control flow operators to handle these: if x do this, else if y do that, switch statements, and the like. Handling every possible invariant can be complex, especially as software evolves over time and the ground beneath you is constantly shifting.

As you become more familiar with Rust, you'll find yourself encoding and enforcing invariants in the type system more and more. And enums are likely the main way you accomplish this.

Let's start with a contrived example. In C/C++, if you had a function that accepted either an Apple or an Orange value, you might do something like: void eat(Apple* apple, Orange* orange). Then you'd have inline logic like if apple != null. In a dynamically typed language, you could pass a single argument, but you'd perform inline type comparison. e.g. with Python you'd write if isinstance(fruit, Apple).

With Rust, you'd declare and use an enum. e.g.

struct Apple {}
struct Orange {}

enum Fruit {
    Apple(Apple),
    Orange(Orange),
}

impl Fruit {
    fn eat(&self) {
        match self {
            Self::Apple(apple) => { ... },
            Self::Orange(orange) => { ... },
        }
    }
}

let apple = Fruit::Apple(Apple { });
apple.eat();

This (again contrived) example shows how we Rust enum variants can hold inner values, how we can define methods on Rust enums (so they behave like regular types), and introduces the match control flow operator.

Quickly, match is a super powerful operator. It will compare its argument against provided patterns and evaluate the arm that matches. Patterns must be comprehensive or the compiler will error. In the case of enums, if you add a variant - say Banana for our Fruit example - and fail to add that variant to existing match expressions, you will get compiler errors!

As you become more proficient with Rust, you'll find yourself moving lots of (often redundant) control flow expressions and conditional dispatch (if X do this, if Y do that) into enum variants and encoding the dispatched actions into that enum/type directly. Conceptually, this is logically little different from having a base type or interface or by having a single wrapper class holds various possible values. But the guarantees are stronger because each distinct possibility is strongly defined as an enum variant. And when combined with the match control flow operator, you can have the Rust compiler verify that all variants are accounted for every time you take conditional action based on the variant.

The 2 most common enums in Rust are Option and Result. The following sections will explain how they work and further demonstrate how invariants can be encoded and enforced in Rust's type system.

Option: A Better Way to Handle Nullability

Many programming languages have the concept of nullable types: the ability for a value to be null or some null-like value. You will often find this expressed in languages as null, nil, None, or some variant thereof.

When programming in these languages, nullable values must be accounted or it could lead to errors. Languages like C/C++ and Go will attempt to to resolve the address behind null/nil, leading to at least a program crash and possibly a security vulnerability. Languages like Java and Python will raise exceptions (NullPointerException in Java - frequently abbreviated NPE because it is so common - and likely TypeError in Python).

The prevalence of failure to account for nullable values is a major reason why null references were coined by their inventor as the billion dollar mistake. (I suspect the real world value is much greater than $1B.)

Having an easy-to-ignore nullable invariant lingering in type systems seems like a massive foot gun to me. And indeed every programmer with sufficient experience has likely introduced a bug due to failure to account for null. I sure have!

Rust doesn't have a null value. Therefore no null references and no billion dollar mistake. Instead, Rust's standard library has Option, an enum representing nullable types / values. And Option is vastly superior to null values.

Option<T> is an enum with 2 variants, Some(T) or None: an instance of some type or nothing. What makes Option different from languages with null references is you have to explicitly ask for the inner value: there is no automatic dereference. Rust forces you to confront the reality that a value is nullable and by doing so can drastically reduce a very common bug class. I say drastically reduce instead of eliminate because it is still possible to shoot yourself in the foot. For example, you can call Option.unwrap() to obtain the inner value, triggering a panic if the None variant is present. Despite the potential for programming errors, this solution is strictly better than null references because Option forces you to confront the reality of nullability and use of the dangerous access mechanisms is relatively easy to audit for. (Clippy has some lints to encourage best practices here.)

The existence of Option<T> means that if you are operating on a non-Option value, that value is guaranteed to exist and not be null. If you are operating on Option, the fact it is optional is explicitly encoded in the type and you know you need to account for it. If the value passed into a function was once always defined and a later refactor changed it to be optional (or vice versa), that semantic change is reflected in the type system and Rust forces you to confront the implications when that change is made, not after it was deployed to production and you started seeing segfaults, NPEs, and the like.

After using Rust's Option<T> to express nullability, you will look at every other language with null references and bemoan how primitive and unsafe it feels by comparison. You will yearn for Rust's safer approach biasing towards correctness and higher quality software. Option<T> is massive feature for the professional programmer who cares about these traits.

Result: A Better Way to Handle Errors

Different programming languages have different ways of handling errors. Returning integers or booleans to express success or failure is common. As is throwing and trapping/catching exceptions.

Like nullability, history has shown us that programmers often fail to handle error invariants, with bugs of varying severity ensuing. Even Linux filesystems fail to handle errors!

I argue that the traditional programming patterns we use to handle errors bias towards buggy outcomes, especially with the return an integer/error value approach. It is easy to forget to check the return value of a function. In C/C++, maybe a function once returned nothing (void) and was later refactored to return an integer error code. You have to know to audit for existing callers when making these changes or updating dependencies. Furthermore, handling errors requires effort. That if err != 0 or if err != nil pattern gets mighty annoying to type all of the time! Plus, you have to know what value to compare against: success can often be 0, -1, or 1 or any other arbitrary value. Getting error handling correct 100% of the time is hard. You will fail and this will lead to bugs.

Result is Rust's primary/preferred mechanism for propagating errors and it is different from traditional approaches.

Like Option<T>, Result<T, E> is an enum with 2 variants: Ok(T) and Err(E). That is, a value is either success, wrapping an inner value of type T or error, wrapping an inner value of type E describing that error.

Like Option<T>, Result<T, E> forces you to confront the existence of invariants. Before operating on the value returned by a function, you need to explicitly access it and that forces you to confront that an error could have occurred. In addition, the Result type is annotated and the compiler will emit a warning when you don't check it. Scenarios like changing an infallible function returning a type T to fallible returning a Result<T, E> will fail to compile (due to typing violations) or make compiler warning noise if there are call sites that fail to account for that change.

In addition to making it more likely that errors are acted upon correctly, Rust also contains a ? operator for simplifying handling of errors.

As I said above, typing patterns like if err != 0 or if err != nil can become extremely tedious. Your brain knows what it needs to type to handle errors but it takes precious seconds to do so, slowing you down. You may have code where the majority of the lines are the same error handling boilerplate over and over, increasing verbosity and arguably decreasing readability.

Rust's ? operator will return an Err(E) variant or evaluate to the inner value from the Ok(T) variant. So you can often add an ? operator after a function call returning a Result<T, E> to automatically propagate an error. Typing a single character is vastly easier and simpler than writing explicit control flow for error handling!

The benefits of ? are blatantly apparent when you have functions calling into multiple fallible functions. Long functions with multiple if err != 0 blocks followed by the next logical operation often reduce to a 1-liner. e.g. bar(foo()?)? or foo.do_x()?.do_y()?. When I said earlier that Rust feels like a higher level language, the ? operator is a significant contributor to that.

There are some downsides to Result<T, E> in terms of programming overhead and consistency between Rust programs. I'll cover these later in the post.

Result<T, E> biases Rust code towards correctness by forcing programmers to confront the reality that an error could exist and should be handled. Once you program in Rust, you will look at error handling mechanisms like returning an error integer or nullable value, realize how brittle and/or tedious they are, and yearn for something better.

The unsafe Escape Hatch

If some of Rust's limitations are too much for you, Rust has an in case of emergency break glass feature called unsafe. This is kind of like C mode where you can do things like access and manipulate raw memory through pointers. You can cast a value to a pointer and back to a new Rust reference/value, effectively short circuiting the borrow checker for that particular reference/value.

A common misconception is unsafe disables the borrow checker and/or loosens type checking. This is incorrect: many of those features are still running in unsafe code. However, because Rust can't fully reason about what's happening (e.g. it doesn't know who owns a raw memory address and when it will be freed), it can't properly enforce all of its rules that guarantee safety, leading to, well, unsafety. (See Unsafe Rust for more on this topic.)

unsafe is a necessary evil. In many Rust programs, you won't have to ever use it. But if you do have to use it, its presence will draw review scrutiny like moths to light. So unlike say C/C++ where practically every memory access is a potential security bug and it is effectively impossible in many scenarios to comprehensively audit for memory safety (if there were there would be no memory safety bugs), using unsafe safely is often viable because scrutiny can be concentrated on its relatively few occurrences. And more experienced Rust programmers know how to encapsulate unsafe into safe wrappers, limiting how much code needs to be audited when code around unsafe changes.

What I've personally been enlightened by is the myriad of operations that Rust considers unsafe. As you learn more and more Rust, you'll encounter random functions sprinkled across the standard library that are unsafe and you'll wonder why. The docs usually tell you and that's how you learn something new (and maybe horrifying) about how computers actually work.

Fearless Refactoring

A significant portion of the software development lifecycle is evolving existing code. Fixing bugs. Extending existing code with new functionality. Refactoring code to fix bugs or prepare for new features. Using code in new, unplanned ways.

In many code bases, the amount of people time spent evolving the code dwarfs the time for creating actual greenfield code/features. (Unfortunately, quantifying when you are doing evolution versus greenfield coding is quite difficult, so both facets often get lumped together into simply software development time. But in my mind they are discrete - although highly interdependent - units of work and the evolution time tends to dwarf the greenfield time on established projects.) So it follows that long-term evolution/maintainability of code bases is more important than initial code creation time.

There is a sufficient body of industry research demonstrating that the cost to fix defects rises exponentially as you progress through the software development lifecycle (do a search for say software development lifecycle cost of fixing a bug).

Furthermore, human memory functions not unlike multi-tier caches and your ability to recall information will diminish over time. (You probably know what you were doing 5 minutes ago, might remember what you were doing at this time yesterday, and probably have no clue what you were doing on this date 20 years ago.)

In terms of coding, the best way to address a defect is to not introduce it in the first place. If you can't do that, your goal is to detect and correct it as early in the development process as possible, as close as possible to when the source code creating that defect came into existence. Practically, in order of descending desirability:

  1. Don't introduce defect (this is impossible because humans are fallible).
  2. Detect and correct defect as soon as the bad key press occurs (within reason: you don't want the programmer to lose too much flow) (milliseconds later).
  3. At next build / test time (seconds or minutes later).
  4. When code is shared with others (maybe you push a branch and CI tells you something is wrong) (minutes to days later).
  5. During code review (minutes to days later).
  6. When code is integrated (e.g. merged) (minutes to days later).
  7. When code is deployed (minutes to days or even months later).
  8. When a bug is reported long after the code has been deployed (weeks to years later).

The earlier a defect is caught, the better the chances that the author (or other involved parties) have relevant code paged in and can fix it with less effort and with lower chances of introducing additional defects. For me, authoring new code is relatively easy compared to refactoring old code. That's because I have new code fully paged into my brain and I know it like the back of my hand. I know where the sharp edges are and how you'll get cut if you make certain changes. However, if several months pass without revisiting the code, most of that heightened awareness evaporates. If I need to change or review that code, my ability to do that with a high degree of confidence and efficiency is drastically eroded.

Generally speaking, the earlier a defect is caught, the less damage it can do. Ideally, a defect is caught and fixed at local development time, before you burden a reviewer with finding it and certainly before it causes harm or anti-value after being deployed!

In addition, compressing the software development lifecycle allows you to ship enhancements sooner, which enables you to deliver value sooner. This is what we're trying to do as professional programmers after all!

Because the cost to fix a defect rises exponentially as it moves through the software development lifecycle, it follows that you want defect detection to occur logarithmically to offset that cost. That means you want as many defects as possible to be caught as early as possible.

Compared to other programming languages I've used, Rust is exceptional at detecting defects earlier in the development lifecycle and as a result can drastically lower overall development costs. Here are the main factors contributing to this belief:

  • The type system is relatively strong and prevents many classes of bugs.
  • The borrow checker and the rules it enforces prevent safety issues at compile time. Some of these violations can be detected by other languages' compilers. However, in many cases sufficient auditing (like {address, memory, thread} sanitizers) is run much less frequently, often only in CI tests, which can be hours or days later.
  • Confidence that the above 2 function as advertised.
  • Invariants can be encoded and enforced in the type system through features like enums being algebraic data types.
  • Variables are immutable by default and must be explicitly annotated as mutable. This forces you to think about where and how data mutation occurs, enabling you to spot issues sooner.
  • Option<T> significantly curtails the billion dollar mistake.
  • Result<T, E> forces you to reckon about handling errors.

The Rust compiler is just exceptional at detecting common defects.

Did your code refactor introduce a use-after-free or dangling reference? Don't worry: the borrow checker will detect that. CVE prevented.

Did you introduce a race condition by performing a mutation somewhere that was previously immutable? The borrow checker will detect that. You potentially just saved hours of time debugging a hard-to-reproduce bug.

Did you add an enum variant but forget to add that variant to a match expression? If you avoided using the match all _ expression, the compiler will tell you match arms aren't exhaustive and give you an error.

Did a value that was previously always defined become nullable? Changing the type from T to Option<T> will yield compiler errors due to type mismatch.

Did an Option<T> that was previously always Some(T) suddenly become None? Hopefully following Rust best practices mean your code will just work. In the worst case you get a panic (with a stack trace). But that's on par with say a Java NPE and is strictly better than a null dereference that you get with languages like C/C++.

Did you change or add a function returning Result<T, E> but forget to check if that Result is an Ok(T) or Err(E), the compiler will tell you.

I could go on. Rust is full of little examples like these where the core language and standard library nudge you towards working code and help detect defects earlier during development, saving vast amounts of time and money later.

The Rust compiler is so good at rooting out problems that many Rust programmers have adopted the expression, if it compiles it works. This statement is obviously falsifiable. But compared to every other programming language I've used, I'm shocked by how often it is true.

For other programming languages, a working compile is the beginning of your verification or debugging journey. For Rust, it often feels like the hard part is over and you are almost done. With other languages, you often have an indefinite number of iterations to fix language defects (like null dereferences or dynamic typing errors) beyond the compile step. You need to address these in addition to any logical/intent defects in your code. And fixing logical/intent defects could introduce more post-compile defects. As a programmer, you just don't know when the process will be done. With Rust, the compiler errors tell you exactly what the language defects are. So by the time you appease the compiler, you are left with just your logical/intent defects. I greatly prefer the Rust workflow which separates these because I'm getting clearer feedback on my progress: I know that once I've addressed all the language defects the compiler complains about that is just a matter of fixing logical/intent defects. I know I'm a giant step closer to victory.

The Progress Principle is a psychological observation that people tend to prefer a series of more smaller wins over fewer larger wins. And (unexpected) setbacks can more than offset the benefits of wins. (The book is an easy read and I've found its insights applicable to software development workflows.) Whether Rust's language designers realized it or not, Rust's development workflow plays into our psychological dispositions as described by The Progress Principle: defects (setbacks) tend to occur earlier (at compile time), not at unexpected later times (during code review, CI testing, deploy, etc) and our progress towards a working solution is composed of small wins, such as fixing compiler errors and knowing when you transition from language defects into logical/intent defects. For me, this makes iterating on Rust more fulfilling and enjoyable than other languages.

Rust Makes You a Better Overall Programmer

Whether you realize it or not, every programmer has a personal, generalized model of how to program, how to reason about code, best practices, and what not. When we program, we specialize that model to the language and environment/project we're programming for. The mental model that each of us has its shaped by our experience: which languages we know, which concepts we've been exposed, mistakes we've made, people we've worked with and the practices they've instilled.

If for no other reason, you should learn Rust to expand your generalized model of how to program so that you can apply Rust's principles outside of Rust.

Before I learned Rust, I had a mental model of the lifetimes of various values/variables/memory and how they would be used. If I were coding C, I would attempt to document these in function comments. e.g. if returning a pointer, the comment would say how long the memory behind that pointer lives or who is responsible for freeing it. So when I encountered Rust's ownership and reference rules when learning Rust, they substantially overlapped with my personal mental model of how you should reason about memory in order to avoid bugs. I distinctly remember reading the Rust Book and thinking wow, this seems to be a formalization of some of the concepts and best practices living in my head!

After using Rust for several months, I realized that my prior mental model around reasoning about safe program behavior was woefully incomplete and that Rust's was far superior.

Rust's different ways of doing things will inevitably force you to think about type design, data access patterns, control flow, etc more than most other programming languages. In most other languages, it is much easier to just write runnable code and defer the complexity around ensuring the code is safe/correct and free from certain classes of bugs, like memory access violations and race conditions. Rust's ways of doing things forces you to confront many of these problems up-front, before anything runs.

Rust's stricter model and way about authoring software eventually percolates into your personal generalized model of how to program in any programming language. As you internalize patterns needed to program Rust proficiently, you will subconsciously cherry-pick aspects of Rust and apply them when programming in other languages, making you a better programmer in those languages.

For example, when you program C/C++, you will realize the minefield of memory safety issues that linger in those languages. Many of those mines never explode. But knowing Rust and the patterns needed to appease the borrow checker and write safe code, you have a better sense of where the mines are located, the patterns that lead to them exploding, and you can take preemptive steps or apply extra scrutiny to avoid tripping them. (If you are like me, you'll reach the conclusion that C/C++ is intrinsically unsafe and is beyond saving, vowing to avoid it as much as possible because it is just too dangerous to use safely/responsibly.)

Similarly, when programming in any language, you'll probably think more about variable mutability and non-mutability, even if those languages don't have the concept of mutability on variables. You'll be more attune to certain patterns for mutating data: where mutation occurs, who has a mutable reference, when there are both mutable and non-mutable references in existence. Again, your knowledge from Rust will subconsciously raise your awareness for classes of bugs, making you a better programmer.

The same thing applies to multi-threaded programming and race conditions. After internalizing Rust's model of how to achieve multi-threading safely, you will probably not look at multi-threading in other languages the same way again. If you are like me, you will be horrified by how the lack of Rust's enforced ownership/reference rules predisposes code to so many horrible and hard-to-debug bugs. Again, you will probably find yourself changing your approach to multi-threading to minimize risk.

Fun fact: while at Mozilla I heard multiple anecdotes of [very intelligent] Firefox developers thinking they had found a bug in Rust's borrow checker because they thought it was impossible for a flagged error to occur. However, after sufficient investigation the result was always (maybe with an exception or two because Mozilla adopted Rust very early) that the Rust compiler was correct and the developer's assertions about how code could behave was incorrect. In these cases, the Rust compiler likely prevented hard-to-debug bugs or even exploitable security vulnerabilities. I remember one developer exclaiming that if the bug had shipped, it would have taken weeks to debug and would likely have gone unfixed for years unless its severity warranted staffing.

I strongly feel that I am a better programmer overall after learning Rust because I find myself applying the [best] practices that Rust enforces on me when programming in other languages. For this reason, even if you don't plan to use Rust in any serious capacity, I encourage people to learn Rust because exposure to its ideas will likely transform the ways you think about programming for the better.

Rust Downsides and Dispelling Some Rust Myths

This post has been rather positive about Rust so far. Rust, like everything, is far from perfect and it has its downsides. Professionals know the limitations of their tools and you should know some of the issues you'll run into when using Rust.

In addition, Rust is still a relatively young and unpopular programming language. Since relatively few people know Rust, there are a handful of myths and inaccuracies circling about the language. I'll also dispel some of those here.

Steeper Learning Curve

A common criticism levied against Rust is it is harder to learn than other programming languages. I think this is a valid concern. My experience is Rust took longer to learn and level-up than other languages I've learned recently, notably Go, Kotlin, and Ruby.

I think the primary reason for this is the borrow checker and the rules it enforces. Many programmers have never encountered forced following of ownership and reference rules before and this concept is completely foreign at first. I liken it to a new way to program. If you only have experience with dynamically typed languages that will allow you to compile a ham sandwich, there's a good chance you'll be frustrated by Rust. Rust will likely challenge your conceptions of how programming should work and may frustrate you in the process.

In addition to the borrow checker itself, there are a myriad of types and patterns you'll encounter and eventually need to understand to appease the borrow checker.

Beyond the borrow checker, Rust's standard library is comprehensive and offers a lot of types and traits. It will take a while to be exposed to many of them and know when/how to use each.

You will likely be adding 3rd party crates as dependencies to your project for common functionality not (yet) in the standard library. These expand the scope of concepts you need to learn.

I hope I'm not scaring anybody away: you can go pretty far in Rust without encountering or understanding most of the standard library. That being said, every new type, trait, concept, and crate you learn unlocks new possibilities and avenues for delivering value through programming. So there is an incentive to take the time to learn them sooner than later.

I learned Rust mostly independently for a personal project. While learning resources such as Learn Rust, the Rust Language Cheat Sheet, and even Clippy are fantastic, in hindsight I probably would have become more proficient sooner had I contributed to an existing Rust project and/or had ongoing technical collaboration with more experienced Rust developers. This is probably no different than any other programming language. But because of Rust's steeper learning curve, I think the benefits of peer exposure are more significant. That being said, I've heard anecdotes of teams with no Rust experience learning Rust together with successful results. So there's no formal recipe for success here.

Finally, despite the steeper learning curve, I'd say the return on investment pays off pretty quickly. As I've argued elsewhere in this post, the Rust compiler and type system helps prevent many classes of bugs. So while it may take longer to initially learn and compose idiomatic Rust code, it won't take long for Rust to offset the time that you would have spent chasing bugs, performance optimizations, and the like.

Rust Moves Too Fast

Rust releases a new version every 6 weeks. By contrast, many other programming languages release ~yearly. This faster release cadence has been a common complaint about Rust.

Quickly, I think people conflate release cadence with churn and hardship from that release cadence. Generally speaking, release cadence isn't the thing you care about: it's how disrupted you are from the releases. If your old release continues to work just as well as the new release, release cadence doesn't really matter (many major websites deploy/release dozens of times per day and you don't care because you can't tell: you only care when the UI or behavior changes). So the thing most of us care about is how frequently Rust releases cause disruption. And disruption is often caused by backwards incompatibility and the introduction of new features, which when adopted, force upgrades.

A few years ago, I think the concern that Rust moves too fast was valid: there were significant features in seemingly every release and crates were eager to jump on the new features, forcing you to upgrade if you wanted to keep your dependency tree up to date. I feel like I caught the tail end of this relative chaos in 2018-2019.

But in the last 18-24 months, things seem to have quieted down. Many of the major language features that people were eager to jump on have landed. The only ongoing churn I'm aware of in Rust is in the async ecosystem, and that seems to be stabilizing. New Rust releases are generally pretty quiet in terms of must use features. The last milestone release in my mind was 1.45 in July 2020, which stabilized procedural macros. The community was pretty quick to jump on that feature/release. My Rust projects have targeted 1.45+ for a while now with minimal issues.

9 months with no major disruptions is on par with the release cadence of other programming languages.

In my opinion, the concern that Rust moves too fast, while once valid, no longer generally applies. Pockets of truth for segments of users caring about niche and lesser-used features, yes. But nothing that applies to the entire Rust ecosystem.

Compiling Is Too Slow

A lot of people have commented that Rust builds take too long. It is true: compiling Rust tends to take longer than C/C++, Go, Java, and other languages requiring an ahead-of-time compile step.

While a lot has been done to make the Rust compiler faster (it feels substantially faster than it was a few years ago), it still isn't as fast as other languages.

Not to dismiss the problem, but in a lot of cases, the speed of Rust compilation is fast enough. Incremental builds for small libraries or programs will take a few hundred milliseconds to a second or two. I suspect most of the people complaining about build times today are developing very large Rust programs (tens of thousands of lines of code and/or hundreds of dependencies).

A contributing problem to build times is dependency count. The simplicity of Cargo makes it very easy to accumulate dependencies in Rust and each additional crate will slow your build down. PyOxidizer has ~400 dependencies at this point in time, for example (I've been throwing the kitchen sink at it in terms of features).

There are a few things under your control to mitigate this problem.

First, install sccache, a transparent compiler cache. By default it caches to the local filesystem. But you can also point it at Redis, Memcached, or blob stores in AWS, Azure, or GCP. Firefox's CI uses an S3 backed cache and the hit rate (for both Rust and C/C++) is 90-99% on nearly every build. For PyOxidizer - a medium sized Rust project - sccache reduces full build times from ~53s wall and ~572s CPU to ~32s wall to 225s CPU on my 16 core Ryzen 5950X. The wall time savings on a lower CPU core count machine are even more significant.

Speaking of CPU core counts, the second thing you can do is give yourself access to more CPU cores. Laptops tend to have at most 4 CPU cores. Consider buying desktops or moving builds to remote machines, often with dozens of CPU cores. This requires spending money. But when you factor in people time saved and the cost of that time and the value of someone's happiness/satisfaction, it can often be justified.

I'm not trying to dismiss the problems that slow builds can impose, but if you want to justify their cost, you can argue that the Rust compiler does more at compilation time than other languages and that this overhead brings benefits, such as preventing bugs earlier in the software development lifecycle. There's no such thing as a free lunch and Rust's relatively slower builds are a tax you pay for the correctness the compiler guarantees. To me, that's a justifiable trade-off.

Rust is Too Young or Isn't Production Ready

The isn't production ready concern is likely disproven by the existence of Rust in production in critical roles at a sufficient number of reputable companies. At this point, there are very few technical reasons to say Rust isn't production ready. Non-technical reasons such as lack of organizational knowledge or a limited talent pool for hiring from, yes. But little on the technical front.

The too young part is ultimately a judgement call for how comfortable you are with new technologies.

I'm generally pretty conservative/skeptical about adopting new technology. If you are in this industry long enough you eventually get humbled by your exuberance.

I was probably in the Rust is too young boat as late as 2017, maybe 2018. While I was cheering on Rust as a Mozillian, I was skeptical it was going to take off. Birthing successfully languages is hard. The language still seemed to move too fast and have too many missing features. Things seemed to stabilize around the 2018 edition. That's also when you started commonly hearing of companies adopting Rust. Lots of startups at first. Then big companies started joining in.

Today, companies you have heard of like Amazon, Cloudflare, Discord, Dropbox, Facebook, Google, and Microsoft are adopting Rust to varying degrees. There are 58,750 published crates on crates.io.

I won't drop names, but I've heard of Rust spreading like wildfire at some companies you've heard of. The stories are pretty similar: random person or team wants to try Rust. Something small and isolated with a minimal blast radius in case of disaster is tried first. Rust is an overwhelming success. As more and more people are exposed to Rust, they see the light, cries for Rust become louder, and it becomes even more widely adopted.

The I'm Writing Fewer Bugs Trap

When I program in Rust, I strongly feel that my base rate of defect introduction is substantially less than other programming languages. I have confidence that the Rust compiler coupled with practices like encoding and enforcing invariants in the type system leads to fewer defects. In some cases I feel like the surface area for bugs is limited to logical defects, which are mis-expressions of the human programmer's intent. And since no automated tool can reliably scan for human intent, there's no way to prevent logical bugs, and that surface area is the best we can ever expect from automated scanning.

Knowing what tests to write and how much effort to invest in test writing is a difficult skill to level up and is full of trade-offs. With Rust, I find myself writing fewer tests than in other languages because I have confidence that the compiler will detect issues that would otherwise require explicit testing.

I feel that my beliefs and practices are rooted in reality and justifiable. Yet I recognize the danger in placing too much faith in my tools, in Rust.

In theory, Rust alleviates the need for running additional verification tools, like {address, memory, thread} sanitizers because the safe subset of Rust prevents the issues these tools detect. Many defects caught by fuzzing are also similarly prevented by the design of Rust (but not all: fuzzing is generally a good idea).

What I'm trying to say is that it is really easy to fall into a trap where you are over-confident about the abilities of Rust to prevent defects and you find yourself letting your guard down and not maintaining testing and other verification best practices.

I'm still evolving my beliefs in this area. But my general opinion is that you should still run things like {address, memory, thread} sanitizers and fuzzing because unsafe likely exists somewhere in the compiled code, as likely does C or assembly code. And because a chain is only as strong as its weakest link, it only takes any bug to undermine the safety of the entire system. So while these additional verification tools likely won't find as many issues as they would in unsafe languages, I still think it is a good idea to continue to run them against Rust, especially for high value code bases.

Error Handling

Result<T, E> isn't a panacea. Because errors are full on types rather than simple primitives like integers, you need to spend effort reasoning and coding about how different error types interact. And often you need to write a bit of boilerplate code to facilitate that interaction. This can cancel out a lot of the efficiency benefits of Rust's ? operator for handling errors.

There are a handful of 3rd party Rust crates specializing in error handling that you'll likely to encounter. These include anyhow, error-chain, failure, and thiserror.

Rust's error handling landscape can at times feel fragmented and make you yearn for something more defined/opinionated in the standard library. The Rust Community recognizes that this is an area that can be improved and has formed an error handling project group to improve this space. So hopefully we see some quality of life improvements to error handling in time.

Conclusion

I am irrationally effusive about Rust. When I see this level of excitement in others, I am extremely skeptical. I was skeptical myself when my former colleagues at Mozilla were talking up Rust years ago. But having used Rust for 2.5 years now and authored tens of thousands of lines of Rust code, the initial relationship euphoria has worn off and I am most definitely in love.

Cynically, Rust has ruined in programming in other languages for me. Less cynically, Rust has spoiled me.

When I look at other languages without the rules enforced by Rust's borrow checker, all I see are sharp edges waiting to materialize into bugs.

When I look at other languages with weaker type systems, I think about all the time I spend having to defend against invariants and how much cognitive load and programming/review effort I need to incur to maintain the baseline of quality that I get with Rust.

When I look at programming languages like Python, Ruby, and TypeScript where you can bolt a type system onto a language that doesn't have it, I think why would I want to do that when I can use an even better type system while likely achieving much better performance with Rust? (It's tempting to reach for a metaphor involving lipstick and pigs.)

When I look at other languages, I generally see the same pile of decades old ideas packaged in different boxes. Some of these ideas are good and probably timeless (e.g. functions and variables). Some are demonstrably bad and should be largely excised from common use (e.g. null references - the billion dollar mistake).

When I interface with Rust's tooling, I feel like it is respectful of my time and has my best interests (producing working software) at heart. I feel the maintainers of the tooling care about me.

When I program in Rust, I feel that I'm producing fewer defects overall. The compiler is catching defects that would otherwise be caught later in the software development lifecycle, leading to increased software development costs.

When I interact with Rust's community of people, respect and empathy abounds.

Does Rust have its problems and limitations? Of course it does: nothing is perfect! But in my opinion, its trade-offs are often strictly better than those found in other programming languages I've used.

At the end of the day, Rust is a programming language and therefore a tool. Adept professionals know not to get too attached to your tools: ultimately it is the value you deliver, not how you deliver it. (Of course the choice of tools can significantly impact the quality and timeline of value delivery!) Will my thoughts on Rust and preferred languages change over time as the landscape shifts: of course they will! But for the time being, Rust brings so much to the table that its competition lacks that I'm overly excited about Rust and its ability to advance the state of software/programming and therefore the industry.

In closing, my current CTO uses the phrase commitment to craft as a desired mindset for their technical organization. That phrase translates to various themes: higher quality / lower defect rate, build with the long-term in mind, implement efficient solutions, etc. Like an artist reaches for a preferred paintbrush or a chef for a preferred knife because their preferred tool enables them to better express their craft, I feel that Rust often enables me to better express the potential of my professional craft more than other programming languages. I strongly feel that Rust predisposes software to higher quality outcomes - both in terms of defect rate and run-time efficiency - while also reducing total development and execution costs over the entire software development lifecycle. That makes Rust my first choice language - my go-to tool - for many new projects at this point in time. If you likewise value commitment to craft, I urge you to explore Rust so that you too can better harness the potential of our programming craft.

But don't take my word on it, read what 42 companies using Rust in production have to say.


Building Standalone Python Applications with PyOxidizer

June 24, 2019 at 09:00 AM | categories: Python, PyOxidizer, Rust

Python application distribution is generally considered an unsolved problem. At their PyCon 2019 keynote talk, Russel Keith-Magee identified code distribution as a potential black swan - an existential threat for longevity - for Python. In their words, Python hasn't ever had a consistent story for how I give my code to someone else, especially if that someone else isn't a developer and just wants to use my application. I completely agree. And I want to add my opinion that unless your target user is a Python developer, they shouldn't need to know anything about Python packaging, Python itself, or even the existence of Python in order to use your application. (And you can replace Python in the previous sentence with any programming language or software technology: most end-users don't care about the technical implementation, they just want to get stuff done.)

Today, I'm excited to announce the first release of PyOxidizer (project, documentation), an open source utility that aims to solve the Python application distribution problem! (The installation instructions are in the docs.)

Standalone Single File, No Dependencies Executable Python Applications

PyOxidizer's marquee feature is that it can produce a single file executable containing a fully-featured Python interpreter, its extensions, standard library, and your application's modules and resources. In other words, you can have a single .exe providing your application. And unlike other tools in this space which tend to be operating system specific, PyOxidizer works across platforms (currently Windows, macOS, and Linux - the most popular platforms for Python today). Executables built with PyOxidizer have minimal dependencies on the host environment nor do they do anything complicated at run-time. I believe PyOxidizer is the only open source tool to have all these attributes.

On Linux, it is possible to build a fully statically linked executable. You can drop this executable into a chroot or container where it is the only file and it will just work. On macOS and Windows, the only library dependencies are on always-present or extremely common libraries. More details are in the docs.

At execution time, binaries built with PyOxidizer do not do anything special to run the Python interpreter. (Other tools in this space do things like create a temporary directory or SquashFS filesystem and extract Python to it.) PyOxidizer loads everything from memory and there is no explicit I/O being performed. When you import a Python module, the bytecode for that module is being loaded from a memory address in the executable using zero-copy. This makes PyOxidizer executables faster to start and import - faster than a python executable itself!

Current Release and Future Roadmap

Today's release of PyOxidizer is just the first release milestone in what I envision is a long and successful project history. While my over-arching goal with PyOxidizer is to solve vast swaths of the Python application distribution problem, I want to be clear that this first release comes nowhere close to doing so. I toiled with what features must be in the initial release. I ultimately decided that PyOxidizer's current functionality is extremely valuable to some audiences and that the project has matured to the point where more eyeballs and users would substantially help its development. (I could definitely use some help prioritizing which features to work on and for that I need users and user feedback.)

In today's release, PyOxidizer is good at producing executables embedding Python. It doesn't yet venture too far into the distribution part of the problem (I want it to be trivial to produce MSI installers, DMG images, deb/rpm packages, etc). But on Linux, this is already a huge step forward because PyOxidizer makes it easy (hopefully!) to produce binaries that should just work on other machines. (Anyone who has attempted to distribute Linux applications will tell you how painful this problem can be.)

Despite its limitations, I believe today's release of PyOxidizer to be a viable tool for some applications. And I believe PyOxidizer can start to replace existing tools in this space. (See the Comparisons to Other Tools document for how PyOxidizer compares to other Python packaging and distribution tools.)

Using today's release of PyOxidizer, larger user-facing applications using Python (like Dropbox, Kodi, MusicBrainz Picard, etc) could use PyOxidizer to produce self-contained executables. This would likely cut down on installer size, decrease install/update time (fewer files means faster operations), and hopefully make packaging simpler for application maintainers. Maintainers of Python utilities could produce self-contained executables, making their utilities faster to start and easier to package and distribute.

New Possibilities and Reliability for Python

By enabling support for self-contained, single file Python applications, PyOxidizer opens exciting new doors for Python. Because Python has historically required an explicit, separate runtime not part of the executable, Python was not viable (or was a hinderance) in many domains. For example, if you wanted to use Python to bootstrap a fresh server or empty container environment, you had a chicken-and-egg problem because you needed to install Python before you could use it.

Let's take Ansible for example. One of Ansible's features is that it remotes into a machine and runs things. The way it does this is it dynamically generates Python scripts locally, uploads them to the remote machine, and tells the remote to execute them. Those Python scripts require the existence of a Python interpreter on the remote machine. This means you need to install Python on a machine before you can control it with Ansible. Furthermore, because the remote's Python isn't under Ansible's control, you can assume very little about its behavior and capabilities, making interaction a bit brittle.

Using PyOxidizer, projects like Ansible could produce a self-contained executable containing a Python interpreter. They could transfer that single binary to the remote machine and execute it, instantly giving the remote machine access to a fully-featured and modern Python interpreter. From there, the sky is the limit. In Ansible's case, the executable could contain the full Ansible runtime, along with any 3rd party Python packages they wanted to leverage. This would allow execution to occur (possibly mostly independently) on the remote machine. This architecture is simpler, scales better, would likely result in faster operations, and would probably improve the quality of life for everyone involved, from application developers to its end users.

Self-contained Python applications built with PyOxidizer essentially solve the Python interpreter bootstrapping and reliability problems. By providing a Python interpreter and a known set of Python modules, you provide a highly deterministic and reliable execution environment for your application. You don't need to fret about which version of Python is installed: you know which version of Python you are using. You don't need to worry about which Python packages are installed: you control explicitly which packages are available. You don't need to worry about whether you are running in a virtualenv, what sys.path is set to, whether .pth files come into play, whether various PYTHON* environment variables can mess up your application, whether some Linux distribution packaged Python differently, what to put in your script's shebang, etc: executables built with PyOxidizer behave as you have instructed them to because they are compiled that way.

All of the concerns in the previous paragraph contribute to a larger problem in the eyes of application maintainers that can be summarized as Python isn't reliable. And because Python isn't reliable, many people reach the conclusion that Python shouldn't be used (this is the black swan that was referred to earlier). With PyOxidizer, the Python environment is isolated and highly deterministic making the reliability problem largely go away. This makes Python a more viable technology choice. And it enables application maintainers to aggressively adopt modern Python versions, utilize third party packages fearlessly, and spend far less time chasing an extremely long tail of issues related to Python environment variance. Succinctly, application developers can focus on building great applications instead of toiling with Python environment problems.

Project Status

PyOxidizer is still in its relative infancy. While it is far from feature complete, I'm mentally committed to working on the remaining major functionality. The Status document lists major missing functionality, lesser missing functionality, and potential future value-add functionality.

I want PyOxidizer to provide a Python application packaging and distribution experience that just works with minimal cognitive effort from Python application maintainers. I have spent a lot of effort documenting PyOxidizer. I care passionately about user experience and want everything about PyOxidizer to be simple and frustration free. I know things aren't there yet. The problems that PyOxidizer is attempting to solve are hard (that's a reason nobody has solved them well yet). I know there's details floating around in my head that haven't been added to the documentation yet. I know there's missing features and bugs in PyOxidizer. I know there are Packaging Pitfalls yet to be discovered.

This is where you come in.

I need your help to make PyOxidizer great. I encourage Python application maintainers reading this to head over to Getting Started and the Packaging User Guide and try to package your applications with PyOxidizer. If things don't work, let me know by filing an issue. If you are confused by lack of or unclear documentation, file an issue. If something frustrates you, file an issue. If you want to suggest I work on a certain feature or fix a bug, file an issue! Tweet to @indygreg to engage with me there. Join the pyoxidizer-users mailing list. While I feel PyOxidizer is usable today (that's why I'm announcing it), I need your feedback to help guide future prioritization.

Finally, I know PyOxidizer has significant implications for some companies and projects that use Python. While I'm not looking to enrich myself or make my livelihood from PyOxidizer, if PyOxidizer is useful to you and you'd like to send money my way as appreciation, you can do so on Patreon or PayPal. If not, that's totally fine: I wouldn't be making PyOxidizer open source if I didn't want to share it with the world for free! And I am financially well off as well. I just feel like there should be more financial contribution to open source because it would improve the health of the ecosystem and I can help achieve that end by advocating for it and giving myself.

Leveraging Rust

The oxidize part of PyOxidizer comes from Rust (See the Wikipedia Rust article - for the chemical not the programming language - to understand where oxidize comes from.) The build time packaging and building functionality is implemented in Rust. And the binary that embeds and controls the Python interpreter in built applications is Rust code. Rationale for these decisions is explained in the FAQ.

This is my first non-toy project using Rust and I have to say that Rust is... incredible! I may have to author a dedicated blog post extolling the virtues of Rust. In short, Rust is now my go-to language for systems level projects. Unless you need the target platform versatility, I don't think C or C++ are defensibles choices in 2019 given their security deficiencies. Languages like Go, Java, and various JVM or CLR languages are acceptable if you can tolerate having a garbage collector and/or a larger runtime. But what makes Rust superior in my mind is the ability for the compiler to prevent large classes of software bugs (especially those that turn into CVEs) and inefficiencies that have plagued our industry for decades. Rust is the first programming language I've used where I feel like the language itself, the compiler, the tools around it (cargo, rustfmt, clippy, rustup, etc), and the community surrounding it all actually care about and assist me with writing high quality software. Nothing else I've used comes even close.

What I've been most surprised about Rust is how high level it feels for a systems level language that isn't garbage collected. When you program lower-level languages like C or C++, compared to a higher level language like Python, you have to type a lot more and be more explicit in nearly everything you do. While Rust is certainly not as expressive or compact as say Python, it is far, far closer to Python than I was expecting it to be. Yes, you do have to type more and think more about your code to appease the Rust compiler's constraints. But the return on that investment is the compiler preventing entire classes of bugs and C/C++ levels of performance. When I started PyOxidizer, the build time logic was implemented in Python and only the run-time pieces were in Rust. After learning a bit more Rust and realizing the obvious code quality benefits, I ditched Python and adopted Rust for the build time logic. And as the code base has grown and gone through various refactorings, I am so glad I did so! The Rust compiler has caught dozens of would-be bugs in Python. Granted, many of these can be attributed to having strong typing and compile time type checking and Rust is little different than say Java on this front. But a significant number of prevented bugs covered invariants in the code because of the way Rust's type system often intersects with control flow. e.g. match arms must be exhaustive, so you can't have unhandled values/types and unchecked Result instances result ina compiler warning. And clippy has been just fantastic helping to guide me towards writing more acceptable code following community accepted best practices.

Even though PyOxidizer is implemented in Rust, most end-users shouldn't have to care (beyond having to install a Rust compiler and build PyOxidizer from source). The existence of Rust should be abstracted away from Python packagers. I did this on purpose because I believe that users of an application shouldn't have to care about the technical implementation of that application. It is a bit unfortunate that I force users to install Rust before using PyOxidizer, but in my defense the target audience is technically savvy developers, bootstrapping Rust is easy, and PyOxidizer is young, so I think it is acceptble for now. If people get hung up on it, I can provide pre-compiled pyoxidizer executables.

But if you do know Rust, PyOxidizer being implemented in Rust opens up some exciting possibilities!

One exciting possibility with PyOxidizer is the ability to add Rust code to your Python application. PyOxidizer works by generating a default Rust application (main.rs) that simply instantiates and runs an embedded Python interpreter then exits. It essentially does what python or a Python script would do. The key takeaway here is your Python application is technically a Rust application (in the same way that python is technically a C application). And being a Rust application means you can add Rust code to that application. You can modify the autogenerated main.rs to do things before, during, and after the embedded Python interpreter runs. It's a regular Rust program and can do anything that Rust programs can do!

Another possibility - and variant of above - is embedding Python in existing Rust projects. PyOxidizer's mechanism for embedding a Python interpreter is implemented as a standalone Rust crate. One can add the pyembed crate to an existing Rust project and a little of build system magic later, your Rust project can now embed and run a Python interpreter!

There's a lot of potential for hybrid Rust + Python programs. And I am very excited about the possibilities.

If you are a Rust programmer, PyOxidizer allows you to easily embed Python in your Rust application. If you are a Python programmer, PyOxidizer allows your to easily leverage Rust in your Python application. In short, the package ecosystem of the other becomes available to you. And if you aren't familiar with Rust, there are some potentially crazy possibilities. For example, Alacritty is a GPU accelerated terminal emulator written in Rust and Servo is an entire web browser engine written in Rust. With PyOxidizer, you could integrate a terminal emulator or browser engine as part of your Python application if you really wanted to. And, yes, Rust's packaging tools are so good that stuff like this tends to just work. As a concrete example, the pyoxidizer CLI tool contains libgit2 for performing in-process interactions with Git repositories. Adding this required a single line change to a Cargo.toml file and it just worked on Linux, macOS, and Windows. Stuff like this often takes hours to days to integrate in C/C++. It is quite ridiculous how easy it is to add (complex) components to Rust projects!

For years, Python projects have implemented extensions in C to realize performance wins. If your Python application is a Rust executable, then implementing this functionality in Rust (rather than C) seems rationale. So we may see oxidized Python applications have their performance critical pieces slowly be rewritten in Rust. (Honestly, the Rust crates to interface between Rust and the CPython API still leave a bit to be desired, so the experience of writing this Rust code still isn't great. But things will certainly improve over time.)

This type of inside-out split language work has been practiced in Python for years. What PyOxidizer brings to the table is the ability to more easily port code outside-in. For example, you could implement performance-criticial, early application logic such as config file parsing and command line argument parsing in Rust. You could then have Rust service some application functionality without Python. Why would you want this? Performance is a valid reason. Starting a Python interpreter, importing modules, and running code can consume several dozen or even hundreds of milliseconds. If you are writing performance sensitive applications, the existence of any Python can add enough latency that people no longer perceive the interaction as instananeous. This added latency can make Python totally inappropriate for some contexts, such as for programs that run as part of populating your shell's prompt. Writing such code in Rust instead of Python dramatically increases the probability that the code is fast and likely delivers stronger correctness guarantees courtesy of Rust's compile time validation as well!

An extreme practice of outside-in porting of Python to Rust would be to incrementally rewrite an entire Python application in Rust. Rust's ergonomics are exceptional and I do think we'll see people choose Rust where they previously would have chosen Python. I've done this myself with PyOxidizer and feel it is a very defensible decision to reach! I feel a bit conflicted releasing a tool which may undermine Python's popularity by encouraging use of Rust over Python. But at the end of the day, PyOxidizer increases the utility of both Python and Rust by giving each more readily accessible access to the other and PyOxidizer improves the overall utility of Python by improving the application distribution story. I have no doubt PyOxidizer is a net benefit for the Python ecosystem, even if it does help usher in more people choosing Rust over Python. If I have an ulterior motive in developing PyOxidizer, it is to enable Mercurial's official distribution to be a Rust executable and for some functionality (like hg status) to be runnable without Python (for performance reasons).

Another possible use of PyOxidizer is as a library. All the build time functionality of PyOxidizer exists in a Rust crate. So, you can add the pyoxidizer crate to your own Rust project and use its code to do things like build a library containing Python, compile Python source modules to bytecode, or walk a directory tree and find Python resources within. The code is still heavily geared towards PyOxidizer and there's no promise of API stability. But this potential for library usage exists and if others want to experiment with building custom Python binaries not using the pyoxidizer CLI tool, using PyOxidizer as a library might save you a lot of time.

Standalone Python Distributions

One of the most time consuming parts of building PyOxidizer was figuring out how to build self-contained Python distributions. Typically, a Python build consists of a library, shared libraries for various extension modules, shared libraries required by the prior items, and a hodgepodge of other files, such as .py files implementing the Python standard library. The python-build-standalone project was created to automate creating special builds of Python which are self-contained and distributable. This requires doing dirty things with build systems. But I don't want to inflict the details on you here. What I do think is worth mentioning is how those Python distributions are distributed. The output of the build is a tarball containing the Python installation, build artifacts that can be used to link a custom libpython, and a PYTHON.json file describing the contents of the distribution. PyOxidizer reads the PYTHON.json file and learns how it should interact with that distribution. If you produce a Python distribution conforming to the format that python-build-standalone defines, you can use that Python with PyOxidizer.

While I have no urgency to do so at this time, I could see a future where this Python distribution format is standardized. Then maintainers of various Python distributions (CPython, PyPy, etc) would independently produce their own distributable artifacts conforming to this standard, in turn allowing machine consumers of Python distributions (such as PyOxidizer) to easily consume different Python distributions and do interesting things with them. You could even imagine these Python distribution archives being readily available as packages in your system's package manager and their locations exposed via the sysconfig Python module, making it easy for tools (like PyOxidizer) to find and use them.

Over time, I could see PyOxidizer's functionality rolling up into official packaging tools like pip, which would know how to consume the distribution archives and produce an executable containing a Python interpreter, required Python modules, etc.

Getting PyOxidizer's functionality rolled into official Python packaging tools is likely years away (if it ever happens). But I think standardizing a format describing a Python distribution and (optionally) contains build artifacts that can be used to repackage it is a prerequisite and would be a good place to start this journey. I would certainly love for Python distributions (like CPython) to be in charge of producing official repackagable distributions because this is not something I want to be in the business of doing long term (I'm lazy, less equipped to make the correct decisions, and there are various trust and security concerns). And while I'm here, I am definitely interested in upstreaming some of the python-build-standalone functionality into the existing CPython build system because coercing CPython's build system to produce distributable binaries is currently a major pain and I'd love to enable others to do this. I just haven't had time nor do I know if the patches would be well received. If a CPython maintainer wants to get in touch, I'd love to have a conversation!

Conclusion

I started hacking on PyOxidizer in November 2018. After months of chipping away at it, I think I finally have a useful utility for some audiences. There's still a lot of missing features and some rough edges. But the core functionality is there and I'm convinced that PyOxidizer or its underlying technology could be an integral part of solving Python's application distribution black swan problem. I'm particularly proud of the hacks I concocted to coerce Python into importing module bytecode from memory using zero-copy. Those are documented in this blog post and in the pyembed crate docs.

So what are you waiting for? Head on over to the documentation, install PyOxidizer, and let me know how it goes by filing issues!

I hope you enjoy oxidizing your Python applications!


PyOxidizer Support for Windows

January 06, 2019 at 10:00 AM | categories: Python, PyOxidizer, Rust

A few weeks ago I introduced PyOxidizer, a project that aims to make it easier to produce completely self-contained executables embedding a Python interpreter (using Rust). A few days later I observed some PyOxidizer performance benefits.

After a few more hacking sessions, I'm very pleased to report that PyOxidizer is now working on Windows!

I am able to produce a standalone Windows .exe containing a fully featured CPython interpreter, all its library dependencies (OpenSSL, SQLite, liblzma, etc), and a copy of the Python standard library (both source and bytecode data). The binary weighs in at around 25 MB. (It could be smaller if we didn't embed .py source files or stripped some dependencies.) The only DLL dependencies of the exe are vcruntime140.dll and various system DLLs that are always present on Windows.

Like I did for Linux and macOS, I produced a Python script that performs ~500 import statements for the near entirety of the Python standard library. I then ran this script with both the official 64-bit Python distribution and an executable produced with PyOxidizer:

# Official CPython 3.7.2 Windows distribution.
$ time python.exe < import_stdlib.py
real    0m0.475s

# PyOxidizer with non-PGO CPython 3.7.2
$ time target/release/pyapp.exe < import_stdlib.py
real    0m0.347s

Compared to the official CPython distribution, a PyOxidizer executable can import almost the entirety of the Python standard library ~125ms faster - or ~73% of original. In terms of the percentage of speedup, the gains are similar to Linux and macOS. However, there is substantial new process overhead on Windows compared to POSIX architectures. On the same machine, a hello world Python process will execute in ~10ms on Linux and ~40ms on Windows. If we remove the startup overhead, importing the Python standard library runs at ~70% of its original time, making the relative speedup on par with that seen on macOS + APFS.

Windows support is a major milestone for PyOxidizer. And it was the hardest platform to make work. CPython's build system on Windows uses Visual Studio project files. And coercing the build system to produce static libraries was a real pain. Lots of CPython's build tooling assumes Python is built in a very specific manner and multiple changes I made completely break those assumptions. On top of that, it's very easy to encounter problems with symbol name mismatch due to the use of __declspec(dllexport) and __declspec(dllimport). I spent several hours going down a rabbit hole learning how Rust generates symbols on Windows for extern {} items. Unfortunately, we currently have to use a Rust Nightly feature (the static-nobundle linkage kind) to get things to work. But I think there are options to remove that requirement.

Up to this point, my work on PyOxidizer has focused on prototyping the concept. With Windows out of the way and PyOxidizer working on Linux, macOS, and Windows, I have achieved confidence that my vision of a single executable embedding a full-featured Python interpreter is technically viable on major desktop platforms! (BSD people, I care about you too. The solution for Linux should be portable to BSD.) This means I can start focusing on features, usability, and optimization. In other words, I can start building a tool that others will want to use.

As always, you can follow my work on this blog and by following the python-build-standalone and PyOxidizer projects on GitHub.


Faster In-Memory Python Module Importing

December 28, 2018 at 12:40 PM | categories: Python, PyOxidizer, Rust

I recently blogged about distributing standalone Python applications. In that post, I announced PyOxidizer - a tool which leverages Rust to produce standalone executables embedding Python. One of the features of PyOxidizer is the ability to import Python modules embedded within the binary using zero-copy.

I also recently blogged about global kernel locks in APFS, which make filesystem operations slower on macOS. This was the latest wrinkle in a long battle against Python's slow startup times, which I've posted about on the official python-dev mailing list over the years.

Since I announced PyOxidizer a few days ago, I've had some productive holiday hacking sessions!

One of the reached milestones is PyOxidizer now supports macOS.

With that milestone reached, I thought it would be interesting to compare the performance of a PyOxidizer executable versus a standard CPython build.

I produced a Python script that imports almost the entirety of the Python standard library - at least the modules implemented in Python. That's 508 import statements. I then executed this script using a typical python3.7 binary (with the standard library on the filesystem) and PyOxidizer-produced standalone executables with a module importer that loads Python modules from memory using zero copy.

# Homebrew installed CPython 3.7.2

# Cold disk cache.
$ sudo purge
$ time /usr/local/bin/python3.7 < import_stdlib.py
real   0m0.694s
user   0m0.354s
sys    0m0.121s

# Hot disk cache.
$ time /usr/local/bin/python3.7 < import_stdlib.py
real   0m0.319s
user   0m0.263s
sys    0m0.050s

# PyOxidizer with non-PGO/non-LTO CPython 3.7.2
$ time target/release/pyapp < import_stdlib.py
real   0m0.223s
user   0m0.201s
sys    0m0.017s

# PyOxidizer with PGO/non-LTO CPython 3.7.2
$ time target/release/pyapp < import_stdlib.py
real   0m0.234s
user   0m0.210s
sys    0m0.019

# PyOxidizer with PTO+LTO CPython 3.7.2
$ sudo purge
$ time target/release/pyapp < import_stdlib.py
real   0m0.442s
user   0m0.252s
sys    0m0.059s

$ time target/release/pyall < import_stdlib.py
real   0m0.221s
user   0m0.197s
sys    0m0.020s

First, the PyOxidizer times are all relatively similar regardless of whether PGO or LTO is used to build CPython. That's not too surprising, as I'm exercising a very limited subset of CPython (and I suspect the benefits of PGO/LTO aren't as pronounced due to the nature of the CPython API).

But the bigger result is the obvious speedup with PyOxidizer and its in-memory importing: PyOxidizer can import almost the entirety of the Python standard library ~100ms faster - or ~70% of original - than a typical standalone CPython install with a hot disk cache! This comes out to ~0.19ms per import statement. If we run purge to clear out the disk cache, the performance delta increases to 252ms, or ~64% of original. All these numbers are on a 2018 6-core 2.9 GHz i9 MacBook Pro, which has a pretty decent SSD.

And on Linux on an i7-6700K running in a Hyper-V VM:

# pyenv installed CPython 3.7.2

# Cold disk cache.
$ time ~/.pyenv/versions/3.7.2/bin/python < import_stdlib.py
real   0m0.405s
user   0m0.165s
sys    0m0.065s

# Hot disk cache.
$ time ~/.pyenv/versions/3.7.2/bin/python < import_stdlib.py
real   0m0.193s
user   0m0.161s
sys    0m0.032s

# PyOxidizer with PGO CPython 3.7.2

# Cold disk cache.
$ time target/release/pyapp < import_stdlib.py
real   0m0.227s
user   0m0.145s
sys    0m0.016s

# Hot disk cache.
$ time target/release/pyapp < import_stdlib.py
real   0m0.152s
user   0m0.136s
sys    0m0.016s

On a hot disk cache, the run-time improvement of PyOxidizer is ~41ms, or ~78% of original. This comes out to ~0.08ms per import statement. When flushing caches by writing 3 to /proc/sys/vm/drop_caches, the delta increases to ~178ms, or ~56% of original.

Using dtruss -c to execute the binaries, the breakdown in system calls occurring >10 times is clear:

# CPython standalone
fstatfs64                                      16
read_nocancel                                  19
ioctl                                          20
getentropy                                     22
pread                                          26
fcntl                                          27
sigaction                                      32
getdirentries64                                34
fcntl_nocancel                                106
mmap                                          114
close_nocancel                                129
open_nocancel                                 130
lseek                                         148
open                                          168
close                                         170
read                                          282
fstat64                                       403
stat64                                        833

# PyOxidizer
lseek                                          10
read                                           12
read_nocancel                                  14
fstat64                                        16
ioctl                                          22
munmap                                         31
stat64                                         33
sysctl                                         33
sigaction                                      36
mmap                                          122
madvise                                       193
getentropy                                    315

PyOxidizer avoids hundreds of open(), close(), read(), fstat64(), and stat64() calls. And by avoiding these calls, PyOxidizer not only avoids the userland-kernel overhead intrinsic to them, but also any additional overhead that APFS is imposing via its global lock(s).

(Why the PyOxidizer binary is making hundreds of calls to getentropy() I'm not sure. It's definitely coming from Python as a side-effect of a module import and it is something I'd like to fix, if possible.)

With this experiment, we finally have the ability to better isolate the impact of filesystem overhead on Python module importing and preliminary results indicate that the overhead is not insignificant - at least on the tested systems (I'll get data for Windows when PyOxidizer supports it). While the test is somewhat contrived (I don't think many applications import the entirety of the Python standard library), some Python applications do import hundreds of modules. And as I've written before, milliseconds matter. This is especially true if you are invoking Python processes hundreds or thousands of times in a build system, when running a test suite, for scripting, etc. Cumulatively you can be importing tens of thousands of modules. So I think shaving even fractions of a millisecond from module importing is important.

It's worth noting that in addition to the system call overhead, CPython's path-based importer runs substantially more Python code than PyOxidizer and this likely contributes several milliseconds of overhead as well. Because PyOxidizer applications are static, the importer can remain simple (finding a module in PyOxidizer is essentially a Rust HashMap<String, Vec<u8> lookup). While it might be useful to isolate the filesystem overhead from Python code overhead, the thing that end-users care about is overall execution time: they don't care where that overhead is coming from. So I think it is fair to compare PyOxidizer - with its intrinsically simpler import model - with what Python typically does (scan sys.path entries and looking for modules on the filesystem).

Another difference is that PyOxidizer is almost completely statically linked. By contrast, a typical CPython install has compiled extension modules as standalone shared libraries and these shared libraries often link against other shared libraries (such as libssl). From dtruss timing information, I don't believe this difference contributes to significant overhead, however.

Finally, I haven't yet optimized PyOxidizer. I still have a few tricks up my sleeve that can likely shave off more overhead from Python startup. But so far the results are looking very promising. I dare say they are looking promising enough that Python distributions themselves might want to look into the area more thoroughly and consider distribution defaults that rely less on the every-Python-module-is-a-separate-file model.

Stay tuned for more PyOxidizer updates in the near future!

(I updated this post a day after initial publication to add measurements for Linux.)


Distributing Standalone Python Applications

December 18, 2018 at 03:35 PM | categories: Python, PyOxidizer, Rust

The Problem

Packaging and application distribution is a hard problem on multiple dimensions. For Python, large aspects of this problem space are more or less solved if you are distributing open source Python libraries and your target audience is developers (use pip and PyPI). But if you are distributing Python applications - standalone executables that use Python - your world can be much more complicated.

One of the primary reasons why distributing Python applications is difficult is because of the complex and often sensitive relationship between a Python application and the environment it runs in.

For starters we have the Python interpreter itself. If your application doesn't distribute the Python interpreter, you are at the whims of the Python interpreter provided by the host machine. You may want to target Python 3.7 only. But because Python 3.5 or 3.6 is the most recent version installed by many Linux distros, you are forced to support older Python versions and all their quirks and lack of features.

Going down the rabbit hole, even the presence of a supposedly compatible version of the Python interpreter isn't a guarantee for success! For example, the Python interpreter could have a built-in extension that links against an old version of a library. Just last week I was encountering weird SQlite bugs in Firefox's automation because Python was using an old version of SQLite with known bugs. Installing a modern SQLite fixed the problems. Or the interpreter could have modifications or extra installed packages interfering with the operation of your application. There are never-ending corner cases. And I can tell you from my experience with having to support the Firefox build system (which uses Python heavily) that you will encounter these corner cases given a broad enough user base.

And even if the Python interpreter on the target machine is fully compatible, getting your code to run on that interpreter could be difficult! Several Python applications leverage compiled extensions linking against Python's C API. Distributing the precompiled form of the extension can be challenging, especially when your code needs to link against 3rd party libraries, which may conflict with something on the target system. And, the precompiled extensions need to be built in a very delicate manner to ensure they can run on as many target machines as possible. But not distributing pre-built binaries requires the end-user be able to compile Python extensions. Not every user has such an environment and forcing this requirement on them is not user friendly.

From an application developer's point of view, distributing a copy of the Python interpreter along with your application is the only reliable way of guaranteeing a more uniform end-user experience. Yes, you will still have variability because every machine is different. But you've eliminated the the Python interpreter from the set of unknowns and that is a huge win. (Unfortunately, distributing a Python interpreter comes with a host of other problems such as size bloat, security/patching concerns, poking the OS packaging bears, etc. But those problems are for another post.)

Existing Solutions

There are tons of existing tools for solving the Python application distribution problem.

The approach that tools like Shiv and PEX take is to leverage Python's built-in support for running zip files. Essentially, if there is a zip file containing a __main__.py file and you execute python file.zip (or have a zip file with a #!/usr/bin/env python shebang), Python can load modules in that zip file and execute an application within. Pretty cool!

This approach works great if your execution environment supports shebangs (Windows doesn't) and the Python interpreter is suitable. But if you need to support Windows or don't have control over the execution environment and can't guarantee the Python interpreter is good, this approach isn't suitable.

As stated above, we want to distribute the Python interpreter with our application to minimize variability. Let's talk about tools that do that.

XAR is a pretty cool offering from Facebook. XAR files are executables that contain SquashFS filesystems. Upon running the executable, SquashFS filesystems are created. For Python applications, the XAR contains a copy of the Python interpreter and all your Python modules. At run-time, these files are extracted to SquashFS filesystems and the Python interpreter is executed. If you squint hard enough, it is kind of like a pre-packaged, executable virtualenv which also contains the Python interpreter.

XARs are pretty cool (and aren't limited to Python). However, because XARs rely on SquashFS, they have a run-time requirement on the target machine. This is great if you only need to support Linux and macOS and your target machines support FUSE and SquashFS. But if you need to support Windows or a general user population without SquashFS support, XARs won't help you.

Zip files and XARs are great for enterprises that have tightly controlled environments. But for a general end-user population, we need something more robust against variance among target machines.

There are a handful of tools for packaging Python applications along with the Python interpreter in more resilient manners.

Nuitka converts Python source to C code then compiles and links that C code against libpython. You can perform a static link and compile everything down to a single executable. If you do the compiling properly, that executable should just work on pretty much every target machine. That's pretty cool and is exactly the kind of solution application distributors are looking for: you can't get much simpler than a self-contained executable! While I'd love to vouch for Nuitka and recommend using it, I haven't used it so can't. And I'll be honest, the prospect of compiling Python source to C code kind of terrifies me. That effectively makes Nuitka a new Python implementation and I'm not sure I can (yet) place the level of trust in Nuitka that I have for e.g. CPython and PyPy.

And that leads us to our final category of tools: freezing your code. There are a handful of tools like PyInstaller which automate the process of building your Python application (often via standard setup.py mechanisms), assembling all the requisite bits of the Python interpreter, and producing an artifact that can be distributed to end users. There are even tools that produce Windows installers, RPMs, DEBs, etc that you can sign and distribute.

These freezing tools are arguably the state of the art for Python application distribution to general user populations. On first glance it seems like all the needed tools are available here. But there are cracks below the surface.

Issues with Freezing

A common problem with freezing is it often relies on the Python interpreter used to build the frozen application. For example, when building a frozen application on Linux, it will bundle the system's Python interpreter with the frozen application. And that interpreter may link against libraries or libc symbol versions not available on all target machines. So, the build environment has to be just right in order for the binaries to run on as many target systems as possible. This isn't an insurmountable problem. But it adds overhead and complexity to application maintainers.

Another limitation is how these frozen applications handle importing Python modules.

Multiple tools take the approach of embedding an archive (usually a zip file) in the executable containing the Python standard library bits not part of libpython. This includes C extensions (compiled to .so or .pyd files) and Python source (.py) or bytecode (.pyc) files. There is typically a step - either at application start time or at module import time - where a file is extracted to the filesystem such that Python's filesystem-based importer can load it from there.

For example, PyInstaller extracts the standard library to a temporary directory at application start time (at least when running in single file mode). This can add significant overhead to the startup time of applications - more than enough to blow through people's ability to perceive something as instantaneous. This is acceptable for long-running applications. But for applications (like CLI tools or support tools for build systems), the overhead can be a non-starter. And, the mere fact that you are doing filesystem write I/O establishes a requirement that the application have write access to the filesystem and that write I/O can perform reasonably well lest application performance suffer. These can be difficult pills to swallow!

Another limitation is that these tools often assume the executable being produced is only a Python application. Sometimes Python is part of a larger application. It would be useful to produce a library that can easily be embedded within a larger application.

Improving the State of the Art

Existing Python application distribution mechanisms don't tick all the requirements boxes for me. We have tools that are suitable for internal distribution in well-defined enterprise environments. And we have tools that target general user populations, albeit with a burden on application maintainers and often come with a performance hit and/or limited flexibility.

I want something that allows me to produce a standalone, single file executable containing a Python interpreter, the Python standard library (or a subset of it), and all the custom code and resources my application needs. That executable should not require any additional library dependencies beyond what is already available on most target machines (e.g. libc). That executable should not require any special filesystem providers (e.g. FUSE/SquashFS) nor should it require filesystem write access nor perform filesystem write I/O at run-time. I should be able to embed a Python interpreter within a larger application, without the overhead of starting the Python interpreter if it isn't needed.

No existing solution ticks all of these boxes.

So I set out to build one.

One problem is producing a Python interpreter that is portable and fully-featured. You can't punt on this problem because if the core Python interpreter isn't produced in just the right way, your application will depend on libraries or symbol versions not available in all environments.

I've created the python-build-standalone project for automating the process of building Python interpreters suitable for use with standalone, distributable Python applications. The project produces (and has available for download) binary artifacts including a pre-compiled Python interpreter and object files used for compiling that interpreter. The Python interpreter is compiled with PGO/LTO using a modern Clang, helping to ensure that Python code runs as fast as it can. All of Python's dependencies are compiled from source with the modern toolchain and everything is aggressively statically linked to avoid external dependencies. The toolchain and pre-built distribution are available for downstream consumers to compile Python extensions with/against.

It's worth noting that use of a modern Clang toolchain is likely sufficiently different from what you use today. When producing manylinux wheels, it is recommended to use the pypa/manylinux Docker images. These Docker images are based on CentOS 5 (for maximum libc and other system library compatibility). While they do install a custom toolchain, Python and any extensions compiled in that environment are compiled with GCC 4.8.2 (as of this writing). That's a GCC from 2013. A lot has changed in compilers since 2013 and building Python and extensions with a compiler released in 2018 should result in various benefits (faster code, better warnings, etc).

If producing custom CPython builds for standalone distribution interests you, you should take a look at how I coerced CPython to statically link all extensions. Spoiler: it involves producing a custom-tailored Modules/Setup.local file that bypasses setup.py, along with some Makefile hacks. Because the build environment is deterministic and isolated in a container, we can get away with some ugly hacks.

A statically linked libpython from which you can produce a standalone binary embedding Python is only the first layer in the onion. The next layer is how to handle the Python standard library.

libpython only contains the code needed to run the core bits of the Python interpreter. If we attempt to run a statically linked python executable without the standard library in the filesystem, things fail pretty fast:

$ rm -rf lib
$ bin/python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: initfsencoding: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007fe9a3432740 (most recent call first):
Aborted (core dumped)

I'll spare you the details for the moment, but initializing the CPython interpreter (via Py_Initialize() requires that parts of the Python standard library be available). This means that in order to fulfill our dream of a single file executable, we will need custom code that teaches the embedded Python interpreter to load the standard library from within the binary... somehow.

As far as I know, efficient embedded standard library handling without run-time requirements does not exist in the current Python packaging/distribution ecosystem. So, I had to devise something new.

Enter PyOxidizer. PyOxidizer is a collection of Rust crates that facilitate building an embeddable Python library, which can easily be added to an executable. We need native code to interface with the Python C APIs in order to influence Python interpreter startup. It is 2018 and Rust is a better C/C++, so I chose Rust for this driver functionality instead of C. Plus, Rust's integrated build system makes it easier to automate the integration of the custom Python interpreter files into binaries.

The role of PyOxidizer is to take the pre-built Python interpreter files from python-build-standalone, combine those files with any other Python files needed to run an application, and marry them to a Rust crate. This Rust crate can trivially be turned into a self-contained executable containing a Python application. Or, it can be combined with another Rust project. Or it can be emitted as a library and integrated with a non-Rust application. There's a lot of flexibility by design.

The mechanism I used for embedding the Python standard library into a single file executable without incurring explicit filesystem access at run-time is (I believe) new, novel, and somewhat crazy. Let me explain how it works.

First, there are no .so/.pyd shared library compiled Python extensions to worry about. This is because all compiled extensions are statically linked into the Python interpreter. To the interpreter, they exist as built-in modules. Typically, a CPython build will have some modules like _abc, _io, and sys provided by built-in modules. Modules like _json exist as standalone shared libraries that are loaded on demand. python-build-standalone's modifications to CPython's build system converts all these would-be standalone shared libraries into built-in modules. (Because we distribute the object files that compose the eventual libpython, it is possible to filter out unwanted modules to cut down on binary size if you don't want to ship a fully-featured Python interpreter.) Because there are no standalone shared libraries providing Python modules, we don't have the problem of needing to load a shared library to load a module, which would undermine our goal of no filesystem access to import modules. And that's a good thing, too, because dlopen() requires a path: you can't load a shared library from a memory address. (Fun fact: there are hacks like dlopen_with_offset() that provide an API to load a library from memory, but they require a custom libc. Google uses this approach for their internal single-file Python application solution.)

From the python-build-standalone artifacts, PyOxidizer collects all files belonging to the Python standard library (notably .py and .pyc files). It also collects other source, bytecode, and resource files needed to run a custom application.

The relevant files are assembled and serialized into data structures which contain the names of the resources and their raw content. These data structures are made available to Rust as &'static [u8] variables (essentially a static void* if you don't speak Rust).

Using the rust-cpython crate, PyOxidizer defines a custom Python extension module implemented purely in Rust. When loaded, the module parses the data structures containing available Python resource names and data into HashMap<&str, &[u8]> instances. In other words, it builds a native mapping from resource name to a pointer to its raw data. The Rust-implemented module exports to Python an API for accessing that data. From the Python side, you do the equivalent of MODULES.get_code('foo') to request the bytecode for a named Python module. When called, the Rust code will perform the lookup and return a memoryview instance pointing to the raw data. (The use of &[u8] and memoryview means that embedded resource data is loaded from its static, read-only memory location instead of copied into a data structure managed by Python. This zero copy approach translates to less overhead for importing modules. Although, the memory needs to be paged in by the operating system. So on slow filesystems, reducing I/O and e.g. compressing module data might be a worthwhile optimization. This can be a future feature.)

Making data embedded within a binary available to a Python module is relatively easy. I'm definitely not the first person to come up with this idea. What is hard - and what I might be the first person to actually do - is how you make the Python module importing mechanism load all standard library modules via such a mechanism.

With a custom extension module built-in to the binary exposing module data, it should just be a matter of registering a custom sys.meta_path importer that knows how to load modules from that custom location. This problem turns out to be quite hard!

The initialization of a CPython interpreter is - as I've learned - a bit complex. A CPython interpreter must be initialized via Py_Initialize() before any Python code can run. That means in order to modify sys.meta_path, Py_Initialize() must finish.

A lot of activity occurs under the hood during initialization. Applications embedding Python have very little control over what happens during Py_Initialize(). You can change some superficial things like what filesystem paths to use to bootstrap sys.path and what encodings to use for stdio descriptors. But you can't really influence the core actions that are being performed. And there's no mechanism to directly influence sys.meta_path before an import is performed. (Perhaps there should be?)

During Py_Initialize(), the interpreter needs to configure the encodings for the filesystem and the stdio descriptors. Encodings are loaded from Python modules provided by the standard library. So, during the course of Py_Initialize(), the interpreter needs to import some modules originally backed by .py files. This creates a dilemma: if Py_Initialize() needs to import modules in the standard library, the standard library is backed by memory and isn't available to known importing mechanisms, and there's no opportunity to configure a custom sys.meta_path importer before Py_Initialize() runs, how do you teach the interpreter about your custom module importer and the location of the standard library modules needed by Py_Initialize()?

This is an extremely gnarly problem and it took me some hours and many false leads to come up with a solution.

My first attempt involved the esoteric frozen modules feature. (This work predated the use of a custom data structure and module containing modules data.) The Python interpreter has a const struct _frozen* PyImport_FrozenModules data structure defining an array of frozen modules. A frozen module is defined by its module name and precompiled bytecode data (roughly equivalent to .pyc file content). Partway through Py_Initialize(), the Python interpreter is able to import modules. And one of the built-in importers that is automatically registered knows how to load modules if they are in PyImport_FrozenModules!

I attempted to audit Python interpreter startup and find all modules that were imported during Py_Initialize(). I then defined a custom PyImport_FrozenModules containing these modules. In theory, the import of these modules during Py_Initialize() would be handled by the FrozenImporter and everything would just work: if I were able to get Py_Initialize() to complete, I'd be able to register a custom sys.meta_path importer immediately afterwards and we'd be set.

Things did not go as planned.

FrozenImporter doesn't fully conform to the PEP 451 requirements for setting specific attributes on modules. Without these attributes, the from . import aliases statement in encodings/__init__.py fails because the importer is unable to resolve the relative module name. Derp. One would think CPython's built-in importers would comply with PEP 451 and that all of Python's standard library could be imported as frozen modules. But this is not the case! I was able to hack around this particular failure by using an absolute import. But I hit another failure and did not want to excavate that rabbit hole. Once I realized that FrozenImporter was lacking mandated module attributes, I concluded that attempting to use frozen modules as a general import-from-memory mechanism was not viable. Furthermore, the C code backing FrozenImporter walks the PyImport_FrozenModules array and does a string compare on the module name to find matches. While I didn't benchmark, I was concerned that un-indexed scanning at import time would add considerable overhead when hundreds of modules were in play. (The C code backing BuiltinImporter uses the same approach and I do worry CPython's imports of built-in extension modules is causing measurable overhead.)

With frozen modules off the table, I needed to find another way to inject a custom module importer that was usable during Py_Initialize(). Because we control the source Python interpreter, modifications to the source code or even link-time modifications or run-time hacks like trampolines weren't off the table. But I really wanted things to work out of the box because I don't want to be in the business of maintaining patches to Python interpreters.

My foray into frozen modules enlightened me to the craziness that is the bootstrapping of Python's importing mechanism.

I remember hearing that the Python module importing mechanism used to be written in C and was rewritten in Python. And I knew that the importlib package defined interfaces allowing you to implement your own importers, which could be registered on sys.meta_path. But I didn't know how all of this worked at the interpreter level.

The internal initimport() C function is responsible for initializing the module importing mechanism. It does the equivalent of import _frozen_importlib, but using the PyImport_ImportFrozenModule() API. It then manipulates some symbols and calls _frozen_importlib.install() with references to the sys and imp built-in modules. Later (in initexternalimport()), a _frozen_importlib_external module is imported and has code within it executed.

I was initially very confused by this because - while there are references to _frozen_importlib and _frozen_importlib_external all over the CPython code base, I couldn't figure out where the code for those modules actually lived! Some sleuthing of the build directory eventually revealed that the files Lib/importlib/_bootstrap.py and Lib/importlib/_bootstrap_external.py were frozen to the module names _frozen_importlib and _frozen_importlib_external, respectively.

Essentially what is happening is the bulk of Python's import machinery is implemented in Python (rather than C). But there's a chicken-and-egg problem where you can't run just any Python code (including any import statement) until the interpreter is partially or fully initialized.

When building CPython, the Python source code for importlib._bootstrap and importlib._bootstrap_external are compiled to bytecode. This bytecode is emitted to .h files, where it is exposed as a static char *. This bytecode is eventually referenced by the default PyImport_FrozenModules array, allowing the modules to be imported via the frozen importer's C API, which bypasses the higher-level importing mechanism, allowing it to work before the full importing mechanism is initialized.

initimport() and initexternalimport() both call Python functions in the frozen modules. And we can clearly look at the source of the corresponding modules and see the Python code do things like register the default importers on sys.meta_path.

Whew, that was a long journey into the bowels of CPython's internals. How does all this help with single file Python executables?

Well, the predicament that led us down this rabbit hole was there was no way to register a custom module importer before Py_Initialize() completes and before an import is attempted during said Py_Initialize().

It took me a while, but I finally realized the frozen importlib._bootstrap_external module provided the window I needed! importlib._bootstrap_external/_frozen_importlib_external is always executed during Py_Initialize(). So if you can modify this module's code, you can run arbitrary code during Py_Initialize() and influence Python interpreter configuration. And since _frozen_importlib_external is a frozen module and the PyImport_FrozenModules array is writable and can be modified before Py_Initialize() is called, all one needs to do is replace the _frozen_importlib / _frozen_importlib_external bytecode in PyImport_FrozenModules and you can run arbitrary code during Python interpreter startup, before Py_Initialize() completes and before any standard library imports are performed!

My solution to this problem is to concatenate some custom Python code to importlib/_bootstrap_external.py. This custom code defines a sys.meta_path importer that knows how to use our Rust-backed built-in extension module to find and load module data. It redefines the _install() function so that this custom importer is registered on sys.meta_path when the function is called during Py_Initialize(). The new Python source is compiled to bytecode and the PyImport_FrozenModules array is modified at run-time to point to the modified _frozen_importlib_external implementation. When Py_Initialize() executes its first standard library import, module data is provided by the custom sys.meta_path importer, which grabs it from a Rust extension module, which reads it from a read-only data structure in the executable binary, which is converted to a Python memoryview instance and sent back to Python for processing.

There's a bit of magic happening behind the scenes to make all of this work. PyOxidizer attempts to hide as much of the gory details as possible. From the perspective of an application maintainer, you just need to define a minimal config file and it handles most of the low-level details. And there's even a higher-level Rust API for configuring the embedded Python interpreter, should you need it.

python-build-standalone and PyOxidizer are still in their infancy. They are very much alpha quality. I consider them technology previews more than usable software at this point. But I think enough is there to demonstrate the viability of using Rust as the build system and run-time glue to build and distribute standalone applications embedding Python.

Time will tell if my utopian vision of zero-copy, no explicit filesystem I/O for Python module imports will pan out. Others who have ventured into this space have warned me that lots of Python modules rely on __file__ to derive paths to other resources, which are later stat()d and open()d. __file__ for in-memory modules doesn't exactly make sense and can't be operated on like normal paths/files. I'm not sure what the inevitable struggles to support these modules will lead to. Maybe we'll have to extract things to temporary directories like other standalone Python applications. Maybe PyOxidizer will take off and people will start using the ResourceReader API, which is apparently the proper way to do these things these days. (Caveat: PyOxidizer doesn't yet implement this API but support is planned.) Time will tell. I'm not opposed to gross hacks or writing more code as needed.

Producing highly distributable and performant Python applications has been far too difficult for far too long. My primary goal for PyOxidizer is to lower these barriers. By leveraging Rust, I also hope to bring Python and Rust closer together. I want to enable applications and libraries to effortlessly harness the powers of both of these fantastic programming languages.

Again, PyOxidizer is still in its infancy. I anticipate a significant amount of hacking over the holidays and hope to share updates in the weeks ahead. Until then, please leave comments, watch the project on GitHub, file issues for bugs and feature requests, etc and we'll see where things lead.


« Previous Page