Why You Shouldn't Use Git LFS

May 12, 2021 at 10:30 AM | categories: Mercurial, Git

I have long held the opinion that you should avoid Git LFS if possible. Since people keeping asking me why, I figured I'd capture my thoughts in a blog post so I have something to refer them to.

Here are my reasons for not using Git LFS.

Git LFS is a Stop Gap Solution

Git LFS was developed outside the official Git project to fulfill a very real market need that Git didn't/doesn't handle large files very well.

I believe it is inevitable that Git will gain better support for handling of large files, as this seems like a critical feature for a popular version control tool.

If you make this long bet, LFS is only an interim solution and its value proposition disappears after Git has better native support for large files.

LFS as a stop gap solution would be tolerable except for the fact that...

Git LFS is a One Way Door

The adoption or removal of Git LFS in a repository is an irreversible decision that requires rewriting history and losing your original commit SHAs.

In some contexts, rewriting history is tolerable. In many others, it is an extremely expensive proposition. My experience maintaining version control in professional contexts aligns with the opinion that rewriting history is expensive and should only be considered a measure of last resort. Maybe if tools made it easier to rewrite history without the negative consequences (e.g. GitHub would redirect references to old SHA1 in URLs and API calls) I would change my opinion here. Until that day, the drawbacks of losing history are just too high to stomach for many.

The reason adoption or removal of LFS is irreversible is due to the way Git LFS works. What LFS does is change the blob content that a Git commit/tree references. Instead of the content itself, it stores a pointer to the content. At checkout and commit time, LFS blobs/records are treated specially via a mechanism in Git that allows content to be rewritten as it moves between Git's core storage and its materialized representation. (The same filtering mechanism is responsible for normalizing line endings in text files. Although that feature is built into the core Git product and doesn't work exactly the same way. But the principles are the same.)

Since the LFS pointer is part of the Merkle tree that a Git commit derives from, you can't add or remove LFS from a repo without rewriting existing Git commit SHAs.

I want to explicitly call out that even if a rewrite is acceptable in the short term, things may change in the future. If you adopt LFS today, you are committing to a) running an LFS server forever b) incurring a history rewrite in the future in order to remove LFS from your repo, or c) ceasing to provide an LFS server and locking out people from using older Git commits. I don't think any of these are great options: I would prefer if there were a way to offboard from LFS in the future with minimal disruption. This is theoretically possible, but it requires the Git core product to recognize LFS blobs/records natively. There's no guarantee this will happen. So adoption of Git LFS is a one way door that can't be easily reversed.

LFS is More Complexity

LFS is more complex for Git end users.

Git users have to install, configure, and sometimes know about the existence of Git LFS. Version control should just work. Large file handling should just work. End-users shouldn't have to care that large files are handled slightly differently from small files.

The usability of Git LFS is generally pretty good. However, there's an upper limit on that usability as long as LFS exists outside the core Git product. And LFS will likely never be integrated into the core Git product because the Git maintainers know that LFS is only a stop gap solution. They would rather solve large files storage correctly than ~forever carry the legacy baggage of having to support LFS in the core product.

LFS is more complexity for Git server operators as well. Instead of a self-contained Git repository and server to support, you now have to support a likely separate HTTP server to facilitate LFS access. This isn't the hardest thing in the world, especially since we're talking about key-value blob storage, which is arguably a solved problem. But it's another piece of infrastructure to support and secure and it increases the surface area of complexity instead of minimizing it. As a server operator, I would much prefer if the large file storage were integrated into the core Git product and I simply needed to provide some settings for it to just work.

Mercurial Does LFS Slightly Better

Since I'm a maintainer of the Mercurial version control tool, I thought I'd throw out how Mercurial handles large file storage better than Git. Mercurial's large file handling isn't great, but I believe it is strictly better with regards to the trade-offs of adopting large file storage.

In Mercurial, use of LFS is a dynamic feature that server/repo operators can choose to enable or disable whenever they want. When the Mercurial server sends file content to a client, presence of external/LFS storage is a flag set on that file revision. Essentially, the flag says the data you are receiving is an LFS record, not the file content itself and the client knows how to resolve that record into content.

Conceptually, this is little different from Git LFS records in terms of content resolution. However, the big difference is this flag is part of the repository interchange data, not the core repository data as it is with Git. Since this flag isn't part of the Merkle tree used to derive the commit SHA, adding, removing, or altering the content of the LFS records doesn't require rewriting commit SHAs. The tracked content SHA - the data now stored in LFS - is still tracked as part of the Merkle tree, so the integrity of the commit / repository can still be verified.

In Mercurial, the choice of whether to use LFS and what to use LFS for is made by the server operator and settings can change over time. For example, you could start with no use of LFS and then one day decide to use LFS for all file revisions larger than 10 MB. Then a year later you lower that to all revisions larger than 1 MB. Then a year after that Mercurial gains better native support for large files and you decide to stop using LFS altogether.

Also in Mercurial, it is possible for clients to push a large file inline as part of the push operation. When the server sees that large file, it can be like this is a large file: I'm going to add it to the blob store and advertise it as LFS. Because the large file record isn't part of the Merkle tree, you can have nice things like this.

I suspect it is only a matter of time before Git's wire protocol learns the ability to dynamically advertise remote servers for content retrieval and this feature will be leveraged for better large file handling. Until that day, I suppose we're stuck with having to rewrite history with LFS and/or funnel large blobs through Git natively, with all the pain that entails.

Conclusion

This post summarized reasons to avoid Git LFS. Are there justifiable scenarios for using LFS? Absolutely! If you insist on using Git and insist on tracking many large files in version control, you should definitely consider LFS. (Although, if you are a heavy user of large files in version control, I would consider Plastic SCM instead, as they seem to have the most mature solution for large files handling.)

The main point of this post is to highlight some drawbacks with using Git LFS because Git LFS is most definitely not a magic bullet. If you can stomach the short and long term effects of Git LFS adoption, by all means, use Git LFS. But please make an informed decision either way.


Pure Rust Implementation of Apple Code Signing

April 14, 2021 at 01:45 PM | categories: PyOxidizer, Rust

A few weeks ago I (foolishly?) set out to implement Apple code signing (what Apple's codesign tool does) in pure Rust.

I wanted to quickly announce on this blog the existence of the project and the news that as of a few minutes ago, the tugger-apple-codesign crate implementing the code signing functionality is now published on crates.io!

So, you can now sign Apple binaries and bundles on non-Apple hardware by doing something like this:

$ cargo install tugger-apple-codesign
$ rcodesign sign /path/to/input /path/to/output

Current features include:

  • Robust support for parsing embedded signatures and most related data structures. rcodesign extract can be used to extract various signature data in raw or human readable form.
  • Parse and verify RFC 5652 Cryptographic Message Syntax (CMS) signature data.
  • Sign binaries. If a code signing certificate key pair is provided, a CMS signature will be created. This includes support for Time-Stamp Protocol (TSP) / RFC 3161 tokens. If no key pair is provided, you get an ad-hoc signature.
  • Signing bundles. Nested bundles and binaries will automatically be signed. Non-code resources will be digested and a CodeResources XML file will be produced.

The most notable missing features are:

  • No support for obtaining signing keys from keychains. If you want to sign with a cryptographic key pair, you'll need to point the tool at a PEM encoded key pair and CA chain.
  • No support for parsing the Code Signing Requirements language. We can parse the binary encoding produced by csreq -b and convert it back to this DSL. But we don't parse the human friendly language.
  • No support for notarization.

All of these could likely be implemented. However, I am not actively working on any of these features. If you would like to contribute support, make noise in the GitHub issue tracker.

The Rust API, CLI, and documentation are still a bit rough around the edges. I haven't performed thorough QA on aspects of the functionality. However, the tool is able to produce signed binaries that Apple's canonical codesign tool says are well-formed. So I'm reasonably confident some of the functionality works as intended. If you find bugs or missing features, please report them on GitHub. Or even better: submit pull requests!

As part of this project, I also created and published the cryptographic-message-syntax crate, which is a pure Rust partial implementation of RFC 5652, which defines the cryptographic message signing mechanism. This RFC is a bit dated and seems to have been superseded by RPKI. So you may want to look elsewhere before inventing new signing mechanisms that use this format.

Finally, it appears the Windows code signing mechanism (Authenticode) also uses RFC 5652 (or a variant thereof) for cryptographic signatures. So by implementing Apple code signatures, I believe I've done most of the legwork to implement Windows/PE signing! I'll probably implement Windows signing in a new crate whenever I hook up automatic code signing to PyOxidizer, which was the impetus for this work (I want to make it possible to build distributable Apple programs without Apple hardware, using as many open source Rust components as possible).


Rust is for Professionals

April 13, 2021 at 08:20 AM | categories: Programming, Rust

A professional programmer delivers value through the authoring and maintaining of software that solves problems. (There are other important ways for professional programmers to deliver value but this post is about programming.)

Programmers rely on various tools to author software. Arguably the most important and consequential choice of tool is the programming language.

In this post, I will articulate why I believe Rust is a highly compelling choice of a programming language for software professionals. I will state my case that Rust disposes software to a lower defect rate, reduces total development and deployment costs, and is exceptionally satisfying to use. In short, I hope to convince you to learn and deploy Rust.

My Background and Disclaimers

Before I go too far, I'm targeting this post towards professional programmers - people who program (or support programming through roles like management) as their primary line of work or who spend sufficient time programming outside of work. I consider myself a professional programmer both because I am a full-time engineer in the software industry and because I contribute to some significant open source projects outside of my day job.

The statement Rust is for Professionals does not imply any logical variant thereof. e.g. I am not implying Rust is not for non-professionals. Rather, the subject/thesis merely defines the audience I want to speak to: people who spend a lot of time authoring, maintaining, and supporting software and are invested in its longer-term outcomes.

I think opinion pieces about programming languages benefit from knowing the author's experience with programming. I first started hacking on code in the late 1990's. I've been a full-time software developer since 2007 after graduating with a degree in Computer Engineering (after an aborted attempt at Biomedical Engineering - hence my affinities for hardware and biological sciences). I've programmed in the following languages: C, C++ (only until C++11), C#, Erlang, Go, JavaScript, Java, Lua, Perl, PHP, Python, Ruby, Rust, shell, SQL, and Verilog. Notably missing from this list is a Lisp and a Haskell/Scala type language. Of these languages, I've spent the most time with C, C#, JavaScript, Perl, PHP, Python, and Rust.

I'm not that strong in computer science or language theory: many colleagues can talk circles around me when it comes to describing computer science and programming language concepts like algorithms, type theory, and common terms used to describe languages. (I have failed many technical interviews because of my limitations here.) In contrast, I perceive my technical strengths as applying an engineering rigor and practicality to problem solving. I care vastly more about how/why things work the way they do and the practical consequences of decisions/choices we make when it comes to software. I find that I tend to think about 2nd and 3rd order effects and broader or longer-term consequences more often than others. Some would call this systems engineering.

I've programmed all kinds of different software. Backend web services, desktop applications, web sites, Firefox browser internals, the Mercurial version control tool, build systems, system/machine management. Notably missing are mobile programming (e.g. iOS/Android) and serious embedded systems (I've hacked around with Raspberry Pis and Arduinos, but those seem very friendly compared to other embedded devices). My strongest affinity is probably towards systems software and general purpose tools: I enjoy building software that other people use to build things. Infrastructure if you will.

Finally, I am expressing my personal opinion in this post. I do not speak for any employer, present or former. While I would love to see more Rust at my current employer, this post is not an attempt to influence what happens behind my employer's walls: there a better ways to conduct successful nemawashi / 根回し than a public blog post. I am not affiliated with the Rust Project in any capacity beyond a very infrequent code contributor and issue filer: I view myself as a normal Rust user. I did work at Mozilla - the company who bankrolled most of Rust's initial development. I even briefly worked in the same small Vancouver office as Graydon Hoare, Rust's primary credited inventor! While I was keen for Rust to succeed because it was affiliated with my then employer, I was most definitely not a Rust evangelist or fan boy while at Mozilla. I have little to personally benefit from this post: I'm writing it because I enjoy writing and I believe the message is important.

With that out of the way, let's talk about Rust!

Rust Makes Me Irrationally Giddy

When I look back at my professional self when I was in my 20s, I feel like I was young and dumb and overly exuberant about computers, technology, new software, and the like. An older, more grizzled professional, I now accept the reality that it is a miracle computers and software work as well as they do as often as they do. Point at any common task on a computer and an iceberg of complexity and nuance lingers under the surface. Our industry is abound in the repetition of proven sub-optimal ideas. You see practices cargo culted across the decades (like the 80 character terminal/line width and null-terminated strings, which can both be traced back to Hollerith punchcards from the late 19th century). You witness cycles of pendulum swings, the same fads and trends, just with different labels (microservices are the new SOA, YAML is the new XML, etc). I can definitely relate to people in this industry who want to drop everything and move to a farm or something (but I grew up in Indiana and had cows living down the street, so I know this lifestyle isn't for me).

Rust is the first programming language I've encountered in years that makes me excited. And not just normal excited: irrationally excited. Like the kind of excitement you have for something when you are naive about its limitations and don't know any better (like many blockchain/cryptocurrency advocates). I feel like the discovery of Rust is transporting me back to my younger self, before I discovered the ugly realities of how computers and software work, and is giving me hope that better tools, better ways of building software could actually exist. To channel my inner Marie Kondo: Rust sparks joy.

When I started learning Rust in earnest in 2018, I thought this was a fluke. It is just the butterflies you get when you think you fall in love, I told myself. Give it time: your irrational excitement will fade. But after using Rust for ~2.5 years now, my positive feelings about it have only grown stronger. There's a reason Rust has claimed the top spot in Stack Overflow's most loved languages survey for 5 years and running. And not by the skin of its teeth: Rust is blowing the competition out of the water. 19% over TypeScript and Python. 23% over Kotlin and Go. If this were a Forrester report for a company-offered product, Rust would be the clear market leader and marketers and salespeople would be using this result to sign up new customers in droves and print money hand over fist.

Let me tell you why Rust excites me.

Rust is Different (In a Good Way)

After you've learned enough programming languages, you start to see common patterns. Manual versus garbage collected memory management. Control flow primitives like if, else, do, while, for, unless. Nullable types. Variable declaration syntax. The list goes on.

To me, Rust introduced a number of new concepts, like match for control flow, enums as algebraic types, the borrow checker, the Option and Result types/enums and more. There were also behaviors of Rust that were different from languages I knew: variables are immutable by default, Result types must be checked they aren't an error to avoid a compiler warning, refusing to compile if there are detectable memory access issues, and tons more.

Many of the new concepts weren't novel to Rust. But considering I've had exposure to many popular programming languages, the fact many were new to me means these aren't common features in mainstream languages. Learning Rust felt like fresh air to me: here was a language designed to be general purpose and make inroads into industry adoption while also willing to buck many of the trends of conventional language design from the last several decades.

When going against conventional practice, it is very easy to unintentionally alienate yourself from potential users. Design a programming language too unlike anything in common use and you are going to have a difficult time attracting users. This is a problem with many academic/opinionated programming languages (or so I hear). Rust does venture away from the tried and popular. And that does contribute to a steeper learning curve. However, there is enough familiarity in Rust's core language to give you a foothold when learning Rust. (And Rust's official learning resources are terrific.)

I feel like Rust's language designers set out to take a first principles approach to the language using modern ideas and ignoring old, disproven ones, realized they needed to ground the language in familiarity to achieve market penetration, and produced reasonable compromises to yield something that was new and novel but familiar enough to not completely alienate its large potential user base.

If you don't like being exposed to new ideas and ways of working, Rust's approach is probably a negative to you. But if you are like me and enjoy continuously expanding your knowledge and testing new ideas, Rust's novelty and willingness to be different is a much welcomed attribute.

Rust: Toolbox Included

It used to be that programming languages were just compilers or interpreters. In recent years, we've seen more and more programming languages bundled with other tools, such as build/packaging tools, code formatters, linters, documentation generators, language servers, centralized package repositories, and more.

I'm not sure what spurred this trend (maybe it was Go?), but I think it is a good move. Programming languages are ecosystems and the compiler/interpreter is just one part of a complex system. If you care about end-user experience and adoption (especially if you are a new language), you want an as turnkey on-boarding experience as possible. I think that's easier to pull off when you offer a cohesive, multi-tool strategy to attract and retain users.

We refer to programming languages with a comprehensive standard library as batteries included. I'm going to refer to programming languages with additional included tools beyond the compiler/interpreter as toolbox included.

Rust, is very much a toolbox included language. (Unless you are installing it via your Linux distribution: in that case Linux packagers have likely unbundled all the tools into separate packages, making the experience a bit more end-user hostile, as Linux packagers tend to do for reasons that merit their own blog post. If you want to experience Rust the way its maintainers intended - the Director's Cut if you will - install Rust via rustup.)

In addition to the Rust compiler (rustc) and the Rust standard library, the following components are all officially developed and offered as part of the Rust programming language on GitHub:

  • Cargo - Rust's package manager and build system.
  • Clippy - A Rust linter.
  • rustdoc - Documentation generator for Rust projects.
  • rustfmt - A Rust code formatter.
  • rls - A Rust Language Server Protocol implementation.
  • crates.io - Rust's official, public package registry.
  • rustup - Previously mentioned Rust installer.
  • vscode-rust - Visual Studio Code extension adding support for Rust. (JetBrains has their own high quality extension for their IDEs, which they develop themselves.)
  • The Rust Programming Language Book
  • And many more.

As an end-user, having all these tools and resources at my fingertips, maintained by the official Rust project is an absolute joy.

For the local tools, rustup ensures they are upgraded as a group, so I don't have to worry about managing them. I periodically run rustup update to ensure my Rust toolbox is up-to-date and that's all I have to do.

Contrast with say Node.js, Python, and Ruby, where the package manager is on a separate release cadence from the core language and I have to think about managing multiple tools. (Rust will likely have to cross this bridge once there are multiple implementations of Rust or multiple popular package managers. But until then, things are very simple.)

Further contrast with languages like JavaScript/Node.js, Python, and Ruby, where tools like a code formatter, linter, and documentation generator aren't always developed under the core project umbrella. As an end-user, you have to know to seek out these additional value-add tools. Furthermore, you have to know which ones to use and how to configure them. The fragmentation also tends to yield varying levels of quality and end-user experience, to the detriment of end-users. The Rust toolbox, by contrast, feels simple and polished.

Rust's toolbox included approach enables me to follow unified practices (arguably best practices) while expending minimal effort. As a result, the following tend to be very similar across nearly every Rust project you'll run into:

  • Code formatting. (Nearly everyone uses rustfmt.)
  • Adherence to common coding and style conventions. (Nearly everyone uses clippy.)
  • Project documentation. (Nearly everyone uses rustdoc.)

Cargo could warrant its own dedicated section. But I'll briefly touch on it here.

Cargo is Rust's official package manager and build system. With cargo, you can:

  • Create new Rust projects with a common project layout.
  • Build projects.
  • Run project tests.
  • Update project dependencies.
  • Generate project documentation (via rustdoc).
  • Install other Rust projects from source.
  • Publish packages to Rust package registries.

As a build system, Cargo is generally a breeze to work with. Configuration files are TOML. Adding dependencies is often a 1 line addition to a Cargo.toml file. Dependencies often just work on the first try. It's not like say C/C++, where taking on a new dependency can easily consume a day or two to get it integrated in your build system and compatible with your source code base. I can't emphasize enough how much joy it brings to be able to leverage an it just works build tool for systems-level programming: I'm finding myself doing things in Rust like parsing ELF, PE, and Mach-O binaries because it is so easy to integrate low-level functionality like this into any Rust program. Cargo is boring. And when it comes to build systems, that's a massive compliment!

No other language I've used has as comprehensive and powerful of a toolbox as Rust does. This toolbox is highly leveraged by the Rust community, resulting is remarkable consistency across projects. This consistency makes it easier to understand, use, and contribute back to other Rust projects. Contrast this with say C/C++, where large code bases often employ multiple tools in the same space on different parts of the same code base, leading to cognitive dissonance and overhead.

As a professional programmer, Rust's powerful and friendly toolbox enables me to build Rust software more easily than with other languages. I spend less time wrangling tools and more time coding. That translates to less overhead delivering value through software. Other languages would be wise to emulate aspects of Rust's model.

Rust is Humane

Of all the programming languages I've used, Rust seems to empathize with its users the most.

There's a few facets to this.

A lot of care seems to have gone into the end-user experience of the Rust toolbox.

The Rust compiler often gives extremely actionable error and warning messages. If something is wrong, it tells me why it is wrong, often pointing out exactly where in source code the problem resides, drawing carets to the source code where things went wrong. In many cases, the compiler will emit a suggested fix, which I can incorporate automatically by pressing a few keys in my IDE. Contrast this with C/C++ and even Go, which tend to have either too-terse-to-be-actionable or too-verbose-to-make-sense-of feedback. By comparison, output from other compilers often comes across as condescending, as if they are saying git gud, idiot. Rust's compiler output tends to come across as I'm sorry you had a problem: how can I help? I feel like the compiler actually cares about my [valuable] time and satisfaction. It wants to keep me in flow.

Then there's Clippy, a Rust linter maintained as part of the Rust project.

One thing I love about Clippy is - like the compiler - many of the lints contain suggestions, which I can incorporate automatically through my IDE. So many other linters just tell you what is wrong and don't seem to go the extra mile to be respectful of my time by offering to fix it for me.

Another aspect of Clippy I love is it is like having an invisible Rust mentor continuously providing constructive feedback to help me level-up my Rust. I don't know how many times I've written Rust code similarly to how I would write code in other languages and Clippy suggests a more Rustic solution. Most of the time I'm like oh, I didn't know about that: that's a much better pattern/solution than what I wrote!

Do I agree with Clippy all the time? Nope. But I do find its signal to noise ratio is exceptionally high compared to other linters I've used. And Clippy is trivial to configure and override, so disagreements are easy to manage. Like the Rust compiler, I feel that Clippy is respectful of my time and has the long term maintainability and correctness of my software at heart.

Then there's the Rust Community - the people behind the core Rust projects. The Rust Community is one of the most professional and welcoming I've seen. Their Code of Conduct is sufficiently comprehensive and actionable. They have their vigorous debates like any other community. But the conversation is civil. Bad apples are discarded when they crop up.

At a talk I made about PyOxidizer at a Rust meetup a few years back, I made a comment in passing about a negative comment I encountered on a Rust sub-Reddit. After the talk, a moderator of that sub who was in the audience (unbeknownst to me) approached for more information so they could investigate, which they did.

I once tweeted about a somewhat confusing, not-very-actionable compiler error I encountered. A few minutes later, some compiler developers were conversing in replies. A few hours later, a pull request was created and a much better error message was merged in short order. I'm not a special one-off here either: I've stumbled across Stack Overflow questions and other forums where Rust core developers see that someone is encountering a confusing issue, question the process that got them to that point, and then make refinements to minimize it from happening in the future. The practice is very similar to what empathetic product managers and user experience designers do.

Not many other communities (or companies for that matter) seem to demonstrate such a high level of compassion and empathy for their users. To be honest, I'm not sure how Rust manages to pull it off, as this tends to be very expensive in terms of people time and it can be very easy to not prioritize. One thing is for certain: the Rust Community is loaded with empathetic people who care about the well-being of users of their products. And it shows from the interaction in forums to the software tools they produce. To everyone who has contributed in the Rust Community: thank you for all that you have done and for setting an example for the rest of us to live up to.

Rust is Surprisingly High Level

One of the reasons I avoided learning Rust for years is that I perceived it was too low level and therefore tedious. Rust was being advertised as a systems programming language and you would hear stories of fighting the borrow checker. I assumed I'd need to be thinking a lot about memory and ownership. I assumed the cost to author and maintain Rust code would be high. I thought Rust would be a safer C/C++, with many of the software development lifecycle caveats that apply. And for the software I was writing at a time, the value proposition of Rust seemed weak. I thought a combination of C and say Python was good enough. When I started writing PyOxidizer, I initially thought only the run-time code calling into the Python interpreter C APIs would be written in Rust and the rest would be Python.

How wrong I was!

When I actually started coding Rust, I was shocked at how high-level it felt. Now, depending on the space of your software, Rust code can be very low-level and tedious (not unlike C/C++). However, for the vast majority of code I author, Rust feels more like Python than C. And even the lower-level code feels much higher level than C or even C++.

In my mind, the expressiveness of Rust comes very close to higher-level, dynamic languages (like JavaScript, Python, and Ruby) while maintaining the raw speed of C/C++ all without sacrificing low-level control for cases when you need it. And it does all of this while maintaining strong safety guarantees (unlike say Go, which has the billion dollar mistake: null references).

I had a mental Venn diagram of the properties of programming languages (gc versus non-gc, static versus dynamic typing, compiled versus interpreted, etc) and which traits (like execution speed, development time, etc) would be possible and Rust invalidated large parts of that model!

You often don't need to think about memory management in Rust: once you understand the rules the borrow checker enforces, memory is largely something that exists but is managed for you by the language, just like in garbage collected languages. Of course there are scenarios where you should absolutely be thinking about memory and should have a grasp on what Rust is doing under the hood. But in my experience, most code can be blissfully ignorant of what is actually happening at the memory level. (However, awareness of value ownership when programming Rust does add overhead, so it's not like the cognitive load required for reasoning about memory disappears completely.)

Rust has both a stack and a heap. But when programming you often don't need to distinguish these locations. You can do things in Rust like return a reference to a stack allocated value and pass this reference around to other functions. This would be a CVE factory in C/C++. But because of Rust's borrow checker, this is safe (and a common practice) in Rust. It also predisposes the code towards better performance! Often in C/C++ you will allocate on the heap because you need to return a reference to memory and returning a reference to a stack allocated value is extremely dangerous. This heap allocation incurs run-time overhead. So Rust allowing you to do the fast thing safely is a nice mini win.

In many statically typed languages, I feel like my programming speed is substantially reduced by having to repeatedly spell out or think about type names. In C, it feels like I'm always writing type names so I can perform casting. Newer versions of C++ and Java have improved matters significantly (e.g. the auto keyword). However, I haven't programmed them enough recently to know how they compare to Rust on this front. All I know is that I'm writing type names a lot less frequently in Rust than I thought I would be and that my programming output isn't limited by my typing speed as much as it historically was in C/C++.

Despite being compiled down to assembly and exposing extremely low-level control, Rust often feels like a higher-level language. Equivalent functionality in Rust is often more concise and/or readable than in C/C++, while performing similarly, all while having substantially stronger safety guarantees. As a professional programmer, the value proposition is blinding: Rust enables me to do more with less, achieve a lower defect rate, and not sacrifice on performance.

Correctness, Quality, Execution Speed, and Development Velocity: Pick 4

The operation of computers and operating systems is exceptionally complex.

All programming languages justifiably attempt to abstract away aspects of this complexity to make it easier to deliver value through software. For example:

  • Assembly is hard: here's a higher level language that compiles down to assembly or is implemented in a language that does.
  • Managing memory manually is hard: use garbage collection.
  • Concurrency is hard: only allow 1 thread to run at a time (JavaScript, Python, etc).
  • Text encoding is hard: strings are Unicode/UTF-8.
  • Operating systems have different interfaces: here's a pile of abstractions in the standard library for things like I/O, networking, filesystem paths, etc.
  • Strong, static typing isn't very flexible and can impose high change costs: use dynamic typing.
  • And tons more.

These abstractions often have undesirable consequences/trade-offs:

  • Garbage collection adds run-time overhead (10% is a number that's commonly cited).
  • Garbage collection adds random slowdowns/pauses, making it difficult to achieve consistency in long-tail latency optimization (i.e. ensuring consistency in P99.9, P99.99, and beyond percentiles).
  • Interpreted languages tend to be slower than compiled languages unless you invest lots of time into a JIT.
  • Limiting execution to a single thread limits the ability to harness the full power of modern CPUs, which tend to have several cores.
  • Primitives like environment variables, process arguments, and filenames aren't guaranteed to be UTF-8 and coercing them to UTF-8 can be lossy.
  • Dynamic typing doesn't catch as many bugs at compile time and you have to be more diligent about guarding against invariants.
  • And tons more.

In other words, there are trade-offs with nearly every decision in programming language and [standard] library design. There are usually no obviously correct and undesirable consequence-free decisions.

And we further have to consider the fallibility of people and the inevitability that mistakes will be made, that bugs and regressions will be introduced and will need addressing. As an industry, we generally accept that mistakes occur and bugs are an unavoidable aspect of software development. If new features and enhancements are value, bugs and defects are anti-value. Like financial debt, existence of bugs and sub-optimal code can be tolerated to varying extents. But this is a highly nuanced topic and different people, companies, and projects will have different perspectives on it. We can all agree that bugs are an inevitable fact of software.

We also need to confront the reality that as an industry we have very little empirical data that says much of significance about topics like static versus dynamic typing. Although we do know some things. As Alex Gaynor informs us in What science can tell us about C and C++'s security, the result of ~2/3 of security vulnerabilities being caused by memory unsafety seems to reproduce against a sufficiently diverse set of projects and companies. That result and the implications of it are worth paying attention to!

With that being said, let's dive into my take on the matter.

Of all the programming languages I've used, I feel that Rust has the strongest disposition towards authoring and maintaining correct, high-quality software. It does this by offering a myriad of features that are designed to prevent (or at least minimize) defects. In addition, I believe Rust shifts the detection of defects to earlier in the software development lifecycle, greatly reducing the cost to mitigate defects and therefore develop software.

(As an aside, every time the topic of Rust's safety and correctness comes up, random people on the Internet rush to their keyboards to say things along the lines of C/C++ and other languages can be made to be just as safe as Rust: it's the bad programmers who are using C/C++ wrong. To those people: please stop. Your belief implies the infallibility of people and machines and that mistakes won't be made. If things like memory unsafety bugs in C/C++ could be prevented, industry titans like Apple, Google, and Microsoft would have found a way. These companies are likely taking many more measures to prevent security vulnerabilities than you are and yet the ~2/3 of security vulnerabilities being caused by memory unsafety (read: humans and machines failing to reason about run-time behavior) result still occurs. To the wiser among us, I urge you to call out perpetrators of this good programmers don't create bugs myth when you see it, just like you would/should if you encounter racist, sexist, or other non-inclusive behaviors. The reason is that belief in this myth can lead to physical or emotional harm, just like non-inclusive -isms. Security bugs, for example, can lead to disclosure of private or sensitive data, which can result in real world harm. Think a stalker or abusive former partner learning where you now live. Or a memory unsafety error in a medical device leading to device malfunction, injuring or killing a patient. Because this is a sensitive topic, I want to be clear that I'm not trying to compare the relative harms incurred by racism, sexism, other -isms, or the mythical perfect programmer. Rather, all I'm saying is each of these surpass the minimum threshold of harm incurred that justifies calling out and stopping the harmful behavior. I believe that as professionals we have an ethical and professional obligation to actively squash the mythical perfect programmer fallacy when we encounter it. Debates on the merits and limits of tools to prevent/find defects is fine: belief in the perfect programmer is not. Sorry for the mini rant: I just get upset by people who think software exists in a vacuum and doesn't have real-world implications for people's safety.)

In the sections below, I'll outline some of Rust's features and behaviors that support my assertion that Rust is biased towards correct and higher quality outcomes and lowers total development cost.

The Borrow Checker

To the uninitiated, the borrow checker is perhaps Rust's most novel contribution to programming. It is a compile time mechanism that enforces various rules about how Rust code must behave. Think of these as laws that Rust code must obey. But these are more like societal laws, not scientific laws (which are irrefutable), as Rust's laws can be broken, often leading to negative consequences, just like societal laws.

Rust's ownership rules are as follows:

  • Each value in Rust has a variable that's called its owner.
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value will be dropped / released.

Then there are rules about references (think intelligent pointers) to owned values:

  • At any given time, you can have either one mutable reference or any number of immutable references.
  • References must always be valid.

Put together, these rules say:

  • There is only a single canonical owner of any given value at any given time. The owner automatically releases/frees the value when it is no longer needed (just like a garbage collected language does when the reference count goes to 0).
  • If there are references to an owned value, that reference must be valid (the owned value hasn't been dropped/released) and you can only have either multiple readers or a single writer (not e.g. a reader and a writer).

The implications of these rules on the behavior of Rust code are significant:

  • Use after free isn't something you have to worry about because references can't point to dropped/released values.
  • Buffer underruns, overflows, and other illegal memory access can't exist because references must be valid and point to an owned value / memory range.
  • Memory level data races are prevented because the single writer or multiple readers rule prevents concurrent reading and writing. (An assertion here is any guards - like locks and mutexes - have appropriate barriers/fences in place to ensure correct behavior in multi-threaded contexts. The ones in the standard library should.)

I used to think that these rules limited the behavior of Rust code. That statement is true. However, as I've thought about it more, I've refined my take to be that ownership and reference rules reinforce properties that well-behaved software exhibits.

If a C/C++ program had illegal memory access, you would say it is buggy and the behavior is not correct. If a Java program attempted to mutate a value on thread A without a lock or other synchronization primitive and thread B raced to read it, leading to data inconsistency, you would also call that a bug and incorrect behavior. If a JavaScript/Python/Ruby function were changed such that it started mutating a value that should be constant, you would call that a bug and incorrect behavior.

While Rust's ownership and reference rules do limit what software can do, the functionality they are limiting is often unsafe or buggy, so losing this functionality is often desirable from a quality and correctness standpoint. Put another way, Rust's borrow checker eliminates entire classes of [common] bugs by preventing patterns that lead to incorrect, buggy behavior.

This. Is. Huge.

Rust's borrow checker catches bugs for which other languages have no automated mechanism or no low cost, low latency mechanism for detecting. There are ways to achieve aspects of what the borrow checker does in other languages. But they tend to require contorting your coding style to accomplish and/or employing high cost tools (often running asynchronously to the compiler) such as {address, memory, thread} sanitizers or fuzzing. With Rust, you get this bug detection built into the language and compiler: no additional tools needed. (I'm not saying you shouldn't run additional tools like sanitizers or fuzz testing against Rust: just that you get a significant benefit of these tools for a drastically lower cost since they are built in to the core language.)

Rust's ownership and reference rules help ensure your software is more well-behaved and bug-free. But, sometimes those rules are too strict. Fortunately, Rust isn't dogmatic about enforcing them. There are legitimate cases where you can't work in the confines of these rules.

Say you want to share a cache between multiple threads. Caches need to be both readable and writable by multiple threads. This violates the reference rules and maybe the single owner ownership rule, depending on how things are implemented. Fortunately, there are primitives in the std::sync module like RwLock and Arc (atomically reference counted) you can use here. Arc (and its non-threadsafe Rc counterpart) give you reference counting, just like a garbage collected language. Primitives like RwLock allow you to wrap an inner value and temporarily acquire an appropriate reference to it, mutable or non-mutable. There's a bit of slight of hand here, but the tricks employed enable you to satisfy the ownership and reference rules and use common programming techniques and patterns while still having the safety and correctness protections the borrow checker enforces.

Data Races: What Data Races?

Multi-threaded and concurrent programming is hard. Really hard. Like it is exceptionally easy to introduce hard-to-diagnose-and-debug bugs hard.

There are many reasons for this. We can all probably relate to the fact that reasoning about multi-threaded code is just hard: instead of 1 call stack to reason about there are N. Further complicating matters are that many of us don't have a firm grasp on how memory works at a very low level. Do you know all the ins and outs on how CPU caches work on the architecture you are targeting? Me neither! (But this is a very good place to start excavating a rabbit hole.)

If you are like me, you've spent many years of your professional career not having to care about multi-threading or concurrent programming because you spend so much time in languages with single threads, are only implementing code that runs in single threaded contexts, or you've recognized the reality that implementing this code safely and correctly is hard and you've intentionally avoided the space or chosen software architectures (like queue-based message passing) to minimize risks. Or maybe if you are say a Java programmer you sprinkle synchronized everywhere out of precaution or in response to race conditions / bugs once they are found. (Everyone's personal experience is different, of course.)

Long story short, the aforementioned ownership and reference rules enforced by the borrow checker eliminate data races. This was a major oh wow moment for me when I learned Rust: I had heard about memory safety but didn't realize the same forces behind it were also responsible for making concurrency safe!

This property is referred to as fearless concurrency. I encourage you to read Aaron Turon's Fearless Concurrency as well as the Fearless Concurrency chapter in the Rust Book as well.

Operating Systems Abstractions Ground in Reality

Rust is the only programming language I've used that attempts to expose operating system primitives like environment variables, command arguments, and filesystem paths and doesn't completely mess it up. Truth be told, this is kind of a niche topic. But as I help maintain a version control tool which needs to care about preserving content identically across systems, this topic is near and dear to my heart.

In POSIX land, primitives like environment variables, command arguments, and filesystem paths are char*, or a bag of null-terminated bytes.

On Windows, these primitives are wchar_t*, or wide bytes.

On both POSIX and Windows, the encoding of the raw bytes can be... complicated.

Nearly every programming language / standard library in existence attempts to normalize these values to its native string type, which is typically Unicode or UTF-8. That's doable and correct a lot of the time. Until it isn't.

Rust, by contrast, has standard library APIs like std::env::vars() that will coerce operating system values to Rust's UTF-8 backed String type. But Rust also exposes the OsString type, which represents operating system native strings. And there are function variants like std::env::vars_os() to access the raw values instead of the UTF-8 normalized ones.

Rust paths internally are stored as OsString, as that as the value passed to the C API to perform filesystem I/O. However, you can coerce paths to String easily enough or define paths in terms of String without jumping through hoops.

The point I'm trying to get across is that Rust's abstractions are ground in the reality of how computers work. Given the choice, Rust will rarely sacrifice the ability to do something correctly. In cases like operating system interop, Rust gives you the choice of convenience or correctness, rather than forcing inconvenience or incorrectness on you, like nearly every other language.

Encoding and Enforcing Invariants in the Type System

Rust enums are algebraic data types. Rust enum variants can have values associated with them and Rust enums, like structs (Rust's main way to define a type), can have functions/methods hung off of them. Rust enums are effectively fully-featured, specialized types, where value instances must be a certain variant of that enum. This makes Rust enums much more powerful than in other languages where enums simply map to integer values and/or can't have associated functions. This power unlocks a lot of possibility and harnessed the right way can drastically improve correctness of code and lead to fewer defects.

Programming inevitably needs to deal with invariants, the various possibilities that can occur. Programmers will reach for control flow operators to handle these: if x do this, else if y do that, switch statements, and the like. Handling every possible invariant can be complex, especially as software evolves over time and the ground beneath you is constantly shifting.

As you become more familiar with Rust, you'll find yourself encoding and enforcing invariants in the type system more and more. And enums are likely the main way you accomplish this.

Let's start with a contrived example. In C/C++, if you had a function that accepted either an Apple or an Orange value, you might do something like: void eat(Apple* apple, Orange* orange). Then you'd have inline logic like if apple != null. In a dynamically typed language, you could pass a single argument, but you'd perform inline type comparison. e.g. with Python you'd write if isinstance(fruit, Apple).

With Rust, you'd declare and use an enum. e.g.

struct Apple {}
struct Orange {}

enum Fruit {
    Apple(Apple),
    Orange(Orange),
}

impl Fruit {
    fn eat(&self) {
        match self {
            Self::Apple(apple) => { ... },
            Self::Orange(orange) => { ... },
        }
    }
}

let apple = Fruit::Apple(Apple { });
apple.eat();

This (again contrived) example shows how we Rust enum variants can hold inner values, how we can define methods on Rust enums (so they behave like regular types), and introduces the match control flow operator.

Quickly, match is a super powerful operator. It will compare its argument against provided patterns and evaluate the arm that matches. Patterns must be comprehensive or the compiler will error. In the case of enums, if you add a variant - say Banana for our Fruit example - and fail to add that variant to existing match expressions, you will get compiler errors!

As you become more proficient with Rust, you'll find yourself moving lots of (often redundant) control flow expressions and conditional dispatch (if X do this, if Y do that) into enum variants and encoding the dispatched actions into that enum/type directly. Conceptually, this is logically little different from having a base type or interface or by having a single wrapper class holds various possible values. But the guarantees are stronger because each distinct possibility is strongly defined as an enum variant. And when combined with the match control flow operator, you can have the Rust compiler verify that all variants are accounted for every time you take conditional action based on the variant.

The 2 most common enums in Rust are Option and Result. The following sections will explain how they work and further demonstrate how invariants can be encoded and enforced in Rust's type system.

Option: A Better Way to Handle Nullability

Many programming languages have the concept of nullable types: the ability for a value to be null or some null-like value. You will often find this expressed in languages as null, nil, None, or some variant thereof.

When programming in these languages, nullable values must be accounted or it could lead to errors. Languages like C/C++ and Go will attempt to to resolve the address behind null/nil, leading to at least a program crash and possibly a security vulnerability. Languages like Java and Python will raise exceptions (NullPointerException in Java - frequently abbreviated NPE because it is so common - and likely TypeError in Python).

The prevalence of failure to account for nullable values is a major reason why null references were coined by their inventor as the billion dollar mistake. (I suspect the real world value is much greater than $1B.)

Having an easy-to-ignore nullable invariant lingering in type systems seems like a massive foot gun to me. And indeed every programmer with sufficient experience has likely introduced a bug due to failure to account for null. I sure have!

Rust doesn't have a null value. Therefore no null references and no billion dollar mistake. Instead, Rust's standard library has Option, an enum representing nullable types / values. And Option is vastly superior to null values.

Option<T> is an enum with 2 variants, Some(T) or None: an instance of some type or nothing. What makes Option different from languages with null references is you have to explicitly ask for the inner value: there is no automatic dereference. Rust forces you to confront the reality that a value is nullable and by doing so can drastically reduce a very common bug class. I say drastically reduce instead of eliminate because it is still possible to shoot yourself in the foot. For example, you can call Option.unwrap() to obtain the inner value, triggering a panic if the None variant is present. Despite the potential for programming errors, this solution is strictly better than null references because Option forces you to confront the reality of nullability and use of the dangerous access mechanisms is relatively easy to audit for. (Clippy has some lints to encourage best practices here.)

The existence of Option<T> means that if you are operating on a non-Option value, that value is guaranteed to exist and not be null. If you are operating on Option, the fact it is optional is explicitly encoded in the type and you know you need to account for it. If the value passed into a function was once always defined and a later refactor changed it to be optional (or vice versa), that semantic change is reflected in the type system and Rust forces you to confront the implications when that change is made, not after it was deployed to production and you started seeing segfaults, NPEs, and the like.

After using Rust's Option<T> to express nullability, you will look at every other language with null references and bemoan how primitive and unsafe it feels by comparison. You will yearn for Rust's safer approach biasing towards correctness and higher quality software. Option<T> is massive feature for the professional programmer who cares about these traits.

Result: A Better Way to Handle Errors

Different programming languages have different ways of handling errors. Returning integers or booleans to express success or failure is common. As is throwing and trapping/catching exceptions.

Like nullability, history has shown us that programmers often fail to handle error invariants, with bugs of varying severity ensuing. Even Linux filesystems fail to handle errors!

I argue that the traditional programming patterns we use to handle errors bias towards buggy outcomes, especially with the return an integer/error value approach. It is easy to forget to check the return value of a function. In C/C++, maybe a function once returned nothing (void) and was later refactored to return an integer error code. You have to know to audit for existing callers when making these changes or updating dependencies. Furthermore, handling errors requires effort. That if err != 0 or if err != nil pattern gets mighty annoying to type all of the time! Plus, you have to know what value to compare against: success can often be 0, -1, or 1 or any other arbitrary value. Getting error handling correct 100% of the time is hard. You will fail and this will lead to bugs.

Result is Rust's primary/preferred mechanism for propagating errors and it is different from traditional approaches.

Like Option<T>, Result<T, E> is an enum with 2 variants: Ok(T) and Err(E). That is, a value is either success, wrapping an inner value of type T or error, wrapping an inner value of type E describing that error.

Like Option<T>, Result<T, E> forces you to confront the existence of invariants. Before operating on the value returned by a function, you need to explicitly access it and that forces you to confront that an error could have occurred. In addition, the Result type is annotated and the compiler will emit a warning when you don't check it. Scenarios like changing an infallible function returning a type T to fallible returning a Result<T, E> will fail to compile (due to typing violations) or make compiler warning noise if there are call sites that fail to account for that change.

In addition to making it more likely that errors are acted upon correctly, Rust also contains a ? operator for simplifying handling of errors.

As I said above, typing patterns like if err != 0 or if err != nil can become extremely tedious. Your brain knows what it needs to type to handle errors but it takes precious seconds to do so, slowing you down. You may have code where the majority of the lines are the same error handling boilerplate over and over, increasing verbosity and arguably decreasing readability.

Rust's ? operator will return an Err(E) variant or evaluate to the inner value from the Ok(T) variant. So you can often add an ? operator after a function call returning a Result<T, E> to automatically propagate an error. Typing a single character is vastly easier and simpler than writing explicit control flow for error handling!

The benefits of ? are blatantly apparent when you have functions calling into multiple fallible functions. Long functions with multiple if err != 0 blocks followed by the next logical operation often reduce to a 1-liner. e.g. bar(foo()?)? or foo.do_x()?.do_y()?. When I said earlier that Rust feels like a higher level language, the ? operator is a significant contributor to that.

There are some downsides to Result<T, E> in terms of programming overhead and consistency between Rust programs. I'll cover these later in the post.

Result<T, E> biases Rust code towards correctness by forcing programmers to confront the reality that an error could exist and should be handled. Once you program in Rust, you will look at error handling mechanisms like returning an error integer or nullable value, realize how brittle and/or tedious they are, and yearn for something better.

The unsafe Escape Hatch

If some of Rust's limitations are too much for you, Rust has an in case of emergency break glass feature called unsafe. This is kind of like C mode where you can do things like access and manipulate raw memory through pointers. You can cast a value to a pointer and back to a new Rust reference/value, effectively short circuiting the borrow checker for that particular reference/value.

A common misconception is unsafe disables the borrow checker and/or loosens type checking. This is incorrect: many of those features are still running in unsafe code. However, because Rust can't fully reason about what's happening (e.g. it doesn't know who owns a raw memory address and when it will be freed), it can't properly enforce all of its rules that guarantee safety, leading to, well, unsafety. (See Unsafe Rust for more on this topic.)

unsafe is a necessary evil. In many Rust programs, you won't have to ever use it. But if you do have to use it, its presence will draw review scrutiny like moths to light. So unlike say C/C++ where practically every memory access is a potential security bug and it is effectively impossible in many scenarios to comprehensively audit for memory safety (if there were there would be no memory safety bugs), using unsafe safely is often viable because scrutiny can be concentrated on its relatively few occurrences. And more experienced Rust programmers know how to encapsulate unsafe into safe wrappers, limiting how much code needs to be audited when code around unsafe changes.

What I've personally been enlightened by is the myriad of operations that Rust considers unsafe. As you learn more and more Rust, you'll encounter random functions sprinkled across the standard library that are unsafe and you'll wonder why. The docs usually tell you and that's how you learn something new (and maybe horrifying) about how computers actually work.

Fearless Refactoring

A significant portion of the software development lifecycle is evolving existing code. Fixing bugs. Extending existing code with new functionality. Refactoring code to fix bugs or prepare for new features. Using code in new, unplanned ways.

In many code bases, the amount of people time spent evolving the code dwarfs the time for creating actual greenfield code/features. (Unfortunately, quantifying when you are doing evolution versus greenfield coding is quite difficult, so both facets often get lumped together into simply software development time. But in my mind they are discrete - although highly interdependent - units of work and the evolution time tends to dwarf the greenfield time on established projects.) So it follows that long-term evolution/maintainability of code bases is more important than initial code creation time.

There is a sufficient body of industry research demonstrating that the cost to fix defects rises exponentially as you progress through the software development lifecycle (do a search for say software development lifecycle cost of fixing a bug).

Furthermore, human memory functions not unlike multi-tier caches and your ability to recall information will diminish over time. (You probably know what you were doing 5 minutes ago, might remember what you were doing at this time yesterday, and probably have no clue what you were doing on this date 20 years ago.)

In terms of coding, the best way to address a defect is to not introduce it in the first place. If you can't do that, your goal is to detect and correct it as early in the development process as possible, as close as possible to when the source code creating that defect came into existence. Practically, in order of descending desirability:

  1. Don't introduce defect (this is impossible because humans are fallible).
  2. Detect and correct defect as soon as the bad key press occurs (within reason: you don't want the programmer to lose too much flow) (milliseconds later).
  3. At next build / test time (seconds or minutes later).
  4. When code is shared with others (maybe you push a branch and CI tells you something is wrong) (minutes to days later).
  5. During code review (minutes to days later).
  6. When code is integrated (e.g. merged) (minutes to days later).
  7. When code is deployed (minutes to days or even months later).
  8. When a bug is reported long after the code has been deployed (weeks to years later).

The earlier a defect is caught, the better the chances that the author (or other involved parties) have relevant code paged in and can fix it with less effort and with lower chances of introducing additional defects. For me, authoring new code is relatively easy compared to refactoring old code. That's because I have new code fully paged into my brain and I know it like the back of my hand. I know where the sharp edges are and how you'll get cut if you make certain changes. However, if several months pass without revisiting the code, most of that heightened awareness evaporates. If I need to change or review that code, my ability to do that with a high degree of confidence and efficiency is drastically eroded.

Generally speaking, the earlier a defect is caught, the less damage it can do. Ideally, a defect is caught and fixed at local development time, before you burden a reviewer with finding it and certainly before it causes harm or anti-value after being deployed!

In addition, compressing the software development lifecycle allows you to ship enhancements sooner, which enables you to deliver value sooner. This is what we're trying to do as professional programmers after all!

Because the cost to fix a defect rises exponentially as it moves through the software development lifecycle, it follows that you want defect detection to occur logarithmically to offset that cost. That means you want as many defects as possible to be caught as early as possible.

Compared to other programming languages I've used, Rust is exceptional at detecting defects earlier in the development lifecycle and as a result can drastically lower overall development costs. Here are the main factors contributing to this belief:

  • The type system is relatively strong and prevents many classes of bugs.
  • The borrow checker and the rules it enforces prevent safety issues at compile time. Some of these violations can be detected by other languages' compilers. However, in many cases sufficient auditing (like {address, memory, thread} sanitizers) is run much less frequently, often only in CI tests, which can be hours or days later.
  • Confidence that the above 2 function as advertised.
  • Invariants can be encoded and enforced in the type system through features like enums being algebraic data types.
  • Variables are immutable by default and must be explicitly annotated as mutable. This forces you to think about where and how data mutation occurs, enabling you to spot issues sooner.
  • Option<T> significantly curtails the billion dollar mistake.
  • Result<T, E> forces you to reckon about handling errors.

The Rust compiler is just exceptional at detecting common defects.

Did your code refactor introduce a use-after-free or dangling reference? Don't worry: the borrow checker will detect that. CVE prevented.

Did you introduce a race condition by performing a mutation somewhere that was previously immutable? The borrow checker will detect that. You potentially just saved hours of time debugging a hard-to-reproduce bug.

Did you add an enum variant but forget to add that variant to a match expression? If you avoided using the match all _ expression, the compiler will tell you match arms aren't exhaustive and give you an error.

Did a value that was previously always defined become nullable? Changing the type from T to Option<T> will yield compiler errors due to type mismatch.

Did an Option<T> that was previously always Some(T) suddenly become None? Hopefully following Rust best practices mean your code will just work. In the worst case you get a panic (with a stack trace). But that's on par with say a Java NPE and is strictly better than a null dereference that you get with languages like C/C++.

Did you change or add a function returning Result<T, E> but forget to check if that Result is an Ok(T) or Err(E), the compiler will tell you.

I could go on. Rust is full of little examples like these where the core language and standard library nudge you towards working code and help detect defects earlier during development, saving vast amounts of time and money later.

The Rust compiler is so good at rooting out problems that many Rust programmers have adopted the expression, if it compiles it works. This statement is obviously falsifiable. But compared to every other programming language I've used, I'm shocked by how often it is true.

For other programming languages, a working compile is the beginning of your verification or debugging journey. For Rust, it often feels like the hard part is over and you are almost done. With other languages, you often have an indefinite number of iterations to fix language defects (like null dereferences or dynamic typing errors) beyond the compile step. You need to address these in addition to any logical/intent defects in your code. And fixing logical/intent defects could introduce more post-compile defects. As a programmer, you just don't know when the process will be done. With Rust, the compiler errors tell you exactly what the language defects are. So by the time you appease the compiler, you are left with just your logical/intent defects. I greatly prefer the Rust workflow which separates these because I'm getting clearer feedback on my progress: I know that once I've addressed all the language defects the compiler complains about that is just a matter of fixing logical/intent defects. I know I'm a giant step closer to victory.

The Progress Principle is a psychological observation that people tend to prefer a series of more smaller wins over fewer larger wins. And (unexpected) setbacks can more than offset the benefits of wins. (The book is an easy read and I've found its insights applicable to software development workflows.) Whether Rust's language designers realized it or not, Rust's development workflow plays into our psychological dispositions as described by The Progress Principle: defects (setbacks) tend to occur earlier (at compile time), not at unexpected later times (during code review, CI testing, deploy, etc) and our progress towards a working solution is composed of small wins, such as fixing compiler errors and knowing when you transition from language defects into logical/intent defects. For me, this makes iterating on Rust more fulfilling and enjoyable than other languages.

Rust Makes You a Better Overall Programmer

Whether you realize it or not, every programmer has a personal, generalized model of how to program, how to reason about code, best practices, and what not. When we program, we specialize that model to the language and environment/project we're programming for. The mental model that each of us has its shaped by our experience: which languages we know, which concepts we've been exposed, mistakes we've made, people we've worked with and the practices they've instilled.

If for no other reason, you should learn Rust to expand your generalized model of how to program so that you can apply Rust's principles outside of Rust.

Before I learned Rust, I had a mental model of the lifetimes of various values/variables/memory and how they would be used. If I were coding C, I would attempt to document these in function comments. e.g. if returning a pointer, the comment would say how long the memory behind that pointer lives or who is responsible for freeing it. So when I encountered Rust's ownership and reference rules when learning Rust, they substantially overlapped with my personal mental model of how you should reason about memory in order to avoid bugs. I distinctly remember reading the Rust Book and thinking wow, this seems to be a formalization of some of the concepts and best practices living in my head!

After using Rust for several months, I realized that my prior mental model around reasoning about safe program behavior was woefully incomplete and that Rust's was far superior.

Rust's different ways of doing things will inevitably force you to think about type design, data access patterns, control flow, etc more than most other programming languages. In most other languages, it is much easier to just write runnable code and defer the complexity around ensuring the code is safe/correct and free from certain classes of bugs, like memory access violations and race conditions. Rust's ways of doing things forces you to confront many of these problems up-front, before anything runs.

Rust's stricter model and way about authoring software eventually percolates into your personal generalized model of how to program in any programming language. As you internalize patterns needed to program Rust proficiently, you will subconsciously cherry-pick aspects of Rust and apply them when programming in other languages, making you a better programmer in those languages.

For example, when you program C/C++, you will realize the minefield of memory safety issues that linger in those languages. Many of those mines never explode. But knowing Rust and the patterns needed to appease the borrow checker and write safe code, you have a better sense of where the mines are located, the patterns that lead to them exploding, and you can take preemptive steps or apply extra scrutiny to avoid tripping them. (If you are like me, you'll reach the conclusion that C/C++ is intrinsically unsafe and is beyond saving, vowing to avoid it as much as possible because it is just too dangerous to use safely/responsibly.)

Similarly, when programming in any language, you'll probably think more about variable mutability and non-mutability, even if those languages don't have the concept of mutability on variables. You'll be more attune to certain patterns for mutating data: where mutation occurs, who has a mutable reference, when there are both mutable and non-mutable references in existence. Again, your knowledge from Rust will subconsciously raise your awareness for classes of bugs, making you a better programmer.

The same thing applies to multi-threaded programming and race conditions. After internalizing Rust's model of how to achieve multi-threading safely, you will probably not look at multi-threading in other languages the same way again. If you are like me, you will be horrified by how the lack of Rust's enforced ownership/reference rules predisposes code to so many horrible and hard-to-debug bugs. Again, you will probably find yourself changing your approach to multi-threading to minimize risk.

Fun fact: while at Mozilla I heard multiple anecdotes of [very intelligent] Firefox developers thinking they had found a bug in Rust's borrow checker because they thought it was impossible for a flagged error to occur. However, after sufficient investigation the result was always (maybe with an exception or two because Mozilla adopted Rust very early) that the Rust compiler was correct and the developer's assertions about how code could behave was incorrect. In these cases, the Rust compiler likely prevented hard-to-debug bugs or even exploitable security vulnerabilities. I remember one developer exclaiming that if the bug had shipped, it would have taken weeks to debug and would likely have gone unfixed for years unless its severity warranted staffing.

I strongly feel that I am a better programmer overall after learning Rust because I find myself applying the [best] practices that Rust enforces on me when programming in other languages. For this reason, even if you don't plan to use Rust in any serious capacity, I encourage people to learn Rust because exposure to its ideas will likely transform the ways you think about programming for the better.

Rust Downsides and Dispelling Some Rust Myths

This post has been rather positive about Rust so far. Rust, like everything, is far from perfect and it has its downsides. Professionals know the limitations of their tools and you should know some of the issues you'll run into when using Rust.

In addition, Rust is still a relatively young and unpopular programming language. Since relatively few people know Rust, there are a handful of myths and inaccuracies circling about the language. I'll also dispel some of those here.

Steeper Learning Curve

A common criticism levied against Rust is it is harder to learn than other programming languages. I think this is a valid concern. My experience is Rust took longer to learn and level-up than other languages I've learned recently, notably Go, Kotlin, and Ruby.

I think the primary reason for this is the borrow checker and the rules it enforces. Many programmers have never encountered forced following of ownership and reference rules before and this concept is completely foreign at first. I liken it to a new way to program. If you only have experience with dynamically typed languages that will allow you to compile a ham sandwich, there's a good chance you'll be frustrated by Rust. Rust will likely challenge your conceptions of how programming should work and may frustrate you in the process.

In addition to the borrow checker itself, there are a myriad of types and patterns you'll encounter and eventually need to understand to appease the borrow checker.

Beyond the borrow checker, Rust's standard library is comprehensive and offers a lot of types and traits. It will take a while to be exposed to many of them and know when/how to use each.

You will likely be adding 3rd party crates as dependencies to your project for common functionality not (yet) in the standard library. These expand the scope of concepts you need to learn.

I hope I'm not scaring anybody away: you can go pretty far in Rust without encountering or understanding most of the standard library. That being said, every new type, trait, concept, and crate you learn unlocks new possibilities and avenues for delivering value through programming. So there is an incentive to take the time to learn them sooner than later.

I learned Rust mostly independently for a personal project. While learning resources such as Learn Rust, the Rust Language Cheat Sheet, and even Clippy are fantastic, in hindsight I probably would have become more proficient sooner had I contributed to an existing Rust project and/or had ongoing technical collaboration with more experienced Rust developers. This is probably no different than any other programming language. But because of Rust's steeper learning curve, I think the benefits of peer exposure are more significant. That being said, I've heard anecdotes of teams with no Rust experience learning Rust together with successful results. So there's no formal recipe for success here.

Finally, despite the steeper learning curve, I'd say the return on investment pays off pretty quickly. As I've argued elsewhere in this post, the Rust compiler and type system helps prevent many classes of bugs. So while it may take longer to initially learn and compose idiomatic Rust code, it won't take long for Rust to offset the time that you would have spent chasing bugs, performance optimizations, and the like.

Rust Moves Too Fast

Rust releases a new version every 6 weeks. By contrast, many other programming languages release ~yearly. This faster release cadence has been a common complaint about Rust.

Quickly, I think people conflate release cadence with churn and hardship from that release cadence. Generally speaking, release cadence isn't the thing you care about: it's how disrupted you are from the releases. If your old release continues to work just as well as the new release, release cadence doesn't really matter (many major websites deploy/release dozens of times per day and you don't care because you can't tell: you only care when the UI or behavior changes). So the thing most of us care about is how frequently Rust releases cause disruption. And disruption is often caused by backwards incompatibility and the introduction of new features, which when adopted, force upgrades.

A few years ago, I think the concern that Rust moves too fast was valid: there were significant features in seemingly every release and crates were eager to jump on the new features, forcing you to upgrade if you wanted to keep your dependency tree up to date. I feel like I caught the tail end of this relative chaos in 2018-2019.

But in the last 18-24 months, things seem to have quieted down. Many of the major language features that people were eager to jump on have landed. The only ongoing churn I'm aware of in Rust is in the async ecosystem, and that seems to be stabilizing. New Rust releases are generally pretty quiet in terms of must use features. The last milestone release in my mind was 1.45 in July 2020, which stabilized procedural macros. The community was pretty quick to jump on that feature/release. My Rust projects have targeted 1.45+ for a while now with minimal issues.

9 months with no major disruptions is on par with the release cadence of other programming languages.

In my opinion, the concern that Rust moves too fast, while once valid, no longer generally applies. Pockets of truth for segments of users caring about niche and lesser-used features, yes. But nothing that applies to the entire Rust ecosystem.

Compiling Is Too Slow

A lot of people have commented that Rust builds take too long. It is true: compiling Rust tends to take longer than C/C++, Go, Java, and other languages requiring an ahead-of-time compile step.

While a lot has been done to make the Rust compiler faster (it feels substantially faster than it was a few years ago), it still isn't as fast as other languages.

Not to dismiss the problem, but in a lot of cases, the speed of Rust compilation is fast enough. Incremental builds for small libraries or programs will take a few hundred milliseconds to a second or two. I suspect most of the people complaining about build times today are developing very large Rust programs (tens of thousands of lines of code and/or hundreds of dependencies).

A contributing problem to build times is dependency count. The simplicity of Cargo makes it very easy to accumulate dependencies in Rust and each additional crate will slow your build down. PyOxidizer has ~400 dependencies at this point in time, for example (I've been throwing the kitchen sink at it in terms of features).

There are a few things under your control to mitigate this problem.

First, install sccache, a transparent compiler cache. By default it caches to the local filesystem. But you can also point it at Redis, Memcached, or blob stores in AWS, Azure, or GCP. Firefox's CI uses an S3 backed cache and the hit rate (for both Rust and C/C++) is 90-99% on nearly every build. For PyOxidizer - a medium sized Rust project - sccache reduces full build times from ~53s wall and ~572s CPU to ~32s wall to 225s CPU on my 16 core Ryzen 5950X. The wall time savings on a lower CPU core count machine are even more significant.

Speaking of CPU core counts, the second thing you can do is give yourself access to more CPU cores. Laptops tend to have at most 4 CPU cores. Consider buying desktops or moving builds to remote machines, often with dozens of CPU cores. This requires spending money. But when you factor in people time saved and the cost of that time and the value of someone's happiness/satisfaction, it can often be justified.

I'm not trying to dismiss the problems that slow builds can impose, but if you want to justify their cost, you can argue that the Rust compiler does more at compilation time than other languages and that this overhead brings benefits, such as preventing bugs earlier in the software development lifecycle. There's no such thing as a free lunch and Rust's relatively slower builds are a tax you pay for the correctness the compiler guarantees. To me, that's a justifiable trade-off.

Rust is Too Young or Isn't Production Ready

The isn't production ready concern is likely disproven by the existence of Rust in production in critical roles at a sufficient number of reputable companies. At this point, there are very few technical reasons to say Rust isn't production ready. Non-technical reasons such as lack of organizational knowledge or a limited talent pool for hiring from, yes. But little on the technical front.

The too young part is ultimately a judgement call for how comfortable you are with new technologies.

I'm generally pretty conservative/skeptical about adopting new technology. If you are in this industry long enough you eventually get humbled by your exuberance.

I was probably in the Rust is too young boat as late as 2017, maybe 2018. While I was cheering on Rust as a Mozillian, I was skeptical it was going to take off. Birthing successfully languages is hard. The language still seemed to move too fast and have too many missing features. Things seemed to stabilize around the 2018 edition. That's also when you started commonly hearing of companies adopting Rust. Lots of startups at first. Then big companies started joining in.

Today, companies you have heard of like Amazon, Cloudflare, Discord, Dropbox, Facebook, Google, and Microsoft are adopting Rust to varying degrees. There are 58,750 published crates on crates.io.

I won't drop names, but I've heard of Rust spreading like wildfire at some companies you've heard of. The stories are pretty similar: random person or team wants to try Rust. Something small and isolated with a minimal blast radius in case of disaster is tried first. Rust is an overwhelming success. As more and more people are exposed to Rust, they see the light, cries for Rust become louder, and it becomes even more widely adopted.

The I'm Writing Fewer Bugs Trap

When I program in Rust, I strongly feel that my base rate of defect introduction is substantially less than other programming languages. I have confidence that the Rust compiler coupled with practices like encoding and enforcing invariants in the type system leads to fewer defects. In some cases I feel like the surface area for bugs is limited to logical defects, which are mis-expressions of the human programmer's intent. And since no automated tool can reliably scan for human intent, there's no way to prevent logical bugs, and that surface area is the best we can ever expect from automated scanning.

Knowing what tests to write and how much effort to invest in test writing is a difficult skill to level up and is full of trade-offs. With Rust, I find myself writing fewer tests than in other languages because I have confidence that the compiler will detect issues that would otherwise require explicit testing.

I feel that my beliefs and practices are rooted in reality and justifiable. Yet I recognize the danger in placing too much faith in my tools, in Rust.

In theory, Rust alleviates the need for running additional verification tools, like {address, memory, thread} sanitizers because the safe subset of Rust prevents the issues these tools detect. Many defects caught by fuzzing are also similarly prevented by the design of Rust (but not all: fuzzing is generally a good idea).

What I'm trying to say is that it is really easy to fall into a trap where you are over-confident about the abilities of Rust to prevent defects and you find yourself letting your guard down and not maintaining testing and other verification best practices.

I'm still evolving my beliefs in this area. But my general opinion is that you should still run things like {address, memory, thread} sanitizers and fuzzing because unsafe likely exists somewhere in the compiled code, as likely does C or assembly code. And because a chain is only as strong as its weakest link, it only takes any bug to undermine the safety of the entire system. So while these additional verification tools likely won't find as many issues as they would in unsafe languages, I still think it is a good idea to continue to run them against Rust, especially for high value code bases.

Error Handling

Result<T, E> isn't a panacea. Because errors are full on types rather than simple primitives like integers, you need to spend effort reasoning and coding about how different error types interact. And often you need to write a bit of boilerplate code to facilitate that interaction. This can cancel out a lot of the efficiency benefits of Rust's ? operator for handling errors.

There are a handful of 3rd party Rust crates specializing in error handling that you'll likely to encounter. These include anyhow, error-chain, failure, and thiserror.

Rust's error handling landscape can at times feel fragmented and make you yearn for something more defined/opinionated in the standard library. The Rust Community recognizes that this is an area that can be improved and has formed an error handling project group to improve this space. So hopefully we see some quality of life improvements to error handling in time.

Conclusion

I am irrationally effusive about Rust. When I see this level of excitement in others, I am extremely skeptical. I was skeptical myself when my former colleagues at Mozilla were talking up Rust years ago. But having used Rust for 2.5 years now and authored tens of thousands of lines of Rust code, the initial relationship euphoria has worn off and I am most definitely in love.

Cynically, Rust has ruined in programming in other languages for me. Less cynically, Rust has spoiled me.

When I look at other languages without the rules enforced by Rust's borrow checker, all I see are sharp edges waiting to materialize into bugs.

When I look at other languages with weaker type systems, I think about all the time I spend having to defend against invariants and how much cognitive load and programming/review effort I need to incur to maintain the baseline of quality that I get with Rust.

When I look at programming languages like Python, Ruby, and TypeScript where you can bolt a type system onto a language that doesn't have it, I think why would I want to do that when I can use an even better type system while likely achieving much better performance with Rust? (It's tempting to reach for a metaphor involving lipstick and pigs.)

When I look at other languages, I generally see the same pile of decades old ideas packaged in different boxes. Some of these ideas are good and probably timeless (e.g. functions and variables). Some are demonstrably bad and should be largely excised from common use (e.g. null references - the billion dollar mistake).

When I interface with Rust's tooling, I feel like it is respectful of my time and has my best interests (producing working software) at heart. I feel the maintainers of the tooling care about me.

When I program in Rust, I feel that I'm producing fewer defects overall. The compiler is catching defects that would otherwise be caught later in the software development lifecycle, leading to increased software development costs.

When I interact with Rust's community of people, respect and empathy abounds.

Does Rust have its problems and limitations? Of course it does: nothing is perfect! But in my opinion, its trade-offs are often strictly better than those found in other programming languages I've used.

At the end of the day, Rust is a programming language and therefore a tool. Adept professionals know not to get too attached to your tools: ultimately it is the value you deliver, not how you deliver it. (Of course the choice of tools can significantly impact the quality and timeline of value delivery!) Will my thoughts on Rust and preferred languages change over time as the landscape shifts: of course they will! But for the time being, Rust brings so much to the table that its competition lacks that I'm overly excited about Rust and its ability to advance the state of software/programming and therefore the industry.

In closing, my current CTO uses the phrase commitment to craft as a desired mindset for their technical organization. That phrase translates to various themes: higher quality / lower defect rate, build with the long-term in mind, implement efficient solutions, etc. Like an artist reaches for a preferred paintbrush or a chef for a preferred knife because their preferred tool enables them to better express their craft, I feel that Rust often enables me to better express the potential of my professional craft more than other programming languages. I strongly feel that Rust predisposes software to higher quality outcomes - both in terms of defect rate and run-time efficiency - while also reducing total development and execution costs over the entire software development lifecycle. That makes Rust my first choice language - my go-to tool - for many new projects at this point in time. If you likewise value commitment to craft, I urge you to explore Rust so that you too can better harness the potential of our programming craft.

But don't take my word on it, read what 42 companies using Rust in production have to say.


Modern CI is Too Complex and Misdirected

April 07, 2021 at 09:00 AM | categories: CI, build system

The state of CI platforms is much stronger than it was just a few years ago. Overall, this is a good thing: access to powerful CI platforms enables software developers and companies to ship more reliable software more frequently, which benefits its users/customers. Centralized CI platforms like GitHub Actions, GitLab Pipelines, and Bitbucket provide benefits of scale, as the Internet serves as a collective information repository for how to use them. Do a search for how to do X on CI platform Y and you'll typically find some code you can copy and paste. Nobody wants to toil with wrangling their CI configuration after all: they just want to ship.

Modern CI Systems are Too Complex

The advancements in CI platforms have come at a cost: increased complexity. And the more I think about it, I'm coming around to the belief that modern CI systems are too complex. Let me explain.

At its core, a CI platform is a specialized remote code execution as a service (it's a feature, not a CVE!) where the code being executed is in pursuit of building, testing, and shipping software (unless you abuse it to mine cryptocurrency). So, CI platforms typically throw in a bunch of value-add features to enable you to ship software more easily. There are vastly different approaches and business models here. (I must tip my hat to GitHub Actions leveraging network effects via community maintained actions: this lowers TCO for GitHub as they don't need to maintain many actions, creates vendor lock-in as users develop a dependence on platform-proprietary actions, all while increasing the value of the platform for end-users - a rare product trifecta.) A common value-add property of CI platforms is some kind of configuration file (often YAML) which itself offers common functionality, such as configuring the version control checkout and specifying what commands to run. This is where we start to get into problems.

(I'm going to focus on GitHub Actions here, not because they are the worst (far from it), but because they seem to be the most popular and readers can relate more easily. But my commentary applies to other platforms like GitLab as well.)

The YAML configuration of modern CI platforms is... powerful. Here are features present in GitHub Actions workflow YAML:

  • An embedded templating system that results in the source YAML being expanded into a final YAML document that is actually evaluated. This includes a custom expression mini language.
  • Triggers for when to run jobs.
  • Named variables.
  • Conditional job execution.
  • Dependencies between jobs.
  • Defining Docker-based run-time environment.
  • Encrypted secrets.
  • Steps constituting each job and what actions those steps should take.

If we expand scope slightly to include actions maintained by GitHub, we also have steps/actions features for:

  • Performing Git checkouts.
  • Storing artifacts used by workflows/jobs.
  • Caching artifacts used by workflows/jobs.
  • Installing common programming languages and environments (like Java, Node.js, Python, and Ruby).
  • And a whole lot more.

And then of course there are 3rd party Actions. And there's a lot of them!

There's a lot of functionality here and a lot of it is arguably necessary: I'm hard pressed to name a feature to cut. (Although I'm no fan of using YAML as a programming language but I concede it use is a fair compromise compared to forcing people to write code to produce YAML or make equivalent API calls to do what the YAML would do.) All these features seem necessary for a sufficiently powerful CI offering. Nobody would use your offering if it didn't offer turnkey functionality after all.

So what's my complaint?

I posit that a sufficiently complex CI system becomes indistinguishable from a build system. I challenge you: try to convince me or yourself that GitHub Actions, GitLab CI, and other CI systems aren't build systems. The basic primitives are all there. GitHub Actions Workflows comprised of jobs comprised of steps are little different from say Makefiles comprised of rules comprised of commands to execute for that rule, with dependencies gluing everything together. The main difference is the form factor and the execution model (build systems are traditionally local and single machine but CI systems are remote/distributed).

Then we have a similar conjecture: a sufficiently complex build system becomes indistinguishable from a CI system. Earlier I said that CI systems are remote code execution as a service. While build systems are historically things that run locally (and therefore not a service), modern build systems like Bazel (or Buck or Gradle) are completely different animals. For example, Bazel has remote execution and remote caching as built-in features. Hey - those are built-in features of modern CI systems too! So here's a thought experiment: if I define a build system in Bazel and then define a server-side Git push hook so the remote server triggers Bazel to build, run tests, and post the results somewhere, is that a CI system? I think it is! A crude one. But I think that qualifies as a CI system.

If you squint hard enough, sufficiently complex CI systems and sufficiently complex build systems start to look like the same thing to me. At a very high level, both are providing a pool of servers offering general compute/execute functionality with specialized features in the domain of building/shipping software, like inter-task artifact exchange, caching, dependencies, and a frontend language to define how everything works.

(If you squint really hard you can start to see a value proposition of Kubernetes for even more general compute scheduling, but I'm not going to go that far in this post because it is a much harder point to make and I don't necessarily believe in it myself. But I thought I'd mention it as an interesting thought experiment. But an easier leap to make is to throw batch job execution (as is often found in data warehouses) in with build and CI systems as belonging in the same bucket: batch job execution also tends to have dependencies, exchange of artifacts between jobs, and I think can strongly resemble a CI system and therefore a build system.)

The thing that bugs me about modern CI systems is that I inevitably feel like I'm reinventing a build system and fragmenting build system logic. Your CI configuration inevitably devolves into a bunch of complex YAML with all kinds of caching and dependency optimizations to keep execution time low and reliability in check - just like your build system. You find yourself contorting your project's build system to work in the context of CI and vice versa. You end up managing two complex DAGs and platforms/systems instead of one.

Because build systems are more generic than CI systems (I think a sufficiently advanced build system can do a superset of the things that a sufficiently complex CI system can do), that means that CI systems are redundant with sufficiently advanced build systems. So going beyond the section title: CI systems aren't too complex: they shouldn't need to exist. Your CI functionality should be an extension of the build system.

In addition to the redundancy argument, I think unified systems are more user friendly. By integrating your CI system into your build system (which by definition can be driven locally as part of regular development workflows), you can expose the full power of the CI system to developers more easily. Think running ad-hoc CI jobs without having to push your changes to a remote server first, just like you can with local builds or tests. This is huge for ergonomics and can drastically compress the cycle time for changes to these systems (which are often brittle to change/test).

Don't get me wrong, aspects of CI systems not traditionally found in build systems (such as centralized results reporting and a UI/API for (re)triggering jobs) absolutely need to exist. Instead, it is the remote compute and work definition aspects that are completely redundant with build systems.

Let's explore the implications of build and CI systems being more of the same.

Modern CI Offerings are Targeting the Wrong Abstraction

If you assume that build and CI systems can be / are more of the same, then it follows that many modern CI offerings like GitHub Actions, GitLab CI, and others are targeting the wrong abstraction: they are defined as domain specific platforms for running CI systems when instead they should take a step back and target the broader general compute platform that is also needed for build systems (and maybe batch job execution, such as what's commonly found in data warehouses/pipelines).

Every CI offering is somewhere different on the spectrum here. I would go so far as to argue that GitHub Actions is more a CI product than a platform. Let me explain.

In my ideal CI platform, I have the ability to schedule an ad-hoc graph of tasks against that platform. I have the ability to hit some APIs with definitions of the tasks I want that platform to run and it accepts them, executes them, uploads artifacts somewhere, reports task results so dependent tasks can execute, etc.

There is a GitHub Actions API that allows you to interact with the service. But the critical feature it doesn't let me do is define ad-hoc units of work: the actual remote execute as a service. Rather, the only way to define units of work is via workflow YAML files checked into your repository. That's so constraining!

GitLab Pipelines is a lot better. GitLab Pipelines supports features like parent-child pipelines (dependencies between different pipelines), multi-project pipelines (dependencies between different projects/repos), and dynamic child pipelines (generate YAML files in pipeline job that defines a new pipeline). (I don't believe GitHub Actions supports any of these features.) Dynamic child pipelines are an important feature, as they mostly divorce the checked-in YAML configuration from the remote execute as a service feature. The main missing feature here is a generic API that allows you achieve this functionality without having to go through a parent pipeline / YAML first. If that API existed, you could build your own build/CI/batch execute system on top of GitLab Pipelines with fewer constraints imposed on you by GitLab Pipeline's opinionated YAML configuration files and the intended use of its creators. (Generally, I think a good litmus test for a well-designed platform or tool is when its authors are surprised by someone's unintended use for it. Of course this knife cuts both ways, as sometimes people do undesirable things, like mine cryptocurrency.)

CI offerings like GitHub Actions and GitLab Pipelines are more products than platforms because they tightly couple an opinionated configuration mechanism (YAML files) and web UI (and corresponding APIs) on top of a theoretically generic remote execute as a service offering. For me to consider these offerings as platforms, they need to grow the ability to schedule arbitrary compute via an API, without being constrained by the YAML officially supported out of the box. GitLab is almost there (the critical missing link is a schedule an inline-defined pipeline API). It is unknown if GitHub is - or is even interested in - pursuing this direction. (More on this later.)

Taskcluster: The Most Powerful CI Platform You've Never Heard Of

I wanted to just mention Taskcluster in passing as a counterexample to the CI offerings that GitHub, GitLab, and others are pursuing. But I found myself heaping praises towards it, so you get a full section on Taskcluster. This content isn't critical to the overall post, so feel free to skip. But if you want to know what a CI platform built for engineers looks like or you are a developer of CI platforms and would like to read about some worthwhile ideas to steal, keep reading.

Mozilla's Taskcluster is a generic CI platform originally built for Firefox. At the time it was conceived and initially built out in 2014-2015, there was nothing else quite like it. And I'm still not aware of anything that can match its raw capabilities. There might be something proprietary behind corporate walls. But nothing close to it in the open source domain. And even the proprietary CI platforms I'm aware of often fall short of Taskcluster's feature list.

To my knowledge, Taskcluster is the only publicly available, mega project scale, true CI platform in existence.

Germane to this post, one thing I love about Taskcluster is its core primitives around defining execution units. The core execute primitive in Taskcluster is a task. Tasks are connected together to form a DAG. (This is not unlike how a build system works.)

A task is created by issuing an API request to a queue service. That API request essentially says schedule this unit of work.

Tasks are defined somewhat generically, essentially as units of arbitrary compute along with metadata, such as task dependencies, permissions/scopes that task has, etc. That unit of work has many of the primitives that are familiar to you if you use GitHub Actions, GitLab Pipelines, etc: a list of commands to execute, which Docker image to execute in, paths to files constituting artifacts, retry settings, etc.

Taskcluster has features far beyond what are offered by GitHub, GitLab, and others today.

For example, Taskcluster offers an IAM-like scopes feature that moderates access control. Scopes control what actions you can perform, what services you have access to, which runner features you can use (e.g. whether you can use ptrace), which secrets you have access to, and more. As a concrete example, Firefox's Taskcluster settings are such that the cryptographic keys/secrets used to sign Firefox builds are inaccessible to untrusted tasks (like the equivalent of tasks initiated by PRs - the Try Server in Mozilla speak). Taskcluster is the only CI platform I'm aware of that has sufficient protections in place to mitigate the fact that CI platforms are gaping remote code execution as a service risks that can and should keep your internal security and risk teams up at night. Taskcluster's security model makes GitHub Actions, GitLab Pipelines, and other commonly used CI services look like data exfiltration and software supply chain vulnerability factories by comparison.

Taskcluster does support adding a YAML file to your repository to define tasks. However, because there's a generic scheduling API, you don't need to use it and you aren't constrained by its features. You could roll your own configuration/frontend for defining tasks: Taskcluster doesn't care because it is a true platform. In fact, Firefox mostly eschews this Taskcluster YAML, instead building out its own functionality for defining tasks. There's a pile of code checked into the Firefox repository that when run will derive the thousands of discrete tasks constituting Firefox's build and release DAG and will register the appropriate sub-graph as Taskcluster tasks. (This also happens to be a pile of YAML. But the programming primitives and control flow are largely absent from YAML files, making it a bit cleaner than the YAML DSL that e.g. GitHub and GitLab CI YAML has evolved into.) This functionality is its own mini build system where the Taskcluster platform is the execution/evaluation mechanism.

Taskcluster's model and capabilities are vastly beyond anything in GitHub Actions or GitLab Pipelines today. There's a lot of great ideas worth copying.

Unfortunately, Taskcluster is very much a power user CI offering. There's no centralized instance that anyone can use (unlike GitHub or GitLab). The learning curve is quite steep. All that power comes at a cost of complexity. I can't in good faith recommend Taskcluster to casual users. But if you want to host your own CI platform, other CI offerings don't quite cut it for you, and you can afford a few people to support your CI platform on an ongoing basis (i.e. your total cost to operate CI including people and machines is >$1M annually), then Taskcluster is worth considering.

Let's get back to the post at hand.

Looking to the Future

In my ideal world there exists a single remote code execution as a service platform purpose built for servicing both near real time and batch/delayed execution. It is probably tailored towards supporting software development, as those domain specific features set it apart from generic compute as a service tools like Kubernetes, Lambda, and others. But something more generic could potentially work.

The concept of a DAG is strongly baked into the execution model so you can define execution units as a graph, capturing dependencies. Sure, you could define isolated, ad-hoc units of work. But if you wanted to define a set of units, you could do that without having to run a persistent agent to coordinate execution through completion like build systems typically do. (Think of this as uploading your DAG to an execution service.)

In my ideal world, there is a single DAG dictating all build, testing, and release tasks. There is no DAG fragmentation at the build, CI, and other batch execute boundaries. No N+1 system or configuration to manage and additional platform to maintain because everything is unified. Economies of scale applies and overall efficiency improves through consolidation.

The platform consists of pools of workers running agents capable of performing work. There are probably pools for near real time / synchronous RPC style invocations and pools for scheduled / delayed / asynchronous execution. You can define your own worker pools and bring your own workers. Advanced customers will likely throw autoscaling groups consisting of highly ephemeral workers (such as EC2 spot instances) at these pools, scaling capacity to meet demand relatively cheaply, terminating workers and machines when capacity is no longer needed to save on billing costs (this is what Firefox's Taskcluster instance has been doing for at least 6 years).

To end-users, a local build consists of driving or scheduling the subset of the complete task graph necessary to produce the build artifacts you need. A CI build/test consists of the subset of the task graph necessary to achieve that (it is probably a superset of the local build graph). Same for releasing.

As for the configuration frontend and how execution units are defined, this platform only needs to provide a single thing: an API that can be used to schedule/execute work. However, for this product offering to be user-friendly, it should offer something like YAML configuration files like CI systems do today. That's fine: many (most?) users will stick to using the simplified YAML interface. Just as long as power users have an escape/scaling vector and can use the low-level schedule/execute API to write their own driver. People will write plug-ins for their build systems enabling it to integrate with this platform. Someone will coerce existing extensible build systems like Bazel, Buck, and Gradle to convert nodes in the build graph to compute tasks in this platform. This unlocks the unification of the build and CI systems (and maybe things like data pipelines too).

Finally, because we're talking about a specialized system tailored for software development, we need robust result/reporting APIs and interfaces. What good is all this fancy distributed remote compute if nobody can see what it is doing? This is probably the most specialized service of the bunch, as how you track results is exceptionally domain specific. Power users may want to build their own result tracking service, so keep that in mind. But the platform should provide a generic one (like what GitHub Actions and GitLab Pipelines do today) because it is a massive value add and few will use your product without such a feature.

Quickly, my proposed unified world will not alleviate the CI complexity concerns raised above: sufficiently large build/CI systems will always have an intrinsic complexity to them and possibly require specialists to maintain. However, because a complex CI system is almost always attached to a complex build system, by consolidating build and CI systems, you reduce the surface area of complexity (you don't have to worry about build/CI interop as much). Lower fragmentation reduces overall complexity, and is therefore a new win. (A similar line of thinking applies to justifying monorepositories.)

All of the components for my vision exist in some working form today. Bazel, Gradle Enterprise, and other modern build systems have RPCs for remote execute and/or caching. They are even extensible and you can write your own plugins to change core functionality for how the build system runs (to varying degrees of course). CI offerings like Taskcluster and GitLab Pipelines support scheduling DAGs of tasks (with Taskcluster's support far more suited for the desired end state). There are batch job execution frameworks like Airflow that look an awful lot like a domain-specific, specialized versions of Taskcluster. What we don't have a is a single product or service with all these features bundled as a cohesive offering.

I'm convinced that building what I'd like to see is not a question of if it can be done but whether we should and who will do it.

And this is where we probably run into problems. I hate to say it, but I'm skeptical this will exist as a widely available service outside a few corporations' walls any time soon. The reason is the total addressable market.

The value of my vision is through unification of discrete systems (build, CI, and maybe some one-offs like data pipelines) that are themselves complex enough that unification is something you'd want to do for business/efficiency reasons. After all, if it isn't complex/inefficient, you probably don't care about making it simpler/faster. Right here we are probably filtering out >90% of the market because their systems just aren't complex enough for this to matter.

This vision requires adoption of a sufficiently advanced build system so it can serve as the brains behind a unified DAG driving remote execute. Some companies and projects will adopt compatible, advanced build systems like Bazel because they have the resources, technical know-how, and efficiency incentives to pull it off. But many won't. The benefit of a more advanced build system over something simpler is often marginal. Factor in that many companies perceive build and CI support as product development overhead and a virtual cost center whose line item needs to be minimized. If you can get by on a less advanced build system that is good enough for a fraction of the cost without excessive hardship, that's the path many companies and projects will follow. Again, people and companies generally don't care about wrangling build and CI systems: they just want to ship.

The total addressable market for this idea seems too small for me to see any major player with the technical know-how to implement and offer such a service in the next few years. After all, we're not even over the hurdle that what I propose (unifying build and CI systems) is a good idea. Having worked in this space for a decade, witnessed the potential of Taskcluster's model, and seen former, present, and potential employers all struggling in this space to varying degrees, I know that this idea would be extremely valuable to some. (For some companies multiple millions of dollars could be saved annually by eliminating redundant human capital maintaining similar systems, reducing machine idle/run costs, and improving turnaround times of critical development loops.) As important as this would be to some companies, my intuition is they represent such a small sliver of the total addressable market that this slice of pie is too small for an existing CI operator like GitHub or GitLab to care about at this time. There are far more lucrative opportunities. (Such as security scanning, as laws/regulation/litigation are finally catching up to the software industry and forcing companies to take security and privacy more seriously, which translates to spending money on security services. This is why GitHub and GitLab have been stumbling over each other to announce new security features over the past 1-2 years.)

I don't think a startup in this area would be a good idea: customer acquisition is too hard. And because much of the core tech already exists in existing tools, there's not much of a moat in the way of proprietary IP to keep copycats with deep pockets at bay. Your best exit here is likely an early acquisition by a Microsoft/GitHub, GitLab, or wannabe player in this space like Amazon/AWS.

Rather, I think our best hope for seeing this vision realized is an operator of an existing major CI platform (either private or public) who also has major build system or other ad-hoc batch execute challenges will implement it and release it upon the world, either as open source or as a service offering. GitHub, GitLab, and other code hosting providers are the ideal candidates since their community effect could help drive industry adoption. But I'd happily accept pretty much any high quality offering from a reputable company!

I'm not sure when, but my money is on GitHub/Microsoft executing on this vision first. They have a stronger incentive in the form of broader market/product tie-ins (think integrated build and CI in Visual Studio or GitHub Workspaces [for Enterprises]). Furthermore, they'll feel the call from within. Microsoft has some really massive build systems and CI challenges (notably Windows). It is clear that elements of Microsoft are conducting development on GitHub, in the open even (at this point Satya Nadella's Microsoft has frozen over so many levels of hell that Dante's classics need new revisions). Microsoft engineers will feel the pain and limitations of discrete build and CI systems. Eventually there will be calls for at least a build system remote execute service/offering on GitHub. (This would naturally fall under GitHub's existing apparent market strategy of capturing more and more of the software development lifecycle.) My hope is GitHub (or whomever) will implement this as a unified platform/service/product rather than discrete services because as I've argued they are practically the same problem. But a unified offering isn't the path of least resistance, so who knows what will happen.

Conclusion

If I could snap my fingers and move industry's discrete build, CI, and maybe batch execute (e.g. data pipelines) ahead 10 years, I would:

  1. Take Mozilla's Taskcluster and its best-in-class specialized remote execute as a service platform.
  2. Add support for a real-time, synchronous execute API (like Bazel's remote execute API) to supplement the existing batch/asynchronous functionality.
  3. Define Starlark dialects so you define CI/release like primitives in build tools like Bazel. (You could also do YAML here. But if your configuration files devolve into DSL, just use a real programming language already.)
  4. Teach build tools like Bazel to work better when units of work that can take minutes or even hours to run (a synchronous/online driver model such as classically employed by build systems isn't appropriate for long-running test, release, or say data pipelines).
  5. Throw a polished web UI for platform interaction, result reporting, etc on top.
  6. Release it to the world.

Will this dream become a reality any time soon? Probably not. But I can dream. And maybe I'll have convinced a reader to pursue it.


Surprisingly Slow

April 06, 2021 at 07:00 AM | categories: Programming

I have an affinity for performance optimization and making software as efficient as possible. Over the years, I've encountered specific instances and common patterns that make software or computers slow. In this post, I'll shine a spotlight on some of them.

I'm titling this post Surprisingly Slow because the slowness was either surprising to me or the sub-optimal practices leading to slowness are prevalent enough that I think many programmers would be surprised by their existence.

The sections below are largely independent. So feel free to cherry pick the ones that interest you.

Environment Detection in Build Systems (e.g. configure and cmake)

This is the topic that inspired this post.

Build systems often feature an environment detection / configuration phase before the build phase. In UNIX land, autoconf generated configure scripts are prevalent. CMake is also popular. These tools run a bunch of code to probe the state of the current system so that the build configuration is appropriate for the current build environment. For example, they'll probe for which compiler to use, its version, and what bugs and capabilities it has.

This environment detection and configuration is a necessary evil because machines and environments often vary substantially and you need to account for those variances.

The problem is that this configuration step often takes longer to run than the build itself! Build systems for small programs or libraries will often spend 10+ seconds running configure and complete the actual compilation and linking in a fraction of that time. In other words, the setup to perform the build takes longer than the build itself!

Depending on how many CPU cores you have, the discrepancy may not be obvious. But I have a 16 core / 32 thread Ryzen 5950X as my primary PC and the relative slowness of the configuration step is painful to observe.

What I find even more shocking is that configuration time often still eclipses actual build time even for large projects. I'm not sure if this is still true, but a few years ago Mozilla observed that building LLVM/Clang on a 96 vCPU EC2 instance resulted in more time spent in cmake/configuring than compiling and linking! And that's a very large C++ project with thousands of source files being compiled!

Build configuration is often a discrete step that executes serially before what most people consider the actual build. To improve efficiency, build configuration needs to be parallelized. Even better, it should be integrated into the main build DAG itself so parts of the build can start running without having to wait for all build configuration. Unfortunately, many common tools performing build configuration can't easily be adapted to this model. So there's not much many of us can do.

Another solution to this problem is avoiding the problem of environment detection in the first place. If you have deterministic and reproducible build environments, you can take a lot of shortcuts to skip environment detection that just isn't needed any more. This is more or less the approach of modern build tools like Bazel. I do wonder how much of the speed gains from tools like Bazel are due to eliminating environment configuration. I suspect it is a lot!

New Process Overhead on Windows

New processes on Windows can't be spawned as quickly as they can on POSIX based operating systems, like Linux. On Windows, assume a new process will take 10-30ms to spawn. On Linux, new processes (often via fork() + exec() will take single digit milliseconds to spawn, if that).

However, thread creation on Windows is very fast (~dozens of microseconds).

These Stack Overflow have some more details.

A few dozen milliseconds is an eternity in CPU time. And it is long enough that it eats into a large percentage of the time budget for people to perceive something as instantaneous. So this may contribute to the perception that Windows is slower than Linux.

If your program architecture consists of spawning new processes left and right (this is common in UNIX land), this can pose performance problems on Windows, as the overhead of new process creation on Windows can really add up:

  • 10ms * 1,000 invocations = 10s
  • 20ms * 10,000 invocations = 200s
  • 30ms * 100,000 invocations = 3,000s

Using the example of configure above, configure files are often shell scripts. And shell scripts often do a lot of their work by spawning other processes like grep, sed, and sort. Even the [ operator could be a new process (seriously: there's probably a /usr/bin/[ executable in your POSIX environment). (Although [ might be a shell built-in.) Command pipe chains (e.g. command | grep | awk) spawn multiple processes serially and can be visually slow to run. Anyway, it is not uncommon for a configure script to spawn thousands of new processes. Assuming 10ms per process, at 1,000 invocations that is 10s of overhead just spawning new processes! This further exacerbates the problem in the previous section!

If your software runs on Windows, consider the impact that relatively slow process spawning will have. Consider a multi-threaded architecture or using longer-lived daemon/background processes instead.

Closing File Handles on Windows

Many years ago I was profiling Mercurial to help improve the working directory checkout speed on Windows, as users were observing that checkout times on Windows were much slower than on Linux, even on the same machine.

I thought I could chalk this up to NTFS versus Linux filesystems or general kernel/OS level efficiency differences. What I actually learned was much more surprising.

When I started profiling Mercurial on Windows, I observed that most I/O APIs were completing in a few dozen microseconds, maybe a single millisecond or two ever now and then. Windows/NTFS performance seemed great!

Except for CloseHandle(). These calls were often taking 1-10+ milliseconds to complete. It seemed odd to me that file writes - even sustained file writes that were sufficient to blow past any write buffering capacity - were fast but closes slow. It was even more perplexing that CloseHandle() was slow even if you were using completion ports (i.e. async I/O). This behavior for completion ports was counter to what the MSDN documentation said should happen (the function should return immediately and its status can be retrieved later).

While I didn't realize it at the time, the cause for this was/is Windows Defender. Windows Defender (and other anti-virus / scanning software) typically work on Windows by installing what's called a filesystem filter driver. This is a kernel driver that essentially hooks itself into the kernel and receives callbacks on I/O and filesystem events. It turns out the close file callback triggers scanning of written data. And this scanning appears to occur synchronously, blocking CloseHandle() from returning. This adds milliseconds of overhead. The net effect is that file mutation I/O on Windows is drastically reduced by Windows Defender and other A/V scanners.

As far as I can tell, as long as Windows Defender (and presumably other A/V scanners) are running, there's no way to make the Windows I/O APIs consistently fast. You can disable A/V scanning (at your own peril). But the trick that Mercurial employs (which has later been emulated by rustup among other tools) is to use a thread pool for calling CloseHandle(). Even if you perform all file open and write I/O on a single thread and use a background thread pool only for calling CloseHandle(), you can see a >3x speedup in time to write files. This optimization should ideally be employed by any software that creates or mutates as little as a few hundred files on Windows. This includes version control tools, installers, and archive extraction tools. Fun fact: rustup can extract tar files on Windows faster than open source and commercial fast extraction/copy tools because it employs this trick and more. I believe rustup on Windows is actually faster at extracting tar archives than it is on Linux!

The artificial I/O latency added by scanning software such as Windows Defender is super annoying. But the performance gains from working around it by using a thread pool for background is often worth the complexity. I have no doubt that if this optimization were baked into popular Windows tools (namely installers), people would be shocked by how much faster things could be.

Writing to Terminals

As a maintainer of Firefox's build system, I fielded a handful of reports from people complaining about builds being slower than their peers on identical hardware. While there are many causes for this, one of the most surprising was the impact the terminal has on build performance.

Writing to the terminal is usually fast. Until it isn't.

What I learned is that writing tons of output or getting clever with writing to the terminal (e.g. writing colors, moving the cursor position to write over existing content) can drastically slow down applications.

Writing to the terminal via stderr/stdout is likely performed via blocking I/O. So if the thing handling your write() (the terminal emulator) doesn't finish its handling promptly, your process just sits around waiting on the terminal to do its thing.

We discovered that different terminals have their own quirks. Historically, the Windows Command Prompt and the built-in Terminal.app on macOS were very slow at handling tons of output. I remember (but can't find the bug or commit to Firefox) when we made the build system quiet by default and that reduced build times by minutes in some configurations.

A few years ago, npm infamously had a performance sucking progress spinner. While I'm not sure how much of this was terminal slowness versus calling progress update code too frequently, the terminal likely played a part because terminals do have a limit to how often they can accept input to draw.

I've found that modern terminals are better about writing a ton of plain text than they were in ~2012, when I was tackling these problems in Firefox's build system. But I would still exercise extreme caution when doing fancy things with the terminal, like coloring text, drawing footers, etc. Always use buffered I/O to minimize the number of write() actually going to the terminal, flushing as needed (hopefully sparingly). Consider using an async thread for writing to stdout/stderr. Record the total time spent in blocking I/O to stdout/stderr so you can measure terminal I/O latency. And periodically compare the wall time delta between stdout/stderr connected to a terminal and /dev/null when running your program to see if there is a discrepancy worth caring about. Finally, consider throttling writes to the terminal. Instead of writing a footer after every line of output, consider buffering lines for a few milliseconds and emitting all lines plus the new footer in batches. If drawing a progress bar or spinner or something of that nature, I would limit drawing to ~10 Hz to minimize terminal overhead.

Thermal Throttling / ACPI C/P-States / Processor Throttling Behavior

We like to think that a computer and its processors are either on or off. If only things were that simple.

Processors are constantly changing their operating envelope as they are running. The following statements are all true (although not every item applies to all machines or CPU models):

  • The MHz each CPU core is running at can fluctuate wildly from 1 second to the next.
  • CPU cores may go to sleep or enter a very low power mode, even if others are running.
  • Cores may underclock significantly if temperature goes beyond a threshold. They may refuse to run faster until the temperature drops. Faulty sensors can lead to premature behavior.
  • Cores may only reach their maximum frequency if other cores are also running. The physical proximity of that other core may matter.
  • It could take dozens, hundreds, or even thousands of milliseconds for an idling core to ramp up to its full speed.
  • The behavior of power scaling can vary substantially depending on whether a machine is connected to external power or running off the battery.
  • The behavior of power scaling can vary substantially depending on whether the battery is fully charged or nearly empty.
  • Apple laptops may exhibit thermal throttling when charging from the left side. (Yes, seriously: always charge your MacBook Pro from the right. And if your employees use Apple laptops for CPU heavy tasks, consider an awareness campaign to encourage charging from the right side. Even better, deploy software that checks for left side charging and alert accordingly. Although I have yet to find any software or API to detect this.)
  • A core may slow down in order to process certain instructions (like AVX-512).

Modern CPUs are really dynamic beasts and their operating behavior is often seemingly unpredictable. Furthermore, CPU models can vary from one to the next. For example, an EPYC or Xeon processor will likely behave differently from a Ryzen or Core i7/i9 which will behave differently depending on whether you are running in a desktop or laptop. (I observed a few years ago that Xeon cores won't turbo as easily as consumer grade CPUs.)

Power fluctuations and their impact on performance are one of the reasons why it is extremely difficult to conduct proper benchmarks. When benchmarking, you need to control the power variable or at least report its state so results are qualified appropriately. I am very skeptical of benchmark results that don't report the power configuration/methodology (this is most of them, sadly) and especially of benchmarks conducted on laptops, as battery operated devices are much more susceptible to power throttling than desktops or servers.

I have personally had a MacBook Pro become thermal throttled because an internal screw came loose and blocked a fan from spinning. macOS didn't warn me: all I knew was that my Firefox builds become 2-3x slower for no apparent reason! I have also observed my MacBook Pro becoming hot due to left side charging. Charging from the right magically made things faster.

At Mozilla, when we started rolling out Xeon desktops to employees, we had reports of wildly varying build speeds. On some operating systems (Mozilla had very lax central machine provisioning and allowed people full domain of their company issued hardware), the default ACPI C/P-States were such that CPU cores were scaling differently.

What we observed was the compile phase of the build was fine. But some people were reporting linking times 2-4x longer (dozens of seconds to minutes) than others on equivalent configurations! This was a big deal because the wall time of an incremental/non-full build is dominated by linking time. We eventually discovered that on the slow machines, the CPU core doing the linking was only running at 25-50% of its potential. Think 1.0-1.5 GHz. But if you started additional CPU heavy tasks, that core ramped up. We discovered that different operating systems had different defaults for the ACPI C/P-States. The more conservative settings would result in CPU cores not scaling their frequency unless there was sufficient CPU load to merit it. Changing to more aggressive power settings ensured better and consistent results.

Laptops are highly susceptible to thermal throttling and aggressive power throttling to conserve battery. I hold the general opinion that laptops are just too variable to have reliable performance. Given the choice, I want CPU heavy workloads running in controlled and observed desktops or server environments.

But servers aren't immune: their ACPI C-State and P-State settings can drastically impact performance. Dialing these up to max so all the cores run at full (or are ready to run at full in a few milliseconds) is possible. However, this may greatly increase your power consumption. You can do this on some cloud providers (like AWS) for no additional direct cost to you. However, higher energy consumption is bad for the environment. Data centers already have a carbon footprint about the size of the airline industry (during non-pandemic times) and that footprint is growing. So think about your ethical responsibilities to the environment before having your server fleet consume potentially megawatts more power.

Python, Node.js, Ruby, and other Interpreter Startup Overhead

Complex systems will often execute Python, Node.js, and other interpreters thousands or more times during their execution. For example, the Firefox build system invokes thousands of Python processes performing common tasks, such as wrapping the compiler invocation. And the Mercurial test harness invokes thousands of Python processes by running hg as part of its testing. I've heard of similar stories involving Node.js, Ruby, and other interpreters, often in the context of use in build systems.

An oft ignored fact about launching a new interpreter process is that each invocation often takes single to dozens of milliseconds to initialize the interpreter. i.e. the new process spends time at the beginning of process execution just getting to the code you are telling it to run. Sometimes the new process overhead is so bad that the slowdown is obvious and rules out the use of a technology. The JVM historically has been notorious for this, which is why use of Java typically entails fewer, longer-running processes over more, domain-limited processes.

I've written about Python's startup overhead before. In 2014 I measured that Mercurial's test harness spends 10-18% of its total CPU time just getting to the point where the interpreter/process can run custom bytecode and 30-38% of its total CPU time getting to the point where Mercurial performs command dispatch (additional time here is mostly module importing overhead).

You may think that a few milliseconds of overhead can't matter that much. But if you multiply by 1,000, 10,000, 100,000 or more, milliseconds matter:

  • 1ms * 1,000 invocations = 1s
  • 10ms * 10,000 invocations = 100s
  • 100ms * 100,000 invocations = 10,000s (2.77 hours)

On Windows, this problem is compounded because of it relatively slow new process startup (see section above).

Programmers need to think long and hard about your process invocation model. Consider the use of fewer processes and/or consider alternative programming languages that don't have significant startup overhead if this could become a problem (anything that compiles down to assembly is usually fine).

Pretty Much all Storage I/O

Of my general affinity for performance optimization, I have a special affinity for I/O optimization. I think the main reason is that the disconnect between the potential for modern storage devices and what is actually achieved is so wide. On paper, software should be getting ~10x the performance from modern storage devices than what we typically see.

Modern storage devices are absurdly fast. The NVMe storage in my primary PC can sustain reads at >3 GB/s (>6 GB/s sequential), writes at ~1 GB/s (4+ GB/s sequential), can perform >500,000 I/O operations per second, and can service many I/O operations in the ~10 microsecond latency range. Modern NVMe storage is roughly on par with the performance of DDR2 DRAM (launched in 2003) in terms of throughput (latency still trails but ~10us is nothing to scoff at).

For comparison, the 1 TB Western Digital Caviar Black spinning disk I retired from my PC a few weeks ago can only do ~90 MB/s sequential reads and writes, 1-2 MB/s random reads and writes, ~12 ms access times. I'm unsure what IOPS is, but considering ~12 ms access times and the physical nature of spinning disks, it can't be more than a few hundred.

Modern NVMe storage is 1.5-3 magnitudes faster than the best spinning disks from little over a decade ago. So why isn't all storage I/O ~instantaneous?

The short answer is that most software fails to utilize the potential of modern storage devices or even worse actively undermines it through bad practices.

For the former, I'll refer you to the excellent Modern Storage is Plenty Fast. It is the APIs That are Bad. tl;dr you can harness the full power of your modern storage device if you bypass the standard OS/kernel I/O primitives and issue I/O operations directly against the device. So, software abstractions in the OS/kernel are eating a lot of potential.

For the software undermining storage device potential aspect, I'll briefly touch on the fsync() POSIX function. By calling this function, you effectively say be sure the state of this file descriptor is persisted to the storage device or I don't want to lose any changes I've made.

Data consistency and durability are important. But the cost to achieving them can be absurdly high. And as it turns out, it is also subtly difficult to do correctly in practice. I'll refer you to Dan Luu's excellent Files are Hard. The papers linked offer a sobering assessment. I'll reinforce the message with PostgreSQL's fsync() surprise, which chronicles how PostgreSQL maintainers learned about how Linux can flat out drop errors when performing device I/O, leading to data corruption. Yikes!

Anyway, about fsync(). The concept of fsync() is sound: ensure this thing is persisted to the storage device. But the implementation is often a pile of inefficiency leading to slowness.

On many Linux filesystems (including ext4), the implementation of fsync() is such that upon calls, all unflushed writes are persisted to storage. So if process A writes out a 1 GB file and process B writes 1 byte to another file and calls fsync() on that single byte write, Linux/ext4 will need to write 1 GB to the storage device, not 1 byte. So on Linux/ext4, all it takes is a random process somewhere to issue fsync() and all dirty page cache entries need to be flushed. On most systems, there's usually something continuously incurring write I/O, so the amount of storage device I/O incurred by fsync() is almost always larger than just the mutated file/directory you actually want persisted.

This behavior can cause a ton of problems. For starters, it artificially increases I/O latency. You'd think that calling fsync() after a minimal change would be ~instantaneous. But if there are lots of dirty pages to be flushed, it could take seconds. At my current employer, we ran into this exact problem with GitHub Enterprise, which has a monolithic architecture. A MySQL database was running off the same ext4 filesystem as the Git repositories. MySQL will call fsync() frequently to ensure transactions and the transaction journal are persisted to storage. But if a Git GC were running and Git just finished writing a multi-gigabyte packfile, MySQL's fsync() would be stuck waiting on Git's large write to finish persisting. This led to slowness of future MySQL transactions and even some application-level timeouts. When people say databases and other stores should be isolated to their own volumes/filesystems, fsync()'s wonky behavior is a big reason why.

Fortunately, newer versions of Linux/ext4 contain a fast commits feature that changes behavior and enables more granular flushing of fsync() to storage, just like it is documented to do. But as the feature is pretty new, it could take a while to stabilize and make its way to distros. I can't wait for it though!

Another problem with fsync() is that it is called more often than it needs to be. Now, if you have mission critical data and need consistency and durability, you should absolutely be calling fsync() appropriately. But the reality is that many data workloads and machine environments don't actually need strong data guarantees!

Take for example Kubernetes pods or CI runners. Or even servers for a stateless service. Ask yourself, what's the worst that could happen if the machine loses power and there is data loss on the local filesystem? In a lot of scenarios the answer is nothing. You've designed your system to be stateless and fault tolerant. You manage your servers as cattle. You treat local filesystems as ephemeral. So if a machine fails, you provision a new one to replace it. In these scenarios, fsync() buys you little to nothing but can cost you a lot!

The cost of avoidable fsync() can be substantial. Combined with the inefficient global flushing behavior of Linux/ext4, it can be a performance sapper, especially on slower storage devices. Fortunately, there are options. Many databases and other popular software has a way to prevent the issuance of fsync(). If your data is ephemeral, consider disabling fsync() for a likely significant performance boost! For software that doesn't support disabling fsync(), the aptly named eatmydata tool and LD_PRELOAD library can be used to nerf fsync() and other similar functionality by intercepting the function calls and making them no-op. Last but not least, for ephemeral machines, consider building a patched Linux kernel that turns fsync() and friends into no-ops. (I'm not sure of anyone who does this. But I've considered it because getting eatmydata to work in places like launched containers can be a bit of a pain.)

I'll close this section with a link to my favorite commit to the Firefox repository: Disable Places during reftests, preventing 50 GB of I/O. While this commit goes beyond disabling fsync(), fsync() (and its Windows equivalent) was responsible for some of the performance loss. Excessive I/O and needless persisting of changes to device can really sap performance. Storage software usually errors on the side of consistency (this is the correct default in my opinion). Given the costs that consistency imposes, you should seriously consider nerfing the guarantees and speeding up I/O when that option is viable for you.

Data Compression

I could write an entire post on the topic of data compression and its widespread suboptimal use. Here is the concise version.

At its core, data compression is a trade-off between CPU and I/O usage. Typically it involves one of the following scenarios:

  1. I/O (either storage or network) is the bottleneck, so we want to trade more CPU to reduce I/O throughput.
  2. At rest storage is expensive, so we want to trade more CPU for lower storage utilization/costs.

Since the early days of computing, a maxim has been that storage is slow and expensive compared to CPU. So trading CPU to reduce storage utilization seemed like a solid bet.

Fast forward to 2021.

As I wrote in the previous section, modern storage I/O is absurdly fast. It is also historically cheap.

Networks have also gotten faster. 1 gbps (125 MB/s) is pretty universal at this point. 2.5 gbps (312 MB/s) is getting deployed in consumer and office environments. 10 gbps (1250 MB/s) is common in data centers. And faster than 10 gbps is possible.

Meanwhile CPUs have somewhat plateaued in their single core performance in the past decade. We've been stuck at ~4 GHz for years. All of the performance gains from CPUs have come from adding more CPU cores to the package and instructions per cycle (IPC) efficiency wins (we've also gotten some agonizing security vulnerabilities like Spectre and Meltdown out of this IPC work as well).

What this all means is that the relative performance difference between CPUs and I/O has compressed significantly (pardon my pun). ~30 years ago, CPUs ran at ~100 MHz and Internet was using dial-up at say 50 kbps, or 0.05 mbps, or 6.25 kBps. That's 16,000 cycles per byte. Today, we're at ~4 GHz with say 1 Gbps / 125 MB/s networks. That's 32 cycles per byte, a decrease of 500x. (In fairness, the ratio closes when you consider that we likely have >1 CPU core competing for I/O and factor in IPC gains. But we're still talking about the relative difference in CPU and I/O decreasing by 1-1.5 magnitudes.) Years ago, trading CPU to lessen the I/O load was often obviously correct. Today, because of the advancements in I/O performance relative to CPU and a substantially reduced cycles per I/O byte budget, the answer is a lot murkier.

Not helping is the prevalence of ancient compression algorithms. DEFLATE - the algorithm behind the ubiquitous zlib library and gzip data format - is ~30 years old. DEFLATE was designed in an era when computers had like 1 MB RAM and 100 MB hard drives. Different times.

DEFLATE/zlib became very popular in a world where I/O was much slower and compression was often a necessity. Not using compression on a dial-up modem resulted in massive performance differences! And because of its popularity in the early days of the Internet, DEFLATE/zlib is available in the standard library of many programming languages. It seems to be the first compression format people reach for when someone says/thinks add compression.

The ubiquity of zlib is good from a dependency perspective: everyone can read zlib/gzip. But for scenarios where you control the reader and writer, use of zlib in 2021 constitutes negligence because its performance lags contemporary solutions. Modern compression libraries (zstandard is my favorite) can yield substantially faster compression and decompression speed while delivering better compression ratios in most data sets. My 2017 Better Compression with Zstandard post dives into the numbers. (I've been meaning to revisit that post since zstandard has since seen multiple 10+% speedups in subsequent releases, making it even more compelling.) If you don't need the ubiquity of zlib (e.g. you control the writers and readers), there's little reason to use zlib over something more modern. Compared to zlib, modern compression libraries like zstandard are the closest thing to magical pixie dust that you can sprinkle on your software for free performance.

If you are using compression (especially zlib) for real-time compression (sending compressed data somewhere where it will be decompressed immediately), you need to measure the line speed of the compressor and decompressor. Then compare that to the uncompressed line speed. Are you bottlenecked by I/O in the uncompressed case? If not, do you need the bandwidth or I/O capacity being saved by compression? If not, why are you using compression at all? You just measured that all compression did was artificially slow down your software for no reason! Given that zlib compression will often fail to saturate a 1 gbps link, there's a very real chance your use of compression introduces an artificial CPU bottleneck!

If you are using compression (especially zlib) for data archiving (storing compressed data somewhere where it will be decompressed eventually), you need to measure and compare compression ratios and line speeds of different compression formats and their settings. Like the real-time compression scenario, if decompression reduces your line speed from uncompressed, you are artificially slowing down access to your data. Maybe that's justified to save on storage costs. But in many cases, you can swap in a different compression library and get similar to better compression ratios while achieving better (de)compression speeds. Who wouldn't want free performance and storage cost reductions?

As an aside, one of the reasons I love zstandard is it can be tuned from something that is screaming fast (GB/s at compression and decompression ends) to something that is very slow on the compression side but yields terrific compression ratios, while still preserving GB/s decompression speeds. This enables you to use the same format for vastly different use cases. You can also dynamically change the storage characteristics of your data. For example, you can initially write data with a fast setting so you aren't CPU constrained on the writer. Then you can have some batch job come around and recompress your data with more aggressive settings, making it much smaller. It's not like zlib where the range of compression settings goes from kinda slow and not very good compression ratios to pretty slow and still not very good compression ratios.

When you know to look for it, inefficiency due to unjustified use of compression or failure to leverage modern compression libraries is everywhere. Here are some common operations in my daily workflow that are bottlenecked by use of slow compression formats and could be made faster by using a different compression format:

  • Installing Apt packages (packages are gzip compressed). (Fun fact, installing apt packages is also subject to fsync() slowness as described above because the package manager will issue an fsync() at least once for each package.)
  • Installing Homebrew packages (packages are gzip compressed).
  • Installing Python packages via pip (source archives are gzip tarballs and wheels are zip files, which use zlib compression).
  • Pushing/pulling Docker images (layers inside Docker images are gzip compressed).
  • Git (wire protocol data exchange and on-disk storage is using zlib). (When I added zstandard support to Mercurial, it reduced the transfer size from servers to ~89% of original while using ~60% of the server-side CPU.)

In the corporate world, there's probably multiple petabyte scale data warehouses, data lakes, data coliseums (I can't keep up with what we're calling them now) storing data in gzip. Dozens of terabytes could likely be shaved by moving to something like zstandard. If using LZMA (which has extremely slow decompression speeds), storage costs are cheap, but data access is extremely slow, making your data querying slow. I haven't had the opportunity to measure it, but I suspect some of the reputation Hadoop and other Big Data systems have for being slow is because they are CPU constrained by suboptimal use of compression.

My experience is that many programmers don't understand the trade-offs and nuances of compression and/or lack knowledge about the existence of more modern, superior compression libraries. Instead, the collective opinion is compression is good, use [zlib] compression. Like many things in software, the real world is complex and nuanced. The dynamics of the relative power and cost of computer components has shifted the pendulum towards compression adding more cost than it saves. And it hasn't helped that industry still widely uses a ~30 year old compression format (DEFLATE/zlib) that is far from ideal for modern computers. If you take the time to measure, I'm sure you'll find many cases where use of compression is either ill-advised or would benefit from a more modern compression library (like zstandard).

x86_64 Binaries in Linux Distribution Packages

Linux distributions often provide pre-built binaries to install via packaging tools (e.g. apt install or yum install).

To keep things simpler and to ensure maximum compatibility, these pre-built binaries are built such that they run on as many computers as possible. Currently, many Linux distributions (including RHEL and Debian) have binary compatibility with the first x86_64 processor, the AMD K8, launched in 2003. These processors featured modern instruction sets, like MMX, 3DNow!, SSE, and SSE2.

What this means is that by default, binaries provided by many Linux distributions won't contain instructions from modern Instruction Set Architectures (ISAs). No SSE4. No AVX. No AVX2. And more. (Well, technically binaries can contain newer instructions. But they likely won't be in default code paths and there will likely be run-time dispatching code to opt into using them.)

Furthermore, C/C++ compilers (like Clang and GCC) will also target an ancient x86_64 microarchitecture level by default (this is where the distribution's binary compatibility defaults come from). So if you compile your own code and don't specify settings like -march or -mtune to change the default targeting settings, your compiled binaries won't leverage SSE4, AVX, etc. You can still force your application to emit these instructions in dynamic code paths without -march/-mtune overrides. But you have to opt in and add additional code complexity to do that.

Because of the conservative microarchitecture targeting settings of compilers and distribution binaries by extension, that's nearly 20 years of ISA work and efficiency gains from more powerful ISAs (like superlinear vectorized instructions) left on the table. And here I get frustrated when my PRs linger unreviewed for more than a day. Imagine what it is like to be an AMD or Intel engineer and have your ISA work take ~decades to be adopted at scale!

Truth be told, I'm unsure how much of a performance impact this ISA backwards compatibility sacrifices. It will vary heavily from workload to workload. But I have no doubt there are some very large datacenters running CPU intensive workloads that could see massive efficiency gains by leveraging modern ISAs. If you are running thousands of servers and your CPU load isn't coming from a JIT'ed language like Java (JITs can emit instructions for the machine they are running on... because they compile just in time), it might very well be worth compiling CPU heavy packages (and their dependencies of course) from source targeting a modern microarchitecture level so you don't leave the benefits of modern ISAs on the table. And be forewarned: use of modern ISAs isn't a silver bullet! Some instructions can actually result in the CPU underclocking in order to run them, making code using those instructions fast but other code slow.

Maintaining binary compatibility with a vanishingly small number of ancient CPUs at the expense of performance on modern CPUs seems... questionable. Fortunately, Linux distributions and Clang/GCC are paying attention.

GCC 11 and Clang 12 define x86_64-{v2, v3, v4} architecture levels targeting ~Nehalem (released 2008), ~Haswell (released 2013), and AVX-512 (~2015), respectively. So you can add e.g. -march=x86_64-v3 to target Haswell era and newer and have the compiler emit SSE4, AVX, AVX2, and other modern instructions.

RHEL9 will be raising their minimum architecture requirement from x86_64 to x86_64-v2, effectively requiring CPUs from 2008+ instead of 2003+.

If you'd like to learn more about this topic, start at this Pharonix article and follow the links to other articles and mailing list discussions.

It's worth noting that at the time I write this, AWS 4th generation EC2 instances (c4, m4, and r4) all support AVX2 and I believe are compatible with GCC/Clang's x86_64-v3 target. And 5th generation Intel instances have AVX-512, presumably making them compatible with x86_64-v4. So even if your distribution targets x86_64-v2, there is still potential free performance from newer ISAs on the table.

If I were operating a server fleet consisting of thousands of machines, I would be very tempted to compile all packages from source targeting a modern microarchitecture level. This would be costly in terms of complexity. But for some workloads, the performance gains could be worth the effort. And this conservative targeting approach may provide justification for running modern-optimized Linux distributions or cloud vendor specific Linux distributions (e.g. Amazon Linux). I'm unsure if distributions like Amazon Linux take advantage of this. If not, they should look into it!

Read the next section for an example of where failure to leverage modern ISAs translates to a performance loss.

Many Implementations of Myers Diff and Other Line Based Diffing Algorithms

This one is rather domain specific but I find it an illustrious example because the behavior is quite counter-intuitive!

Various classes of software need to take two text documents and emit a textual diff of their contents. Think what git diff displays.

There are various algorithms for generating a diff of text. Myers Diff is probably the most famous. The run-time of the algorithms is proportional to the number of lines. Probably O(nlog(n)) or O(n^2).

These text-based diffing algorithms often operate at the line level (rather than say the byte or codepoint level) because it drastically limits the search space and minimizes n to keep the algorithm run-time in check.

Over the years, various people have realized that when diffing two text documents, large parts of the inputs are often identical (why would you diff unrelated content after all). So most implementations of diff algorithms have a myriad of optimizations to limit the number of lines compared. Two common optimizations are to identify and exclude the common prefix and suffix of the input.

This is over-simplified, but text-based diffing algorithms often do the following:

  1. Split the input into lines.
  2. Hash each line to facilitate fast line equivalence testing (comparing a u32 or u64 checksum is a ton faster than memcmp() or strcmp()).
  3. Identity and exclude common prefix and suffix lines.
  4. Feed remaining lines into diffing algorithm.

The idea is that steps 1-3 - which should be O(n) - reduce work for an algorithm (step 4) with run-time complexity worse than O(n). Sounds good on paper.

So what actually happens?

If you profile a number of these diff implementations, you find that steps 1-3 actually take more time than the supposedly slow/expensive algorithm! How can this be?!

One culprit is the line splitting. Even assuming we can use memory 0-copy / references for storing the line contents (as opposed to allocating a new string to hold each parsed line, which can be much less efficient), splitting text into lines can be grossly inefficient!

There are various reasons for this. Maybe you are decoding the text into code points rather than operate in the domain of bytes (you shouldn't need to decode the entire input to search for newlines). Maybe you are traversing the file one character/byte at a time looking for LF.

An efficient solution to this problem employs the use of vectorized CPU instructions (like AVX/AVX2) which can scan several bytes at a time looking for a sentinel value or matching a byte mask. So instead of 1 instruction per input byte, you have 1/n. Your C runtime library probably has assembly implementations of memchr(), strchr(), and similar functions and automatically chooses the newest/fastest assembly/instructions supported by the run-time CPU (glibc does).

In theory, compilers recognize such patterns and emit modern vectorized instructions automagically. In reality, because the default target ISA of compilers is relatively ancient compared to what your CPU is capable of (see previous section), you are stuck with old instructions and linear scanning. Your best bet is to stick with functions in the C runtime that are probably backed by assembly. (Although watch out for function call overhead.)

Another culprit causing inefficiency is hashing each line. The hashing is performed to reduce equivalence testing to a u32/u64 compare rather than strcmp(). Many implementations don't seem to give much consideration to the hashing algorithm, using something like crc32 or djb2. An inefficiency here is many older hashing algorithms operate at the byte level: you need to feed in 1 byte at a time, update state (XOR if often employed), then feed in the next byte. This is inefficient because it throws away the instruction pipelining and superscalar properties of modern CPUs. A better approach is to use a hashing algorithm that digests 4, 8, or more bytes at a time. Again, this lowers run-time from ~n cycles per byte to ~1/n.

Another common inefficiency is computing the lines and hashes of content in the common prefix and suffix. Use of memcmp() (or even better: hand-rolled assembly to give you the offset of the first divergence) is more efficient, as again, your C runtime library probably has assembly implementations of memcmp() which can compare input at near native memory speed.

I quite enjoy this example because it demonstrates that something that is seemingly O(n) is slower than O(nlog(n))/O(n^2). This is because often the result of the optimization reduces the n of the expensive algorithm to such a small value that the computational complexity is trivial. Compilers targeting ancient microarchitectures and failing to leverage vectorized instructions which unlock superlinear performance further shift the time towards the O(n) optimizations.

Conclusion

Computers and software can be surprisingly slow for surprising reasons. While this post was long and touched on a number of topics, it only scratched the surface of potential topics. I could easily find another 10 topics to write about. But that will have to be for another post.

Before I go, if you find inaccuracies in this post, please shoot me an email (address in resume in site header) so I can correct the post, as I don't want to unintentionally mislead others.

Also, computers and software are complex. When it comes to performance and optimizations, always be measuring. The issues I described could be manifesting in your software and environments but the effort to address them may not be worth the reward. Computers and software, like life, are full of trade-offs. Performance is just one trade-off. Please don't cargo cult my advice without measuring and applying critical thinking first.


Next Page »