Achieving A Completely Open Source Implementation of Apple Code Signing and Notarization

August 08, 2022 at 08:08 AM | categories: Apple, Rust

As I've previously blogged in Pure Rust Implementation of Apple Code Signing (2021-04-14) and Expanding Apple Ecosystem Access with Open Source, Multi Platform Code signing (2022-04-25), I've been hacking on an open source implementation of Apple code signing and notarization using the Rust programming language. This takes the form of the apple-codesign crate / library and its rcodesign CLI executable. (Documentation / GitHub project /

As of that most recent post in April, I was pretty happy with the relative stability of the implementation: we were able to sign, notarize, and staple Mach-O binaries, directory bundles (.app, .framework bundles, etc), XAR archives / flat packages / .pkg installers, and DMG disk images. Except for the known limitations, if Apple's official codesign and notarytool tools support it, so do we. This allows people to sign, notarize, and release Apple software from non-Apple operating systems like Linux and Windows. This opens up new avenues for Apple platform access.

A major limitation in previous versions of the apple-codesign crate was our reliance on Apple's Transporter tool for notarization. Transporter is a Java application made available for macOS, Linux, and Windows that speaks to Apple's servers and can upload assets to their notarization service. I used this tool at the time because it seemed to be officially supported by Apple and the path of least resistance to standing up notarization. But Transporter was a bit wonky to use and an extra dependency that you needed to install.

At WWDC 2022, Apple announced a new Notary API as part of the App Store Connect API. In what felt like a wink directly at me, Apple themselves even calls out the possibility for leveraging this API to notarize from Linux! I knew as soon as I saw this that it was only a matter of time before I would be able to replace Transporter with a pure Rust client for the new HTTP API. (I was already thinking about using the unpublished HTTP API that notarytool uses. And from the limited reversing notes I have from before WWDC it looks like the new official Notary API is very similar - possibly identical to - what notarytool uses. So kudos to Apple for opening up this access!)

I'm very excited to announce that we now have a pure Rust implementation of a client for Apple's Notary API in the apple-codesign crate. This means we can now notarize Apple software from any machine where you can get the Rust crate to compile. This means we no longer have a dependency on the 3rd party Apple Transporter application. Notarization, like code signing, is 100% open source Rust code.

As excited as I am to announce this new feature, I'm even more excited that it was largely implemented by a contributor, Robin Lambertz / @roblabla! They filed a GitHub feature request while WWDC 2022 was still ongoing and then submitted a PR a few days later. It took me a few months to get around to reviewing it (I try to avoid computer screens during summers), but it was a fantastic PR given the scope of the change. It never ceases to bring joy to me when someone randomly contributes greatness to open source.

So, as of the just-released 0.17 release of the apple-codesign Rust crate and its corresponding rcodesign CLI tool, you can now rcodesign notary-submit to speak to Apple's Notary API using a pure Rust client. No more requirements on 3rd party, proprietary software. All you need to sign and notarize Apple applications is the self-contained rcodesign executable and a Linux, Windows, macOS, BSD, etc machine to run it on.

I'm stoked to finally achieve this milestone! There are probably thousands of companies and individuals who have wanted to release Apple software from non-macOS operating systems. (The existence and popularity of tools like fastlane seems to confirm this.) The historical lack of an Apple code signing and notarization solution that worked outside macOS has prevented this. Well, that barrier has officially fallen.

Release notes, documentation, and (self-signed) pre-built executables of the rcodesign executable for major platforms are available on the 0.17 release page.

Announcing the PyOxy Python Runner

May 10, 2022 at 08:00 AM | categories: Python, PyOxidizer

I'm pleased to announce the initial release of PyOxy. Binaries are available on GitHub.

(Yes, I used my pure Rust Apple code signing implementation to remotely sign the macOS binaries from GitHub Actions using a YubiKey plugged into my Windows desktop: that experience still feels magical to me.)

PyOxy is all of the following:

  • An executable program used for running Python interpreters.
  • A single file and highly portable (C)Python distribution.
  • An alternative python driver providing more control over the interpreter than what python itself provides.
  • A way to make some of PyOxidizer's technology more broadly available without using PyOxidizer.

Read the following sections for more details.

pyoxy Acts Like python

The pyoxy executable has a run-python sub-command that will essentially do what python would do:

$ pyoxy run-python
Python 3.9.12 (main, May  3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.

A Python REPL. That's familiar!

You can even pass python arguments to it:

$ pyoxy run-python -- -c 'print("hello, world")'
hello, world

When a pyoxy executable is renamed to any filename beginning with python, it implicitly behaves like pyoxy run-python --.

$ mv pyoxy python3.9
$ ls -al python3.9
-rwxrwxr-x  1 gps gps 120868856 May 10  2022 python3.9

$ ./python3.9
Python 3.9.12 (main, May  3 2022, 03:29:54)
[Clang 14.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Single File Python Distributions

The official pyoxy executables are built with PyOxidizer and leverage the Python distributions provided by my python-build-standalone project. On Linux and macOS, a fully featured Python interpreter and its library dependencies are statically linked into pyoxy. The pyoxy executable also embeds a copy of the Python standard library and imports it from memory using the oxidized_importer Python extension module.

What this all means is that the official pyoxy executables can function as single file CPython distributions! Just download a pyoxy executable, rename it to python, python3, python3.9, etc and it should behave just like a normal python would!

Your Python installation has never been so simple. And fast: pyoxy should be a few milliseconds faster to initialize a Python interpreter mostly because of oxidized_importer and it avoiding filesystem overhead to look for and load .py[c] files.

Low-Level Control Over the Python Interpreter with YAML

The pyoxy run-yaml command is takes the path to a YAML file defining the embedded Python interpreter configuration and then launches that Python interpreter in-process:

$ cat > hello_world.yaml <<EOF
allocator_debug: true
  run_command: 'print("hello, world")'

$ pyoxy run-yaml hello_world.yaml
hello, world

Under the hood, PyOxy uses the pyembed Rust crate to manage embedded Python interpreters. The YAML document that PyOxy uses is simply deserialized into a pyembed::OxidizedPythonInterpreterConfig Rust struct, which pyembed uses to spawn a Python interpreter. This Rust struct offers near complete control over how the embedded Python interpreter behaves: it even allows you to tweak settings that are impossible to change from environment variables or python command arguments! (Beware: this power means you can easily cause the interpreter to crash if you feed it a bad configuration!)

YAML Based Python Applications

pyoxy run-yaml ignores all file content before the YAML --- start document delimiter. This means that on UNIX-like platforms you can create executable YAML files defining your Python application. e.g.

$ mkdir -p myapp
$ cat > myapp/ << EOF
print("hello from myapp")

$ cat > say_hello <<"EOF"
"exec" "`dirname $0`/pyoxy" run-yaml "$0" -- "$@"
  run_module: 'myapp'
  module_search_paths: ["$ORIGIN"]

$ chmod +x say_hello

$ ./say_hello
hello from myapp

This means that to distribute a Python application, you can drop a copy of pyoxy in a directory then define an executable YAML file masquerading as a shell script and you can run Python code with as little as two files!

The Future of PyOxy

PyOxy is very young. I hacked it together on a weekend in September 2021. I wanted to shore up some functionality before releasing it then. But I got perpetually sidetracked and never did the work. I figured it would be better to make a smaller splash with a lesser-baked product now than wait even longer. Anyway...

As part of building PyOxidizer I've built some peripheral technology:

  • Standalone and highly distributable Python builds via the python-build-standalone project.
  • The pyembed Rust crate for managing an embedded Python interpreter.
  • The oxidized_importer Python package/extension for importing modules from memory, among other things.
  • The Python packed resources data format for representing a collection of Python modules and resource files for efficient loading (by oxidized_importer).

I conceived PyOxy as a vehicle to enable people to leverage PyOxidizer's technology without imposing PyOxidizer onto them. I feel that PyOxidizer's broader technology is generally useful and too valuable to be gated behind using PyOxidizer.

PyOxy is only officially released for Linux and macOS for the moment. It definitely builds on Windows. However, I want to improve the single file executable experience before officially releasing PyOxy on Windows. This requires an extensive overhaul to oxidized_importer and the way it serializes Python resources to be loaded from memory.

I'd like to add a sub-command to produce a Python packed resources payload. With this, you could bundle/distribute a Python application as pyoxy plus a file containing your application's packed resources alongside YAML configuring the Python interpreter. Think of this as a more modern and faster version of the venerable zipapp approach. This would enable PyOxy to satisfy packaging scenarios provided by tools like Shiv, PEX, and XAR. However, unlike Shiv and PEX, pyoxy also provides an embedded Python interpreter, so applications are much more portable since there isn't reliance on the host machine having a Python interpreter installed.

I'm really keen to see how others want to use pyoxy.

The YAML based control over the Python interpreter could be super useful for testing, benchmarking, and general Python interpreter configuration experimentation. It essentially opens the door to things previously only possible if you wrote code interfacing with Python's C APIs.

I can also envision tools that hide the existence of Python wanting to leverage the single file Python distribution property of pyoxy. For example, tools like Ansible could copy pyoxy to a remote machine to provide a well-defined Python execution environment without having to rely on what packages are installed. Or pyoxy could be copied into a container or other sandboxed/minimal environment to provide a Python interpreter.

And that's PyOxy. I hope you find it useful. Please file any bug reports or feature requests in PyOxidizer's issue tracker.

Expanding Apple Ecosystem Access with Open Source, Multi Platform Code Signing

April 25, 2022 at 08:00 AM | categories: Apple, Rust

A little over one year ago, I announced a project to implement Apple code signing in pure Rust. There have been quite a number of developments since that post and I thought a blog post was in order. So here we are!

But first, some background on why we're here.


(Skip this section if you just want to get to the technical bits.)

Apple runs some of the largest and most profitable software application ecosystems in existence. Gaining access to these ecosystems has traditionally required the use of macOS and membership in the Apple Developer Program.

For the most part this makes sense: if you want to develop applications for Apple operating systems you will likely utilize Apple's operating systems and Apple's official tooling for development and distribution. Sticking to the paved road is a good default!

But many people want more... flexibility. Open source developers, for example, often want to distribute cross-platform applications with minimal effort. There are entire programming language ecosystems where the operating system you are running on is abstracted away as an implementation detail for many applications. By creating a de facto requirement that macOS, iOS, etc development require the direct access to macOS and (often above market priced) Apple hardware, the distribution requirements imposed by Apple's software ecosystems are effectively exclusionary and prevent interested parties from contributing to the ecosystem.

One of the aspects of software distribution on Apple platforms that trips a lot of people up is code signing and notarization. Essentially, you need to:

  1. Embed a cryptographic signature in applications that effectively attests to its authenticity from an Apple Developer Program associated account. (This is signing.)
  2. Upload your application to Apple so they can inspect it, verify it meets requirements, likely store a copy of it. Apple then issues their own cryptographic signature called a notarization ticket which then needs to be stapled/attached to the application being distributed so Apple operating systems can trust it. (This is notarization.)

Historically, these steps required Apple proprietary software run exclusively from macOS. This means that even if you are in a software ecosystem like Rust, Go, or the web platform where you can cross-compile apps without direct access to macOS (testing is obviously a different story), you would still need macOS somewhere if you wanted to sign and notarize your application. And signing and notarization is effectively required on macOS due to default security settings. On mobile platforms like iOS, it is impossible to distribute applications that aren't signed and notarized unless you are running a jailbreaked device.

A lot of people (myself included) have grumbled at these requirements. Why should I be forced to involve an Apple machine as part of my software release process if I don't need macOS to build my application? Why do I have to go through a convoluted dance to sign and notarize my application at release time - can't it be more streamlined?

When I looked at this space last year, I saw some obvious inefficiencies and room to improve. So as I said then, I foolishly set out to reimplement Apple code signing so developers would have more flexibility and opportunity for distributing applications to Apple's ecosystems.

The ultimate goal of this work is to expand Apple ecosystem access to more developers. A year later, I believe I'm delivering a product capable of doing this.

One Year Later

Foremost, I'm excited to announce release of rcodesign 0.14.0. This is the first time I'm publishing pre-built binaries (Linux, Windows, and macOS) of rcodesign. This reflects my confidence in the relative maturity of the software.

In case you are wondering, yes, the macOS rcodesign executable is self-signed: it was signed by a GitHub Actions Linux runner using a code signing certificate exclusive to a YubiKey. That YubiKey was plugged into a Windows 11 desktop next to my desk. The rcodesign executable was not copied between machines as part of the signing operation. Read on to learn about the sorcery that made this possible.

A lot has changed in the apple-codesign project / Rust crate in the last year! Just look at the changelog!

The project was renamed from tugger-apple-codesign.

(If you installed via cargo install, you'll need to cargo install --force apple-codesign to force Cargo to overwrite the rcodesign executable with one from a different crate.)

The rcodesign CLI executable is still there and more powerful than ever. You can still sign Apple applications from Linux, Windows, macOS, and any other platform you can get the Rust program to compile on.

There is now Sphinx documentation for the project. This is published on alongside PyOxidizer's documentation (because I'm using a monorepo). There's some general documentation in there, such as a guide on how to selectively bypass Gatekeeper by deploying your own alternative code signing PKI to parallel Apple's. (This seems like something many companies would want but for whatever reason I'm not aware of anyone doing this - possibly because very few people understand how these systems work.)

There are bug fixes galore. When I look back at the state of rcodesign when I first blogged about it, I think of how naive I was. There were a myriad of applications that wouldn't pass notarization because of a long tail of bugs. There are still known issues. But I believe many applications will successfully sign and notarize now. I consider failures novel and worthy of bug reports - so please report them!

Read on to learn about some of the notable improvements in the past year (many of them occurring in the last two months).

Support for Signing Bundles, DMGs, and .pkg Installers

When I announced this project last year, only Mach-O binaries and trivially simple .app bundles were signable. And even then there were a ton of subtle issues.

rcodesign sign can now sign more complex bundles, including many nested bundles. There are reports of iOS app bundles signing correctly! (However, we don't yet have good end-user documentation for signing iOS apps. I will gladly accept PRs to improve the documentation!)

The tool also gained support for signing .dmg disk image files and .pkg flat package installers.

Known limitations with signing are now documented in the Sphinx docs.

I believe rcodesign now supports signing all the major file formats used for Apple software distribution. If you find something that doesn't sign and it isn't documented as a known issue with an existing GitHub issue tracking it, please report it!

Support for Notarization on Linux, Windows, and macOS

Apple publishes a Java tool named Transporter that enables you to upload artifacts to Apple for notarization. They make this tool available for Linux, Windows, and of course macOS.

While this tool isn't open source (as far as I know), usage of this tool enables you to notarize from Linux and Windows while still using Apple's official tooling for communicating with their servers.

rcodesign now has support for invoking Transporter and uploading artifacts to Apple for notarization. We now support notarizing bundles, .dmg disk images, and .pkg flat installer packages. I've successfully notarized all of these application types from Linux.

(I'm capable of implementing an alternative uploader in pure Rust but without assurances that Apple won't bring down the ban hammer for violating terms of use, this is a bridge I'm not yet willing to cross. The requirement to use Transporter is literally the only thing standing in the way of making rcodesign an all-in-one single file executable tool for signing and notarizing Apple software and I really wish I could deliver this user experience win without reprisal.)

With support for both signing and notarizing all application types, it is now possible to release Apple software without macOS involved in your release process.

YubiKey Integration

I try to use my YubiKeys as much as possible because a secret or private key stored on a YubiKey is likely more secure than a secret or private key sitting around on a filesystem somewhere. If you hack my machine, you can likely gain access to my private keys. But you will need physical access to my YubiKey and to compel or coerce me into unlocking it in order to gain access to its private keys.

rcodesign now has support for using YubiKeys for signing operations.

This does require an off-by-default smartcard Cargo feature. So if building manually you'll need to e.g. cargo install --features smartcard apple-codesign.

The YubiKey integration comes courtesy of the amazing yubikey Rust crate. This crate will speak directly to the smartcard APIs built into macOS and Windows. So if you have an rcodesign build with YubiKey support enabled, YubiKeys should just work. Try it by plugging in your YubiKey and running rcodesign smartcard-scan.

YubiKey integration has its own documentation.

I even implemented some commands to make it easy to manage the code signing certificates on your YubiKey. For example, you can run rcodesign smartcard-generate-key --smartcard-slot 9c to generate a new private key directly on the device and then rcodesign generate-certificate-signing-request --smartcard-slot 9c --csr-pem-path csr.pem to export that certificate to a Certificate Signing Request (CSR), which you can exchange for an Applie-issued signing certificate at This means you can easily create code signing certificates whose private key was generated directly on the hardware device and can never be exported. Generating keys this way is widely considered to be more secure than storing keys in software vaults, like Apple's Keychains.

Remote Code Signing

The feature I'm most excited about is what I'm calling remote code signing.

Remote code signing allows you to delegate the low-level cryptographic signature operations in code signing to a separate machine.

It's probably easiest to just demonstrate what it can do.

Earlier today I signed a macOS universal Mach-O executable from a GitHub-hosted Linux GitHub Actions runner using a YubiKey physically attached to the Windows 11 machine next to my desk at home. The signed application was not copied between machines.

Here's how I did it.

I have a GitHub Actions workflow that calls rcodesign sign --remote-signer. I manually triggered that workflow and started watching the near real time job output with my browser. Here's a screenshot of the job logs:

GitHub Actions initiating remote code signing

rcodesign sign --remote-signer prints out some instructions (including a wall of base64 encoded data) for what to do next. Importantly, it requests that someone else run rcodesign remote-sign to continue the signing process.

And here's a screenshot of me doing that from the Windows terminal:

Windows terminal output from running remote-sign command

This log shows us connecting and authenticating with the YubiKey along with some status updates regarding speaking to a remote server.

Finally, here's a screenshot of the GitHub Actions job output after I ran that command on my Windows machine:

GitHub Actions initiating machine output

Remote signing enabled me to sign a macOS application from a GitHub Actions runner operated by GitHub while using a code signing certificate securely stored on my YubiKey plugged into a Windows machine hundreds of kilometers away from the GitHub Actions runner. Magic, right?

What's happening here is the 2 rcodesign processes are communicating with each other via websockets bridged by a central relay server. (I operate a default server free of charge. The server is open source and a Terraform module is available if you want to run your own server with hopefully just a few minutes of effort.) When the initiating machine wants to create a signature, it sends a message back to the signer requesting a cryptographic signature. The signature is then sent back to the initiator, who incorporates it.

I designed this feature with automated releases from CI systems (like GitHub Actions) in mind. I wanted a way where I could streamline the code signing and release process of applications without having to give a low trust machine in CI ~unlimited access to my private signing key. But the more I thought about it the more I realized there are likely many other scenarios where this could be useful. Have you ever emailed or Dropboxed an application for someone else to sign because you don't have an Apple issued code signing certificate? Now you have an alternative solution that doesn't require copying files around! As long as you can see the log output from the initiating machine or have that output communicated to you (say over a chat application or email), you can remotely sign files on another machine!

An Aside on the Security of Remote Signing

At this point, I'm confident the more security conscious among you have been grimacing for a few paragraphs now. Websockets through a central server operated by a 3rd party?! Giving remote machines access to perform code signing against arbitrary content?! Your fears and skepticism are 100% justified: I'd be thinking the same thing!

I fully recognize that a service that facilitates remote code signing makes for a very lucrative attack target! If abused, it could be used to coerce parties with valid code signing certificates to sign unwanted code, like malware. There are many, many, many wrong ways to implement such a feature. I pondered for hours about the threat modeling and how to make this feature as secure as possible.

Remote Code Signing Design and Security Considerations captures some of my high level design goals and security assessments. And Remote Code Signing Protocol goes into detail about the communications protocol, including the crypto (actual cryptography, not the fad) involved. The key takeaways are the protocol and server are designed such that a malicious server or man-in-the-middle can not forge signature requests. Signing sessions expire after a few minutes and 3rd parties (or the server) can't inject malicious messages that would result in unwanted signatures. There is an initial handshake to derive a session ephemeral shared encryption key and from there symmetric encryption keys are used so all meaningful messages between peers are end-to-end encrypted. About the worst a malicious server could do is conduct a denial of service. This is by design.

As I argue in Security Analysis in the Bigger Picture, I believe that my implementation of remote signing is more secure than many common practices because common practices today entail making copies of private keys and giving low trust machines (like CI workers) access to private keys. Or files are copied around without cryptographic chain-of-custody to prove against tampering. Yes, remote signing introduces a vector for remote access to use signing keys. But practiced as I intended, remote signing can eliminate the need to copy private keys or grant ~unlimited access to them. From a threat modeling perspective, I think the net restriction in key access makes remote signing more secure than the private key management practices by many today.

All that being said, the giant asterisk here is I implemented my own cryptosystem to achieve end-to-end message security. If there are bugs in the design or implementation, that cryptosystem could come crashing down, bringing defenses against message forgery with it. At that point, a malicious server or privileged network actor could potentially coerce someone into signing unwanted software. But this is likely the extent of the damage: an offline attack against the signing key should not be possible since signing requires presence and since the private key is never transmitted over the wire. Even without the end-to-end encryption, the system is arguably more secure than leaving your private key lingering around as an easily exfiltrated CI secret (or similar).

(I apologize to every cryptographer I worked with at Mozilla who beat into me the commandment that thou shall not roll their own crypto: I have sinned and I feel remorseful.)

Cryptography is hard. And I'm sure I made plenty of subtle mistakes. Issue #552 tracks getting an audit of this protocol and code performed. And the aforementioned protocol design docs call out some of the places where I question decisions I've made.

If you would be interested in doing a security review on this feature, please get in touch on issue #552 or send me an email. If there's one immediate outcome I'd like from this blog post it would be for some white hat^Hknight to show up and give me peace of mind about the cryptosystem implementation.

Until then, please assume the end-to-end encryption is completely flawed. Consider asking someone with security or cryptographer in their job title for their opinion on whether this feature is safe for you to use. Hopefully we'll get a security review done soon and this caveat can go away!

If you do want to use this feature, Remote Code Signing contains some usage documentation, including how to use it with GitHub Actions. (I could also use some help productionizing a reusable GitHub Action to make this more turnkey! Although I'm hesitant to do it before I know the cryptosystem is sound.)

That was a long introduction to remote code signing. But I couldn't in good faith present the feature without addressing the security aspect. Hopefully I didn't scare you away! Traditional / local signing should have no security concerns (beyond the willingness to run software written by somebody you probably don't know, of course).

Apple Keychain Support

As of today's 0.14 release we now have early support for signing with code signing certificates stored in Apple Keychains! If you created your Apple code signing certificates in Keychain Access or Xcode, this is probably where you code signing certificates live.

I held off implementing this for the longest time because I didn't perceive there to be a benefit: if you are on macOS, just use Apple's official tooling. But with rcodesign gaining support for remote code signing and some other features that could make it a compelling replacement for Apple tooling on all platforms, I figured we should provide the feature so we stop discouraging people to export private keys from Keychains.

This integration is very young and there's still a lot that can be done, such as automatically using an appropriate signing certificate based on what you are signing. Please file feature request issues if there's a must-have feature you are missing!

Better Debugging of Failures

Apple's code signing is complex. It is easy for there to be subtle differences between Apple's tooling and rcodesign.

rcodesign now has print-signature-info and diff-signatures commands to dump and compare YAML metadata pertinent to code signing to make it easier to compare behavior between code signing implementations and even multiple signing operations.

The documentation around debugging and reporting bugs now emphasizes using these tools to help identify bugs.

A Request For Users and Feedback

I now believe rcodesign to be generally usable. I've thrown a lot of random software at it and I feel like most of the big bugs and major missing features are behind us.

But I also feel it hasn't yet received wide enough attention to have confidence in that assessment.

If you want to help the development of this tool, the most important actions you can take are to attempt signing / notarization operations with it and report your results.

Does rcodesign spark joy? Please leave a comment in the GitHub discussion for the latest release!

Does rcodesign not work? I would very much appreciate a bug report! Details on how to file good bugs are in the docs.

Have general feedback? UI is confusing? Documentation is insufficient? Leave a comment in the aforementioned discussion. Or create a GitHub issue if you think it is actionable. I can't fix what I don't know about!

Have private feedback? Send me an email.


I could write thousands of words about all I learned from hacking on this project.

I've learned way too much about too many standards and specifications in the crypto space. RFCs 2986, 3161, 3280, 3281, 3447, 4210, 4519, 5280, 5480, 5652, 5869, 5915, 5958, and 8017 plus probably a few more. How cryptographic primitives are stored and expressed: ASN.1, OIDs, BER, DER, PEM, SPKI, PKCS#1, PKCS#8. You can show me the raw parse tree for an ASN.1 data structure and I can probably tell you what RFC defines it. I'm not proud of this. But I will say actually knowing what every field in an X.509 certificate does or the many formats that cryptographic keys are expressed in seems empowering. Before, I would just search for the openssl incantation to do something. Now, I know which ASN.1 data structures are involved and how to manipulate the fields within.

I've learned way too much around minutia around how Apple code signing actually works. The mechanism is way too complex for something in the security space. There was at least one high profile Gatekeeper bug in the past year allowing improperly signed code to run. I suspect there will be more: the surface area to exploit is just too large.

I think I'm proud of building an open source implementation of Apple's code signing. To my knowledge nobody else has done this outside of Apple. At least not to the degree I have. Then factor in that I was able to do this without access (or willingness) to look at Apple source code and much of the progress was achieved by diffing and comparing results with Apple's tooling. Hours of staring at diffoscope and comparing binary data structures. Hours of trying to find the magical settings that enabled a SHA-1 or SHA-256 digest to agree. It was tedious work for sure. I'll likely never see a financial return on the time equivalent it took me to develop this software. But, I suppose I can nerd brag that I was able to implement this!

But the real reward for this work will be if it opens up avenues to more (open source) projects distributing to the Apple ecosystems. This has historically been challenging for multiple reasons and many open source projects have avoided official / proper distribution channels to avoid the pain (or in some cases because of philosophical disagreements with the premise of having a walled software garden in the first place). I suspect things will only get worse, as I feel it is inevitable Apple clamps down on signing and notarization requirements on macOS due to the rising costs of malware and ransomware. So having an alternative, open source, and multi-platform implementation of Apple code signing seems like something important that should exist in order to provide opportunities to otherwise excluded developers. I would be humbled if my work empowers others. And this is all the reward I need.

Bulk Analyze Linux Packages with Linux Package Analyzer

January 09, 2022 at 09:10 PM | categories: packaging, Rust

I've frequently wanted to ask random questions about Linux packages and binaries:

  • Which packages provide a file X?
  • Which binaries link against a library X?
  • Which libraries/packages define/export a symbol X?
  • Which binaries reference an undefined symbol X?
  • Which features of ELF are in common use?
  • Which x86 instructions are in a binary?
  • What are the most common ELF section names and their sizes?
  • What are the differences between package X in distributions A and B?
  • Which ELF binaries have the most relocations?
  • And many more.

So, I built Linux Package Analyzer to facilitate answering questions like this.

Linux Package Analyzer is a Rust crate providing the lpa CLI tool. lpa currently supports importing Debian and RPM package repositories (the most popular Linux packaging formats) into a local SQLite database so subsequent analysis can be efficiently performed offline. In essence:

  1. Run lpa import-debian-repository or lpa import-rpm-repository and point the tool at the base URL of a Linux package repository.
  2. Package indices are fetched.
  3. Discovered .deb and .rpm files are downloaded.
  4. Installed files within each package archive are inspected.
  5. Binary/ELF files have their executable sections disassembled.
  6. Results are stored in a local SQLite database for subsequent analysis.

The LPA-built database currently stores the following:

  • Imported packages (name, version, source URL).
  • Installed files within each package (path, size).
  • ELF file information (parsed fields from header, number of relocations, important metadata from the .dynamic section, etc).
  • ELF sections (number, type, flags, address, file offset, etc).
  • Dynamic libraries required by ELF files.
  • ELF symbols (name, demangled name, type constant, binding, visibility, version string, etc).
  • For x86 architectures, counts of unique instructions in each ELF file and counts of instructions referencing specific registers.

Using a command like lpa import-debian-repository --components main,multiverse,restricted,universe --architectures amd64 impish, I can import the (currently) ~96 GB of package data from 63,720 packages defining Ubuntu 21.10 to a local ~12 GB SQLite database and answer tons of random questions. Interesting insights yielded so far include:

  • The entirety of the package ecosystem for amd64 consists of 63,720 packages providing 6,704,222 files (168,730 of them ELF binaries) comprising 355,700,362,973 bytes in total.
  • Within the 168,730 ELF binaries are 5,286,210 total sections having 606,175 distinct names. There are also 116,688,943 symbols in symbol tables (debugging info is not included and local symbols not imported or exported are often not present in symbol tables) across 19,085,540 distinct symbol names. The sum of all the unique symbol names is 1,263,441,355 bytes and 4,574,688,289 bytes if you count occurrences across all symbol tables (this might be an over count due to how ELF string tables work).
  • The longest demangled ELF symbol is 271,800 characters and is defined in the file usr/lib/x86_64-linux-gnu/ provided by the libmshr2019.2 package.
  • The longest non-mangled ELF symbol is 5,321 characters and is defined in multiple files/packages, as it is part of a library provided by GCC.
  • Only 145 packages have files with indirect functions (IFUNCs). If you discard duplicates (mainly from GCC and glibc), you are left with ~11 packages. This does not appear to be a popular ELF feature!
  • With 54,764 references in symbol tables, strlen appears to be the most (recognized) widely used libc symbol. It even bests memcpy (52,726) and free (42,603).
  • MOV is the most frequent x86 instruction, followed by CALL. (I could write an entire blog post about observations about x86 instruction use.)

There's a trove of data in the SQLite database and the lpa commands only expose a fraction of it. I reckon a lot of interesting tweets, blog posts, research papers, and more could be derived from the data that lpa assembles.

lpa does all of its work in-process using pure Rust. The Debian and RPM repository interaction is handled via the debian-packaging and rpm-repository crates (which I wrote). ELF file parsing is handled by the (amazing) object crate. And x86 disassembling via the iced-x86 crate. Many tools similar to lpa call out to other processes to interface with .deb/.rpm packages, parse ELF files, disassemble x86, etc. Doing this in pure Rust makes life so much simpler as all the functionality is self-contained and I don't have to worry about run-time dependencies for random tools. This means that lpa should just work from Windows, macOS, and other non-Linux environments.

Linux Package Analyzer is very much in its infancy. And I don't really have a grand vision for it. (I built it and some of the packaging code it is built on) in support of some even grander projects I have cooking.) Please file bugs, feature requests, and pull requests in GitHub. The project is currently part of the PyOxidizer repo (because I like monorepos). But I may pull it and other os/packaging/toolchain code into a new monorepo since target audiences are different.

I hope others find this tool useful!

Rust Implementation of Debian Packaging Primitives

January 03, 2022 at 04:00 PM | categories: packaging, Rust

Does your Linux distribution use tools with apt in their name to manage system packages? If so, your system packages are using Debian packaging.

Most tools interfacing with Debian packages (.deb files) and repositories use functionality provided by the apt repository. This repository provides libraries like libapt as well as tools like apt-get and apt. Most of the functionality is implemented in C++.

I wanted to raise awareness that I've begun implementing Debian packaging primitives in pure Rust. The debian-packaging crate is published on For now, it is developed inside the PyOxidizer repository (because I like monorepos).

So far, a handful of useful functionality is implemented:

  • Parsing and serializing control files
  • Reading repository indices files and parsing their content.
  • Reading HTTP hosted repositories.
  • Publishing repositories to the filesystem and S3.
  • Writing changelog files.
  • Reading and writing .deb files.
  • Copying repositories.
  • Creating repositories.
  • PGP signing and verification operations.
  • Parsing and sorting version strings.
  • Dependency syntax parsing.
  • Dependency resolution.

Hopefully the documentation contains all you would want to know for how to use the crate.

The crate is designed to be used as a library so any Rust program can (hopefully) easily tap the power of the Debian packaging ecosystem.

As with most software, there are likely several bugs and many features not yet implemented. But I have bulk downloaded the entirety of some distribution's repositories without running into obvious parse/reading failures. So I'm reasonably confident that important parts of the code (like control file parsing, repository indices file handling, and .deb file reading) work as advertised.

Hopefully someone out there finds this work useful!

Next Page ยป