Absorbing Commit Changes in Mercurial 4.8

November 05, 2018 at 09:25 AM | categories: Mercurial, Mozilla

Every so often a tool you use introduces a feature that is so useful that you can't imagine how things were before that feature existed. The recent 4.8 release of the Mercurial version control tool introduces such a feature: the hg absorb command.

hg absorb is a mechanism to automatically and intelligently incorporate uncommitted changes into prior commits. Think of it as hg histedit or git rebase -i with auto squashing.

Imagine you have a set of changes to prior commits in your working directory. hg absorb figures out which changes map to which commits and absorbs each of those changes into the appropriate commit. Using hg absorb, you can replace cumbersome and often merge conflict ridden history editing workflows with a single command that often just works. Read on for more details and examples.

Modern version control workflows often entail having multiple unlanded commits in flight. What this looks like varies heavily by the version control tool, standards and review workflows employed by the specific project/repository, and personal preferences.

A workflow practiced by a lot of projects is to author your commits into a sequence of standalone commits, with each commit representing a discrete, logical unit of work. Each commit is then reviewed/evaluated/tested on its own as part of a larger series. (This workflow is practiced by Firefox, the Git and Mercurial projects, and the Linux Kernel to name a few.)

A common task that arises when working with such a workflow is the need to incorporate changes into an old commit. For example, let's say we have a stack of the following commits:

$ hg show stack
  @  1c114a ansible/hg-web: serve static files as immutable content
  o  d2cf48 ansible/hg-web: synchronize templates earlier
  o  c29f28 ansible/hg-web: convert hgrc to a template
  o  166549 ansible/hg-web: tell hgweb that static files are in /static/
  o  d46d6a ansible/hg-web: serve static template files from httpd
  o  37fdad testing: only print when in verbose mode
 /   (stack base)
o  e44c2e (@) testing: install Mercurial 4.8 final

Contained within this stack are 5 commits changing the way that static files are served by hg.mozilla.org (but that's not important).

Let's say I submit this stack of commits for review. The reviewer spots a problem with the second commit (serve static template files from httpd) and wants me to make a change.

How do you go about making that change?

Again, this depends on the exact tool and workflow you are using.

A common workflow is to not rewrite the existing commits at all: you simply create a new fixup commit on top of the stack, leaving the existing commits as-is. e.g.:

$ hg show stack
  o  deadad fix typo in httpd config
  o  1c114a ansible/hg-web: serve static files as immutable content
  o  d2cf48 ansible/hg-web: synchronize templates earlier
  o  c29f28 ansible/hg-web: convert hgrc to a template
  o  166549 ansible/hg-web: tell hgweb that static files are in /static/
  o  d46d6a ansible/hg-web: serve static template files from httpd
  o  37fdad testing: only print when in verbose mode
 /   (stack base)
o  e44c2e (@) testing: install Mercurial 4.8 final

When the entire series of commits is incorporated into the repository, the end state of the files is the same, so all is well. But this strategy of using fixup commits (while popular - especially with Git-based tooling like GitHub that puts a larger emphasis on the end state of changes rather than the individual commits) isn't practiced by all projects. hg absorb will not help you if this is your workflow.

A popular variation of this fixup commit workflow is to author a new commit then incorporate this commit into a prior commit. This typically involves the following actions:

<save changes to a file>

$ hg commit
<type commit message>

$ hg histedit
<manually choose what actions to perform to what commits>

OR

<save changes to a file>

$ git add <file>
$ git commit
<type commit message>

$ git rebase --interactive
<manually choose what actions to perform to what commits>

Essentially, you produce a new commit. Then you run a history editing command. You then tell that history editing command what to do (e.g. to squash or fold one commit into another), that command performs work and produces a set of rewritten commits.

In simple cases, you may make a simple change to a single file. Things are pretty straightforward. You need to know which two commits to squash together. This is often trivial. Although it can be cumbersome if there are several commits and it isn't clear which one should be receiving the new changes.

In more complex cases, you may make multiple modifications to multiple files. You may even want to squash your fixups into separate commits. And for some code reviews, this complex case can be quite common. It isn't uncommon for me to be incorporating dozens of reviewer-suggested changes across several commits!

These complex use cases are where things can get really complicated for version control tool interactions. Let's say we want to make multiple changes to a file and then incorporate those changes into multiple commits. To keep it simple, let's assume 2 modifications in a single file squashing into 2 commits:

<save changes to file>

$ hg commit --interactive
<select changes to commit>
<type commit message>

$ hg commit
<type commit message>

$ hg histedit
<manually choose what actions to perform to what commits>

OR

<save changes to file>

$ git add <file>
$ git add --interactive
<select changes to stage>

$ git commit
<type commit message>

$ git add <file>
$ git commit
<type commit message>

$ git rebase --interactive
<manually choose which actions to perform to what commits>

We can see that the number of actions required by users has already increased substantially. Not captured by the number of lines is the effort that must go into the interactive commands like hg commit --interactive, git add --interactive, hg histedit, and git rebase --interactive. For these commands, users must tell the VCS tool exactly what actions to take. This takes time and requires some cognitive load. This ultimately distracts the user from the task at hand, which is bad for concentration and productivity. The user just wants to amend old commits: telling the VCS tool what actions to take is an obstacle in their way. (A compelling argument can be made that the work required with these workflows to produce a clean history is too much effort and it is easier to make the trade-off favoring simpler workflows versus cleaner history.)

These kinds of squash fixup workflows are what hg absorb is designed to make easier. When using hg absorb, the above workflow can be reduced to:

<save changes to file>

$ hg absorb
<hit y to accept changes>

OR

<save changes to file>

$ hg absorb --apply-changes

Let's assume the following changes are made in the working directory:

$ hg diff
diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -76,7 +76,7 @@ LimitRequestFields 1000
      # Serve static files straight from disk.
      <Directory /repo/hg/htdocs/static/>
          Options FollowSymLinks
 -        AllowOverride NoneTypo
 +        AllowOverride None
          Require all granted
      </Directory>

@@ -86,7 +86,7 @@ LimitRequestFields 1000
      # and URLs are versioned by the v-c-t revision, they are immutable
      # and can be served with aggressive caching settings.
      <Location /static/>
 -        Header set Cache-Control "max-age=31536000, immutable, bad"
 +        Header set Cache-Control "max-age=31536000, immutable"
      </Location>

      #LogLevel debug

That is, we have 2 separate uncommitted changes to ansible/roles/hg-web/templates/vhost.conf.j2.

Here is what happens when we run hg absorb:

$ hg absorb
showing changes for ansible/roles/hg-web/templates/vhost.conf.j2
        @@ -78,1 +78,1 @@
d46d6a7 -        AllowOverride NoneTypo
d46d6a7 +        AllowOverride None
        @@ -88,1 +88,1 @@
1c114a3 -        Header set Cache-Control "max-age=31536000, immutable, bad"
1c114a3 +        Header set Cache-Control "max-age=31536000, immutable"

2 changesets affected
1c114a3 ansible/hg-web: serve static files as immutable content
d46d6a7 ansible/hg-web: serve static template files from httpd
apply changes (yn)?
<press "y">
2 of 2 chunk(s) applied

hg absorb automatically figured out that the 2 separate uncommitted changes mapped to 2 different changesets (Mercurial's term for commit). It print a summary of what lines would be changed in what changesets and prompted me to accept its plan for how to proceed. The human effort involved is a quick review of the proposed changes and answering a prompt.

At a technical level, hg absorb finds all uncommitted changes and attempts to map each changed line to an unambiguous prior commit. For every change that can be mapped cleanly, the uncommitted changes are absorbed into the appropriate prior commit. Commits impacted by the operation are rebased automatically. If a change cannot be mapped to an unambiguous prior commit, it is left uncommitted and users can fall back to an existing workflow (e.g. using hg histedit).

But wait - there's more!

The automatic rewriting logic of hg absorb is implemented by following the history of lines. This is fundamentally different from the approach taken by hg histedit or git rebase, which tend to rely on merge strategies based on the 3-way merge to derive a new version of a file given multiple input versions. This approach combined with the fact that hg absorb skips over changes with an ambiguous application commit means that hg absorb will never encounter merge conflicts! Now, you may be thinking if you ignore lines with ambiguous application targets, the patch would always apply cleanly using a classical 3-way merge. This statement logically sounds correct. But it isn't: hg absorb can avoid merge conflicts when the merging performed by hg histedit or git rebase -i would fail.

The above example attempts to exercise such a use case. Focusing on the initial change:

diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -76,7 +76,7 @@ LimitRequestFields 1000
     # Serve static files straight from disk.
     <Directory /repo/hg/htdocs/static/>
         Options FollowSymLinks
-        AllowOverride NoneTypo
+        AllowOverride None
         Require all granted
     </Directory>

This patch needs to be applied against the commit which introduced it. That commit had the following diff:

diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -73,6 +73,15 @@ LimitRequestFields 1000
         {% endfor %}
     </Location>

+    # Serve static files from templates directory straight from disk.
+    <Directory /repo/hg/hg_templates/static/>
+        Options None
+        AllowOverride NoneTypo
+        Require all granted
+    </Directory>
+
+    Alias /static/ /repo/hg/hg_templates/static/
+
     #LogLevel debug
     LogFormat "%h %v %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\""
     ErrorLog "/var/log/httpd/hg.mozilla.org/error_log"

But after that commit was another commit with the following change:

diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -73,14 +73,21 @@ LimitRequestFields 1000
         {% endfor %}
     </Location>

-    # Serve static files from templates directory straight from disk.
-    <Directory /repo/hg/hg_templates/static/>
-        Options None
+    # Serve static files straight from disk.
+    <Directory /repo/hg/htdocs/static/>
+        Options FollowSymLinks
         AllowOverride NoneTypo
         Require all granted
     </Directory>

...

When we use hg histedit or git rebase -i to rewrite this history, the VCS would first attempt to re-order commits before squashing 2 commits together. When we attempt to reorder the fixup diff immediately after the commit that introduces it, there is a good chance your VCS tool would encounter a merge conflict. Essentially your VCS is thinking you changed this line but the lines around the change in the final version are different from the lines in the initial version: I don't know if those other lines matter and therefore I don't know what the end state should be, so I'm giving up and letting the user choose for me.

But since hg absorb operates at the line history level, it knows that this individual line wasn't actually changed (even though the lines around it did), assumes there is no conflict, and offers to absorb the change. So not only is hg absorb significantly simpler than today's hg histedit or git rebase -i workflows in terms of VCS command interactions, but it can also avoid time-consuming merge conflict resolution as well!

Another feature of hg absorb is that all the rewriting occurs in memory and the working directory is not touched when running the command. This means that the operation is fast (working directory updates often account for a lot of the execution time of hg histedit or git rebase commands). It also means that tools looking at the last modified time of files (e.g. build systems like GNU Make) won't rebuild extra (unrelated) files that were touched as part of updating the working directory to an old commit in order to apply changes. This makes hg absorb more friendly to edit-compile-test-commit loops and allows developers to be more productive.

And that's hg absorb in a nutshell.

When I first saw a demo of hg absorb at a Mercurial developer meetup, my jaw - along with those all over the room - hit the figurative floor. I thought it was magical and too good to be true. I thought Facebook (the original authors of the feature) were trolling us with an impossible demo. But it was all real. And now hg absorb is available in the core Mercurial distribution for anyone to use.

From my experience, hg absorb just works almost all of the time: I run the command and it maps all of my uncommitted changes to the appropriate commit and there's nothing more for me to do! In a word, it is magical.

To use hg absorb, you'll need to activate the absorb extension. Simply put the following in your hgrc config file:

[extensions]
absorb =

hg absorb is currently an experimental feature. That means there is no commitment to backwards compatibility and some rough edges are expected. I also anticipate new features (such as hg absorb --interactive) will be added before the experimental label is removed. If you encounter problems or want to leave comments, file a bug, make noise in #mercurial on Freenode, or submit a patch. But don't let the experimental label scare you away from using it: hg absorb is being used by some large install bases and also by many of the Mercurial core developers. The experimental label is mainly there because it is a brand new feature in core Mercurial and the experimental label is usually affixed to new features.

If you practice workflows that frequently require amending old commits, I think you'll be shocked at how much easier hg absorb makes these workflows. I think you'll find it to be a game changer: once you use hg abosrb, you'll soon wonder how you managed to get work done without it.