git-diff-modulo-base: handle empty old/new ranges
Skip commit metadata if the reduced diff is empty
Add git-diff-module-base tests
You're reviewing a large pull request on a project where folks care about a clean commit history. The author has just force-pushed a new version of the pull request and you want to see only what has changed. But, oh no! They rebased on a more recent version of the destination branch, and now the diff is full of all sorts of unrelated changes. And how can you check if rebase conflicts were resolved appropriately?
Diff modulo Base comes to the rescue!
Use cargo build
or cargo install
with a reasonably recent version of stable
Rust. Two binaries are installed:
diff-modulo-base
operates on plain diff files to make the core diffing
algorithm availablegit-diff-modulo-base
directly compares Git commits or branchesThis utility allows you to compare the relevant changes of two versions of a rebased branch given three input diffs: two base diffs that show the changes since the respective merge bases and a target diff between the branches you are actually interested in.
As such, it is very similar to (and actually builds on) git range-diff
. The
difference is that the resulting output is arguably much more readable.
Example: Let's say your Git history looks like this, where main
is the
main development branch and feature
is some feature branch that has just been
rebased:
o--o--A--o--o-- ... -- o -- B -- o -- o (origin/main)
\ \
\ o -- o -- o -- D (origin/feature)
\
o -- o -- o -- C (origin/feature@{1})
Running
git diff-modulo base origin/main origin/feature@{1} origin/feature
produces output similar to git range-diff
. However, instead of producing a
"diff of diffs", the output is the origin/feature@{1}..origin/feature
diff
with noise from unrelated changes in A..B
filtered out.
The output of the tool is meant to help in the reviewing changes made to pull requests. In particular, it is meant to help answer two questions:
The output is technically an interleaving of two diffs:
C
and D
in the example).A
and B
in the example),
prefixed with hash (#
) characters.There may be lines starting with <
or >
. Those are changes that are deemed
"unimportant" by the tool. A change in the target diff that is entirely caused
by changes in the base version is considered unimportant unless it is likely to
have been involved in textual conflicts during rebase. Similarly, a change in
the base version is considered unimportant if it is not near any changes in the
base diffs.
Make sure to look at the available command-line options to see some alternative modes of usage. For example, it is possible to compare two individual commits. This occasionally comes up after cherry-picking:
git diff-modulo-base ${original-commit-hash} ${cherry-picked-commit-hash}
If you're working with repositories hosted on GitHub, it may be convenient to
automatically fetch pull request branches by adding a line such as the
following to the relevant remote
section of your local repository's
.git/config
:
fetch = +refs/pull/*:refs/remotes/origin/pull/*
Very similar output can also be obtained by invoking the underlying (not
Git-aware) diff-modulo-base
with the relevant diffs:
diff-modulo-base <(git diff A..origin/feature@{1}) <(git diff B..origin/feature) <(git diff origin/feature@{1}..origin/feature)
This produces the same kind of diff output, albeit not split according to
individual commits on the feature branch. The same effect can be achieved using
the --combined
flag of git diff-modulo-base
.
diff-modulo-base
expects standard Git-style diffs as input and works
standalone entirely based on the input diffs. In particular, the tool does not
attempt to open any of the files mentioned in the diffs.
The diff-modulo-base
command-line utility is built on a reusable library crate
that implements the underlying algorithm, including the required diff parsing
and writing.
Diffs are treated as "don't care about extended characters ASCII". That is, diffs are required to use an encoding where all ASCII characters use their standard, single-byte ASCII encoding, and any non-ASCII characters are encoded using bytes that do not correspond to ASCII characters. Just use UTF-8 and you'll be fine.
The underlying algorithm is, at a high level:
Parse the base and target diffs to obtain base and target sequences of "chunks". Chunks are either file headers or diff "hunks".
For hunks in the target diff, refer to the base diffs to check whether the
lines were changed relative to the corresponding base versions. If they were
not, mark them as context changes (lines that will be output with a <
or
>
prefix) and drop hunks or files that are entirely unmodified or context
changes.
The exact details of this algorithm are subject to change.
Contributions are welcome! Please send patches by email to ~nhaehnle/diff-modulo-base@lists.sr.ht. See the guide at https://git-send-email.io/ if you're unfamiliar with the process.
Please integrate this feature into as many development tools as you can find. To that end, the project is released under the MIT license.