a structural diff that understands syntax 🟥🟩
 
 
 
 
 
Go to file
Wilfred Hughes d7396073ee Expand readme and add screenshot 2021-06-22 00:30:58 +07:00
.github/workflows Don't bother with clippy on CI for now 2021-03-07 14:10:44 +07:00
config * is a legal symbol constituent in lisps 2021-06-22 00:12:35 +07:00
img Expand readme and add screenshot 2021-06-22 00:30:58 +07:00
sample_files Making the JS sample file more interesting 2019-11-18 17:59:04 +07:00
src Add a TODO for why elisp_after.el has poor list highlighting 2021-06-22 00:26:50 +07:00
.gitignore Initial proof of concept 2018-12-29 15:29:42 +07:00
Cargo.lock Embed syntax.toml in binary 2021-06-20 16:32:22 +07:00
Cargo.toml Embed syntax.toml in binary 2021-06-20 16:32:22 +07:00
README.md Expand readme and add screenshot 2021-06-22 00:30:58 +07:00
after.js Adding a modified list to a sample file 2019-01-16 09:42:16 +07:00
before.js Adding a modified list to a sample file 2019-01-16 09:42:16 +07:00
text_diff_notes.md Add notes on LCS weaknesses 2021-03-21 13:34:01 +07:00

README.md

It's Difftastic!

Difftastic is an experimental structured diff tool that compares files based on their syntax.

screenshot

It is very much unfinished. It works reasonably on very parenthesised data (lisps, JSON), it works sometimes on other languages with sufficient parentheses (Rust, JS), and falls back to a line-oriented diff otherwise.

See config/syntax.toml to see how languages are defined.

Other Diff Techniques

There are a bunch of other ways of diffing text files. I summarise them here, along with example invocations.

Myers' diff algorithm

This is the default diff algorithm in GNU diff and git diff. It finds the longest common subsequence (LCS) and is used on a line-by-line basis.

There's a great introduction here and the original paper is An O(ND) Difference Algorithm and Its Variations, Myers 1986.

# Modern diff supports colour, but see also
# https://www.colordiff.org/
$ diff --color=always -u sample_files/css_before.css sample_files/css_after.css

Note that GNU diff originally used the Hunt-McIlroy algorithm).

Patience Diff

Myer's diff has a problem with sliders:

 if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {
                        $smtp_server = $_;

Instead of:

+if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
 if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {

Git has a --indent-heuristic that was added to reduce the likelihood of making a bad choice. There's a corpus of test files where the ideal diff has been chosen by a human, to test different heuristics.

The patience diff algorithm is an LCS algorithm that aims to do a better job with sliders. It produces great results by doing more work.

# Original behaviour
$ git diff --no-indent-heuristic --no-index sample_files/css_before.css sample_files/css_after.css
# As of git 2.11, this heuristic is enabled by default.
$ git diff --indent-heuristic --no-index sample_files/css_before.css sample_files/css_after.css
# Patience algorithm does a better a job in this example.
$ git diff --patience --no-index sample_files/css_before.css sample_files/css_after.css

Diff Match Patch also has some excellent discussions of diff designs on the author's website (e.g diff strategies).

Histogram Diff

Git 1.7.7+ also has a histogram algorithm, which aims to produce better results than Myers' algorithm but without the slowdown of the patience algorithm.

# Inferior to patience on this example file.
$ git diff --histogram --no-index sample_files/css_before.css sample_files/css_after.css

Side-by-side Diff

$ diff -y --color=always sample_files/css_before.css sample_files/css_after.css

Tree Diff

Most tree diff implementations focus on XML, and there's a great overview of techniques in this blog post.

Jane Street's patdiff implements a tree diff, using an A* algorithm.

prettydiff

prettydiff does really well out of the box with the sample files here. It implements LCS on words.

wu-diff

wu-diff doesn't have much documentation, but it gives the same results as other LCS implementations in Rust.

JSON diff

json-diff provides a proper structural diff for JSON files.

graphtage

graphtage compares structured data by parsing into a generic file format, then displaying a diff. It finds the optimal edit sequence, and even allows things like diffing JSON against YAML.

Lisp diffs

sdiff and diff-sexp explore s-expression oriented diffs.