a structural diff that understands syntax 🟥🟩
 
 
 
 
 
Go to file
Wilfred Hughes 28d5e51911 Fix crash on multibyte characters
Previously parsing would proceed byte at a time, which would crash if
the source contained multibyte characters. Instead, try all the
regular expression patterns, and jump to the next nearest match.
2021-07-18 22:34:52 +07:00
.github/workflows Don't bother with clippy on CI for now 2021-03-07 14:10:44 +07:00
config OCaml allows ; as punctuation too 2021-07-17 00:02:59 +07:00
img Line up visible lines and include gaps if necessary 2021-07-07 22:01:19 +07:00
sample_files Add JSON to sample files 2021-07-05 13:13:47 +07:00
src Fix crash on multibyte characters 2021-07-18 22:34:52 +07:00
.gitignore Ignore data from perf when profiling 2021-07-05 13:17:05 +07:00
CHANGELOG.md Fix crash on multibyte characters 2021-07-18 22:34:52 +07:00
Cargo.lock Remove unused dependency 2021-07-17 15:35:43 +07:00
Cargo.toml Remove unused dependency 2021-07-17 15:35:43 +07:00
LICENSE Add LICENSE file 2021-07-04 11:41:39 +07:00
README.md Support using difftastic with built-in git commands 2021-07-18 15:01:32 +07:00
text_diff_notes.md Add notes on LCS weaknesses 2021-03-21 13:34:01 +07:00

README.md

It's Difftastic!

Difftastic is an experimental structured diff tool that compares files based on their syntax.

screenshot

It is very much unfinished. It works reasonably on very parenthesised data (lisps, JSON), it works sometimes on other languages with sufficient parentheses (Rust, JS), and falls back to a line-oriented diff otherwise.

How It Works

(1) Parsing.

Difftastic treats source code as a sequence of atoms or (possibly nested) lists.

Language syntax is defined in config/syntax.toml: you provide regular expressions for atoms (including comments), open delimiters, and close delimiters.

This is heavily inspired by Comby, which handles a large number of languages by using a similar approach.

(2) Diffing.

Difftastic treats diff calculations as a graph search problem. It finds the minimal diff using Dijkstra's algorithm.

This is based on the excellent Autochrome project.

(3) Printing.

Difftastic prints a side-by-side diff that fits the current terminal. It will try to align unchanged nodes (see screenshot above).

Known Problems

Crashes. The code is underdocumented, undertested, and unfinished.

Performance. Difftastic scales relatively poorly on files with a large number of changes, and can use a lot of memory. This might be solved by A* search.

Replacing top-level expressions. If you delete a function and write a completely different new one, difftastic will show the small number of common tokens between them.

Comments. Small changes can show big diffs.

Non-goals

Patch files. If you want to create a patch that you can later apply, use diff. Difftastic ignores whitespace, so its output is lossy. (AST patching is also a hard problem.)

Dogfooding

Once you've compiled difftastic and it's on $PATH, you can try dogfooding.

To see the changes to the current git repo in difftastic, you can add the following to your .gitconfig and run git difftool.

[diff]
        tool = difftastic

[difftool]
        prompt = false

[difftool "difftastic"]
        cmd = difftastic "$LOCAL" "$REMOTE"

Alternatively, to run difftastic as the default diff engine for a git invocation:

$ CLICOLOR_FORCE=1 GIT_EXTERNAL_DIFF=difftastic git diff
$ CLICOLOR_FORCE=1 GIT_EXTERNAL_DIFF=difftastic git log -p --ext-diff

Further Reading

The wiki includes a thorough overview of alternative diffing techniques and tools.