a structural diff that understands syntax 🟥🟩
 
 
 
 
 
Go to file
Wilfred Hughes 8b382e4356 Improved OCaml handling
Fixes #17
Fixes #15
2021-07-13 22:52:13 +07:00
.github/workflows Don't bother with clippy on CI for now 2021-03-07 14:10:44 +07:00
config Improved OCaml handling 2021-07-13 22:52:13 +07:00
img Line up visible lines and include gaps if necessary 2021-07-07 22:01:19 +07:00
sample_files Add JSON to sample files 2021-07-05 13:13:47 +07:00
src Prefer novel nodes that follow another novel node 2021-07-13 22:19:21 +07:00
.gitignore Ignore data from perf when profiling 2021-07-05 13:17:05 +07:00
CHANGELOG.md Improved OCaml handling 2021-07-13 22:52:13 +07:00
Cargo.lock Roll version 2021-07-07 22:25:03 +07:00
Cargo.toml Roll version 2021-07-07 22:25:03 +07:00
LICENSE Add LICENSE file 2021-07-04 11:41:39 +07:00
README.md Clarify slider and performance status 2021-07-13 22:31:13 +07:00
text_diff_notes.md Add notes on LCS weaknesses 2021-03-21 13:34:01 +07:00

README.md

It's Difftastic!

Difftastic is an experimental structured diff tool that compares files based on their syntax.

screenshot

It is very much unfinished. It works reasonably on very parenthesised data (lisps, JSON), it works sometimes on other languages with sufficient parentheses (Rust, JS), and falls back to a line-oriented diff otherwise.

How It Works

(1) Parsing.

Difftastic treats source code as a sequence of atoms or (possibly nested) lists.

Language syntax is defined in config/syntax.toml: you provide regular expressions for atoms (including comments), open delimiters, and close delimiters.

This is heavily inspired by Comby, which handles a large number of languages by using a similar approach.

(2) Diffing.

Difftastic treats diff calculations as a graph search problem. It finds the minimal diff using Dijkstra's algorithm.

This is based on the excellent Autochrome project.

(3) Printing.

Difftastic prints a side-by-side diff that fits the current terminal. It will try to align unchanged nodes (see screenshot above).

Known Problems

Crashes. The code is underdocumented, undertested, and unfinished.

Performance. Difftastic scales relatively poorly on files with a large number of changes, and can use a lot of memory. This might be solved by A* search.

Replacing top-level expressions. If you delete a function and write a completely different new one, difftastic will show the small number of common tokens between them.

Comments. Small changes can show big diffs.

Non-goals

Patch files. If you want to create a patch that you can later apply, use diff. Difftastic ignores whitespace, so it is output is lossy. (AST patching is also a hard problem.)

Testing with Git

[diff]
        tool = difftastic

[difftool "difftastic"]
        cmd = ~/projects/difftastic/target/debug/difftastic "$LOCAL" "$REMOTE"

You can then run git difftool -y to see the current repo changes in difftastic.

Further Reading

The wiki includes a thorough overview of alternative diffing techniques and tools.