|
|
|
|
@ -1,9 +1,22 @@
|
|
|
|
|
# Tree Diffing
|
|
|
|
|
|
|
|
|
|
The following diff tools build a tree and compare it, rather than a
|
|
|
|
|
simple textual comparison.
|
|
|
|
|
This page summarises some of the other tree diffing tools available.
|
|
|
|
|
|
|
|
|
|
## json-diff (Pairwise Comparison)
|
|
|
|
|
If you're in a hurry, start by looking at Autochrome. It's extremely
|
|
|
|
|
capable, and has an excellent description of the design.
|
|
|
|
|
|
|
|
|
|
If you're interested in a summary of the academic literature, [this
|
|
|
|
|
blog
|
|
|
|
|
post](http://useless-factor.blogspot.com/2008/01/matching-diffing-and-merging-xml.html)
|
|
|
|
|
(and its [accompanying
|
|
|
|
|
paper](http://useless-factor.blogspot.com/2008/01/matching-diffing-and-merging-xml.html)
|
|
|
|
|
-- mirrored under a CC BY-NC license) are great resources.
|
|
|
|
|
|
|
|
|
|
## json-diff (2012)
|
|
|
|
|
|
|
|
|
|
Languages: JSON
|
|
|
|
|
Algorithm: Pairwise comparison
|
|
|
|
|
Output: CLI colours
|
|
|
|
|
|
|
|
|
|
[json-diff](https://github.com/andreyvit/json-diff) performs a
|
|
|
|
|
structural diff of JSON files. It considers subtrees to be different
|
|
|
|
|
@ -13,17 +26,30 @@ entirely different.
|
|
|
|
|
json-diff is also noteworthy for its extremely readable display of
|
|
|
|
|
results.
|
|
|
|
|
|
|
|
|
|
## Autochrome (Dijkstra's Algorithm)
|
|
|
|
|
## GumTree (2014)
|
|
|
|
|
|
|
|
|
|
[Autochrome](https://fazzone.github.io/autochrome.html) parses Clojure
|
|
|
|
|
with a custom parser that preserves comments. Autochrome uses
|
|
|
|
|
Dijkstra's algorithm to compare syntax trees.
|
|
|
|
|
Languages: [~10 programming
|
|
|
|
|
languages](https://github.com/GumTreeDiff/gumtree/wiki/Languages)
|
|
|
|
|
Parser: Several, including [srcML](https://www.srcml.org/)
|
|
|
|
|
Algorithm: Top-down, then bottom-up
|
|
|
|
|
Ouput: HTML, Swing GUI, or text
|
|
|
|
|
|
|
|
|
|
Autochrome's webpage includes worked examples of the algorithm and a
|
|
|
|
|
discussion of design tradeoffs. It's a really great resource for
|
|
|
|
|
understanding tree diffing techniques in general.
|
|
|
|
|
[GumTree](https://github.com/GumTreeDiff/gumtree) can parse several
|
|
|
|
|
programming languages and then performs a tree-based diff, outputting
|
|
|
|
|
an HTML display.
|
|
|
|
|
|
|
|
|
|
## Tree Diff (A* Search)
|
|
|
|
|
The GumTree algorithm is described in the associated paper
|
|
|
|
|
'Fine-grained and accurate source code differencing' by Falleri et al
|
|
|
|
|
([DOI](http://doi.acm.org/10.1145/2642937.2642982),
|
|
|
|
|
[PDF](https://hal.archives-ouvertes.fr/hal-01054552/document)). It
|
|
|
|
|
performs a greedy top-down search for identical subtrees, then
|
|
|
|
|
performs a bottom-up search to match up the rest.
|
|
|
|
|
|
|
|
|
|
## Tree Diff (2017)
|
|
|
|
|
|
|
|
|
|
Languages: S-expression data format
|
|
|
|
|
Algorithm: A* search
|
|
|
|
|
Output: Merged s-expression file
|
|
|
|
|
|
|
|
|
|
Tristan Hume wrote a tree diffing algorithm during his 2017 internship
|
|
|
|
|
and Jane Street. The source code is not available, but [he has a blog
|
|
|
|
|
@ -35,29 +61,60 @@ configuration by Jane Street. It uses A* search to find the minimal
|
|
|
|
|
diff between them, and builds a new s-expression with a section marked
|
|
|
|
|
with `:date-switch` for the differing parts.
|
|
|
|
|
|
|
|
|
|
## GumTree (Top-down, then Bottom-up)
|
|
|
|
|
(Jane Street also has patdiff, but that seems to be a line-oriented
|
|
|
|
|
diff with some whitespace/integer display polish. It doesn't
|
|
|
|
|
understand that e.g. whitespace in `"foo "` is meaningful).
|
|
|
|
|
|
|
|
|
|
[GumTree](https://github.com/GumTreeDiff/gumtree) can parse several
|
|
|
|
|
programming languages and then performs a tree-based diff, outputting
|
|
|
|
|
an HTML display.
|
|
|
|
|
## Autochrome (2017)
|
|
|
|
|
|
|
|
|
|
The GumTree algorithm is described in the associated paper
|
|
|
|
|
'Fine-grained and accurate source code differencing' by Falleri et al
|
|
|
|
|
([DOI](http://doi.acm.org/10.1145/2642937.2642982),
|
|
|
|
|
[PDF](https://hal.archives-ouvertes.fr/hal-01054552/document)). It
|
|
|
|
|
performs a greedy top-down search for identical subtrees, then
|
|
|
|
|
performs a bottom-up search to match up the rest.
|
|
|
|
|
Languages: Clojure
|
|
|
|
|
Parser: Custom, preserves comments
|
|
|
|
|
Algorithm: Dijkstra (previously A* search)
|
|
|
|
|
Output: HTML
|
|
|
|
|
|
|
|
|
|
(The academic literature has a lot of discussion of tree differencing,
|
|
|
|
|
[this blog
|
|
|
|
|
post](http://useless-factor.blogspot.com/2008/01/matching-diffing-and-merging-xml.html)
|
|
|
|
|
has a good overview of tree differencing focusing on XML.)
|
|
|
|
|
[Autochrome](https://fazzone.github.io/autochrome.html) parses Clojure
|
|
|
|
|
with a custom parser that preserves comments. Autochrome uses
|
|
|
|
|
Dijkstra's algorithm to compare syntax trees.
|
|
|
|
|
|
|
|
|
|
Autochrome's webpage includes worked examples of the algorithm and a
|
|
|
|
|
discussion of design tradeoffs. It's a really great resource for
|
|
|
|
|
understanding tree diffing techniques in general.
|
|
|
|
|
|
|
|
|
|
## graphtage (2020)
|
|
|
|
|
|
|
|
|
|
Languages: JSON, XML, HTML, YAML, plist, and CSS
|
|
|
|
|
Parser: json5, pyYAML, ignores comments
|
|
|
|
|
Algorithm: Levenshtein distance
|
|
|
|
|
Output: CLI colours
|
|
|
|
|
|
|
|
|
|
[graphtage](https://blog.trailofbits.com/2020/08/28/graphtage/)
|
|
|
|
|
compares structured data by parsing into a generic file format, then
|
|
|
|
|
displaying a diff. It even allows things like diffing JSON against
|
|
|
|
|
YAML.
|
|
|
|
|
|
|
|
|
|
As with json-diff, it does not consider `["foo"]` and `"foo"` to have
|
|
|
|
|
any similarities.
|
|
|
|
|
|
|
|
|
|
## Diffsitter (LCS on Leaves)
|
|
|
|
|
## Diffsitter (2020)
|
|
|
|
|
|
|
|
|
|
Parser: [Tree-sitter](https://tree-sitter.github.io/tree-sitter/)
|
|
|
|
|
Algorithm: Longest-common-subsequence
|
|
|
|
|
Output: CLI colours
|
|
|
|
|
|
|
|
|
|
[Diffsitter](https://github.com/afnanenayet/diffsitter) is another
|
|
|
|
|
tree-sitter based diff tool. It uses [LCS diffing on the leaves of the
|
|
|
|
|
syntax
|
|
|
|
|
tree](https://github.com/afnanenayet/diffsitter/blob/b0fd72612c6fcfdb8c061d3afa3bea2b0b754f33/src/ast.rs#L310-L313).
|
|
|
|
|
|
|
|
|
|
## sdiff (2021)
|
|
|
|
|
|
|
|
|
|
Languages: Scheme
|
|
|
|
|
Parser: Scheme's built-in `read`, ignores comments
|
|
|
|
|
Algorithm: MH-Diff from the Chawathe paper
|
|
|
|
|
Output: CLI colours
|
|
|
|
|
|
|
|
|
|
[Semantically meaningful S-expression diff: Tree-diff for lisp source
|
|
|
|
|
code](https://archive.fosdem.org/2021/schedule/event/sexpressiondiff/)
|
|
|
|
|
was presented at FOSDEM 2021.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|