Document the main tricky cases with tree diffs

pull/38/head 0.10
Wilfred Hughes 2021-09-24 23:32:34 +07:00
parent 3d12e56e0f
commit c9f85d806e
3 changed files with 225 additions and 0 deletions

@ -17,6 +17,10 @@ Removed legacy regex-based parsing backend.
Some additional runtime optimisations.
### Manual
Added a chapter on difficult cases for tree diff algorithms.
## 0.9
### Parsing

@ -5,6 +5,7 @@
- [Parsing](./parsing.md)
- [Upstream Parsers](./upstream_parsers.md)
- [Diffing](./diffing.md)
- [Tricky Cases](./tricky_cases.md)
- [Contributing](./contributing.md)
- [Alternative Projects](./alternative_projects.md)
- [Tree Diffing](./tree_diffing.md)

@ -0,0 +1,220 @@
# Tricky Cases
Tree diffing is challenging in some situations. This page demonstrates
difficult cases observed during development.
Not all of these cases work well in difftastic yet.
## Adding delimiters
```
;; Before
x
;; After
(x)
```
This is tricky because `x` has changed its depth in the tree, but `x`
itself is unchanged.
Not all tree diff algorithms handle this case. It is also challenging
to display this case clearly: we want to highlight the changed
delimiters, but not their content. This is challenging in larger
expressions.
## Changing Delimiters
```
;; Before
(x)
;; After
[x]
```
As with the wrapping case, we want to highlight the delimiters rather
than the `x`.
## Expanding Delimiters
```
;; Before
(x) y
;; After
(x y)
```
In this case, we want to highlight `y`. Highlighting the delimiters
could make `x` look changed.
## Contracting Delimiters
```
;; Before
(x y)
;; After
(x) y
```
This should be highlighted similar to the expanding delimiter case.
## Reordering Within A List
```
;; Before
(x y)
;; After
(y x)
```
We want to highlight the list contents and not the delimiters.
## Middle Insertions
```
// Before
foo(bar(123));
// After
foo(extra(bar(123)));
```
We want to consider both `foo` and `bar` to be unchanged. This case is
challenging for diffing algorithms that do a bottom-up then top-down
matching of trees.
## Sliders
Sliders are a common problem in text based diffs, where lines are
matched in a confusing way.
They typically look like this. The diff has to arbitrarily choose a
delimiter, and it chooses the wrong one.
```
+ }
+
+ function foo () {
}
```
git-diff has some heuristics to reduce the risk of this (e.g. the
"patience diff"), but it can still occur.
There's a similar problem in tree diffs.
```
;; Before
A B
C D
;; After
A B
A B
C D
```
Ideally we'd prefer marking contiguous nodes as novel, so we highlight
`A B` rather than `B\nA`. From the perspective of a
longest-common-subsequence algorithm, these two choices are
equivalent.
## Minimising Depth Changes
```
// Before
if true {
foo(123);
}
foo(456);
// After
foo(789);
```
Do we consider `foo(123)` or `foo(456)` to match with `foo(789)`?
Difftastic prefers `foo(456)` by preferring nodes at the same nesting depth.
## Replacements
```
// Before
function foo(x) { return x + 1; }
// After
function bar(y) { baz(y); }
```
In this example, we've deleted a function and written a completely
different one. A tree-based diff could match up the `function` and the
outer delimiters, resulting in a confusing display showing lots of
small changes.
As with sliders, the replacement problem can also occur in textual
line-based diffs. Line-diffs struggle if there are a small number of
common lines. The more precise, granular behaviour of tree diffs makes
this problem much more common though.
## Similar Comments
```
// Before
/* The quick brown fox. */
foobar();
// After
/* The slow brown fox. */
foobaz();
```
`foobar` and `foobaz` are completely different, and should never be
matched up. For comments, we'd rather match up comments that are
textually similar.
## Multiline Comments
```
// Before
/* Hello
* World. */
// After
if (x) {
/* Hello
* World. */
}
```
The inner content of these two comments is technically different. We
want to treat them as identical however.
## Reflowing Doc Comments
Block comments have prefixes that aren't meaningful.
```
// Before
/* The quick brown fox jumps
* over the lazy dog. */
// After
/* The quick brown fox immediately
* jumps over the lazy dog. */
```
The inner content has changed from `jumps * over` to `immediately *
jumps over`. However, the `*` is decorative and we don't care that
it's moved.
## Invalid Syntax
There's no guarantee that the input we're given is valid syntax. Even
if the code is valid, it might use syntax that isn't supported by the
parser.
Tree-sitter provided explicit error nodes, and difftastic treats them
as atoms so it can run the same tree diff algorithm regardless.