Expand related projects

pull/1/head
Wilfred Hughes 2018-12-30 01:36:48 +07:00
parent 1e23284e42
commit db04937663
1 changed files with 83 additions and 3 deletions

@ -2,7 +2,87 @@
A language-aware word-level diff.
## Related Projects
## Other Diff Techniques
* http://fazzone.github.io/autochrome.html
* https://github.com/janestreet/patdiff
There are a bunch of other ways of diffing text files. I summarise
them here, along with example invocations.
### Myers' diff algorithm
This is the default diff algorithm in GNU diff and git diff. It finds
the longest common subsequence (LCS) and is used on a line-by-line basis.
There's a
[great introduction
here](https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/)
and the original paper is [An O(ND) Difference Algorithm and Its
Variations, Myers
1986](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.6927).
```
# Modern diff supports colour, but see also
# https://www.colordiff.org/
$ diff --color=always -u css_before.css css_after.css
```
Note that GNU diff [originally used the Hunt-McIlroy
algorithm](http://fabiensanglard.net/git_code_review/diff.php)).
### Patience Diff
Myer's diff has a problem with sliders:
```diff
if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;
```
Instead of:
```diff
+if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
```
Git has a `--indent-heuristic` that [was added to reduce the
likelihood of making a bad
choice](https://github.com/git/git/commit/433860f3d0beb0c6f205290bd16cda413148f098). There's
a [corpus of test files](https://github.com/mhagger/diff-slider-tools)
where the ideal diff has been chosen by a human, to test different
heuristics.
The [patience diff
algorithm](https://bramcohen.livejournal.com/73318.html) is an LCS
algorithm that aims to do a better job with sliders. It produces great
results by [doing more
work](https://stackoverflow.com/questions/40133534/is-gits-implementation-of-the-patience-diff-algorithm-correct/40159510#40159510).
```
# Original behaviour
$ git diff --no-indent-heuristic --no-index css_before.css css_after.css
# As of git 2.11, this heuristic is enabled by default.
$ git diff --indent-heuristic --no-index css_before.css css_after.css
# Patience algorithm does a better a job in this example.
$ git diff --patience --no-index css_before.css css_after.css
```
### Histogram Diff
Git 1.7.7+ also has a [histogram
algorithm](https://stackoverflow.com/a/32367597/509706), which aims to
produce better results than Myers' algorithm but without [the slowdown
of the patience algorithm](https://github.com/git/git/commit/85551232b56e763ecfcc7222e0858bac4e962c80).
```
# Inferior to patience on this example file.
$ git diff --histogram --no-index css_before.css css_after.css
```