If we skip some nodes inside a list whose delimiters are unchanged, we
need to mark the outer list as unchanged.
Split ChangeState::Unchanged into UnchangedNode and UnchangedDelimiter
to make this clearer, and add a test.
This is a small speedup (up to 6% reduction in instructions) and makes
the graph logic easier to reason about.
In principle this can change dififng results, but all of the sample
files are unaffected.
This logic was intended to solve the problem of a small number of
nodes being matched up in a very large expression. We can do this as
cleanup after diffing, which should be faster and more effective
(see #162).
After we've aligned lines based on diff results, we have intermediate
lines that we need to align somehow. Previously, we'd just take them
in order, aligning the first on the LHS with the first on the RHS and
so on.
If the intermediate lines start or end with a sequence of blank lines,
prefer aligning the blank lines. If we have both, arbitrarily choose
the ending blank lines.
This has produced better results in many of the sample files, although
in the case of slow_before.rs we've just changed from a leading blank
line alignment to a trailing blank line alignment.
We should split lines based on their codepoint length, so all our
lengths are on codepoint boundaries. We can then safely index by byte position.
All the positions are measured in bytes, not code points. Tweak
function names to make this explicit.
Fixes#149
Previously we were assuming that the first/last line pairs in a hunk
contained the earliest/latest lines on both sides. This isn't true
when there are no common items between the lines.
This fixes some display issues in load_before/after.js, but include a
new integration test that is smaller and easier to eyeball.
Fixes#133
Previously we'd get tripped up by cases where choosing equal
delimiters would be be considered the same as entering each delimiter
separately, making diffs worse.
Fixes#147
The diff crate has a great ergonomic API, but it doesn't implement
Myers' algorithm and performs badly on large inputs.
https://github.com/utkarshkukreti/diff.rs/issues/1
Now that we have a wrapper wu_diff that provides a similar API,
replace the remaining call sites to diff::slice(). These are
relatively cold, so this is a small performance improvement (1%
instruction reduction).
This further improves performance on large text files. On the sample
files in #153, this improves performance from 99B instructions to 29B
instructions on my machine.
For large files, tree-sitter syntax highlighting is much more
expensive than the parse itself. We spend most of the runtime
advancing the tree-sitter query cursor.
This doesn't affect runtime of normal usage, but it helps debugging
and makes flamegraphs more readable.
Spotted in #153
In #153 a user reported difftastic never terminated on a 140,000
file. This was due to the diff crate using a very large amount of time
and memory.
The diff crate does not use Myers' algorithm, which has a
divide-and-conquer approach using snakes:
https://blog.jcoglan.com/2017/03/22/myers-diff-in-linear-space-theory/
wu-diff does implement Myer's algorithm and performs much better on
these large files.