This is a very tiny perf hit, but allows us to pass newly allocated
strings to new_atom(), which will be necessary for normalising
case-insensitive languages.
We rely on being able to split lines and rejoin them to obtain the
original string. `str::lines()` in the Rust stdlib does not have this
property.
This was causing crashes in word-diffing on textual diffing, where
code paths differed on the number of lines they thought a string had.
This was broken in 8b842387a1.
Fixes#688.
This isn't strictly necessary since difftastic is a binary-only
crate. However, it improves compiler warnings (see next commit) and
potentially helps future changes to make difftastic available as a
library.
This corresponds to:
$ cargo +nightly fmt -- --config group_imports=StdExternalCrate
Since this option is only available on nightly, I'm not adding a
rustfmt.toml to enforce this, just doing it as a one-off run.
This is important when comparing short string literals. This change
has improved several cases in sample_files/ but I've added a new
example that made the previous unwanted behaviour much more obvious.
This is a 4% reduction in instructions for typing_before.ml, but a
0.2% increase instructions for slow_before.rs. This seems like a win
overall, and it also keeps the codebase more consistent and simpler.
If a string is replaced with another, apply subword highlighting
similar to how we handle replaced comments.
Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>
Inside text files, it seems to be better to be conservative and
consider abc123def as one word rather than three.
This is noticeable when looking at changes to the compare.expected
file, which contains hashes. 123c456 and 345c789 don't really have a
`c` in common, so subword highlighting is ugly.
The contiguous penalty was an attempt to fix the slider problem:
// Old
A B
C D
// New
A B
A B
C D
// Unwanted diff
A +B+
+A+ B
C D
However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.
This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.
There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.
Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
This makes constructing hunks harder to reason about.
This change doesn't affect output, but helps when debugging, as it
makes multiline atoms much less common.
This substantially improves performance on text files where there are
few lines in common.
For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.
Fixes the case mentioned by @quackenbush in #236.
This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
Previously we highlighted changed whitespace, which led to ugly
results if the number of words changed (there was a different number
of whitespace characters so some were highlighted).
Also treat _ and - as word constituents, as it produces nicer results
when people write example CLI invocations in comments.