This is important when comparing short string literals. This change
has improved several cases in sample_files/ but I've added a new
example that made the previous unwanted behaviour much more obvious.
This saves us searching the hash map twice. This is a modest
performance improvement: an instruction count reduction of 4% on
slow_before.rs, and 1% reduction on typing_before.ml.
This is a 4% reduction in instructions for typing_before.ml, but a
0.2% increase instructions for slow_before.rs. This seems like a win
overall, and it also keeps the codebase more consistent and simpler.
This was intended to allow usage of .entry_ref(), but it's already a
performance win without using that API! It's around a 9% reduction in
instructions in slow_before.rs, and 2% reduction in typing_before.ml.
If a string is replaced with another, apply subword highlighting
similar to how we handle replaced comments.
Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>
Inside text files, it seems to be better to be conservative and
consider abc123def as one word rather than three.
This is noticeable when looking at changes to the compare.expected
file, which contains hashes. 123c456 and 345c789 don't really have a
`c` in common, so subword highlighting is ugly.
The contiguous penalty was an attempt to fix the slider problem:
// Old
A B
C D
// New
A B
A B
C D
// Unwanted diff
A +B+
+A+ B
C D
However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.
This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.
There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.
Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
This ensures that choosing a unchanged non-punctuation atom with some
novel atoms is better than choosing punctuation and some changed
comments. This produces better results in general, see
comma_and_comment_after.js for an example.
This will be more noticeable after the next commit, where costs of
novel atoms are in a smaller range of values.
This is harder to reason about, and
2e6666041f did not include a motivating
test case.
Removing contiguous status is a minor perf improvement (2% reduction
in instructions), makes the code simpler, and does not significantly
affect diffing results.
Of the two sample files that have changed, the erlang_before.erl file
has improved and nest_before.rs is neutral.
This is equivalent (increased cost on unchanged nodes vs decreased
cost on changed nodes), but easier to reason about.
Previously we have multiple notions of changed atoms: NovelAtomLHS,
NovelAtomRHS, and ReplacedComment. We want to consider punctuation as
less desirable even when e.g. comments arereplaced.