Show the hunk count and detected language in a dimmed style. This
information is less important than the diff content itself, so this
change makes the important information more prominent.
First part of #544
Previously we didn't check the state of children, which was an
oversight from the original implementation. As a result, we fixed
nested sliders in fewer situations.
Fixes#535
This is important when comparing short string literals. This change
has improved several cases in sample_files/ but I've added a new
example that made the previous unwanted behaviour much more obvious.
If a string is replaced with another, apply subword highlighting
similar to how we handle replaced comments.
Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>
Inside text files, it seems to be better to be conservative and
consider abc123def as one word rather than three.
This is noticeable when looking at changes to the compare.expected
file, which contains hashes. 123c456 and 345c789 don't really have a
`c` in common, so subword highlighting is ugly.
The contiguous penalty was an attempt to fix the slider problem:
// Old
A B
C D
// New
A B
A B
C D
// Unwanted diff
A +B+
+A+ B
C D
However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.
This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.
There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.
Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
This is harder to reason about, and
2e6666041f did not include a motivating
test case.
Removing contiguous status is a minor perf improvement (2% reduction
in instructions), makes the code simpler, and does not significantly
affect diffing results.
Of the two sample files that have changed, the erlang_before.erl file
has improved and nest_before.rs is neutral.
It appears that the GitHub Actions runner is returning the glob path
results in a different ordering than the ordering obtained when locally
running the compare_all.sh script. This difference in the ordering
causes CI to fail due to differences to the generated expectation file.
This also seems to have been an issue in previous PRs---the solution
here is likely to sort the output before processing or figure out what
shell options cause the difference in glob ordering, and explicitly set
those in the shell script to eliminate the difference (or prevent the
script from inheriting anything but shell defaults).
For now, try reordering the output by hand to match the ordering the
GitHub Action runner likely expects.
This substantially improves performance on text files where there are
few lines in common.
For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.
Fixes the case mentioned by @quackenbush in #236.
This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
Currently it contains a nested string node, even though it's a fixed
set of known types. This was preventing us from applying good syntax
highlighting.
This was particularly noticeable with `string`, which wasn't
previously highlighted as a type.
This allows given nodes (configurable per-language, using tree-sitter's
query syntax) to be re-parsed as other languages. The canonical example
is CSS or JavaScript inside HTML, which normally would be a single token
but now can get the full range of syntax highlighting and tree diffing.
The config sets this up for only two languages: HTML (contains CSS or
JavaScript in <script> or <style> tags; we don't support style="" or
onclick="" etc. at this point), and Makefiles (contains Bash in
$(shell ...) commands). The latter is fairly obscure; the big win is
in the former.
It would be nice to also have this support for PHP; however, the HTML
parser seems to be a bit confused when asked to parse the partial HTML
blocks we get if we just mark the "text" blocks as HTML, so for this
to work well, probably the PHP blocks should be parsed as sub-languages
of HTML instead of vice versa.
Also, as a minor quibble, there should be support for bash in Perl's
backticks (similar to in Makefiles), but the tree-sitter Perl parser
does not support backticks at all (it goes into error recovery).
There may have been languages that I've missed, e.g. some languages
might have nodes that contain e.g. SQL.
Fixes#382. Potentially relevant to #376.
Previously we highlighted changed whitespace, which led to ugly
results if the number of words changed (there was a different number
of whitespace characters so some were highlighted).
Also treat _ and - as word constituents, as it produces nicer results
when people write example CLI invocations in comments.