This substantially improves performance on text files where there are
few lines in common.
For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.
Fixes the case mentioned by @quackenbush in #236.
This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
Currently it contains a nested string node, even though it's a fixed
set of known types. This was preventing us from applying good syntax
highlighting.
This was particularly noticeable with `string`, which wasn't
previously highlighted as a type.
This allows given nodes (configurable per-language, using tree-sitter's
query syntax) to be re-parsed as other languages. The canonical example
is CSS or JavaScript inside HTML, which normally would be a single token
but now can get the full range of syntax highlighting and tree diffing.
The config sets this up for only two languages: HTML (contains CSS or
JavaScript in <script> or <style> tags; we don't support style="" or
onclick="" etc. at this point), and Makefiles (contains Bash in
$(shell ...) commands). The latter is fairly obscure; the big win is
in the former.
It would be nice to also have this support for PHP; however, the HTML
parser seems to be a bit confused when asked to parse the partial HTML
blocks we get if we just mark the "text" blocks as HTML, so for this
to work well, probably the PHP blocks should be parsed as sub-languages
of HTML instead of vice versa.
Also, as a minor quibble, there should be support for bash in Perl's
backticks (similar to in Makefiles), but the tree-sitter Perl parser
does not support backticks at all (it goes into error recovery).
There may have been languages that I've missed, e.g. some languages
might have nodes that contain e.g. SQL.
Fixes#382. Potentially relevant to #376.
Previously we highlighted changed whitespace, which led to ugly
results if the number of words changed (there was a different number
of whitespace characters so some were highlighted).
Also treat _ and - as word constituents, as it produces nicer results
when people write example CLI invocations in comments.
@QuarticCat observed that popping delimiters is unnecessary, and saw a
speedup in PR #401. This reduces the number of nodes in typical graphs
by ~20%, reducing runtime and memory usage.
This works because there is only one thing we can do at the end of a
list: pop the delimiter. The syntax node on the other side does not
give us more options, we have at most one. Popping all the delimiters
as soon as possible is equivalent, and produces the same graph route.
This change has also slightly changed the output of
samples_files/slow_after.rs, producing a better (more minimal)
diff. This is probably luck, due to the path-dependent nature of the
route solving logic, but it's a positive sign.
A huge thanks to @QuarticCat for their contributions, this is a huge
speedup.
Co-authored-by: QuarticCat <QuarticCat@pm.me>
This helps skimming the results when multiple files are changed with
multiple hunks. It makes the file changing more prominent than just
going from e.g. 5/5 to 1/10.
Fixes#400
Acked-by: Wilfred Hughes <me@wilfred.me.uk>
QML is a UI language, and its syntax is basically JSON-like structure
+ JavaScript. The tree-sitter parser is named after the upstream grammar
file qmljs.g, but the canonical language name is QML. So I choose Qml as
the Language enum.
https://doc.qt.io/qt-6/qmlapplications.html
Previously we fixed sliders in each 'possibly changed' region. This
meant that we couldn't fix sliders that needed to move outside the
region. The most common case was code of the form `foo, bar, baz`
where `, baz` was unchanged but we wanted to slide to `,`.
We now call `fix_all_sliders` for the toplevel tree on both
sides. This required some minor changes to the slider logic, as the
unchanged/novel regions could occur at any level of the tree.
(It was probably also the case that we were missing slider
opportunities previously, because we terminated as soon as we found an
outer slider for the nested case.)
This change has no performance impact, probably because tree diffing
is vastly more expensive (O(N^2)) than sliders (O(N)).
Fixes#327
This produces substantially better diff results, and fixes the 'last
item in the list shown as changed' problem.
This can produce slower diffing. typing_before.ml takes 10% more
instructions and slow_before.rs takes 110% more instructions.
This is a more traditional graph representation. It is slightly easier
to reason about, and it's clearer that graph node creation time
dominates graphs exploration.
This is a slight performance regression, but it enables better
exploration of parethesis nesting (see next commit). typing_before.ml
has regressed from 3.75B instructions to 3.85B instructions and
slow_before.rs has regressed from 1.73B instructions to 2.15B
instructions.
This change has also made the diff output for slow_before.rs slightly
worse (note the `lhs` variable is now claimed as changed in more
cases). It's not clear why, but presumably means that the node visit
order has changed slightly.
Closes#324
This removes the need to special-case Perl, and is necessary for
CMake (which has nodes bracket_comment and line_comment that aren't
marked as 'extra').