If we have thousands of syntax nodes on both sides, we can end
up attempting to preallocate a very large hashmap.
In #542, a user hit an issue with two JSON files where the LHS had
33,000 syntax nodes and the RHS had 34,000 nodes, so we'd attempt to
preallocate a hashmap of capacity 1,122,000,000. This required
allocating 70,866,960,400 bytes (roughly 66 GiB).
Impose a sensible limit on the hashmap.
Fixes#542
Show the hunk count and detected language in a dimmed style. This
information is less important than the diff content itself, so this
change makes the important information more prominent.
First part of #544
Difftastic is generally conservative about MSRV, and will only
increase the version when there is a compelling reason (e.g. major
performance improvement, important bug fix in a dependendency).
This version increase will enable us to upgrade crossterm to 0.26, which
has better detection of terminal width on Windows.
I've also clarified MSRV details for other dependencies that cannot
currently be upgraded.
I've observed PDF files that have sufficiently large headers that they
were detected as text, which wasn't helpful.
Also improve logging to report how many invalid bytes were found.
Previously we didn't check the state of children, which was an
oversight from the original implementation. As a result, we fixed
nested sliders in fewer situations.
Fixes#535
The contiguous penalty was an attempt to fix the slider problem:
// Old
A B
C D
// New
A B
A B
C D
// Unwanted diff
A +B+
+A+ B
C D
However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.
This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.
There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.
Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
Summary: Preparing to publish a new version of https://crates.io/crates/tree-sitter-erlang to bring in the recent changes for OTP 26 support. And fix the ELP github CI
Reviewed By: michalmuskala, perehonchuk
Differential Revision: D46796302
fbshipit-source-id: 8320d63a5d8b3aa6829992864bf641fdea735ca5
Line numbers may be less than .max_line(), as .max_line() trims
whitespace. Ensure pad_after() is robust to this, and add a test.
I could only reproduce the crash in inline display mode, but in
principle this could be an issue in all modes.
Fixes#452
This substantially improves performance on text files where there are
few lines in common.
For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.
Fixes the case mentioned by @quackenbush in #236.
This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
Currently it contains a nested string node, even though it's a fixed
set of known types. This was preventing us from applying good syntax
highlighting.
This was particularly noticeable with `string`, which wasn't
previously highlighted as a type.
This allows given nodes (configurable per-language, using tree-sitter's
query syntax) to be re-parsed as other languages. The canonical example
is CSS or JavaScript inside HTML, which normally would be a single token
but now can get the full range of syntax highlighting and tree diffing.
The config sets this up for only two languages: HTML (contains CSS or
JavaScript in <script> or <style> tags; we don't support style="" or
onclick="" etc. at this point), and Makefiles (contains Bash in
$(shell ...) commands). The latter is fairly obscure; the big win is
in the former.
It would be nice to also have this support for PHP; however, the HTML
parser seems to be a bit confused when asked to parse the partial HTML
blocks we get if we just mark the "text" blocks as HTML, so for this
to work well, probably the PHP blocks should be parsed as sub-languages
of HTML instead of vice versa.
Also, as a minor quibble, there should be support for bash in Perl's
backticks (similar to in Makefiles), but the tree-sitter Perl parser
does not support backticks at all (it goes into error recovery).
There may have been languages that I've missed, e.g. some languages
might have nodes that contain e.g. SQL.
Fixes#382. Potentially relevant to #376.
Previously we highlighted changed whitespace, which led to ugly
results if the number of words changed (there was a different number
of whitespace characters so some were highlighted).
Also treat _ and - as word constituents, as it produces nicer results
when people write example CLI invocations in comments.
@QuarticCat observed that popping delimiters is unnecessary, and saw a
speedup in PR #401. This reduces the number of nodes in typical graphs
by ~20%, reducing runtime and memory usage.
This works because there is only one thing we can do at the end of a
list: pop the delimiter. The syntax node on the other side does not
give us more options, we have at most one. Popping all the delimiters
as soon as possible is equivalent, and produces the same graph route.
This change has also slightly changed the output of
samples_files/slow_after.rs, producing a better (more minimal)
diff. This is probably luck, due to the path-dependent nature of the
route solving logic, but it's a positive sign.
A huge thanks to @QuarticCat for their contributions, this is a huge
speedup.
Co-authored-by: QuarticCat <QuarticCat@pm.me>
QML is a UI language, and its syntax is basically JSON-like structure
+ JavaScript. The tree-sitter parser is named after the upstream grammar
file qmljs.g, but the canonical language name is QML. So I choose Qml as
the Language enum.
https://doc.qt.io/qt-6/qmlapplications.html
This was previously fixed in
cb900c3463 (see commit message), but
broken in #341.
Instead, use both term_size and terminal_size, to maximise our chances
that we can detect the width. Also comment the code with the relevant
terminal_size issue.
Fixes#346
This produces substantially better diff results, and fixes the 'last
item in the list shown as changed' problem.
This can produce slower diffing. typing_before.ml takes 10% more
instructions and slow_before.rs takes 110% more instructions.