Ensure size_hint never exceeds graph_limit

If we have thousands of syntax nodes on both sides, we can end
up attempting to preallocate a very large hashmap.

In #542, a user hit an issue with two JSON files where the LHS had
33,000 syntax nodes and the RHS had 34,000 nodes, so we'd attempt to
preallocate a hashmap of capacity 1,122,000,000. This required
allocating 70,866,960,400 bytes (roughly 66 GiB).

Impose a sensible limit on the hashmap.

Fixes #542
try_similar_lib
Wilfred Hughes 2023-08-04 17:19:27 +07:00
parent c937f819a1
commit 892d4fdb58
2 changed files with 11 additions and 1 deletions

@ -7,6 +7,11 @@ prominent.
Improved syntax hightling for Java built-in types.
### Diffing
Fixed an issue with runaway memory usage when the two files input
files had a large number of differences.
## 0.49 (release 26th July 2023)
### Parsing

@ -205,7 +205,12 @@ pub fn mark_syntax<'a>(
// graph whose size is roughly quadratic. Use this as a size hint,
// so we don't spend too much time re-hashing and expanding the
// predecessors hashmap.
let size_hint = lhs_node_count * rhs_node_count;
//
// Cap this number to the graph limit, so we don't try to allocate
// an absurdly large (i.e. greater than physical memory) hashmap
// when there is a large number of nodes. We'll never visit more
// than graph_limit nodes.
let size_hint = std::cmp::min(lhs_node_count * rhs_node_count, graph_limit);
let start = Vertex::new(lhs_syntax, rhs_syntax);
let vertex_arena = Bump::new();