Commit Graph

51 Commits (7f5c11c0758c87cd0d4989fa2eed09fb06dc1c74)

Author SHA1 Message Date
Wilfred Hughes 7f5c11c075 cargo fmt 2024-04-28 16:40:00 +07:00
Wilfred Hughes 8655a9464e Fix unwanted duplicate node in existing vec
Broken in previous commit. This is now only a few percentage points
performance win, but it's still a net improvement.
2024-04-28 16:35:40 +07:00
Wilfred Hughes d15d593708 Move to smallvec for seen vertices
This is a surprisingly large perf win. On my Thinkpad:

typing_before/after.ml:
before: 3.038B instructions
after:  2.870B instructions

slow_before/after.rs:
before: 2.381B instructions
after:  1.260B instructions (!)
2024-04-28 16:16:47 +07:00
Steinar H. Gunderson 302570591f Make Stack be allocated on the arena.
This fixes another memory leak, and also removes the need for
refcounting the Stack objects and the Node objects they point to.
2024-04-28 15:46:23 +07:00
Steinar H. Gunderson 4fb1478817 Fix memory leak in neighbours array.
Vertex is allocated on the arena, so it is never dropped;
then it cannot contain a Vec allocated on the regular heap
without leaking memory. Replace the Vec with a slice allocated
on the arena, which seems to fix most of the leaks. (Some may
remain; I haven't checked fully.) It should also be slightly
more memory-efficient.

It's not clear that we actually need the RefCell instead of
just putting Option directly into the structure, but I've
let it stay.

This issue was probably introduced in a71d6118cf.
2024-04-28 15:46:23 +07:00
Wilfred Hughes 93ae0e91db Fix typos 2024-03-12 23:08:39 +07:00
Wilfred Hughes 3d29dc1228 Silence some clippy lints 2024-03-11 22:26:30 +07:00
Wilfred Hughes cac80e992a Avoid `res` locals in favour of more meaningful names 2023-11-28 13:27:27 +07:00
Wilfred Hughes 1dbcd08a90 cargo fmt 2023-11-19 13:10:41 +07:00
Wilfred Hughes f2b3b34bec Use pub(crate) everywhere for visibility
This isn't strictly necessary since difftastic is a binary-only
crate. However, it improves compiler warnings (see next commit) and
potentially helps future changes to make difftastic available as a
library.
2023-11-18 16:46:13 +07:00
Wilfred Hughes 27b14ae4c7 Clarify probably_punctuation 2023-11-11 11:14:49 +07:00
Wilfred Hughes 1e7866b64e Do word diffing on text too 2023-09-12 13:03:27 +07:00
Wilfred Hughes 243a4a5f48 Group imports consistently
This corresponds to:

$ cargo +nightly fmt -- --config group_imports=StdExternalCrate

Since this option is only available on nightly, I'm not adding a
rustfmt.toml to enforce this, just doing it as a one-off run.
2023-09-12 12:32:51 +07:00
Wilfred Hughes 8731a1b908 Fix rustdoc warnings 2023-09-12 12:21:43 +07:00
Wilfred Hughes 11f457b5f9 Fix typo 2023-08-16 21:20:17 +07:00
Wilfred Hughes c2b7042b80 Do subword highlighting in more cases
This is useful when two strings substantially differ, but have the
same e.g. end.
2023-07-10 21:26:24 +07:00
Wilfred Hughes 4aca79f220 Use the raw_entry_mut API on hashbrown::HashMap
This saves us searching the hash map twice. This is a modest
performance improvement: an instruction count reduction of 4% on
slow_before.rs, and 1% reduction on typing_before.ml.
2023-07-09 22:49:37 +07:00
Wilfred Hughes d9911e0b49 Move DftHashMap to a separate file 2023-07-09 15:37:51 +07:00
Wilfred Hughes f2456a12b2 Use hashbrown for the alloc_if_new data
This was intended to allow usage of .entry_ref(), but it's already a
performance win without using that API! It's around a 9% reduction in
instructions in slow_before.rs, and 2% reduction in typing_before.ml.
2023-07-09 11:11:03 +07:00
Wilfred Hughes 2607d17d73 Fix spelling in comment 2023-07-08 17:16:14 +07:00
Zhenge Chen ffd49d523a Detect replaced strings
If a string is replaced with another, apply subword highlighting
similar to how we handle replaced comments.

Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>
2023-07-08 17:16:06 +07:00
Wilfred Hughes f86ba13abf Increase punctuation cost to 200 2023-07-08 14:59:47 +07:00
Wilfred Hughes 495dbe5b14 Improve comments in Edge::cost 2023-07-08 14:53:33 +07:00
Wilfred Hughes 53855e415e Reduce copying further in set_neighbours
This saves a remarkable 8.5% of instructions on slow_before.rs.
2023-07-07 23:37:16 +07:00
Wilfred Hughes a180fd6d24 Don't return the neighbours inside get_set_neighbours
This caused unnecessarying closing, costing 0.2% instructions in some
cases, and also made the code less readable.
2023-07-07 23:29:51 +07:00
Wilfred Hughes c07e640b24 Remove contiguous penalty
The contiguous penalty was an attempt to fix the slider problem:

// Old
A B
C D

// New
A B
A B
C D

// Unwanted diff
A +B+
+A+ B
C D

However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.

This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.

There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.

Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
2023-07-06 08:37:02 +07:00
Wilfred Hughes 31df177881 Increase the punctuation penalty
This ensures that choosing a unchanged non-punctuation atom with some
novel atoms is better than choosing punctuation and some changed
comments. This produces better results in general, see
comma_and_comment_after.js for an example.

This will be more noticeable after the next commit, where costs of
novel atoms are in a smaller range of values.
2023-07-06 08:16:24 +07:00
Wilfred Hughes c3016eca4a Add TODO 2023-07-06 08:14:03 +07:00
Wilfred Hughes 43c24047b4 Don't track contiguous status on novel delimiter edges
This is harder to reason about, and
2e6666041f did not include a motivating
test case.

Removing contiguous status is a minor perf improvement (2% reduction
in instructions), makes the code simpler, and does not significantly
affect diffing results.

Of the two sample files that have changed, the erlang_before.erl file
has improved and nest_before.rs is neutral.
2023-07-04 23:53:16 +07:00
Wilfred Hughes 1e4d1828c7 Store probably_punctuation on unchanged edges
This is equivalent (increased cost on unchanged nodes vs decreased
cost on changed nodes), but easier to reason about.

Previously we have multiple notions of changed atoms: NovelAtomLHS,
NovelAtomRHS, and ReplacedComment. We want to consider punctuation as
less desirable even when e.g. comments arereplaced.
2023-07-03 19:48:31 +07:00
Wilfred Hughes c405b58327 Fix cost for ReplacedComment
This needs to be 2x novel nodes, or we prefer it far too often.
2023-07-02 23:12:31 +07:00
Wilfred Hughes 8d44e91a06 Improve lifetime names 2023-04-22 15:25:45 +07:00
Wilfred Hughes 29d87a6ac4 Adding TODO 2023-01-08 22:06:58 +07:00
Wilfred Hughes c310fb34f9 Use u32 for edge cost
This is performance neutral (both runtime and memory size) but the
code is slightly readable as there are fewer conversions.
2023-01-08 21:34:49 +07:00
Wilfred Hughes 00ecf36a22 Pop delimiters immediately, rather than having ExitDelimiter* edges
@QuarticCat observed that popping delimiters is unnecessary, and saw a
speedup in PR #401. This reduces the number of nodes in typical graphs
by ~20%, reducing runtime and memory usage.

This works because there is only one thing we can do at the end of a
list: pop the delimiter. The syntax node on the other side does not
give us more options, we have at most one. Popping all the delimiters
as soon as possible is equivalent, and produces the same graph route.

This change has also slightly changed the output of
samples_files/slow_after.rs, producing a better (more minimal)
diff. This is probably luck, due to the path-dependent nature of the
route solving logic, but it's a positive sign.

A huge thanks to @QuarticCat for their contributions, this is a huge
speedup.

Co-authored-by: QuarticCat <QuarticCat@pm.me>
2022-12-28 02:00:09 +07:00
Wilfred Hughes 57d1f6d449 Reserve the vec inside allocate_if_new
Pushing to this vec was showing 2.5% of total compute time in profiles.
2022-12-28 00:30:25 +07:00
Wilfred Hughes 923989d1a8 clippy fixes 2022-11-03 22:18:56 +07:00
QuarticCat cd5ba54752 Reduce number of branches of Vertex::eq 2022-10-06 22:33:47 +07:00
QuarticCat 887dec7645 Remove field can_pop_either from Vertex 2022-10-06 22:31:48 +07:00
QuarticCat 7a8044696e Simplify push_{lhs,rhs}_delimiter 2022-10-06 22:31:38 +07:00
QuarticCat 3b0edb43a1
Change a RefCell in Vertex to Cell 2022-09-28 05:56:53 +07:00
QuarticCat 2c6972c1b2
Fix more clippy warnings 2022-09-28 05:47:34 +07:00
QuarticCat d48ee2dfdb
Use a faster stack impl 2022-09-28 04:08:42 +07:00
Wilfred Hughes c602503dec Treat . as punctuation
Closes #388
2022-09-21 21:39:07 +07:00
Wilfred Hughes fe5ef8757d Give novel punctuation a lower edge cost
We'd rather see an unchanged variable name than an unchanged comma.

Fixes #366
2022-09-09 09:47:53 +07:00
Wilfred Hughes c957818514 Explore two graph nodes for each parenthesis position
This produces substantially better diff results, and fixes the 'last
item in the list shown as changed' problem.

This can produce slower diffing. typing_before.ml takes 10% more
instructions and slow_before.rs takes 110% more instructions.
2022-08-21 16:34:17 +07:00
Wilfred Hughes a71d6118cf Store predecessors and neighbours as mutable fields in graph nodes
This is a more traditional graph representation. It is slightly easier
to reason about, and it's clearer that graph node creation time
dominates graphs exploration.

This is a slight performance regression, but it enables better
exploration of parethesis nesting (see next commit). typing_before.ml
has regressed from 3.75B instructions to 3.85B instructions and
slow_before.rs has regressed from 1.73B instructions to 2.15B
instructions.

This change has also made the diff output for slow_before.rs slightly
worse (note the `lhs` variable is now claimed as changed in more
cases). It's not clear why, but presumably means that the node visit
order has changed slightly.

Closes #324
2022-08-21 16:25:54 +07:00
Wilfred Hughes 51ddcef393 Make clippy happier 2022-07-03 11:20:44 +07:00
Wilfred Hughes d4285bed7c Move more files into diff/ 2022-05-25 09:31:12 +07:00
Wilfred Hughes c5fe152f25 Define a parse submodule 2022-05-25 09:28:12 +07:00