Commit Graph

65 Commits (117d20c5274b6ed95f728a744ef6954738e7557d)

Author SHA1 Message Date
Wilfred Hughes 117d20c527 Add doc comment 2025-10-04 17:14:09 +07:00
Wilfred Hughes cabe203465 Improve doc comment 2025-07-30 09:40:25 +07:00
Wilfred Hughes ba45a40f71 Elide lifetimes in more places
Versions of clippy after the MSRV complain about these, and it's fine
on our current Rust version too.
2025-03-18 00:27:11 +07:00
Wilfred Hughes d8b715bd5b Rename myers_diff to LCS diff as it's not actually Myers algorithm 2025-03-09 23:55:08 +07:00
Wilfred Hughes ca9b7da43f Run cargo fmt 2025-03-06 23:03:40 +07:00
Wilfred Hughes 8953c55cf8 Pass String to new_atom
This is a very tiny perf hit, but allows us to pass newly allocated
strings to new_atom(), which will be necessary for normalising
case-insensitive languages.
2025-02-23 20:08:45 +07:00
Wilfred Hughes 649c557708 Fix some clippy lints 2024-12-19 21:29:31 +07:00
Wilfred Hughes 39e645832e Fix compilation on older Rust versions 2024-11-15 22:08:15 +07:00
Wilfred Hughes d5b1e26d70 Add a debug helper for syntax tree as DOT 2024-11-14 22:55:00 +07:00
Wilfred Hughes 819a672df8 Clarify content ID in debug output on Syntax 2024-11-15 00:03:30 +07:00
Wilfred Hughes 549cb483fe Fix crash due to trailing newlines in string nodes at EOF
Fixes #782
2024-11-15 00:03:30 +07:00
Andreas Deininger 5ecf3c1eb2 Bump GitHub action workflows to their latest versions 2024-09-11 21:22:59 +07:00
Wilfred Hughes 0973998de2 Clarify enum variant NovelLinePart and expand doc comments 2024-07-30 15:33:37 +07:00
Wilfred Hughes 92fa3fb3de Ensure files with no common content are aligned 2024-07-20 23:43:04 +07:00
Wilfred Hughes ffe27c575e Ensure line splitting distinguishes "foo" and "foo\n"
We rely on being able to split lines and rejoin them to obtain the
original string. `str::lines()` in the Rust stdlib does not have this
property.

This was causing crashes in word-diffing on textual diffing, where
code paths differed on the number of lines they thought a string had.

This was broken in 8b842387a1.

Fixes #688.
2024-07-20 16:09:44 +07:00
Wilfred Hughes 03d1f9bf26 Lint against .to_string() on String 2024-05-07 08:39:07 +07:00
Wilfred Hughes 5e38261b77 cargo fmt 2024-02-29 00:56:16 +07:00
Wilfred Hughes 7e8f928926 Add doc comments 2024-02-29 00:10:52 +07:00
Wilfred Hughes cac80e992a Avoid `res` locals in favour of more meaningful names 2023-11-28 13:27:27 +07:00
Wilfred Hughes 569f0038d1 Always filter blank lines at start and end in positions
Fixes #595
2023-11-28 12:35:28 +07:00
Wilfred Hughes d89d057345 Clarify parameter name 2023-11-28 11:57:11 +07:00
Wilfred Hughes e96c9463a0 Fix typo 2023-11-28 11:15:11 +07:00
Wilfred Hughes 1ec868e1df Update to latest line-numbers 2023-11-19 13:11:07 +07:00
Wilfred Hughes f2b3b34bec Use pub(crate) everywhere for visibility
This isn't strictly necessary since difftastic is a binary-only
crate. However, it improves compiler warnings (see next commit) and
potentially helps future changes to make difftastic available as a
library.
2023-11-18 16:46:13 +07:00
Wilfred Hughes 60d0f61cbd Define a separate words module 2023-11-18 16:46:13 +07:00
Wilfred Hughes 6dd0c70767 Add TODO 2023-09-12 13:05:05 +07:00
Wilfred Hughes 1e7866b64e Do word diffing on text too 2023-09-12 13:03:27 +07:00
Wilfred Hughes 243a4a5f48 Group imports consistently
This corresponds to:

$ cargo +nightly fmt -- --config group_imports=StdExternalCrate

Since this option is only available on nightly, I'm not adding a
rustfmt.toml to enforce this, just doing it as a one-off run.
2023-09-12 12:32:51 +07:00
Wilfred Hughes b78ba2da4b Use type names from line_numbers directly 2023-08-26 20:36:07 +07:00
Wilfred Hughes 41c9165c79 Use my line_numbers crate for newline position calculations 2023-08-26 16:25:32 +07:00
Wilfred Hughes f6ceb2aefd Update unit test new subword highlighting heuristic 2023-07-12 12:48:45 +07:00
Wilfred Hughes a814e01d22 Improve word diffing heuristic and add another sample file 2023-07-12 12:12:32 +07:00
Wilfred Hughes 1d3b6836ef Handle multiline atoms more accurately in split_atom_words 2023-07-12 11:49:39 +07:00
Wilfred Hughes 5824322244 Require some common words to do subword highlighting
This is important when comparing short string literals. This change
has improved several cases in sample_files/ but I've added a new
example that made the previous unwanted behaviour much more obvious.
2023-07-10 09:03:21 +07:00
Wilfred Hughes 8eb949eb02 Use DftHashMap everywhere
This is a 4% reduction in instructions for typing_before.ml, but a
0.2% increase instructions for slow_before.rs. This seems like a win
overall, and it also keeps the codebase more consistent and simpler.
2023-07-09 15:41:01 +07:00
Wilfred Hughes 27f59c0b3a Don't treat - as a word constituent
This produces slightly better results with some string replacements.
2023-07-08 17:16:14 +07:00
Zhenge Chen ffd49d523a Detect replaced strings
If a string is replaced with another, apply subword highlighting
similar to how we handle replaced comments.

Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>
2023-07-08 17:16:06 +07:00
Wilfred Hughes 87d27c5598 Only split numbers inside comments
Inside text files, it seems to be better to be conservative and
consider abc123def as one word rather than three.

This is noticeable when looking at changes to the compare.expected
file, which contains hashes. 123c456 and 345c789 don't really have a
`c` in common, so subword highlighting is ugly.
2023-07-07 08:40:06 +07:00
Wilfred Hughes c07e640b24 Remove contiguous penalty
The contiguous penalty was an attempt to fix the slider problem:

// Old
A B
C D

// New
A B
A B
C D

// Unwanted diff
A +B+
+A+ B
C D

However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.

This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.

There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.

Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
2023-07-06 08:37:02 +07:00
Wilfred Hughes 3730580ca3 Improve word splitting heuristics
This is particularly noticeable when diffing comments with timestamps
2000-12-31T23:59:59 where we don't want 31T23 to be a single word.
2023-06-29 08:33:30 +07:00
Wilfred Hughes 87f19f5e10 Don't including trailing newlines in comment nodes
This makes constructing hunks harder to reason about.

This change doesn't affect output, but helps when debugging, as it
makes multiline atoms much less common.
2023-04-30 09:51:39 +07:00
Wilfred Hughes d521b29c9e set_prev_sibling should always recurse 2023-04-20 08:42:10 +07:00
Wilfred Hughes 8b842387a1 Don't clean trailing newline before diffing
Difftastic should take the user's input as-is, or it risks performing
an incorrect diff in both textual and syntactic diffing.

Fixes #499
2023-03-30 08:46:11 +07:00
Wilfred Hughes c9105ca0ba cargo fmt 2023-01-15 15:49:24 +07:00
Wilfred Hughes a488efd63b Add highlighting for ignored syntactic elements
This finishes --ignore-comment support.

Fixes #449.
2023-01-15 14:49:46 +07:00
Wilfred Hughes 0e3c57c64a Skip unique items before computing Myer's diff on text
This substantially improves performance on text files where there are
few lines in common.

For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.

Fixes the case mentioned by @quackenbush in #236.

This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
2023-01-15 11:38:02 +07:00
Wilfred Hughes 8a799af0ff cargo fmt 2023-01-06 18:18:37 +07:00
Wilfred Hughes d8d4b8c003 Add is_all_whitespace helper function 2023-01-06 08:36:54 +07:00
Wilfred Hughes 0fc1842595 Improve word highlighting heuristics in comments
Previously we highlighted changed whitespace, which led to ugly
results if the number of words changed (there was a different number
of whitespace characters so some were highlighted).

Also treat _ and - as word constituents, as it produces nicer results
when people write example CLI invocations in comments.
2023-01-02 16:56:31 +07:00
QuarticCat 2c6972c1b2
Fix more clippy warnings 2022-09-28 05:47:34 +07:00