Commit Graph

53 Commits (f1c69d3b92593414bb9a8b14183b6f06d6a32966)

Author SHA1 Message Date
Wilfred Hughes f1c69d3b92 WIP don't use StringIgnoringNewline due to #755 2024-09-29 21:46:01 +07:00
Wilfred Hughes 1ac95534fe Don't push empty positions when diffing lines 2024-07-30 16:16:34 +07:00
Wilfred Hughes 86612798ad Try ignoring trailing newlines in line-based differ 2024-07-30 16:09:40 +07:00
Wilfred Hughes 0973998de2 Clarify enum variant NovelLinePart and expand doc comments 2024-07-30 15:33:37 +07:00
Wilfred Hughes c2f4b1f2ee Update tests and changelog for 1e8be4558b 2024-07-21 11:15:54 +07:00
Wilfred Hughes 92fa3fb3de Ensure files with no common content are aligned 2024-07-20 23:43:04 +07:00
Wilfred Hughes 3be8e80fe7 Fix issue with later lines not having positions during diffing 2024-03-19 00:25:18 +07:00
Wilfred Hughes 53298e4240 Set a length limit on lines when doing a word diff
See #653
2024-02-29 00:54:55 +07:00
Wilfred Hughes cac80e992a Avoid `res` locals in favour of more meaningful names 2023-11-28 13:27:27 +07:00
Wilfred Hughes 1ec868e1df Update to latest line-numbers 2023-11-19 13:11:07 +07:00
Wilfred Hughes fe62cf4cf5 Don't ignore novel blank lines
Fixes #575
2023-11-18 17:27:41 +07:00
Wilfred Hughes f2b3b34bec Use pub(crate) everywhere for visibility
This isn't strictly necessary since difftastic is a binary-only
crate. However, it improves compiler warnings (see next commit) and
potentially helps future changes to make difftastic available as a
library.
2023-11-18 16:46:13 +07:00
Wilfred Hughes 60d0f61cbd Define a separate words module 2023-11-18 16:46:13 +07:00
Wilfred Hughes 243a4a5f48 Group imports consistently
This corresponds to:

$ cargo +nightly fmt -- --config group_imports=StdExternalCrate

Since this option is only available on nightly, I'm not adding a
rustfmt.toml to enforce this, just doing it as a one-off run.
2023-09-12 12:32:51 +07:00
Wilfred Hughes b78ba2da4b Use type names from line_numbers directly 2023-08-26 20:36:07 +07:00
Wilfred Hughes 41c9165c79 Use my line_numbers crate for newline position calculations 2023-08-26 16:25:32 +07:00
Wilfred Hughes f3b02f7b47 cargo fmt 2023-01-15 11:43:09 +07:00
Wilfred Hughes 0e3c57c64a Skip unique items before computing Myer's diff on text
This substantially improves performance on text files where there are
few lines in common.

For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.

Fixes the case mentioned by @quackenbush in #236.

This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
2023-01-15 11:38:02 +07:00
Wilfred Hughes c08eefb14a Move slice_by_hash to myers_diff and add unit tests 2023-01-15 11:03:31 +07:00
Wilfred Hughes c5fe152f25 Define a parse submodule 2022-05-25 09:28:12 +07:00
Wilfred Hughes 373d7d9d81 Define a diff submodule 2022-05-24 09:33:47 +07:00
Wilfred Hughes 5703f75568 Don't assume that lines end with newlines
Previously we would crash if the last line in a file had no trailing
newline and ended with a multibyte character.

Closes #217
2022-03-30 22:42:17 +07:00
Wilfred Hughes 59ee169ddc Remove leftover debug logging 2022-03-24 21:27:27 +07:00
Wilfred Hughes 503c8b26ec Fix tests broken in 9e32e2e08
The changes in context.rs were intentional, the changes in
line_parser.rs were a result of bad stash merging. Revert the
line_parser.rs changes.
2022-03-21 23:55:32 +07:00
Wilfred Hughes 9e32e2e08e Ensure matched lines includes blanks at the ends of the file
Fixes #163
2022-03-20 22:31:32 +07:00
Wilfred Hughes 4563648a9f Clarify ChangeKind helper method name 2022-03-20 15:17:32 +07:00
Wilfred Hughes f4f12003cb Fix minor clippy lints 2022-03-16 22:42:31 +07:00
Wilfred Hughes 6210921104 Use Myers' diff for word-level diffing too
This further improves performance on large text files. On the sample
files in #153, this improves performance from 99B instructions to 29B
instructions on my machine.
2022-03-12 12:19:57 +07:00
Wilfred Hughes 5d8af55231 Prefer myers_diff types 2022-03-12 12:17:26 +07:00
Wilfred Hughes edee567e61 Factor out a myers_diff module 2022-03-12 12:15:59 +07:00
Wilfred Hughes afb1b369f4 Switch to wu-diff for textual diffing
In #153 a user reported difftastic never terminated on a 140,000
file. This was due to the diff crate using a very large amount of time
and memory.

The diff crate does not use Myers' algorithm, which has a
divide-and-conquer approach using snakes:

https://blog.jcoglan.com/2017/03/22/myers-diff-in-linear-space-theory/

wu-diff does implement Myer's algorithm and performs much better on
these large files.
2022-03-10 23:12:25 +07:00
Wilfred Hughes 88a0c10c9d Adding TODO 2022-02-09 00:07:54 +07:00
Wilfred Hughes 9c71d95755 Don't allocate strings in split_words()
It's faster, especially in large textual diffs (3% on my test file).
2022-02-09 00:04:53 +07:00
Wilfred Hughes ab8e8e4485 Treat NovelLinePart as a change
Previously we'd treat it as unchanged, leading to incorrect text diffs
when words were added on a single side.

Fixes #122
2022-02-07 20:37:01 +07:00
Wilfred Hughes 908c2509d6 Rename UnchangedLinePart to NovelLinePart to reflect its usage
It's used in larger novel atoms (big comments and big string
literals).
2022-02-07 20:25:06 +07:00
Wilfred Hughes 72034f141d Clarify MatchKind name for novel words 2022-02-07 20:24:01 +07:00
Wilfred Hughes 1d2f08ca75 Improve internal docs 2022-02-07 20:21:39 +07:00
Wilfred Hughes 5186bffe83 Improve efficiency of line-based diffing
On my large JSON test file, this is a 2% reduction in time.

Test file is package-lock.json from
91b378e1fa
2022-02-06 23:53:35 +07:00
Wilfred Hughes f3faf3ebaf Reuse string slices rather than allocating new ones
This is a very small performance win on large textual diffs (0.2%).
2022-02-06 23:18:37 +07:00
Wilfred Hughes d0ce2baf14 Rename MatchKind variants to not assume comments 2022-02-06 16:50:57 +07:00
Wilfred Hughes 6a056e3630 Ensure textual diffs aren't highlighted as comments 2022-02-06 16:48:12 +07:00
Wilfred Hughes de89caadb3 Don't consider newlines to be words in the line parser
This causes us to match up unrelated lines, and doesn't make sense for
a line parser.

Improves #90.
2022-01-22 17:56:12 +07:00
Wilfred Hughes ae4efa4082 Add unit tests for line parser 2022-01-16 21:45:47 +07:00
Wilfred Hughes 027856d707 Adding a line-based textual differ that ignores trees 2022-01-05 09:39:19 +07:00
Wilfred Hughes 3df7bb57e1 Add basic syntax highlighting for keywords and operators
Helps with #32
2021-10-03 15:23:27 +07:00
Wilfred Hughes 60150a3826 Make change private on SyntaxInfo 2021-09-21 22:42:24 +07:00
Wilfred Hughes d7b0c917c1 Remove regex parser 2021-09-19 12:17:25 +07:00
Wilfred Hughes 058b9e3c12 Expand module docs 2021-09-14 00:07:28 +07:00
Wilfred Hughes e0075300b3 Add TODO on word-level diff costs 2021-09-11 23:04:23 +07:00
Wilfred Hughes 98564e9ba8 Add TODO for line-based perf 2021-09-11 21:15:54 +07:00