Wilfred Hughes
f1c69d3b92
WIP don't use StringIgnoringNewline due to #755
2024-09-29 21:46:01 +07:00
Wilfred Hughes
1ac95534fe
Don't push empty positions when diffing lines
2024-07-30 16:16:34 +07:00
Wilfred Hughes
86612798ad
Try ignoring trailing newlines in line-based differ
2024-07-30 16:09:40 +07:00
Wilfred Hughes
0973998de2
Clarify enum variant NovelLinePart and expand doc comments
2024-07-30 15:33:37 +07:00
Wilfred Hughes
c2f4b1f2ee
Update tests and changelog for 1e8be4558b
2024-07-21 11:15:54 +07:00
Wilfred Hughes
92fa3fb3de
Ensure files with no common content are aligned
2024-07-20 23:43:04 +07:00
Wilfred Hughes
3be8e80fe7
Fix issue with later lines not having positions during diffing
2024-03-19 00:25:18 +07:00
Wilfred Hughes
53298e4240
Set a length limit on lines when doing a word diff
...
See #653
2024-02-29 00:54:55 +07:00
Wilfred Hughes
cac80e992a
Avoid `res` locals in favour of more meaningful names
2023-11-28 13:27:27 +07:00
Wilfred Hughes
1ec868e1df
Update to latest line-numbers
2023-11-19 13:11:07 +07:00
Wilfred Hughes
fe62cf4cf5
Don't ignore novel blank lines
...
Fixes #575
2023-11-18 17:27:41 +07:00
Wilfred Hughes
f2b3b34bec
Use pub(crate) everywhere for visibility
...
This isn't strictly necessary since difftastic is a binary-only
crate. However, it improves compiler warnings (see next commit) and
potentially helps future changes to make difftastic available as a
library.
2023-11-18 16:46:13 +07:00
Wilfred Hughes
60d0f61cbd
Define a separate words module
2023-11-18 16:46:13 +07:00
Wilfred Hughes
243a4a5f48
Group imports consistently
...
This corresponds to:
$ cargo +nightly fmt -- --config group_imports=StdExternalCrate
Since this option is only available on nightly, I'm not adding a
rustfmt.toml to enforce this, just doing it as a one-off run.
2023-09-12 12:32:51 +07:00
Wilfred Hughes
b78ba2da4b
Use type names from line_numbers directly
2023-08-26 20:36:07 +07:00
Wilfred Hughes
41c9165c79
Use my line_numbers crate for newline position calculations
2023-08-26 16:25:32 +07:00
Wilfred Hughes
f3b02f7b47
cargo fmt
2023-01-15 11:43:09 +07:00
Wilfred Hughes
0e3c57c64a
Skip unique items before computing Myer's diff on text
...
This substantially improves performance on text files where there are
few lines in common.
For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.
Fixes the case mentioned by @quackenbush in #236 .
This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
2023-01-15 11:38:02 +07:00
Wilfred Hughes
c08eefb14a
Move slice_by_hash to myers_diff and add unit tests
2023-01-15 11:03:31 +07:00
Wilfred Hughes
c5fe152f25
Define a parse submodule
2022-05-25 09:28:12 +07:00
Wilfred Hughes
373d7d9d81
Define a diff submodule
2022-05-24 09:33:47 +07:00
Wilfred Hughes
5703f75568
Don't assume that lines end with newlines
...
Previously we would crash if the last line in a file had no trailing
newline and ended with a multibyte character.
Closes #217
2022-03-30 22:42:17 +07:00
Wilfred Hughes
59ee169ddc
Remove leftover debug logging
2022-03-24 21:27:27 +07:00
Wilfred Hughes
503c8b26ec
Fix tests broken in 9e32e2e08
...
The changes in context.rs were intentional, the changes in
line_parser.rs were a result of bad stash merging. Revert the
line_parser.rs changes.
2022-03-21 23:55:32 +07:00
Wilfred Hughes
9e32e2e08e
Ensure matched lines includes blanks at the ends of the file
...
Fixes #163
2022-03-20 22:31:32 +07:00
Wilfred Hughes
4563648a9f
Clarify ChangeKind helper method name
2022-03-20 15:17:32 +07:00
Wilfred Hughes
f4f12003cb
Fix minor clippy lints
2022-03-16 22:42:31 +07:00
Wilfred Hughes
6210921104
Use Myers' diff for word-level diffing too
...
This further improves performance on large text files. On the sample
files in #153 , this improves performance from 99B instructions to 29B
instructions on my machine.
2022-03-12 12:19:57 +07:00
Wilfred Hughes
5d8af55231
Prefer myers_diff types
2022-03-12 12:17:26 +07:00
Wilfred Hughes
edee567e61
Factor out a myers_diff module
2022-03-12 12:15:59 +07:00
Wilfred Hughes
afb1b369f4
Switch to wu-diff for textual diffing
...
In #153 a user reported difftastic never terminated on a 140,000
file. This was due to the diff crate using a very large amount of time
and memory.
The diff crate does not use Myers' algorithm, which has a
divide-and-conquer approach using snakes:
https://blog.jcoglan.com/2017/03/22/myers-diff-in-linear-space-theory/
wu-diff does implement Myer's algorithm and performs much better on
these large files.
2022-03-10 23:12:25 +07:00
Wilfred Hughes
88a0c10c9d
Adding TODO
2022-02-09 00:07:54 +07:00
Wilfred Hughes
9c71d95755
Don't allocate strings in split_words()
...
It's faster, especially in large textual diffs (3% on my test file).
2022-02-09 00:04:53 +07:00
Wilfred Hughes
ab8e8e4485
Treat NovelLinePart as a change
...
Previously we'd treat it as unchanged, leading to incorrect text diffs
when words were added on a single side.
Fixes #122
2022-02-07 20:37:01 +07:00
Wilfred Hughes
908c2509d6
Rename UnchangedLinePart to NovelLinePart to reflect its usage
...
It's used in larger novel atoms (big comments and big string
literals).
2022-02-07 20:25:06 +07:00
Wilfred Hughes
72034f141d
Clarify MatchKind name for novel words
2022-02-07 20:24:01 +07:00
Wilfred Hughes
1d2f08ca75
Improve internal docs
2022-02-07 20:21:39 +07:00
Wilfred Hughes
5186bffe83
Improve efficiency of line-based diffing
...
On my large JSON test file, this is a 2% reduction in time.
Test file is package-lock.json from
91b378e1fa
2022-02-06 23:53:35 +07:00
Wilfred Hughes
f3faf3ebaf
Reuse string slices rather than allocating new ones
...
This is a very small performance win on large textual diffs (0.2%).
2022-02-06 23:18:37 +07:00
Wilfred Hughes
d0ce2baf14
Rename MatchKind variants to not assume comments
2022-02-06 16:50:57 +07:00
Wilfred Hughes
6a056e3630
Ensure textual diffs aren't highlighted as comments
2022-02-06 16:48:12 +07:00
Wilfred Hughes
de89caadb3
Don't consider newlines to be words in the line parser
...
This causes us to match up unrelated lines, and doesn't make sense for
a line parser.
Improves #90 .
2022-01-22 17:56:12 +07:00
Wilfred Hughes
ae4efa4082
Add unit tests for line parser
2022-01-16 21:45:47 +07:00
Wilfred Hughes
027856d707
Adding a line-based textual differ that ignores trees
2022-01-05 09:39:19 +07:00
Wilfred Hughes
3df7bb57e1
Add basic syntax highlighting for keywords and operators
...
Helps with #32
2021-10-03 15:23:27 +07:00
Wilfred Hughes
60150a3826
Make change private on SyntaxInfo
2021-09-21 22:42:24 +07:00
Wilfred Hughes
d7b0c917c1
Remove regex parser
2021-09-19 12:17:25 +07:00
Wilfred Hughes
058b9e3c12
Expand module docs
2021-09-14 00:07:28 +07:00
Wilfred Hughes
e0075300b3
Add TODO on word-level diff costs
2021-09-11 23:04:23 +07:00
Wilfred Hughes
98564e9ba8
Add TODO for line-based perf
2021-09-11 21:15:54 +07:00