Commit Graph

8317 Commits (c15ba1e75ae6de31661d525dbbb386ac21094ca0)
 

Author SHA1 Message Date
Wilfred Hughes b2229d66a8 Always display all lines in a hunk
Previously we were assuming that the first/last line pairs in a hunk
contained the earliest/latest lines on both sides. This isn't true
when there are no common items between the lines.

This fixes some display issues in load_before/after.js, but include a
new integration test that is smaller and easier to eyeball.

Fixes #133
2022-03-15 00:11:30 +07:00
Wilfred Hughes 6d9dc8322f Tweak motto 2022-03-13 22:14:03 +07:00
Wilfred Hughes 8c469e3d36 Ensure slider correction sets the opposite node on both sides
Fixes #154
2022-03-13 21:23:58 +07:00
Wilfred Hughes 19d11951ce Update changelog for c0d1faae6 2022-03-13 20:26:59 +07:00
Wilfred Hughes 7e8bf37bbc Use NonZeroU32 for unique IDs
This is a modest memory saving (reducing instruction counts by 3%
too), and it's nice having a distinct type for IDs.
2022-03-13 18:41:39 +07:00
Wilfred Hughes 0eb9f82aed Include node ID when printing ChangeKind 2022-03-13 18:28:33 +07:00
Wilfred Hughes c0d1faae63 Keep exploring the graph even when we find matched delimiters
Previously we'd get tripped up by cases where choosing equal
delimiters would be be considered the same as entering each delimiter
separately, making diffs worse.

Fixes #147
2022-03-13 15:41:54 +07:00
Wilfred Hughes bd6a1cbdc6 Store can_pop_either in Vertex 2022-03-13 13:56:18 +07:00
Wilfred Hughes 85fc2c6340 cargo fmt 2022-03-13 13:16:40 +07:00
Wilfred Hughes d416ee289f Improve debug printing
Clarify that we're printing count, not node IDs.
2022-03-13 12:43:50 +07:00
Wilfred Hughes 615c18d63e Clarify cost choices more 2022-03-12 18:03:26 +07:00
Wilfred Hughes 20aa06db5f Remove obsolete comment 2022-03-12 18:01:08 +07:00
Wilfred Hughes fdf5c2f88e Improve bug report template wording
Closes #138
2022-03-12 17:44:54 +07:00
Wilfred Hughes 50a3eb958e
Update issue templates 2022-03-12 17:37:14 +07:00
Wilfred Hughes 9439a932e6 Optimise hunk search within matched_lines
This reduces instructions from 29B instructions to 22B for the sample
file in #153.
2022-03-12 14:40:49 +07:00
Wilfred Hughes dad463daf5 Use Myers' diff everywhere
The diff crate has a great ergonomic API, but it doesn't implement
Myers' algorithm and performs badly on large inputs.

https://github.com/utkarshkukreti/diff.rs/issues/1

Now that we have a wrapper wu_diff that provides a similar API,
replace the remaining call sites to diff::slice(). These are
relatively cold, so this is a small performance improvement (1%
instruction reduction).
2022-03-12 12:29:34 +07:00
Wilfred Hughes dbf088bd25 cargo fmt 2022-03-12 12:22:38 +07:00
Wilfred Hughes 6210921104 Use Myers' diff for word-level diffing too
This further improves performance on large text files. On the sample
files in #153, this improves performance from 99B instructions to 29B
instructions on my machine.
2022-03-12 12:19:57 +07:00
Wilfred Hughes 5d8af55231 Prefer myers_diff types 2022-03-12 12:17:26 +07:00
Wilfred Hughes edee567e61 Factor out a myers_diff module 2022-03-12 12:15:59 +07:00
Wilfred Hughes dba68d1d2a Don't run syntax highlighting when dumping the tree-sitter output
For large files, tree-sitter syntax highlighting is much more
expensive than the parse itself. We spend most of the runtime
advancing the tree-sitter query cursor.

This doesn't affect runtime of normal usage, but it helps debugging
and makes flamegraphs more readable.

Spotted in #153
2022-03-12 11:53:39 +07:00
Wilfred Hughes 2f8ccd94da Add sample file showing slow perf large adjacent lists
Part of #156.
2022-03-12 11:28:28 +07:00
Wilfred Hughes afb1b369f4 Switch to wu-diff for textual diffing
In #153 a user reported difftastic never terminated on a 140,000
file. This was due to the diff crate using a very large amount of time
and memory.

The diff crate does not use Myers' algorithm, which has a
divide-and-conquer approach using snakes:

https://blog.jcoglan.com/2017/03/22/myers-diff-in-linear-space-theory/

wu-diff does implement Myer's algorithm and performs much better on
these large files.
2022-03-10 23:12:25 +07:00
Wilfred Hughes e8d9ffa61c Remove unwanted debugging 2022-03-10 22:59:28 +07:00
Wilfred Hughes c81911be51 Allow the unchanged tree threshold to be changed with an env var
This is helpful when debugging production use cases. Fixes #155
2022-03-10 21:15:57 +07:00
Wilfred Hughes a3a2bfb317 Roll version 2022-03-10 00:13:26 +07:00
Wilfred Hughes 918b6a1ba3 Don't consider Hack files with <?hh headers and .php extension as PHP 2022-03-10 00:03:57 +07:00
Wilfred Hughes a10e91e1cf Add symlinks for building PHP parser 2022-03-09 23:55:18 +07:00
Wilfred Hughes ed0bde6b91 Adding support for PHP 2022-03-09 23:52:31 +07:00
Wilfred Hughes 51a4da6d7c Add 'vendor/tree-sitter-php/' from commit '0ce134234214427b6aeb2735e93a307881c6cd6f'
git-subtree-dir: vendor/tree-sitter-php
git-subtree-mainline: 020983cd85
git-subtree-split: 0ce1342342
2022-03-09 23:36:23 +07:00
Wilfred Hughes 020983cd85 Merge branch 'split_unchanged_regions'
This fixes #116
2022-03-09 23:12:02 +07:00
Wilfred Hughes 0927a2f9e6 Update changelog 2022-03-09 23:11:04 +07:00
Wilfred Hughes d8c16e561d Update regression tests 2022-03-09 23:04:44 +07:00
Wilfred Hughes f0e90b2aea Improve test naming and ensure they're exercising the relevant parts 2022-03-09 23:02:16 +07:00
Wilfred Hughes cc384bfa6d Add shrinking and get tests passing 2022-03-09 22:45:53 +07:00
Wilfred Hughes 80b9762216 Set threshold and document 2022-03-09 09:40:53 +07:00
Wilfred Hughes d1c060ca17 Make skip unchanged logic less aggressive 2022-03-09 09:40:53 +07:00
Wilfred Hughes f39721d792 Update most of the integration tests 2022-03-09 09:40:53 +07:00
Wilfred Hughes 1895df8f3b Allow unchanged nodes to be checked recursively, and hook up to main.rs 2022-03-09 09:40:53 +07:00
Wilfred Hughes fb7d2d7c86 Get basic examples working with impl 2022-03-09 09:40:53 +07:00
Wilfred Hughes e2c65e5743 unique ID and plumb in 2022-03-09 09:40:53 +07:00
Wilfred Hughes 08f8568eeb Mark, add tests 2022-03-09 09:40:53 +07:00
Wilfred Hughes 63588d2c61 WIP unchanged regions 2022-03-09 09:40:53 +07:00
Wilfred Hughes becb79f861 Clarify logging to distinguish graph nodes from syntax nodes 2022-03-08 22:11:54 +07:00
Wilfred Hughes ad77bba451 Add debug logging for the number of nodes diffed 2022-03-08 22:09:20 +07:00
Wilfred Hughes 17ff2bc07e < and > are delimiters in Rust and C++ 2022-03-08 20:42:57 +07:00
Wilfred Hughes 88ae00bd88 Use depth difference as a heuristic when comparing equal nodes
This reverts commit 7544874a55. It turns
out there are cases where this is still necessary (see new sample
file). It's also performance neutral.

This bug became more obvious with the recent 'skip unchanged'
optimisations. The optimisation changed the number of preceeding nodes and
exposed this bug more often.
2022-03-08 09:44:55 +07:00
Wilfred Hughes 6f65bbbbb0 Merge branch 'delim_type'
Introduce a new type EnteredDelimiter that tracks entering/leaving
list nodes. The PopEither and PopBoth cases reflect the choices more
accurately than a 2-tuple of options.

This is a performance hit (slow_before.rs runtime has increased by
49%) but it's important for diff correctness.

Fixes #147
2022-03-06 21:39:15 +07:00
Wilfred Hughes 1a85fb1271 Fix up tests, add doc comments 2022-03-06 21:34:36 +07:00
Wilfred Hughes c229cfb6bb Import Stack rather than qualifying name 2022-03-06 21:07:26 +07:00