Commit Graph

240 Commits (master)

Author SHA1 Message Date
Wilfred Hughes b78f7d447b Always replace tabs, even in single-column display
Fixes #617
2024-03-04 21:38:23 +07:00
Wilfred Hughes 7ddd8879b1 Merge branch 'scheme-support' of github.com:kutsurak/difftastic into kutsurak-scheme-support 2024-03-04 08:46:59 +07:00
Wilfred Hughes 53298e4240 Set a length limit on lines when doing a word diff
See #653
2024-02-29 00:54:55 +07:00
Brandon Maier e6b27caf06 Add support for devicetrees 2024-02-28 17:11:53 +07:00
Wilfred Hughes 7a00339977
Merge pull request #634 from evanrichter/smali
Smali language support
2024-02-19 11:57:54 +07:00
Wilfred Hughes 4fef3f7b00 Update regression test for QML parser 2024-02-15 23:40:56 +07:00
Wilfred Hughes e70224bb22 Update regression tests for new JS/TS parsers 2024-02-15 08:30:57 +07:00
Panagiotis Koutsourakis 67ada1ccd4 Add support for Scheme 2024-02-12 10:20:57 +07:00
Evan Richter d106c979ee add smali language support 2024-02-01 15:41:14 +07:00
Mikhail Brinchuk 297fa952c2
Merge branch 'Wilfred:master' into f# 2024-01-29 11:45:31 +07:00
Wilfred Hughes a3731067b2
Merge pull request #618 from arbrauns/tree-sitter-vhdl
Add tree-sitter-vhdl
2024-01-28 12:04:53 +07:00
Wilfred Hughes 6c4310e33b Remove macro from Objective-C sample file so it parses fully 2024-01-28 11:02:58 +07:00
Mikhail Brinchuk 9a894d5369 Added a regression test for F# 2024-01-28 12:36:13 +07:00
Armin Brauns c5638750d6 Add tree-sitter-vhdl 2024-01-09 09:23:51 +07:00
Wilfred Hughes 5c67ed08a9 Update integration test for new tab width 2024-01-07 19:26:07 +07:00
Wilfred Hughes db86b28a28 Add support for Objective-C
Closes #600

Co-authored-by: Nick Moore <nick@pilotmoon.com>
2024-01-07 12:50:19 +07:00
Wilfred Hughes db0c150f61 Report permission changes
Fixes #605
2023-12-30 11:20:00 +07:00
Wilfred Hughes 001279a2e1 Update expected output test 2023-12-08 21:51:29 +07:00
Rodolphe Blancho e18b5d0712
Merge branch 'master' into feature/salesforce_apex_support 2023-12-05 12:31:59 +07:00
Wilfred Hughes 569f0038d1 Always filter blank lines at start and end in positions
Fixes #595
2023-11-28 12:35:28 +07:00
Wilfred Hughes 8a58fb76ab Update regression test for SCSS capitalisation change 2023-11-25 01:20:31 +07:00
Wilfred Hughes fe62cf4cf5 Don't ignore novel blank lines
Fixes #575
2023-11-18 17:27:41 +07:00
Wilfred Hughes 778a6bee9a Flatten nullable types in Kotlin
Workaround for #589 and #411
2023-10-26 08:56:37 +07:00
Wilfred Hughes 81714c17ce
Merge pull request #573 from brneor/scss
Add Scss parser
2023-10-11 08:57:58 +07:00
Rodolphe Blancho 05d78ca741 add support for Salesforce Apex
Apex Language documentation:
https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_dev_guide.htm

Uses https://github.com/aheber/tree-sitter-sfapex
2023-10-06 11:08:02 +07:00
Wilfred Hughes 992437db1d Show the language name when parsing fails 2023-09-28 00:33:49 +07:00
Breno Reis fffbc17adc
add regression test for SCSS 2023-09-20 14:38:35 +07:00
Wilfred Hughes 8ebd6317d1 Add XML test file 2023-09-12 13:05:15 +07:00
Wilfred Hughes 1e7866b64e Do word diffing on text too 2023-09-12 13:03:27 +07:00
Wilfred Hughes d56f775f31 Highlight constructors consistently with type names 2023-09-03 01:30:22 +07:00
Wilfred Hughes fac4f3082b Update snapshot tests for new Rust parser 2023-08-18 23:07:54 +07:00
Wilfred Hughes 0af76db498 Update regression tests for humansize file formatting 2023-08-13 09:10:16 +07:00
Wilfred Hughes 757c297412 Adjust header style
Show the hunk count and detected language in a dimmed style. This
information is less important than the diff content itself, so this
change makes the important information more prominent.

First part of #544
2023-07-31 08:35:27 +07:00
Wilfred Hughes 797af40ae8 Improve Java highlighting 2023-07-27 08:33:38 +07:00
Wilfred Hughes 7caaaf7fcf Handle nested sliders correctly when preferring the outer delimiter
Previously we didn't check the state of children, which was an
oversight from the original implementation. As a result, we fixed
nested sliders in fewer situations.

Fixes #535
2023-07-14 08:49:55 +07:00
Wilfred Hughes 5606c04261 Treat qualified modules and variables as atoms in Haskell 2023-07-12 12:34:39 +07:00
Wilfred Hughes a814e01d22 Improve word diffing heuristic and add another sample file 2023-07-12 12:12:32 +07:00
Wilfred Hughes c2b7042b80 Do subword highlighting in more cases
This is useful when two strings substantially differ, but have the
same e.g. end.
2023-07-10 21:26:24 +07:00
Wilfred Hughes 5824322244 Require some common words to do subword highlighting
This is important when comparing short string literals. This change
has improved several cases in sample_files/ but I've added a new
example that made the previous unwanted behaviour much more obvious.
2023-07-10 09:03:21 +07:00
Wilfred Hughes 27f59c0b3a Don't treat - as a word constituent
This produces slightly better results with some string replacements.
2023-07-08 17:16:14 +07:00
Zhenge Chen ffd49d523a Detect replaced strings
If a string is replaced with another, apply subword highlighting
similar to how we handle replaced comments.

Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>
2023-07-08 17:16:06 +07:00
Wilfred Hughes 87d27c5598 Only split numbers inside comments
Inside text files, it seems to be better to be conservative and
consider abc123def as one word rather than three.

This is noticeable when looking at changes to the compare.expected
file, which contains hashes. 123c456 and 345c789 don't really have a
`c` in common, so subword highlighting is ugly.
2023-07-07 08:40:06 +07:00
Wilfred Hughes c07e640b24 Remove contiguous penalty
The contiguous penalty was an attempt to fix the slider problem:

// Old
A B
C D

// New
A B
A B
C D

// Unwanted diff
A +B+
+A+ B
C D

However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.

This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.

There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.

Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
2023-07-06 08:37:02 +07:00
Wilfred Hughes 43c24047b4 Don't track contiguous status on novel delimiter edges
This is harder to reason about, and
2e6666041f did not include a motivating
test case.

Removing contiguous status is a minor perf improvement (2% reduction
in instructions), makes the code simpler, and does not significantly
affect diffing results.

Of the two sample files that have changed, the erlang_before.erl file
has improved and nest_before.rs is neutral.
2023-07-04 23:53:16 +07:00
Wilfred Hughes c405b58327 Fix cost for ReplacedComment
This needs to be 2x novel nodes, or we prefer it far too often.
2023-07-02 23:12:31 +07:00
Wilfred Hughes 3730580ca3 Improve word splitting heuristics
This is particularly noticeable when diffing comments with timestamps
2000-12-31T23:59:59 where we don't want 31T23 to be a single word.
2023-06-29 08:33:30 +07:00
Wilfred Hughes 8b842387a1 Don't clean trailing newline before diffing
Difftastic should take the user's input as-is, or it risks performing
an incorrect diff in both textual and syntactic diffing.

Fixes #499
2023-03-30 08:46:11 +07:00
Wilfred Hughes 0ec2f3a319 Add test case that reproduces #499 2023-03-24 23:19:13 +07:00
Wilfred Hughes 3263612150 Update expected output for latest QML grammar 2023-03-17 00:55:59 +07:00
Wilfred Hughes a0f7ed5e78 Set explicit locales for ordering globs in integration test 2023-03-15 15:47:27 +07:00
Karl Ding 4a861c376e Manually order expected sample comparison file
It appears that the GitHub Actions runner is returning the glob path
results in a different ordering than the ordering obtained when locally
running the compare_all.sh script. This difference in the ordering
causes CI to fail due to differences to the generated expectation file.

This also seems to have been an issue in previous PRs---the solution
here is likely to sort the output before processing or figure out what
shell options cause the difference in glob ordering, and explicitly set
those in the shell script to eliminate the difference (or prevent the
script from inheriting anything but shell defaults).

For now, try reordering the output by hand to match the ordering the
GitHub Action runner likely expects.
2023-03-14 21:46:40 +07:00
Karl Ding 6b2947b4e9 Add Ada "Hello World" sample file 2023-03-14 21:46:40 +07:00
Wilfred Hughes 045d6a2c58 Treat Newick and Racket as lisps 2023-03-03 00:23:11 +07:00
Wilfred Hughes 9556cd978e Merge branch 'delehef/master' 2023-02-21 08:46:07 +07:00
Franklin Delehelle 21ded51e90 Add newick example files 2023-02-21 08:45:49 +07:00
6cdh 5e659e2d98 add racket highlight 2023-02-21 08:30:00 +07:00
6cdh 3c418168a4 fix bug in here_string 2023-02-12 16:45:38 +07:00
6cdh 2fade3e0bf fix here string 2023-02-12 13:57:45 +07:00
6cdh fe756905bf added Racket support 2023-02-12 13:39:58 +07:00
Wilfred Hughes 96fc044e6d Improve syntax highlighting
`@include` and `@exception` are both used for highlighting keywords in
several languages.
2023-02-10 08:50:43 +07:00
Wilfred Hughes 0f1d323bf3 Disable lua-match? highlighting predicates
This fixes inaccurate type highlighting in Scala.

See #310
2023-02-10 08:45:41 +07:00
Wilfred Hughes 5db907f393
Merge pull request #481 from hugo-vrijswijk/master
Update tree-sitter-scala
2023-02-08 17:52:46 +07:00
Wilfred Hughes df6f4618b6 Update expected output for 62da0d56cc 2023-02-08 17:51:28 +07:00
Wilfred Hughes 18b46812c0 Update C++ expected output for 63cf71641a 2023-02-08 17:49:05 +07:00
Hugo van Rijswijk afad33c71e update compare.expected 2023-02-08 11:16:19 +07:00
Wilfred Hughes 7d5afd78dc Respect --error-limit when parsing
Next step for #472
2023-02-04 22:29:18 +07:00
Wilfred Hughes 5ed4bac8a5 Add support for R
Fixes #470
2023-01-26 08:50:00 +07:00
Wilfred Hughes 9ce60140ce Update screenshot and ensure sample files match 2023-01-22 20:19:36 +07:00
Wilfred Hughes 0e3c57c64a Skip unique items before computing Myer's diff on text
This substantially improves performance on text files where there are
few lines in common.

For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.

Fixes the case mentioned by @quackenbush in #236.

This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
2023-01-15 11:38:02 +07:00
Wilfred Hughes efec759504 Only set language_used after a full syntactic diff
This fixes cases where the language is detected but the file hits the
byte limit.

Fixes #462.
2023-01-14 12:52:08 +07:00
Wilfred Hughes 63a3bf0c91 Ensure we use the correct config for sublanguage parsing
Otherwise get the wrong node names for atoms.
2023-01-08 22:24:43 +07:00
Wilfred Hughes 8ed4fbccfa Treat colour values (e.g. `#FFF`) as atoms in CSS 2023-01-08 22:22:46 +07:00
Wilfred Hughes 34967f588d Treat predefined_type as an atom in TypeScript
Currently it contains a nested string node, even though it's a fixed
set of known types. This was preventing us from applying good syntax
highlighting.

This was particularly noticeable with `string`, which wasn't
previously highlighted as a type.
2023-01-07 22:43:50 +07:00
Wilfred Hughes cd87796552 Treat doctype nodes as atoms in HTML
The tree-sitter parser doesn't include the text after DOCTYPE in the
inner tag.
2023-01-03 08:40:39 +07:00
Steinar H. Gunderson 9133918dd4 Support parsing of sub-languages.
This allows given nodes (configurable per-language, using tree-sitter's
query syntax) to be re-parsed as other languages. The canonical example
is CSS or JavaScript inside HTML, which normally would be a single token
but now can get the full range of syntax highlighting and tree diffing.

The config sets this up for only two languages: HTML (contains CSS or
JavaScript in <script> or <style> tags; we don't support style="" or
onclick="" etc. at this point), and Makefiles (contains Bash in
$(shell ...) commands). The latter is fairly obscure; the big win is
in the former.

It would be nice to also have this support for PHP; however, the HTML
parser seems to be a bit confused when asked to parse the partial HTML
blocks we get if we just mark the "text" blocks as HTML, so for this
to work well, probably the PHP blocks should be parsed as sub-languages
of HTML instead of vice versa.

Also, as a minor quibble, there should be support for bash in Perl's
backticks (similar to in Makefiles), but the tree-sitter Perl parser
does not support backticks at all (it goes into error recovery).

There may have been languages that I've missed, e.g. some languages
might have nodes that contain e.g. SQL.

Fixes #382. Potentially relevant to #376.
2023-01-03 08:31:48 +07:00
Wilfred Hughes 0fc1842595 Improve word highlighting heuristics in comments
Previously we highlighted changed whitespace, which led to ugly
results if the number of words changed (there was a different number
of whitespace characters so some were highlighted).

Also treat _ and - as word constituents, as it produces nicer results
when people write example CLI invocations in comments.
2023-01-02 16:56:31 +07:00
Wilfred Hughes e8e5ca8e47 Replace tabs during display, so parsing sees the original source
Fixes #350
2023-01-01 22:44:47 +07:00
Wilfred Hughes 00ecf36a22 Pop delimiters immediately, rather than having ExitDelimiter* edges
@QuarticCat observed that popping delimiters is unnecessary, and saw a
speedup in PR #401. This reduces the number of nodes in typical graphs
by ~20%, reducing runtime and memory usage.

This works because there is only one thing we can do at the end of a
list: pop the delimiter. The syntax node on the other side does not
give us more options, we have at most one. Popping all the delimiters
as soon as possible is equivalent, and produces the same graph route.

This change has also slightly changed the output of
samples_files/slow_after.rs, producing a better (more minimal)
diff. This is probably luck, due to the path-dependent nature of the
route solving logic, but it's a positive sign.

A huge thanks to @QuarticCat for their contributions, this is a huge
speedup.

Co-authored-by: QuarticCat <QuarticCat@pm.me>
2022-12-28 02:00:09 +07:00
Wilfred Hughes afc78e976d Document Erlang support and add test
Fixes #394
2022-12-15 23:30:45 +07:00
rhirano0715 436edb2ab4 Only add colour to the first hunk header
This helps skimming the results when multiple files are changed with
multiple hunks. It makes the file changing more prominent than just
going from e.g. 5/5 to 1/10.

Fixes #400

Acked-by: Wilfred Hughes <me@wilfred.me.uk>
2022-12-01 09:38:36 +07:00
Wilfred Hughes 2e7c90c472 Ensure line wrapping uses the same length on both sides
Closes #421
2022-11-13 00:35:06 +07:00
Wilfred Hughes 28c3b0ef5d Tweak line number styling to make it more distinct from content
Dim line numbers for unchanged lines, and make changed lines bold (in
addition to the existing red/green colours).

Closes #384
2022-10-28 20:34:36 +07:00
Wilfred Hughes 2a3346e338 Use apply_line_number_color consistently on LHS and RHS
Previously we missed a case on the LHS.
2022-10-28 20:18:17 +07:00
Wilfred Hughes 490787fe28 Factor out line number styling 2022-10-28 19:07:51 +07:00
Wilfred Hughes b9d44ae65f Treat error nodes as atoms
Fixes #408
2022-10-15 22:50:08 +07:00
Wilfred Hughes 39bd04002c
Merge pull request #369 from esawady/hare
Add Hare support
2022-09-15 09:33:07 +07:00
Wilfred Hughes cafd672cc8 Don't underline all changes in plaintext files
Fixes #371
2022-09-15 09:30:16 +07:00
Ember Sawady 7ed685ae52 Add support for Hare 2022-09-13 23:34:16 +07:00
Wilfred Hughes 3c51f58d8e Add Pascal support
Fixes #365
2022-09-13 00:05:23 +07:00
Wilfred Hughes f155a27522 Underline changed words in comments
This makes them easier to spot in larger changes.

Fixes #328
2022-09-10 15:54:04 +07:00
Yuya Nishihara 84f0b25fb6 Add support for QML
QML is a UI language, and its syntax is basically JSON-like structure
+ JavaScript. The tree-sitter parser is named after the upstream grammar
file qmljs.g, but the canonical language name is QML. So I choose Qml as
the Language enum.

https://doc.qt.io/qt-6/qmlapplications.html
2022-09-10 11:38:35 +07:00
Wilfred Hughes fe5ef8757d Give novel punctuation a lower edge cost
We'd rather see an unchanged variable name than an unchanged comma.

Fixes #366
2022-09-09 09:47:53 +07:00
Wilfred Hughes b104c4be10 Fix sliders in a single global pass
Previously we fixed sliders in each 'possibly changed' region. This
meant that we couldn't fix sliders that needed to move outside the
region. The most common case was code of the form `foo, bar, baz`
where `, baz` was unchanged but we wanted to slide to `,`.

We now call `fix_all_sliders` for the toplevel tree on both
sides. This required some minor changes to the slider logic, as the
unchanged/novel regions could occur at any level of the tree.

(It was probably also the case that we were missing slider
opportunities previously, because we terminated as soon as we found an
outer slider for the nested case.)

This change has no performance impact, probably because tree diffing
is vastly more expensive (O(N^2)) than sliders (O(N)).

Fixes #327
2022-09-02 18:10:09 +07:00
Wilfred Hughes eabefd5612 Factor out language name pretty-printing 2022-09-02 11:56:51 +07:00
Wilfred Hughes b1b3756fa7 Attempt to detect and decode UTF-16 files too
Closes #345
2022-08-28 15:38:57 +07:00
Wilfred Hughes 09334030ab Fix incorrect line number being used in side-by-side display
Fixes #334
2022-08-22 09:34:34 +07:00
Wilfred Hughes 026b2674d0 Update expected output
The previous two commits were done in a branch that was rebased, so
the integration tets missed 5fe6d551d.
2022-08-21 21:36:00 +07:00
Wilfred Hughes c957818514 Explore two graph nodes for each parenthesis position
This produces substantially better diff results, and fixes the 'last
item in the list shown as changed' problem.

This can produce slower diffing. typing_before.ml takes 10% more
instructions and slow_before.rs takes 110% more instructions.
2022-08-21 16:34:17 +07:00
Wilfred Hughes a71d6118cf Store predecessors and neighbours as mutable fields in graph nodes
This is a more traditional graph representation. It is slightly easier
to reason about, and it's clearer that graph node creation time
dominates graphs exploration.

This is a slight performance regression, but it enables better
exploration of parethesis nesting (see next commit). typing_before.ml
has regressed from 3.75B instructions to 3.85B instructions and
slow_before.rs has regressed from 1.73B instructions to 2.15B
instructions.

This change has also made the diff output for slow_before.rs slightly
worse (note the `lhs` variable is now claimed as changed in more
cases). It's not clear why, but presumably means that the node visit
order has changed slightly.

Closes #324
2022-08-21 16:25:54 +07:00
Wilfred Hughes 58c8f47298 Also consider highlights.scm when marking nodes as comments
This removes the need to special-case Perl, and is necessary for
CMake (which has nodes bracket_comment and line_comment that aren't
marked as 'extra').
2022-08-20 18:28:07 +07:00