difftastic

Commit Graph

Author	SHA1	Message	Date
Wilfred Hughes	797af40ae8	Improve Java highlighting	2023-07-27 08:33:38 +07:00
Wilfred Hughes	7caaaf7fcf	Handle nested sliders correctly when preferring the outer delimiter Previously we didn't check the state of children, which was an oversight from the original implementation. As a result, we fixed nested sliders in fewer situations. Fixes #535	2023-07-14 08:49:55 +07:00
Wilfred Hughes	5606c04261	Treat qualified modules and variables as atoms in Haskell	2023-07-12 12:34:39 +07:00
Wilfred Hughes	a814e01d22	Improve word diffing heuristic and add another sample file	2023-07-12 12:12:32 +07:00
Wilfred Hughes	c2b7042b80	Do subword highlighting in more cases This is useful when two strings substantially differ, but have the same e.g. end.	2023-07-10 21:26:24 +07:00
Wilfred Hughes	5824322244	Require some common words to do subword highlighting This is important when comparing short string literals. This change has improved several cases in sample_files/ but I've added a new example that made the previous unwanted behaviour much more obvious.	2023-07-10 09:03:21 +07:00
Wilfred Hughes	27f59c0b3a	Don't treat - as a word constituent This produces slightly better results with some string replacements.	2023-07-08 17:16:14 +07:00
Zhenge Chen	ffd49d523a	Detect replaced strings If a string is replaced with another, apply subword highlighting similar to how we handle replaced comments. Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>	2023-07-08 17:16:06 +07:00
Wilfred Hughes	87d27c5598	Only split numbers inside comments Inside text files, it seems to be better to be conservative and consider abc123def as one word rather than three. This is noticeable when looking at changes to the compare.expected file, which contains hashes. 123c456 and 345c789 don't really have a `c` in common, so subword highlighting is ugly.	2023-07-07 08:40:06 +07:00
Wilfred Hughes	c07e640b24	Remove contiguous penalty The contiguous penalty was an attempt to fix the slider problem: // Old A B C D // New A B A B C D // Unwanted diff A +B+ +A+ B C D However, it doesn't make sense for Dijkstra, which is stateless. The best route from vertex X is independent of how we got to vertex X. This worked by dumb luck: in some circumstances we terminate early rather than fully executing Dijkstra's algorithm. This cost tweak improved results on a few test files. However, the post-processing slider logic is a proper, general solution. This was added much later. There's no reason to keep the contiguous penalty now. It's confusing, and makes adding new edge costs with consistent 'X costs more than Y' behaviours more difficult. Performance is essentially neutral: a small decrease in typing_before.ml, a small increase in slow_before.rs.	2023-07-06 08:37:02 +07:00
Wilfred Hughes	43c24047b4	Don't track contiguous status on novel delimiter edges This is harder to reason about, and `2e6666041f` did not include a motivating test case. Removing contiguous status is a minor perf improvement (2% reduction in instructions), makes the code simpler, and does not significantly affect diffing results. Of the two sample files that have changed, the erlang_before.erl file has improved and nest_before.rs is neutral.	2023-07-04 23:53:16 +07:00
Wilfred Hughes	c405b58327	Fix cost for ReplacedComment This needs to be 2x novel nodes, or we prefer it far too often.	2023-07-02 23:12:31 +07:00
Wilfred Hughes	3730580ca3	Improve word splitting heuristics This is particularly noticeable when diffing comments with timestamps 2000-12-31T23:59:59 where we don't want 31T23 to be a single word.	2023-06-29 08:33:30 +07:00
Wilfred Hughes	8b842387a1	Don't clean trailing newline before diffing Difftastic should take the user's input as-is, or it risks performing an incorrect diff in both textual and syntactic diffing. Fixes #499	2023-03-30 08:46:11 +07:00
Wilfred Hughes	0ec2f3a319	Add test case that reproduces #499	2023-03-24 23:19:13 +07:00
Wilfred Hughes	3263612150	Update expected output for latest QML grammar	2023-03-17 00:55:59 +07:00
Wilfred Hughes	7fbe0d6c2f	Improve UTF-16 detection heuristics and add test	2023-03-16 00:31:58 +07:00
Wilfred Hughes	3e4df7d7dd	Add CLI test for the example in #433	2023-03-16 00:13:41 +07:00
Wilfred Hughes	a0f7ed5e78	Set explicit locales for ordering globs in integration test	2023-03-15 15:47:27 +07:00
Karl Ding	4a861c376e	Manually order expected sample comparison file It appears that the GitHub Actions runner is returning the glob path results in a different ordering than the ordering obtained when locally running the compare_all.sh script. This difference in the ordering causes CI to fail due to differences to the generated expectation file. This also seems to have been an issue in previous PRs---the solution here is likely to sort the output before processing or figure out what shell options cause the difference in glob ordering, and explicitly set those in the shell script to eliminate the difference (or prevent the script from inheriting anything but shell defaults). For now, try reordering the output by hand to match the ordering the GitHub Action runner likely expects.	2023-03-14 21:46:40 +07:00
Karl Ding	6b2947b4e9	Add Ada "Hello World" sample file	2023-03-14 21:46:40 +07:00
Wilfred Hughes	2d1a2c906e	Count errors on the root node too Fixes #377	2023-03-03 00:25:41 +07:00
Wilfred Hughes	045d6a2c58	Treat Newick and Racket as lisps	2023-03-03 00:23:11 +07:00
Wilfred Hughes	03985066f5	Treat Makefile text as atoms Improves another case identified in #476	2023-03-02 23:52:01 +07:00
Wilfred Hughes	9556cd978e	Merge branch 'delehef/master'	2023-02-21 08:46:07 +07:00
Franklin Delehelle	21ded51e90	Add newick example files	2023-02-21 08:45:49 +07:00
6cdh	5e659e2d98	add racket highlight	2023-02-21 08:30:00 +07:00
6cdh	3c418168a4	fix bug in here_string	2023-02-12 16:45:38 +07:00
6cdh	2fade3e0bf	fix here string	2023-02-12 13:57:45 +07:00
6cdh	fe756905bf	added Racket support	2023-02-12 13:39:58 +07:00
Wilfred Hughes	96fc044e6d	Improve syntax highlighting `@include` and `@exception` are both used for highlighting keywords in several languages.	2023-02-10 08:50:43 +07:00
Wilfred Hughes	0f1d323bf3	Disable lua-match? highlighting predicates This fixes inaccurate type highlighting in Scala. See #310	2023-02-10 08:45:41 +07:00
Wilfred Hughes	5db907f393	Merge pull request #481 from hugo-vrijswijk/master Update tree-sitter-scala	2023-02-08 17:52:46 +07:00
Wilfred Hughes	df6f4618b6	Update expected output for `62da0d56cc`	2023-02-08 17:51:28 +07:00
Wilfred Hughes	18b46812c0	Update C++ expected output for `63cf71641a`	2023-02-08 17:49:05 +07:00
Hugo van Rijswijk	afad33c71e	update compare.expected	2023-02-08 11:16:19 +07:00
Wilfred Hughes	7d5afd78dc	Respect --error-limit when parsing Next step for #472	2023-02-04 22:29:18 +07:00
Wilfred Hughes	5ed4bac8a5	Add support for R Fixes #470	2023-01-26 08:50:00 +07:00
Wilfred Hughes	9ce60140ce	Update screenshot and ensure sample files match	2023-01-22 20:19:36 +07:00
Wilfred Hughes	0e3c57c64a	Skip unique items before computing Myer's diff on text This substantially improves performance on text files where there are few lines in common. For example, 10,000 line files with no lines in common is more than 10x faster (8.5 seconds to 0.49 seconds on my machine), and sample_files/huge_cpp_before.cpp is nearly 2% faster. Fixes the case mentioned by @quackenbush in #236. This is inspired by the heuristics discussions at https://github.com/mitsuhiko/similar/issues/15	2023-01-15 11:38:02 +07:00
Wilfred Hughes	efec759504	Only set language_used after a full syntactic diff This fixes cases where the language is detected but the file hits the byte limit. Fixes #462.	2023-01-14 12:52:08 +07:00
Wilfred Hughes	63a3bf0c91	Ensure we use the correct config for sublanguage parsing Otherwise get the wrong node names for atoms.	2023-01-08 22:24:43 +07:00
Wilfred Hughes	8ed4fbccfa	Treat colour values (e.g. `#FFF`) as atoms in CSS	2023-01-08 22:22:46 +07:00
Wilfred Hughes	34967f588d	Treat predefined_type as an atom in TypeScript Currently it contains a nested string node, even though it's a fixed set of known types. This was preventing us from applying good syntax highlighting. This was particularly noticeable with `string`, which wasn't previously highlighted as a type.	2023-01-07 22:43:50 +07:00
Wilfred Hughes	cd87796552	Treat doctype nodes as atoms in HTML The tree-sitter parser doesn't include the text after DOCTYPE in the inner tag.	2023-01-03 08:40:39 +07:00
Steinar H. Gunderson	9133918dd4	Support parsing of sub-languages. This allows given nodes (configurable per-language, using tree-sitter's query syntax) to be re-parsed as other languages. The canonical example is CSS or JavaScript inside HTML, which normally would be a single token but now can get the full range of syntax highlighting and tree diffing. The config sets this up for only two languages: HTML (contains CSS or JavaScript in <script> or <style> tags; we don't support style="" or onclick="" etc. at this point), and Makefiles (contains Bash in $(shell ...) commands). The latter is fairly obscure; the big win is in the former. It would be nice to also have this support for PHP; however, the HTML parser seems to be a bit confused when asked to parse the partial HTML blocks we get if we just mark the "text" blocks as HTML, so for this to work well, probably the PHP blocks should be parsed as sub-languages of HTML instead of vice versa. Also, as a minor quibble, there should be support for bash in Perl's backticks (similar to in Makefiles), but the tree-sitter Perl parser does not support backticks at all (it goes into error recovery). There may have been languages that I've missed, e.g. some languages might have nodes that contain e.g. SQL. Fixes #382. Potentially relevant to #376.	2023-01-03 08:31:48 +07:00
Wilfred Hughes	0fc1842595	Improve word highlighting heuristics in comments Previously we highlighted changed whitespace, which led to ugly results if the number of words changed (there was a different number of whitespace characters so some were highlighted). Also treat _ and - as word constituents, as it produces nicer results when people write example CLI invocations in comments.	2023-01-02 16:56:31 +07:00
Wilfred Hughes	e8e5ca8e47	Replace tabs during display, so parsing sees the original source Fixes #350	2023-01-01 22:44:47 +07:00
Wilfred Hughes	00ecf36a22	Pop delimiters immediately, rather than having ExitDelimiter* edges @QuarticCat observed that popping delimiters is unnecessary, and saw a speedup in PR #401. This reduces the number of nodes in typical graphs by ~20%, reducing runtime and memory usage. This works because there is only one thing we can do at the end of a list: pop the delimiter. The syntax node on the other side does not give us more options, we have at most one. Popping all the delimiters as soon as possible is equivalent, and produces the same graph route. This change has also slightly changed the output of samples_files/slow_after.rs, producing a better (more minimal) diff. This is probably luck, due to the path-dependent nature of the route solving logic, but it's a positive sign. A huge thanks to @QuarticCat for their contributions, this is a huge speedup. Co-authored-by: QuarticCat <QuarticCat@pm.me>	2022-12-28 02:00:09 +07:00
Wilfred Hughes	afc78e976d	Document Erlang support and add test Fixes #394	2022-12-15 23:30:45 +07:00
rhirano0715	436edb2ab4	Only add colour to the first hunk header This helps skimming the results when multiple files are changed with multiple hunks. It makes the file changing more prominent than just going from e.g. 5/5 to 1/10. Fixes #400 Acked-by: Wilfred Hughes <me@wilfred.me.uk>	2022-12-01 09:38:36 +07:00
Wilfred Hughes	2e7c90c472	Ensure line wrapping uses the same length on both sides Closes #421	2022-11-13 00:35:06 +07:00
Wilfred Hughes	28c3b0ef5d	Tweak line number styling to make it more distinct from content Dim line numbers for unchanged lines, and make changed lines bold (in addition to the existing red/green colours). Closes #384	2022-10-28 20:34:36 +07:00
Wilfred Hughes	2a3346e338	Use apply_line_number_color consistently on LHS and RHS Previously we missed a case on the LHS.	2022-10-28 20:18:17 +07:00
Wilfred Hughes	490787fe28	Factor out line number styling	2022-10-28 19:07:51 +07:00
Wilfred Hughes	b9d44ae65f	Treat error nodes as atoms Fixes #408	2022-10-15 22:50:08 +07:00
Wilfred Hughes	39bd04002c	Merge pull request #369 from esawady/hare Add Hare support	2022-09-15 09:33:07 +07:00
Wilfred Hughes	cafd672cc8	Don't underline all changes in plaintext files Fixes #371	2022-09-15 09:30:16 +07:00
Ember Sawady	7ed685ae52	Add support for Hare	2022-09-13 23:34:16 +07:00
Wilfred Hughes	3c51f58d8e	Add Pascal support Fixes #365	2022-09-13 00:05:23 +07:00
Wilfred Hughes	f155a27522	Underline changed words in comments This makes them easier to spot in larger changes. Fixes #328	2022-09-10 15:54:04 +07:00
Yuya Nishihara	84f0b25fb6	Add support for QML QML is a UI language, and its syntax is basically JSON-like structure + JavaScript. The tree-sitter parser is named after the upstream grammar file qmljs.g, but the canonical language name is QML. So I choose Qml as the Language enum. https://doc.qt.io/qt-6/qmlapplications.html	2022-09-10 11:38:35 +07:00
Wilfred Hughes	fe5ef8757d	Give novel punctuation a lower edge cost We'd rather see an unchanged variable name than an unchanged comma. Fixes #366	2022-09-09 09:47:53 +07:00
Yuya Nishihara	cc2d354768	Unset LC_ALL and LC_COLLATE to stabilize regression test output I set LC_COLLATE=C in ~/.profile, which appears to change the glob order. LC_ALL would also affect that, so let's unset both.	2022-09-08 22:37:27 +07:00
Wilfred Hughes	b104c4be10	Fix sliders in a single global pass Previously we fixed sliders in each 'possibly changed' region. This meant that we couldn't fix sliders that needed to move outside the region. The most common case was code of the form `foo, bar, baz` where `, baz` was unchanged but we wanted to slide to `,`. We now call `fix_all_sliders` for the toplevel tree on both sides. This required some minor changes to the slider logic, as the unchanged/novel regions could occur at any level of the tree. (It was probably also the case that we were missing slider opportunities previously, because we terminated as soon as we found an outer slider for the nested case.) This change has no performance impact, probably because tree diffing is vastly more expensive (O(N^2)) than sliders (O(N)). Fixes #327	2022-09-02 18:10:09 +07:00
Wilfred Hughes	eabefd5612	Factor out language name pretty-printing	2022-09-02 11:56:51 +07:00
Wilfred Hughes	b31e3c78c1	Add sample files missing from `b1b3756fa7`	2022-09-01 09:21:25 +07:00
Wilfred Hughes	b1b3756fa7	Attempt to detect and decode UTF-16 files too Closes #345	2022-08-28 15:38:57 +07:00
Wilfred Hughes	09334030ab	Fix incorrect line number being used in side-by-side display Fixes #334	2022-08-22 09:34:34 +07:00
Wilfred Hughes	026b2674d0	Update expected output The previous two commits were done in a branch that was rebased, so the integration tets missed `5fe6d551d`.	2022-08-21 21:36:00 +07:00
Wilfred Hughes	c957818514	Explore two graph nodes for each parenthesis position This produces substantially better diff results, and fixes the 'last item in the list shown as changed' problem. This can produce slower diffing. typing_before.ml takes 10% more instructions and slow_before.rs takes 110% more instructions.	2022-08-21 16:34:17 +07:00
Wilfred Hughes	a71d6118cf	Store predecessors and neighbours as mutable fields in graph nodes This is a more traditional graph representation. It is slightly easier to reason about, and it's clearer that graph node creation time dominates graphs exploration. This is a slight performance regression, but it enables better exploration of parethesis nesting (see next commit). typing_before.ml has regressed from 3.75B instructions to 3.85B instructions and slow_before.rs has regressed from 1.73B instructions to 2.15B instructions. This change has also made the diff output for slow_before.rs slightly worse (note the `lhs` variable is now claimed as changed in more cases). It's not clear why, but presumably means that the node visit order has changed slightly. Closes #324	2022-08-21 16:25:54 +07:00
Wilfred Hughes	58c8f47298	Also consider highlights.scm when marking nodes as comments This removes the need to special-case Perl, and is necessary for CMake (which has nodes bracket_comment and line_comment that aren't marked as 'extra').	2022-08-20 18:28:07 +07:00
Wilfred Hughes	01cce54978	Fix path display when called from git with two arguments Fixes #332	2022-08-18 23:00:13 +07:00
Wilfred Hughes	0dce9fcec5	Update regression tests following `38c6718c86`	2022-07-11 22:14:12 +07:00
Wilfred Hughes	7e34d7073b	Update regressio tests for new JSON upstream highlighting	2022-07-10 23:36:11 +07:00
Wilfred Hughes	c5a5555862	Update Gleam parser	2022-07-10 22:58:50 +07:00
Wilfred Hughes	d05a3d9373	Add Julia sample files	2022-07-04 19:57:00 +07:00
Wilfred Hughes	975ff6eedd	Update regression tests now that @conditional is highlighted This is mostly `if` keywords in various positions now being highlighted as keywords.	2022-07-04 19:54:00 +07:00
Wilfred Hughes	719654d462	Merge pull request #301 from lilydjwg/master use unicode-width to align CJK characters	2022-07-04 15:07:25 +07:00
Wilfred Hughes	156c701459	Add large files from #293 for test This file pair exposed a bunch of perf issues, so it's useful to keep it around.	2022-07-03 22:18:30 +07:00
Wilfred Hughes	2d43075841	HTML: include doublequotes in attribute atoms	2022-07-03 22:11:25 +07:00
lilydjwg	0648b0a6cf	add sample_files for Chinese (CJK fullwidth characters) One of the other three "expected" updates is caused by a fullwidth emoji and others are removal of colored empty strings.	2022-07-04 12:01:49 +07:00
Benjamin Manns	d131ae1d35	Add HTML parser	2022-07-01 12:23:20 +07:00
Wilfred Hughes	3eada5b9b0	Prefer outer delimiter in lisps	2022-05-11 11:54:02 +07:00
Wilfred Hughes	ca1dbbc264	Update expected output file for `902c30f6c`	2022-05-11 11:45:48 +07:00
Wilfred Hughes	1a6c5b8e7f	Display rename information when before and after paths are different	2022-05-08 11:52:42 +07:00
Wilfred Hughes	2d8e1cf180	Merge pull request #279 from Xuanwo/fix_bad_padding fix: Bad padding of column numbers at the end of files	2022-05-07 11:27:47 +07:00
cherryblossom	defc084637	Add Elvish support Add support for [Elvish](https://elv.sh).	2022-05-07 20:12:43 +07:00
Xuanwo	5cfe53820b	chore: Update compare expected Signed-off-by: Xuanwo <github@xuanwo.io>	2022-05-03 14:45:25 +07:00
Wilfred Hughes	03c5d78650	Treat perl regexes as atoms too	2022-04-29 18:28:01 +07:00
Wilfred Hughes	3bb5933163	Ensure Perl comments are treated as atoms with an atom kind of comment	2022-04-29 18:23:31 +07:00
Wilfred Hughes	e1cbdc1478	Allow users to override the tab width Fixes #274	2022-04-28 20:47:04 +07:00
Wilfred Hughes	62e5b21d53	Merge remote-tracking branch 'cherryblossom/swift'	2022-04-28 09:12:54 +07:00
Wilfred Hughes	f98f2a8aca	Fix directory diffing when files were only present on one side This particularly helps usage with mercurial when files are added or removed. Fixes #272	2022-04-27 21:46:46 +07:00
cherryblossom	b87d6c99f7	Add Swift support	2022-04-26 17:08:23 +07:00
Wilfred Hughes	a9af73d944	Add a second file to the test directory	2022-04-24 20:13:17 +07:00
Wilfred Hughes	b2320b29d5	Merge pull request #264 from Xuanwo/hcl feat: Add HCL support	2022-04-24 08:48:34 +07:00
Xuanwo	d4c3d114dc	fix: Add atoms for hcl Signed-off-by: Xuanwo <github@xuanwo.io>	2022-04-24 15:57:51 +07:00
Wilfred Hughes	f91357b729	Update perl regression test	2022-04-23 11:33:01 +07:00

1 2 3 4 5 ...

275 Commits (3c62ff37c04b1f8a6c625a0c8ac11a04d284d939)