difftastic

Commit Graph

Author	SHA1	Message	Date
Wilfred Hughes	9134593a39	Add XML support Fixes #10	2023-09-08 23:43:20 +07:00
Wilfred Hughes	d56f775f31	Highlight constructors consistently with type names	2023-09-03 01:30:22 +07:00
Wilfred Hughes	a4ee2cf99e	cargo fmt	2023-08-26 21:41:41 +07:00
Wilfred Hughes	b78ba2da4b	Use type names from line_numbers directly	2023-08-26 20:36:07 +07:00
Wilfred Hughes	41c9165c79	Use my line_numbers crate for newline position calculations	2023-08-26 16:25:32 +07:00
Wilfred Hughes	ca44de78e1	Group overrides from the same language together No functional change, but makes --list-languages easier to read. Fixes #549	2023-08-25 08:22:28 +07:00
Wilfred Hughes	0db99d76c6	Allow a language override to include multiple globs	2023-08-24 08:47:59 +07:00
eth3lbert	b6d8ecbd4f	feat: display commit info in --version (#558 ) This improves --version output for #554.	2023-08-18 08:10:47 +07:00
Wilfred Hughes	803a3a673c	Improve variable names	2023-08-18 00:28:17 +07:00
Alex Krantz	11a96e5aec	Add JSON cli flag	2023-08-17 08:49:59 +07:00
Wilfred Hughes	11f457b5f9	Fix typo	2023-08-16 21:20:17 +07:00
Wilfred Hughes	191f42e9d5	Clippy fixes	2023-08-15 21:42:06 +07:00
Wilfred Hughes	6b1c82efdf	Prefer Option<&T> over &Option<T>	2023-08-15 21:37:41 +07:00
Wilfred Hughes	a43b9ae9eb	Dim the extra information section in hunks	2023-08-15 21:33:11 +07:00
Wilfred Hughes	e1f97e614f	Improve wording of conflict information Fixes #555	2023-08-15 17:52:02 +07:00
Wilfred Hughes	e0a1405453	Add the ability to parse conflict markers and diff the two files	2023-08-15 09:01:15 +07:00
Wilfred Hughes	f06e95ca02	Renamed `old_path` to `extra_info` and format it during option parsing This allows us to use this field for other purposes that aren't renames.	2023-08-14 08:41:42 +07:00
Wilfred Hughes	f1ba399504	Move local variable closer to first use	2023-08-14 08:27:42 +07:00
Wilfred Hughes	eeb2974967	Move option parsing before argument parsing This is useful for additional mode parsing that wants to access these options.	2023-08-13 21:34:42 +07:00
Wilfred Hughes	1c60f3efd3	Move content detection out of diff_file_content This makes the function useful in cases when we already have a string, not bytes.	2023-08-13 21:31:37 +07:00
Wilfred Hughes	3c702d0490	Use humansize for file size formatting	2023-08-12 22:34:11 +07:00
Wilfred Hughes	5f25bc0ebd	Rename information in header should only be shown on first hunk Fixes #553	2023-08-11 08:21:29 +07:00
Wilfred Hughes	a187d7a134	Improve rename styling It should use the heading with colour, consistent with other modes, and the header should come before rename information.	2023-08-08 08:53:33 +07:00
Wilfred Hughes	ba92a93f9b	Fix rustc warning on recent nightly	2023-08-04 23:31:31 +07:00
Wilfred Hughes	19cbf1d458	Implement some other useful traits on EqOnFirstItem These aren't immediately used, but they're handy for experimenting with the similar library which requires these.	2023-08-04 23:29:29 +07:00
Wilfred Hughes	892d4fdb58	Ensure size_hint never exceeds graph_limit If we have thousands of syntax nodes on both sides, we can end up attempting to preallocate a very large hashmap. In #542, a user hit an issue with two JSON files where the LHS had 33,000 syntax nodes and the RHS had 34,000 nodes, so we'd attempt to preallocate a hashmap of capacity 1,122,000,000. This required allocating 70,866,960,400 bytes (roughly 66 GiB). Impose a sensible limit on the hashmap. Fixes #542	2023-08-04 17:19:27 +07:00
Wilfred Hughes	c937f819a1	Log the number of bytes in the arena at the end of route finding	2023-08-04 17:04:23 +07:00
Wilfred Hughes	0c01c73398	Be consistent in lifetime names for Vertex	2023-08-03 08:32:16 +07:00
Wilfred Hughes	757c297412	Adjust header style Show the hunk count and detected language in a dimmed style. This information is less important than the diff content itself, so this change makes the important information more prominent. First part of #544	2023-07-31 08:35:27 +07:00
Wilfred Hughes	797af40ae8	Improve Java highlighting	2023-07-27 08:33:38 +07:00
Wilfred Hughes	4e9637c861	Check more bytes when detecting encoding I've observed PDF files that have sufficiently large headers that they were detected as text, which wasn't helpful. Also improve logging to report how many invalid bytes were found.	2023-07-21 08:34:41 +07:00
Wilfred Hughes	4f750ec359	Clarify how to find language names in argument help	2023-07-21 08:23:36 +07:00
Wilfred Hughes	685a2ef8d5	Merge remote-tracking branch 'grunweg/master'	2023-07-20 22:41:56 +07:00
Wilfred Hughes	7caaaf7fcf	Handle nested sliders correctly when preferring the outer delimiter Previously we didn't check the state of children, which was an oversight from the original implementation. As a result, we fixed nested sliders in fewer situations. Fixes #535	2023-07-14 08:49:55 +07:00
Wilfred Hughes	a5d3cb55b7	Treat constructors consistently with variables in Haskell atoms	2023-07-12 17:34:42 +07:00
Wilfred Hughes	8614910fe2	cargo fmt	2023-07-12 16:45:58 +07:00
Wilfred Hughes	f6ceb2aefd	Update unit test new subword highlighting heuristic	2023-07-12 12:48:45 +07:00
Wilfred Hughes	5606c04261	Treat qualified modules and variables as atoms in Haskell	2023-07-12 12:34:39 +07:00
Wilfred Hughes	a814e01d22	Improve word diffing heuristic and add another sample file	2023-07-12 12:12:32 +07:00
Wilfred Hughes	1d3b6836ef	Handle multiline atoms more accurately in split_atom_words	2023-07-12 11:49:39 +07:00
Wilfred Hughes	c2b7042b80	Do subword highlighting in more cases This is useful when two strings substantially differ, but have the same e.g. end.	2023-07-10 21:26:24 +07:00
Wilfred Hughes	5824322244	Require some common words to do subword highlighting This is important when comparing short string literals. This change has improved several cases in sample_files/ but I've added a new example that made the previous unwanted behaviour much more obvious.	2023-07-10 09:03:21 +07:00
Wilfred Hughes	4aca79f220	Use the raw_entry_mut API on hashbrown::HashMap This saves us searching the hash map twice. This is a modest performance improvement: an instruction count reduction of 4% on slow_before.rs, and 1% reduction on typing_before.ml.	2023-07-09 22:49:37 +07:00
Wilfred Hughes	8eb949eb02	Use DftHashMap everywhere This is a 4% reduction in instructions for typing_before.ml, but a 0.2% increase instructions for slow_before.rs. This seems like a win overall, and it also keeps the codebase more consistent and simpler.	2023-07-09 15:41:01 +07:00
Wilfred Hughes	d9911e0b49	Move DftHashMap to a separate file	2023-07-09 15:37:51 +07:00
Wilfred Hughes	f2456a12b2	Use hashbrown for the alloc_if_new data This was intended to allow usage of .entry_ref(), but it's already a performance win without using that API! It's around a 9% reduction in instructions in slow_before.rs, and 2% reduction in typing_before.ml.	2023-07-09 11:11:03 +07:00
Wilfred Hughes	27f59c0b3a	Don't treat - as a word constituent This produces slightly better results with some string replacements.	2023-07-08 17:16:14 +07:00
Wilfred Hughes	2607d17d73	Fix spelling in comment	2023-07-08 17:16:14 +07:00
Zhenge Chen	ffd49d523a	Detect replaced strings If a string is replaced with another, apply subword highlighting similar to how we handle replaced comments. Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>	2023-07-08 17:16:06 +07:00
Wilfred Hughes	f86ba13abf	Increase punctuation cost to 200	2023-07-08 14:59:47 +07:00
Wilfred Hughes	495dbe5b14	Improve comments in Edge::cost	2023-07-08 14:53:33 +07:00
Wilfred Hughes	574fb5bd50	Fix clippy lint	2023-07-07 23:52:51 +07:00
Wilfred Hughes	53855e415e	Reduce copying further in set_neighbours This saves a remarkable 8.5% of instructions on slow_before.rs.	2023-07-07 23:37:16 +07:00
Wilfred Hughes	a180fd6d24	Don't return the neighbours inside get_set_neighbours This caused unnecessarying closing, costing 0.2% instructions in some cases, and also made the code less readable.	2023-07-07 23:29:51 +07:00
Wilfred Hughes	87d27c5598	Only split numbers inside comments Inside text files, it seems to be better to be conservative and consider abc123def as one word rather than three. This is noticeable when looking at changes to the compare.expected file, which contains hashes. 123c456 and 345c789 don't really have a `c` in common, so subword highlighting is ugly.	2023-07-07 08:40:06 +07:00
Wilfred Hughes	c07e640b24	Remove contiguous penalty The contiguous penalty was an attempt to fix the slider problem: // Old A B C D // New A B A B C D // Unwanted diff A +B+ +A+ B C D However, it doesn't make sense for Dijkstra, which is stateless. The best route from vertex X is independent of how we got to vertex X. This worked by dumb luck: in some circumstances we terminate early rather than fully executing Dijkstra's algorithm. This cost tweak improved results on a few test files. However, the post-processing slider logic is a proper, general solution. This was added much later. There's no reason to keep the contiguous penalty now. It's confusing, and makes adding new edge costs with consistent 'X costs more than Y' behaviours more difficult. Performance is essentially neutral: a small decrease in typing_before.ml, a small increase in slow_before.rs.	2023-07-06 08:37:02 +07:00
Wilfred Hughes	31df177881	Increase the punctuation penalty This ensures that choosing a unchanged non-punctuation atom with some novel atoms is better than choosing punctuation and some changed comments. This produces better results in general, see comma_and_comment_after.js for an example. This will be more noticeable after the next commit, where costs of novel atoms are in a smaller range of values.	2023-07-06 08:16:24 +07:00
Wilfred Hughes	c3016eca4a	Add TODO	2023-07-06 08:14:03 +07:00
Wilfred Hughes	43c24047b4	Don't track contiguous status on novel delimiter edges This is harder to reason about, and `2e6666041f` did not include a motivating test case. Removing contiguous status is a minor perf improvement (2% reduction in instructions), makes the code simpler, and does not significantly affect diffing results. Of the two sample files that have changed, the erlang_before.erl file has improved and nest_before.rs is neutral.	2023-07-04 23:53:16 +07:00
Wilfred Hughes	1e4d1828c7	Store probably_punctuation on unchanged edges This is equivalent (increased cost on unchanged nodes vs decreased cost on changed nodes), but easier to reason about. Previously we have multiple notions of changed atoms: NovelAtomLHS, NovelAtomRHS, and ReplacedComment. We want to consider punctuation as less desirable even when e.g. comments arereplaced.	2023-07-03 19:48:31 +07:00
Wilfred Hughes	c405b58327	Fix cost for ReplacedComment This needs to be 2x novel nodes, or we prefer it far too often.	2023-07-02 23:12:31 +07:00
Wilfred Hughes	3730580ca3	Improve word splitting heuristics This is particularly noticeable when diffing comments with timestamps 2000-12-31T23:59:59 where we don't want 31T23 to be a single word.	2023-06-29 08:33:30 +07:00
Wilfred Hughes	9eb48ca661	Configure comments as atoms in latest scala parser	2023-06-12 22:28:01 +07:00
Wilfred Hughes	81ac9d167b	cargo fmt	2023-06-02 08:45:21 +07:00
Gao, Xiang	653660e92f	Treat CUDA as C++ (#522 )	2023-05-27 12:30:46 +07:00
Wilfred Hughes	b0a3a0ada9	Use DFT_LOG as the internal logging environment variable Fixes #519	2023-05-21 23:33:34 +07:00
Wilfred Hughes	b6895d42e4	Document the --override option in CHANGELOG and --help This also enables users to disable language parsing. Closes #439 Closes #440	2023-05-15 23:04:20 +07:00
Wilfred Hughes	745253d4d9	cargo fmt	2023-05-15 21:49:07 +07:00
Wilfred Hughes	f302952748	Add unit test for overrides	2023-05-15 17:54:23 +07:00
Wilfred Hughes	325926eead	Update tests for overrides argument	2023-05-15 17:51:39 +07:00
Wilfred Hughes	943a9e91f1	Fully document --override	2023-05-15 08:53:34 +07:00
Wilfred Hughes	c9cabac517	Allow overriding language associations from numbered environment variables	2023-05-15 08:43:53 +07:00
Wilfred Hughes	b3a7bec430	Pass overrides to guess() and display them in --list-languages	2023-05-15 08:20:32 +07:00
Wilfred Hughes	0e2c167dda	Move LanguageOverride to guess_language	2023-05-14 21:49:59 +07:00
Wilfred Hughes	adff907aaa	Pass language_overrides to all modes	2023-05-14 21:46:54 +07:00
Wilfred Hughes	47a0690286	Initial support for parsing --override	2023-05-14 21:44:30 +07:00
Wilfred Hughes	84af470128	Remove --language	2023-05-14 15:57:43 +07:00
Wilfred Hughes	05a1b184ea	Use globbing to match file names in language detection	2023-05-14 15:50:56 +07:00
Wilfred Hughes	893b56b6b1	Consistent casing	2023-05-14 00:56:45 +07:00
Wilfred Hughes	e37a6b2087	Improve naming and ordering of JSX/TSX languages	2023-05-14 00:54:58 +07:00
Wilfred Hughes	4d85b5c15e	Prefer pattern matching and EnumIter for Language rather than lists	2023-05-13 23:46:18 +07:00
Wilfred Hughes	1f16d207f6	Clarify comment	2023-05-13 23:15:26 +07:00
Mike Grunweg	7984b7a59e	Audit node types with children: none should be treated as atoms.	2023-05-05 22:45:53 +07:00
Mike Grunweg	2c2461667b	Configure parsing and language detection.	2023-05-05 22:00:15 +07:00
Wilfred Hughes	87f19f5e10	Don't including trailing newlines in comment nodes This makes constructing hunks harder to reason about. This change doesn't affect output, but helps when debugging, as it makes multiline atoms much less common.	2023-04-30 09:51:39 +07:00
Wilfred Hughes	b5cc0787f4	Clarify debug printing of LineNumber	2023-04-30 09:29:08 +07:00
Wilfred Hughes	faaec9ad4c	Use the SyntaxId type explicitly in ChangeMap	2023-04-22 19:59:03 +07:00
Wilfred Hughes	8d44e91a06	Improve lifetime names	2023-04-22 15:25:45 +07:00
Wilfred Hughes	d521b29c9e	set_prev_sibling should always recurse	2023-04-20 08:42:10 +07:00
Wilfred Hughes	2074b97117	Fix clippy lint	2023-04-19 20:34:33 +07:00
Wilfred Hughes	18ef47e20d	Use a path separator constant that is available on Rust 1.57	2023-04-19 20:04:56 +07:00
Wilfred Hughes	f08e77a675	Prefer a single optional string for old_name when renames occur	2023-04-15 17:20:30 +07:00
Wilfred Hughes	6f9ccd9ec4	Use a single display_path field in the diff options struct	2023-04-15 17:19:56 +07:00
Wilfred Hughes	2224602ad5	Store file renames explicitly in Diff struct	2023-04-15 00:57:27 +07:00
Valentin	4296796053	Run cargo fmt	2023-04-06 09:44:38 +07:00
Valentin	b86d4dbf9e	Add solidity support	2023-04-05 11:12:19 +07:00
Wilfred Hughes	8c004be87b	Remove unnecessary helper function	2023-04-02 19:28:57 +07:00
Wilfred Hughes	cb9367c129	Remove tests that no longer apply after `8b842387a`	2023-03-31 08:19:23 +07:00
Wilfred Hughes	45121f6f6d	Fix clippy warnings	2023-03-31 08:15:18 +07:00
Wilfred Hughes	8b842387a1	Don't clean trailing newline before diffing Difftastic should take the user's input as-is, or it risks performing an incorrect diff in both textual and syntactic diffing. Fixes #499	2023-03-30 08:46:11 +07:00
Wilfred Hughes	713220613e	Support diffing directories where files are only in one side This was broken in `1e9f43768`. Add test. Fixes #500	2023-03-24 23:57:49 +07:00
Wilfred Hughes	1e9f437688	Remove --missing-as-empty	2023-03-17 08:40:21 +07:00
Wilfred Hughes	e4ff6d7d09	Update atom nodes for latest Java	2023-03-17 00:42:19 +07:00
Wilfred Hughes	69e511e638	Tweak UTF-16 heuristics to prefer files with a BOM	2023-03-16 00:46:24 +07:00
Wilfred Hughes	7fbe0d6c2f	Improve UTF-16 detection heuristics and add test	2023-03-16 00:31:58 +07:00
Wilfred Hughes	20ad284882	Add 'vendored_parsers/tree-sitter-clojure/' from commit '421546c2547c74d1d9a0d8c296c412071d37e7ca' Closes #448 git-subtree-dir: vendored_parsers/tree-sitter-clojure git-subtree-mainline: `ebfc043a4a` git-subtree-split: `421546c254`	2023-03-15 15:43:55 +07:00
Wilfred Hughes	a67be0f845	Treat quoted_keys as atoms in TOML	2023-03-15 15:12:33 +07:00
Wilfred Hughes	ec6e4665bc	Merge pull request #494 from karlding/add_ada_support Add Ada support	2023-03-15 10:25:46 +07:00
Karl Ding	01522a9d58	Remove alire.toml from LANG_FILE_NAMES Fix an incorrect assumption about how LANG_FILE_NAMES works. We want the alire.toml file to still be treated as TOML (and not Ada), so adding an entry here is not necessary for Ada support.	2023-03-15 01:09:48 +07:00
Wilfred Hughes	6ad77c620c	Don't highlight text in purple Closes #498	2023-03-14 23:47:17 +07:00
Karl Ding	d5ed2deb6e	Run formatter over changes	2023-03-14 21:46:40 +07:00
Karl Ding	5271f65f92	Add language support for Ada Implement support in difftastic for the Ada programming language using the treesitter grammar provided in 'briot/tree-sitter-ada'. Language detection depends on the following suffixes: * adb * ads * ada The presence of the alire TOML file (alire.toml) is also used as a heuristic.	2023-03-14 21:46:40 +07:00
Stavros Korokithakis	85b9493eaa	Recognize the Arduino extension as C++	2023-03-13 19:44:46 +07:00
Wilfred Hughes	c7bfc72529	Clarify --exit-code behaviour on plain text	2023-03-09 08:54:21 +07:00
Wilfred Hughes	ac35c6c047	Document the different options for --display Fixes #491	2023-03-09 08:54:21 +07:00
Wilfred Hughes	2d1a2c906e	Count errors on the root node too Fixes #377	2023-03-03 00:25:41 +07:00
Wilfred Hughes	045d6a2c58	Treat Newick and Racket as lisps	2023-03-03 00:23:11 +07:00
Wilfred Hughes	03985066f5	Treat Makefile text as atoms Improves another case identified in #476	2023-03-02 23:52:01 +07:00
Wilfred Hughes	54eb22ea98	Use FileFormat consistently everywhere	2023-03-02 22:25:14 +07:00
Wilfred Hughes	dae8c8a3b5	Update parameter names for type	2023-03-02 22:01:13 +07:00
Wilfred Hughes	133c05e46b	Define an explicit enum for file formats	2023-03-02 21:58:33 +07:00
Wilfred Hughes	c33d7f2520	Support --check-only on text files too	2023-03-02 08:50:45 +07:00
Wilfred Hughes	ca2902cee2	Add more doc comments to FileContent	2023-03-02 08:35:45 +07:00
Wilfred Hughes	e99b2ce27c	Represent byte limit and parse error limit as Result return types	2023-02-24 08:38:15 +07:00
Wilfred Hughes	9556cd978e	Merge branch 'delehef/master'	2023-02-21 08:46:07 +07:00
6cdh	5e659e2d98	add racket highlight	2023-02-21 08:30:00 +07:00
Franklin Delehelle	a0b9df0e29	Add support for Newick tree files	2023-02-16 15:52:16 +07:00
6cdh	2fade3e0bf	fix here string	2023-02-12 13:57:45 +07:00
6cdh	fe756905bf	added Racket support	2023-02-12 13:39:58 +07:00
Wilfred Hughes	96fc044e6d	Improve syntax highlighting `@include` and `@exception` are both used for highlighting keywords in several languages.	2023-02-10 08:50:43 +07:00
Wilfred Hughes	62da0d56cc	Fix error message	2023-02-05 22:43:58 +07:00
Wilfred Hughes	952d55ee22	Don't bother logging parse errors	2023-02-05 22:03:08 +07:00
Wilfred Hughes	96a80c4d21	Improve naming for parse error limit	2023-02-05 17:30:24 +07:00
Wilfred Hughes	63cf71641a	Display file size in the header if it's too big	2023-02-05 17:28:39 +07:00
Wilfred Hughes	7d5afd78dc	Respect --error-limit when parsing Next step for #472	2023-02-04 22:29:18 +07:00
Wilfred Hughes	7ec0c2a956	Remove unnecessary concat! calls	2023-02-04 22:02:28 +07:00
Wilfred Hughes	553a1ef231	Count error nodes when walking tree-sitter tree First steps of #472	2023-02-04 16:46:17 +07:00
Wilfred Hughes	f77730005e	Add clarifying comment	2023-02-04 14:09:30 +07:00
Wilfred Hughes	34f21c6d9f	Only run the gzip test on Linux CI This test fails when a MIME database isn't present, such as the Windows CI environment or minimal environments (reported in #478).	2023-02-04 11:14:46 +07:00
Wilfred Hughes	ed0f50fa68	Remove --error-limit to prepare for release	2023-02-02 22:22:23 +07:00
Wilfred Hughes	b6e756ed1a	Merge pull request #473 from Miksu82/master Fix file content type detection for small gzipped files	2023-02-02 22:18:59 +07:00
Wilfred Hughes	b78067a96b	Disable unfinished error_limit logic to prepare for release	2023-02-02 22:11:31 +07:00
Mika Ristimäki	926055634c	Remove application/octet-stream check when checking mime type	2023-02-02 11:08:24 +07:00
Niklas Vogel	abdef09bdd	Check arguments passed from Git for `/dev/null`	2023-02-01 17:53:16 +07:00
Mika Ristimäki	c573094ee5	Make sure gzip files are treated as binary	2023-02-01 11:01:26 +07:00
Mika Ristimäki	289632afcc	Fix formatting	2023-02-01 10:20:47 +07:00
Mika Ristimäki	bb81f0e524	Fix file content type detection for small gzipped files	2023-02-01 10:20:47 +07:00
Wilfred Hughes	92d21dc5c0	Silence another clippy lint	2023-01-30 08:35:24 +07:00
Wilfred Hughes	8e1503a11d	Add an --error-limit CLI option Starts #472.	2023-01-26 23:55:41 +07:00
Wilfred Hughes	cc3f266969	Treat `%in%` as a single atom in R	2023-01-26 22:40:10 +07:00
Wilfred Hughes	5ed4bac8a5	Add support for R Fixes #470	2023-01-26 08:50:00 +07:00
Wilfred Hughes	998c9e94ff	Fix crash on repeated, partially novel lists Fixes #469	2023-01-25 23:54:41 +07:00
Wilfred Hughes	12fe91560c	Add doc comments to slider functions	2023-01-25 23:17:42 +07:00
Wilfred Hughes	6321b8ece2	Consistently append newlines regardless of colour mode	2023-01-25 16:55:12 +07:00
Wilfred Hughes	3291da6be4	Allow colour to be configured with DFT_COLOR too In #468 it's also mentioned that this couldn't be configured with an environment variable.	2023-01-25 16:44:41 +07:00
Wilfred Hughes	4337fbbcd8	Show the default value for --color and --display Fixes #468	2023-01-25 16:40:12 +07:00
Wilfred Hughes	897305ed6b	Allow --ignore-comments to be used with --dump-syntax	2023-01-25 16:31:09 +07:00
Wilfred Hughes	9bdd1964b0	Improve doc comments	2023-01-23 08:47:52 +07:00
Wilfred Hughes	6c3dc6b69e	Fix formatting in comment	2023-01-23 08:38:40 +07:00
Wilfred Hughes	edb51ece86	Disable a clippy warning that we can't satisfy for Syntax	2023-01-22 20:03:44 +07:00
Wilfred Hughes	a37edbab8d	Silence some clippy warnings	2023-01-22 20:03:44 +07:00
Wilfred Hughes	fe68f43e93	Use crossterm for is_tty and terminal width This is activly maintained, handles stdout being directed, and seems to be tested more on Windows too (potentially improving #363).	2023-01-17 00:03:08 +07:00
Wilfred Hughes	dffc72d1ca	cargo fmt	2023-01-15 20:13:08 +07:00
Wilfred Hughes	daa7156a2c	Fix crash with --display=inline and trailing whitespace Line numbers may be less than .max_line(), as .max_line() trims whitespace. Ensure pad_after() is robust to this, and add a test. I could only reproduce the crash in inline display mode, but in principle this could be an issue in all modes. Fixes #452	2023-01-15 20:04:12 +07:00
Wilfred Hughes	e17b6e6109	Clarify wording	2023-01-15 15:56:37 +07:00
Wilfred Hughes	c9105ca0ba	cargo fmt	2023-01-15 15:49:24 +07:00
Wilfred Hughes	b8a2910c26	Reuse the tree-sitter tree for extracting comments This reduces instruction counts by 9% on some test files (e.g. samples_files/typing_before.ml).	2023-01-15 15:47:15 +07:00
Wilfred Hughes	2536fe7396	Factor out parsing to tree in main.rs	2023-01-15 15:42:44 +07:00
Wilfred Hughes	2d6ee94ac1	Split out to_tree and to_syntax functions in parser	2023-01-15 15:39:07 +07:00
Wilfred Hughes	a488efd63b	Add highlighting for ignored syntactic elements This finishes --ignore-comment support. Fixes #449.	2023-01-15 14:49:46 +07:00
Wilfred Hughes	f3b02f7b47	cargo fmt	2023-01-15 11:43:09 +07:00
Wilfred Hughes	0e3c57c64a	Skip unique items before computing Myer's diff on text This substantially improves performance on text files where there are few lines in common. For example, 10,000 line files with no lines in common is more than 10x faster (8.5 seconds to 0.49 seconds on my machine), and sample_files/huge_cpp_before.cpp is nearly 2% faster. Fixes the case mentioned by @quackenbush in #236. This is inspired by the heuristics discussions at https://github.com/mitsuhiko/similar/issues/15	2023-01-15 11:38:02 +07:00
Wilfred Hughes	c08eefb14a	Move slice_by_hash to myers_diff and add unit tests	2023-01-15 11:03:31 +07:00
Wilfred Hughes	dd92af3643	Add tests for myers_diff module	2023-01-15 10:55:14 +07:00
Wilfred Hughes	85e630aabc	cargo fmt	2023-01-14 12:52:53 +07:00
Wilfred Hughes	efec759504	Only set language_used after a full syntactic diff This fixes cases where the language is detected but the file hits the byte limit. Fixes #462.	2023-01-14 12:52:08 +07:00
Wilfred Hughes	6eed874362	Clarify field name	2023-01-14 12:30:54 +07:00
Wilfred Hughes	94b6117723	Define an --ignore-comments option and pass to parser Initial work for #449	2023-01-10 08:47:58 +07:00
Wilfred Hughes	08b3ff138f	Rename vendor/ directory Closes #453	2023-01-10 08:35:01 +07:00
Wilfred Hughes	1ad9789b38	clippy fixes	2023-01-10 00:45:07 +07:00
Wilfred Hughes	63a3bf0c91	Ensure we use the correct config for sublanguage parsing Otherwise get the wrong node names for atoms.	2023-01-08 22:24:43 +07:00
Wilfred Hughes	8ed4fbccfa	Treat colour values (e.g. `#FFF`) as atoms in CSS	2023-01-08 22:22:46 +07:00
Wilfred Hughes	29d87a6ac4	Adding TODO	2023-01-08 22:06:58 +07:00
Wilfred Hughes	c310fb34f9	Use u32 for edge cost This is performance neutral (both runtime and memory size) but the code is slightly readable as there are fewer conversions.	2023-01-08 21:34:49 +07:00
Wilfred Hughes	34967f588d	Treat predefined_type as an atom in TypeScript Currently it contains a nested string node, even though it's a fixed set of known types. This was preventing us from applying good syntax highlighting. This was particularly noticeable with `string`, which wasn't previously highlighted as a type.	2023-01-07 22:43:50 +07:00
Wilfred Hughes	610a6e441d	Ensure that textual fallback diffing has a parse language of None Previously we still passed the parse language after exceeding the graph limit, leading to incorrect underline highlighting.	2023-01-06 19:07:05 +07:00
Wilfred Hughes	8a799af0ff	cargo fmt	2023-01-06 18:18:37 +07:00
Wilfred Hughes	9ae4eb17fd	Add test for is_all_whitespace	2023-01-06 18:16:01 +07:00
Wilfred Hughes	d8d4b8c003	Add is_all_whitespace helper function	2023-01-06 08:36:54 +07:00
Wilfred Hughes	cd87796552	Treat doctype nodes as atoms in HTML The tree-sitter parser doesn't include the text after DOCTYPE in the inner tag.	2023-01-03 08:40:39 +07:00
Steinar H. Gunderson	9133918dd4	Support parsing of sub-languages. This allows given nodes (configurable per-language, using tree-sitter's query syntax) to be re-parsed as other languages. The canonical example is CSS or JavaScript inside HTML, which normally would be a single token but now can get the full range of syntax highlighting and tree diffing. The config sets this up for only two languages: HTML (contains CSS or JavaScript in <script> or <style> tags; we don't support style="" or onclick="" etc. at this point), and Makefiles (contains Bash in $(shell ...) commands). The latter is fairly obscure; the big win is in the former. It would be nice to also have this support for PHP; however, the HTML parser seems to be a bit confused when asked to parse the partial HTML blocks we get if we just mark the "text" blocks as HTML, so for this to work well, probably the PHP blocks should be parsed as sub-languages of HTML instead of vice versa. Also, as a minor quibble, there should be support for bash in Perl's backticks (similar to in Makefiles), but the tree-sitter Perl parser does not support backticks at all (it goes into error recovery). There may have been languages that I've missed, e.g. some languages might have nodes that contain e.g. SQL. Fixes #382. Potentially relevant to #376.	2023-01-03 08:31:48 +07:00
Wilfred Hughes	0fc1842595	Improve word highlighting heuristics in comments Previously we highlighted changed whitespace, which led to ugly results if the number of words changed (there was a different number of whitespace characters so some were highlighted). Also treat _ and - as word constituents, as it produces nicer results when people write example CLI invocations in comments.	2023-01-02 16:56:31 +07:00
Wilfred Hughes	87dcfd2cca	Replace tabs in inline output too	2023-01-01 22:55:48 +07:00
Wilfred Hughes	e9eb4cd209	Always return the padding amount in split_string_by_width	2023-01-01 22:50:38 +07:00
Wilfred Hughes	e8e5ca8e47	Replace tabs during display, so parsing sees the original source Fixes #350	2023-01-01 22:44:47 +07:00
Wilfred Hughes	00ecf36a22	Pop delimiters immediately, rather than having ExitDelimiter* edges @QuarticCat observed that popping delimiters is unnecessary, and saw a speedup in PR #401. This reduces the number of nodes in typical graphs by ~20%, reducing runtime and memory usage. This works because there is only one thing we can do at the end of a list: pop the delimiter. The syntax node on the other side does not give us more options, we have at most one. Popping all the delimiters as soon as possible is equivalent, and produces the same graph route. This change has also slightly changed the output of samples_files/slow_after.rs, producing a better (more minimal) diff. This is probably luck, due to the path-dependent nature of the route solving logic, but it's a positive sign. A huge thanks to @QuarticCat for their contributions, this is a huge speedup. Co-authored-by: QuarticCat <QuarticCat@pm.me>	2022-12-28 02:00:09 +07:00
Wilfred Hughes	4bfdc7685c	Log vertex count at the info level This is less noisy and more useful than the path logging at the debug level.	2022-12-28 01:00:10 +07:00
Wilfred Hughes	57d1f6d449	Reserve the vec inside allocate_if_new Pushing to this vec was showing 2.5% of total compute time in profiles.	2022-12-28 00:30:25 +07:00
Wilfred Hughes	9745d06c87	Add docs for our stack implementation	2022-12-24 17:12:58 +07:00
Wilfred Hughes	3b37b9a12c	Expand symlinks before computing relative path for display paths Fixes #447	2022-12-22 22:48:27 +07:00
Wilfred Hughes	3766b944a8	Define a struct for DiffOptions	2022-12-22 09:11:59 +07:00
Wilfred Hughes	cadceb20b0	Show whole file names too with --list-languages	2022-12-19 09:33:53 +07:00
Wilfred Hughes	a21327ab13	Factor out a constant for language file associations	2022-12-19 09:23:46 +07:00
Wilfred Hughes	fda897b816	Don't parse rebar.lock files as Erlang The Erlang parser doesn't support this syntax apparently: https://github.com/WhatsApp/tree-sitter-erlang/issues/3	2022-12-19 01:00:56 +07:00
Wilfred Hughes	e0fcf2b84b	Add a --check-only flag Fixes #386	2022-12-18 23:55:22 +07:00
Wilfred Hughes	abd5e07654	Add a has_syntactic_changes field	2022-12-18 23:21:36 +07:00
Wilfred Hughes	b34cdabfdc	Prefer has_changes to has_same for clarity	2022-12-18 23:17:05 +07:00
Wilfred Hughes	a2f22cb17c	Only set the exit code if --exit-code is set This is important for usage with git log, which terminates on non-zero exit codes.	2022-12-18 23:11:18 +07:00
dannyfreeman	d99a3ba8a0	Separate rules for keywords leading with / char, inline them	2022-12-18 09:42:54 +07:00
Wilfred Hughes	b7cfff3f27	Silence some clippy lints	2022-12-18 00:33:27 +07:00
Wilfred Hughes	6a46237bb0	Set the exit code when changes are found Closes #285	2022-12-18 00:28:54 +07:00
Wilfred Hughes	7dd6bbd609	Implement Default on DisplayOptions	2022-12-17 23:49:18 +07:00
Wilfred Hughes	aa067e636b	Don't bother storing bytes of binary files	2022-12-17 23:44:48 +07:00
Wilfred Hughes	4b2e601de1	Store hunks in Summary	2022-12-17 23:42:20 +07:00
Wilfred Hughes	2cf27ec7cd	Display paths relative to cwd Fixes #444	2022-12-16 10:08:38 +07:00
Wilfred Hughes	bb2ae868d7	Add basic Erlang syntax highlighting Improves #394	2022-12-15 23:20:37 +07:00
dannyfreeman	1744342eb6	Reorder rules so that dependent rules come later	2022-12-15 11:33:42 +07:00
Wilfred Hughes	75299f79fd	Configure erlang delimiters	2022-12-14 23:58:15 +07:00
Wilfred Hughes	e6cec41e23	Add basic Erlang support Add tree-sitter and configure language detection. Helps with #394 Co-authored-by: Benedikt Reinartz <filmor@gmail.com>	2022-12-14 23:51:26 +07:00
dannyfreeman	e3940a0818	Remove named symbols for delimiters and keyword markers These can still be queried, but they don't appear in parse tree results. This makes the corpus tests a little simpler to maintain, and it's more in line with other markers in the grammar	2022-12-14 09:50:29 +07:00
dannyfreeman	b17f8407a0	Fix freeze on null byte char literals Tree-sitter does not like parsing empty chars, which is what a null byte registers as. This fixes that by forcing the *_NAMERSPACED_NAMES to require at least one character to be picked up See https://github.com/tree-sitter/tree-sitter/issues/98 This commit was squashed This is the commit message 2: Don't require at least 1 char for namespaced symbol names. This allows for symbols like `this-one/` to be parsed without error. They occur frequently in leiningen templates, which are not valid clojure files, but are still common enough to account for. Note that clojure repls will not accpet them as valid symbols. The reader throws and error This is the commit message 3 Revert "Don't require at least 1 char for namespaced symbol names." The leiningnen templates are not valid clojure code, so we won't make a special exception in the parsing for them, they can just be invalid. Parsing still works, we just get an error node in the `sym_lit` nodes when we fail to matach a `sym_name`	2022-12-14 09:50:09 +07:00
dannyfreeman	fd5e743d2c	Remove dyanamic precedence, no longer needed to resolve conflicts Also fix regex typo, missing escape char	2022-12-14 09:49:59 +07:00
dannyfreeman	80a63ddb5e	Account for weird keywords that start with /	2022-12-14 09:49:45 +07:00
dannyfreeman	319e45c253	Don't use an alias for $.kwd_marker, reduce duplication	2022-12-14 09:49:21 +07:00
dannyfreeman	4eef1073f6	Tokenize keyword in namespace and name parts Adds runtime conflict resolution for keywords, when one keywords can match both keyword rules, treesitter will prefer using the _kwd_qualified rule over _kwd_unqualified	2022-12-14 09:49:13 +07:00
dannyfreeman	0a50ef8786	Not whitespace required after lone division symbol / Accounts for things like '(+ - * /) but creates individual symbol tokens for invalid clojure code like (/////) ;; 1 token per / /asdf ;; 2 tokens: / and asdf /asdf/hjkl ;; 2 token: / and asfd/hjkl ; Correct comment, squash this commit later	2022-12-14 09:49:01 +07:00
dannyfreeman	ce6290192a	POC: Tokenize symbol into namespace and name parts Attempts to address issue #21 and issue #28	2022-12-14 09:48:43 +07:00
Wilfred Hughes	c5985c88b2	Allow null bytes in UTF16 input files	2022-12-13 09:38:24 +07:00
Wilfred Hughes	7b31be8adb	Improve binary file detection heuristics Fixes #433	2022-12-08 10:29:35 +07:00
Wilfred Hughes	84a968cdbb	cargo fmt	2022-12-08 10:28:24 +07:00
Wilfred Hughes	4da26f6459	Derive equality and debug on ProbableFileKind	2022-12-08 10:04:25 +07:00
Wilfred Hughes	554fb18b7c	Fix interleaved output when diffing directories Fixes #437	2022-12-08 09:58:19 +07:00
rhirano0715	436edb2ab4	Only add colour to the first hunk header This helps skimming the results when multiple files are changed with multiple hunks. It makes the file changing more prominent than just going from e.g. 5/5 to 1/10. Fixes #400 Acked-by: Wilfred Hughes <me@wilfred.me.uk>	2022-12-01 09:38:36 +07:00
Wilfred Hughes	2e7c90c472	Ensure line wrapping uses the same length on both sides Closes #421	2022-11-13 00:35:06 +07:00
Wilfred Hughes	923989d1a8	clippy fixes	2022-11-03 22:18:56 +07:00
Wilfred Hughes	7f7b35441b	Ensure that inline display without color has newlines This was broken in `3147eb8e6a` when newline splitting was made consistent, and `2071517621` only fixed the inline case when color was enabled. Fixed #383	2022-10-28 23:42:52 +07:00
Wilfred Hughes	28c3b0ef5d	Tweak line number styling to make it more distinct from content Dim line numbers for unchanged lines, and make changed lines bold (in addition to the existing red/green colours). Closes #384	2022-10-28 20:34:36 +07:00
Wilfred Hughes	2a3346e338	Use apply_line_number_color consistently on LHS and RHS Previously we missed a case on the LHS.	2022-10-28 20:18:17 +07:00
Wilfred Hughes	4d8d2a2f9d	Fix a clippy lint	2022-10-28 19:39:44 +07:00
Wilfred Hughes	1d6c7923e3	Replace remaining is_lhs booleans with Side arguments	2022-10-28 19:27:46 +07:00
Wilfred Hughes	7ea4b96a41	Prefer Side over booleans in line number styling	2022-10-28 19:14:06 +07:00
Wilfred Hughes	490787fe28	Factor out line number styling	2022-10-28 19:07:51 +07:00
Wilfred Hughes	b9d44ae65f	Treat error nodes as atoms Fixes #408	2022-10-15 22:50:08 +07:00
Wilfred Hughes	02f1cca444	Display stdin CLI arguments as "(stdin)" This improves display for #389, and makes language detection use pattern matching on FileArgument rather than comparing literal strings.	2022-10-14 13:46:30 +07:00
Wilfred Hughes	b4ff28c75e	Fix side-by-side line length when colour is disabled Fixes #406 Looks like this was inadvertently broken in #301.	2022-10-14 13:15:17 +07:00
Wilfred Hughes	2a6eb7e4f8	Add Debug on SourceDimensions	2022-10-14 11:52:53 +07:00
Wilfred Hughes	b6ddd152d0	Add the ability to configure how many lines of context are shown See #242	2022-10-13 12:34:52 +07:00
QuarticCat	cd5ba54752	Reduce number of branches of Vertex::eq	2022-10-06 22:33:47 +07:00
QuarticCat	887dec7645	Remove field can_pop_either from Vertex	2022-10-06 22:31:48 +07:00
QuarticCat	7a8044696e	Simplify push_{lhs,rhs}_delimiter	2022-10-06 22:31:38 +07:00

... 3 4 5 6 7 ...

1873 Commits (c73b18be77ca73e461a991bf8d30a8c5f95af597)