Commit Graph

1873 Commits (c73b18be77ca73e461a991bf8d30a8c5f95af597)

Author SHA1 Message Date
Wilfred Hughes 9134593a39 Add XML support
Fixes #10
2023-09-08 23:43:20 +07:00
Wilfred Hughes d56f775f31 Highlight constructors consistently with type names 2023-09-03 01:30:22 +07:00
Wilfred Hughes a4ee2cf99e cargo fmt 2023-08-26 21:41:41 +07:00
Wilfred Hughes b78ba2da4b Use type names from line_numbers directly 2023-08-26 20:36:07 +07:00
Wilfred Hughes 41c9165c79 Use my line_numbers crate for newline position calculations 2023-08-26 16:25:32 +07:00
Wilfred Hughes ca44de78e1 Group overrides from the same language together
No functional change, but makes --list-languages easier to read.

Fixes #549
2023-08-25 08:22:28 +07:00
Wilfred Hughes 0db99d76c6 Allow a language override to include multiple globs 2023-08-24 08:47:59 +07:00
eth3lbert b6d8ecbd4f
feat: display commit info in --version (#558)
This improves --version output for #554.
2023-08-18 08:10:47 +07:00
Wilfred Hughes 803a3a673c Improve variable names 2023-08-18 00:28:17 +07:00
Alex Krantz 11a96e5aec Add JSON cli flag 2023-08-17 08:49:59 +07:00
Wilfred Hughes 11f457b5f9 Fix typo 2023-08-16 21:20:17 +07:00
Wilfred Hughes 191f42e9d5 Clippy fixes 2023-08-15 21:42:06 +07:00
Wilfred Hughes 6b1c82efdf Prefer Option<&T> over &Option<T> 2023-08-15 21:37:41 +07:00
Wilfred Hughes a43b9ae9eb Dim the extra information section in hunks 2023-08-15 21:33:11 +07:00
Wilfred Hughes e1f97e614f Improve wording of conflict information
Fixes #555
2023-08-15 17:52:02 +07:00
Wilfred Hughes e0a1405453 Add the ability to parse conflict markers and diff the two files 2023-08-15 09:01:15 +07:00
Wilfred Hughes f06e95ca02 Renamed `old_path` to `extra_info` and format it during option parsing
This allows us to use this field for other purposes that aren't
renames.
2023-08-14 08:41:42 +07:00
Wilfred Hughes f1ba399504 Move local variable closer to first use 2023-08-14 08:27:42 +07:00
Wilfred Hughes eeb2974967 Move option parsing before argument parsing
This is useful for additional mode parsing that wants to access these
options.
2023-08-13 21:34:42 +07:00
Wilfred Hughes 1c60f3efd3 Move content detection out of diff_file_content
This makes the function useful in cases when we already have a string,
not bytes.
2023-08-13 21:31:37 +07:00
Wilfred Hughes 3c702d0490 Use humansize for file size formatting 2023-08-12 22:34:11 +07:00
Wilfred Hughes 5f25bc0ebd Rename information in header should only be shown on first hunk
Fixes #553
2023-08-11 08:21:29 +07:00
Wilfred Hughes a187d7a134 Improve rename styling
It should use the heading with colour, consistent with other modes,
and the header should come before rename information.
2023-08-08 08:53:33 +07:00
Wilfred Hughes ba92a93f9b Fix rustc warning on recent nightly 2023-08-04 23:31:31 +07:00
Wilfred Hughes 19cbf1d458 Implement some other useful traits on EqOnFirstItem
These aren't immediately used, but they're handy for experimenting
with the similar library which requires these.
2023-08-04 23:29:29 +07:00
Wilfred Hughes 892d4fdb58 Ensure size_hint never exceeds graph_limit
If we have thousands of syntax nodes on both sides, we can end
up attempting to preallocate a very large hashmap.

In #542, a user hit an issue with two JSON files where the LHS had
33,000 syntax nodes and the RHS had 34,000 nodes, so we'd attempt to
preallocate a hashmap of capacity 1,122,000,000. This required
allocating 70,866,960,400 bytes (roughly 66 GiB).

Impose a sensible limit on the hashmap.

Fixes #542
2023-08-04 17:19:27 +07:00
Wilfred Hughes c937f819a1 Log the number of bytes in the arena at the end of route finding 2023-08-04 17:04:23 +07:00
Wilfred Hughes 0c01c73398 Be consistent in lifetime names for Vertex 2023-08-03 08:32:16 +07:00
Wilfred Hughes 757c297412 Adjust header style
Show the hunk count and detected language in a dimmed style. This
information is less important than the diff content itself, so this
change makes the important information more prominent.

First part of #544
2023-07-31 08:35:27 +07:00
Wilfred Hughes 797af40ae8 Improve Java highlighting 2023-07-27 08:33:38 +07:00
Wilfred Hughes 4e9637c861 Check more bytes when detecting encoding
I've observed PDF files that have sufficiently large headers that they
were detected as text, which wasn't helpful.

Also improve logging to report how many invalid bytes were found.
2023-07-21 08:34:41 +07:00
Wilfred Hughes 4f750ec359 Clarify how to find language names in argument help 2023-07-21 08:23:36 +07:00
Wilfred Hughes 685a2ef8d5 Merge remote-tracking branch 'grunweg/master' 2023-07-20 22:41:56 +07:00
Wilfred Hughes 7caaaf7fcf Handle nested sliders correctly when preferring the outer delimiter
Previously we didn't check the state of children, which was an
oversight from the original implementation. As a result, we fixed
nested sliders in fewer situations.

Fixes #535
2023-07-14 08:49:55 +07:00
Wilfred Hughes a5d3cb55b7 Treat constructors consistently with variables in Haskell atoms 2023-07-12 17:34:42 +07:00
Wilfred Hughes 8614910fe2 cargo fmt 2023-07-12 16:45:58 +07:00
Wilfred Hughes f6ceb2aefd Update unit test new subword highlighting heuristic 2023-07-12 12:48:45 +07:00
Wilfred Hughes 5606c04261 Treat qualified modules and variables as atoms in Haskell 2023-07-12 12:34:39 +07:00
Wilfred Hughes a814e01d22 Improve word diffing heuristic and add another sample file 2023-07-12 12:12:32 +07:00
Wilfred Hughes 1d3b6836ef Handle multiline atoms more accurately in split_atom_words 2023-07-12 11:49:39 +07:00
Wilfred Hughes c2b7042b80 Do subword highlighting in more cases
This is useful when two strings substantially differ, but have the
same e.g. end.
2023-07-10 21:26:24 +07:00
Wilfred Hughes 5824322244 Require some common words to do subword highlighting
This is important when comparing short string literals. This change
has improved several cases in sample_files/ but I've added a new
example that made the previous unwanted behaviour much more obvious.
2023-07-10 09:03:21 +07:00
Wilfred Hughes 4aca79f220 Use the raw_entry_mut API on hashbrown::HashMap
This saves us searching the hash map twice. This is a modest
performance improvement: an instruction count reduction of 4% on
slow_before.rs, and 1% reduction on typing_before.ml.
2023-07-09 22:49:37 +07:00
Wilfred Hughes 8eb949eb02 Use DftHashMap everywhere
This is a 4% reduction in instructions for typing_before.ml, but a
0.2% increase instructions for slow_before.rs. This seems like a win
overall, and it also keeps the codebase more consistent and simpler.
2023-07-09 15:41:01 +07:00
Wilfred Hughes d9911e0b49 Move DftHashMap to a separate file 2023-07-09 15:37:51 +07:00
Wilfred Hughes f2456a12b2 Use hashbrown for the alloc_if_new data
This was intended to allow usage of .entry_ref(), but it's already a
performance win without using that API! It's around a 9% reduction in
instructions in slow_before.rs, and 2% reduction in typing_before.ml.
2023-07-09 11:11:03 +07:00
Wilfred Hughes 27f59c0b3a Don't treat - as a word constituent
This produces slightly better results with some string replacements.
2023-07-08 17:16:14 +07:00
Wilfred Hughes 2607d17d73 Fix spelling in comment 2023-07-08 17:16:14 +07:00
Zhenge Chen ffd49d523a Detect replaced strings
If a string is replaced with another, apply subword highlighting
similar to how we handle replaced comments.

Co-authored-by: Wilfred Hughes <me@wilfred.me.uk>
2023-07-08 17:16:06 +07:00
Wilfred Hughes f86ba13abf Increase punctuation cost to 200 2023-07-08 14:59:47 +07:00
Wilfred Hughes 495dbe5b14 Improve comments in Edge::cost 2023-07-08 14:53:33 +07:00
Wilfred Hughes 574fb5bd50 Fix clippy lint 2023-07-07 23:52:51 +07:00
Wilfred Hughes 53855e415e Reduce copying further in set_neighbours
This saves a remarkable 8.5% of instructions on slow_before.rs.
2023-07-07 23:37:16 +07:00
Wilfred Hughes a180fd6d24 Don't return the neighbours inside get_set_neighbours
This caused unnecessarying closing, costing 0.2% instructions in some
cases, and also made the code less readable.
2023-07-07 23:29:51 +07:00
Wilfred Hughes 87d27c5598 Only split numbers inside comments
Inside text files, it seems to be better to be conservative and
consider abc123def as one word rather than three.

This is noticeable when looking at changes to the compare.expected
file, which contains hashes. 123c456 and 345c789 don't really have a
`c` in common, so subword highlighting is ugly.
2023-07-07 08:40:06 +07:00
Wilfred Hughes c07e640b24 Remove contiguous penalty
The contiguous penalty was an attempt to fix the slider problem:

// Old
A B
C D

// New
A B
A B
C D

// Unwanted diff
A +B+
+A+ B
C D

However, it doesn't make sense for Dijkstra, which is stateless. The
best route from vertex X is independent of how we got to vertex X.

This worked by dumb luck: in some circumstances we terminate early
rather than fully executing Dijkstra's algorithm. This cost tweak
improved results on a few test files. However, the post-processing
slider logic is a proper, general solution. This was added much later.

There's no reason to keep the contiguous penalty now. It's confusing,
and makes adding new edge costs with consistent 'X costs more than Y'
behaviours more difficult.

Performance is essentially neutral: a small decrease in
typing_before.ml, a small increase in slow_before.rs.
2023-07-06 08:37:02 +07:00
Wilfred Hughes 31df177881 Increase the punctuation penalty
This ensures that choosing a unchanged non-punctuation atom with some
novel atoms is better than choosing punctuation and some changed
comments. This produces better results in general, see
comma_and_comment_after.js for an example.

This will be more noticeable after the next commit, where costs of
novel atoms are in a smaller range of values.
2023-07-06 08:16:24 +07:00
Wilfred Hughes c3016eca4a Add TODO 2023-07-06 08:14:03 +07:00
Wilfred Hughes 43c24047b4 Don't track contiguous status on novel delimiter edges
This is harder to reason about, and
2e6666041f did not include a motivating
test case.

Removing contiguous status is a minor perf improvement (2% reduction
in instructions), makes the code simpler, and does not significantly
affect diffing results.

Of the two sample files that have changed, the erlang_before.erl file
has improved and nest_before.rs is neutral.
2023-07-04 23:53:16 +07:00
Wilfred Hughes 1e4d1828c7 Store probably_punctuation on unchanged edges
This is equivalent (increased cost on unchanged nodes vs decreased
cost on changed nodes), but easier to reason about.

Previously we have multiple notions of changed atoms: NovelAtomLHS,
NovelAtomRHS, and ReplacedComment. We want to consider punctuation as
less desirable even when e.g. comments arereplaced.
2023-07-03 19:48:31 +07:00
Wilfred Hughes c405b58327 Fix cost for ReplacedComment
This needs to be 2x novel nodes, or we prefer it far too often.
2023-07-02 23:12:31 +07:00
Wilfred Hughes 3730580ca3 Improve word splitting heuristics
This is particularly noticeable when diffing comments with timestamps
2000-12-31T23:59:59 where we don't want 31T23 to be a single word.
2023-06-29 08:33:30 +07:00
Wilfred Hughes 9eb48ca661 Configure comments as atoms in latest scala parser 2023-06-12 22:28:01 +07:00
Wilfred Hughes 81ac9d167b cargo fmt 2023-06-02 08:45:21 +07:00
Gao, Xiang 653660e92f
Treat CUDA as C++ (#522) 2023-05-27 12:30:46 +07:00
Wilfred Hughes b0a3a0ada9 Use DFT_LOG as the internal logging environment variable
Fixes #519
2023-05-21 23:33:34 +07:00
Wilfred Hughes b6895d42e4 Document the --override option in CHANGELOG and --help
This also enables users to disable language parsing.
Closes #439
Closes #440
2023-05-15 23:04:20 +07:00
Wilfred Hughes 745253d4d9 cargo fmt 2023-05-15 21:49:07 +07:00
Wilfred Hughes f302952748 Add unit test for overrides 2023-05-15 17:54:23 +07:00
Wilfred Hughes 325926eead Update tests for overrides argument 2023-05-15 17:51:39 +07:00
Wilfred Hughes 943a9e91f1 Fully document --override 2023-05-15 08:53:34 +07:00
Wilfred Hughes c9cabac517 Allow overriding language associations from numbered environment variables 2023-05-15 08:43:53 +07:00
Wilfred Hughes b3a7bec430 Pass overrides to guess() and display them in --list-languages 2023-05-15 08:20:32 +07:00
Wilfred Hughes 0e2c167dda Move LanguageOverride to guess_language 2023-05-14 21:49:59 +07:00
Wilfred Hughes adff907aaa Pass language_overrides to all modes 2023-05-14 21:46:54 +07:00
Wilfred Hughes 47a0690286 Initial support for parsing --override 2023-05-14 21:44:30 +07:00
Wilfred Hughes 84af470128 Remove --language 2023-05-14 15:57:43 +07:00
Wilfred Hughes 05a1b184ea Use globbing to match file names in language detection 2023-05-14 15:50:56 +07:00
Wilfred Hughes 893b56b6b1 Consistent casing 2023-05-14 00:56:45 +07:00
Wilfred Hughes e37a6b2087 Improve naming and ordering of JSX/TSX languages 2023-05-14 00:54:58 +07:00
Wilfred Hughes 4d85b5c15e Prefer pattern matching and EnumIter for Language rather than lists 2023-05-13 23:46:18 +07:00
Wilfred Hughes 1f16d207f6 Clarify comment 2023-05-13 23:15:26 +07:00
Mike Grunweg 7984b7a59e Audit node types with children: none should be treated as atoms. 2023-05-05 22:45:53 +07:00
Mike Grunweg 2c2461667b Configure parsing and language detection. 2023-05-05 22:00:15 +07:00
Wilfred Hughes 87f19f5e10 Don't including trailing newlines in comment nodes
This makes constructing hunks harder to reason about.

This change doesn't affect output, but helps when debugging, as it
makes multiline atoms much less common.
2023-04-30 09:51:39 +07:00
Wilfred Hughes b5cc0787f4 Clarify debug printing of LineNumber 2023-04-30 09:29:08 +07:00
Wilfred Hughes faaec9ad4c Use the SyntaxId type explicitly in ChangeMap 2023-04-22 19:59:03 +07:00
Wilfred Hughes 8d44e91a06 Improve lifetime names 2023-04-22 15:25:45 +07:00
Wilfred Hughes d521b29c9e set_prev_sibling should always recurse 2023-04-20 08:42:10 +07:00
Wilfred Hughes 2074b97117 Fix clippy lint 2023-04-19 20:34:33 +07:00
Wilfred Hughes 18ef47e20d Use a path separator constant that is available on Rust 1.57 2023-04-19 20:04:56 +07:00
Wilfred Hughes f08e77a675 Prefer a single optional string for old_name when renames occur 2023-04-15 17:20:30 +07:00
Wilfred Hughes 6f9ccd9ec4 Use a single display_path field in the diff options struct 2023-04-15 17:19:56 +07:00
Wilfred Hughes 2224602ad5 Store file renames explicitly in Diff struct 2023-04-15 00:57:27 +07:00
Valentin 4296796053 Run cargo fmt 2023-04-06 09:44:38 +07:00
Valentin b86d4dbf9e Add solidity support 2023-04-05 11:12:19 +07:00
Wilfred Hughes 8c004be87b Remove unnecessary helper function 2023-04-02 19:28:57 +07:00
Wilfred Hughes cb9367c129 Remove tests that no longer apply after 8b842387a 2023-03-31 08:19:23 +07:00
Wilfred Hughes 45121f6f6d Fix clippy warnings 2023-03-31 08:15:18 +07:00
Wilfred Hughes 8b842387a1 Don't clean trailing newline before diffing
Difftastic should take the user's input as-is, or it risks performing
an incorrect diff in both textual and syntactic diffing.

Fixes #499
2023-03-30 08:46:11 +07:00
Wilfred Hughes 713220613e Support diffing directories where files are only in one side
This was broken in 1e9f43768. Add test.

Fixes #500
2023-03-24 23:57:49 +07:00
Wilfred Hughes 1e9f437688 Remove --missing-as-empty 2023-03-17 08:40:21 +07:00
Wilfred Hughes e4ff6d7d09 Update atom nodes for latest Java 2023-03-17 00:42:19 +07:00
Wilfred Hughes 69e511e638 Tweak UTF-16 heuristics to prefer files with a BOM 2023-03-16 00:46:24 +07:00
Wilfred Hughes 7fbe0d6c2f Improve UTF-16 detection heuristics and add test 2023-03-16 00:31:58 +07:00
Wilfred Hughes 20ad284882 Add 'vendored_parsers/tree-sitter-clojure/' from commit '421546c2547c74d1d9a0d8c296c412071d37e7ca'
Closes #448

git-subtree-dir: vendored_parsers/tree-sitter-clojure
git-subtree-mainline: ebfc043a4a
git-subtree-split: 421546c254
2023-03-15 15:43:55 +07:00
Wilfred Hughes a67be0f845 Treat quoted_keys as atoms in TOML 2023-03-15 15:12:33 +07:00
Wilfred Hughes ec6e4665bc
Merge pull request #494 from karlding/add_ada_support
Add Ada support
2023-03-15 10:25:46 +07:00
Karl Ding 01522a9d58 Remove alire.toml from LANG_FILE_NAMES
Fix an incorrect assumption about how LANG_FILE_NAMES works.

We want the alire.toml file to still be treated as TOML (and not Ada),
so adding an entry here is not necessary for Ada support.
2023-03-15 01:09:48 +07:00
Wilfred Hughes 6ad77c620c Don't highlight text in purple
Closes #498
2023-03-14 23:47:17 +07:00
Karl Ding d5ed2deb6e Run formatter over changes 2023-03-14 21:46:40 +07:00
Karl Ding 5271f65f92 Add language support for Ada
Implement support in difftastic for the Ada programming language
using the treesitter grammar provided in 'briot/tree-sitter-ada'.

Language detection depends on the following suffixes:

    * adb
    * ads
    * ada

The presence of the alire TOML file (alire.toml) is also used as
a heuristic.
2023-03-14 21:46:40 +07:00
Stavros Korokithakis 85b9493eaa
Recognize the Arduino extension as C++ 2023-03-13 19:44:46 +07:00
Wilfred Hughes c7bfc72529 Clarify --exit-code behaviour on plain text 2023-03-09 08:54:21 +07:00
Wilfred Hughes ac35c6c047 Document the different options for --display
Fixes #491
2023-03-09 08:54:21 +07:00
Wilfred Hughes 2d1a2c906e Count errors on the root node too
Fixes #377
2023-03-03 00:25:41 +07:00
Wilfred Hughes 045d6a2c58 Treat Newick and Racket as lisps 2023-03-03 00:23:11 +07:00
Wilfred Hughes 03985066f5 Treat Makefile text as atoms
Improves another case identified in #476
2023-03-02 23:52:01 +07:00
Wilfred Hughes 54eb22ea98 Use FileFormat consistently everywhere 2023-03-02 22:25:14 +07:00
Wilfred Hughes dae8c8a3b5 Update parameter names for type 2023-03-02 22:01:13 +07:00
Wilfred Hughes 133c05e46b Define an explicit enum for file formats 2023-03-02 21:58:33 +07:00
Wilfred Hughes c33d7f2520 Support --check-only on text files too 2023-03-02 08:50:45 +07:00
Wilfred Hughes ca2902cee2 Add more doc comments to FileContent 2023-03-02 08:35:45 +07:00
Wilfred Hughes e99b2ce27c Represent byte limit and parse error limit as Result return types 2023-02-24 08:38:15 +07:00
Wilfred Hughes 9556cd978e Merge branch 'delehef/master' 2023-02-21 08:46:07 +07:00
6cdh 5e659e2d98 add racket highlight 2023-02-21 08:30:00 +07:00
Franklin Delehelle a0b9df0e29
Add support for Newick tree files 2023-02-16 15:52:16 +07:00
6cdh 2fade3e0bf fix here string 2023-02-12 13:57:45 +07:00
6cdh fe756905bf added Racket support 2023-02-12 13:39:58 +07:00
Wilfred Hughes 96fc044e6d Improve syntax highlighting
`@include` and `@exception` are both used for highlighting keywords in
several languages.
2023-02-10 08:50:43 +07:00
Wilfred Hughes 62da0d56cc Fix error message 2023-02-05 22:43:58 +07:00
Wilfred Hughes 952d55ee22 Don't bother logging parse errors 2023-02-05 22:03:08 +07:00
Wilfred Hughes 96a80c4d21 Improve naming for parse error limit 2023-02-05 17:30:24 +07:00
Wilfred Hughes 63cf71641a Display file size in the header if it's too big 2023-02-05 17:28:39 +07:00
Wilfred Hughes 7d5afd78dc Respect --error-limit when parsing
Next step for #472
2023-02-04 22:29:18 +07:00
Wilfred Hughes 7ec0c2a956 Remove unnecessary concat! calls 2023-02-04 22:02:28 +07:00
Wilfred Hughes 553a1ef231 Count error nodes when walking tree-sitter tree
First steps of #472
2023-02-04 16:46:17 +07:00
Wilfred Hughes f77730005e Add clarifying comment 2023-02-04 14:09:30 +07:00
Wilfred Hughes 34f21c6d9f Only run the gzip test on Linux CI
This test fails when a MIME database isn't present, such as the
Windows CI environment or minimal environments (reported in #478).
2023-02-04 11:14:46 +07:00
Wilfred Hughes ed0f50fa68 Remove --error-limit to prepare for release 2023-02-02 22:22:23 +07:00
Wilfred Hughes b6e756ed1a
Merge pull request #473 from Miksu82/master
Fix file content type detection for small gzipped files
2023-02-02 22:18:59 +07:00
Wilfred Hughes b78067a96b Disable unfinished error_limit logic to prepare for release 2023-02-02 22:11:31 +07:00
Mika Ristimäki 926055634c Remove application/octet-stream check when checking mime type 2023-02-02 11:08:24 +07:00
Niklas Vogel abdef09bdd Check arguments passed from Git for `/dev/null` 2023-02-01 17:53:16 +07:00
Mika Ristimäki c573094ee5 Make sure gzip files are treated as binary 2023-02-01 11:01:26 +07:00
Mika Ristimäki 289632afcc Fix formatting 2023-02-01 10:20:47 +07:00
Mika Ristimäki bb81f0e524 Fix file content type detection for small gzipped files 2023-02-01 10:20:47 +07:00
Wilfred Hughes 92d21dc5c0 Silence another clippy lint 2023-01-30 08:35:24 +07:00
Wilfred Hughes 8e1503a11d Add an --error-limit CLI option
Starts #472.
2023-01-26 23:55:41 +07:00
Wilfred Hughes cc3f266969 Treat `%in%` as a single atom in R 2023-01-26 22:40:10 +07:00
Wilfred Hughes 5ed4bac8a5 Add support for R
Fixes #470
2023-01-26 08:50:00 +07:00
Wilfred Hughes 998c9e94ff Fix crash on repeated, partially novel lists
Fixes #469
2023-01-25 23:54:41 +07:00
Wilfred Hughes 12fe91560c Add doc comments to slider functions 2023-01-25 23:17:42 +07:00
Wilfred Hughes 6321b8ece2 Consistently append newlines regardless of colour mode 2023-01-25 16:55:12 +07:00
Wilfred Hughes 3291da6be4 Allow colour to be configured with DFT_COLOR too
In #468 it's also mentioned that this couldn't be configured with an
environment variable.
2023-01-25 16:44:41 +07:00
Wilfred Hughes 4337fbbcd8 Show the default value for --color and --display
Fixes #468
2023-01-25 16:40:12 +07:00
Wilfred Hughes 897305ed6b Allow --ignore-comments to be used with --dump-syntax 2023-01-25 16:31:09 +07:00
Wilfred Hughes 9bdd1964b0 Improve doc comments 2023-01-23 08:47:52 +07:00
Wilfred Hughes 6c3dc6b69e Fix formatting in comment 2023-01-23 08:38:40 +07:00
Wilfred Hughes edb51ece86 Disable a clippy warning that we can't satisfy for Syntax 2023-01-22 20:03:44 +07:00
Wilfred Hughes a37edbab8d Silence some clippy warnings 2023-01-22 20:03:44 +07:00
Wilfred Hughes fe68f43e93 Use crossterm for is_tty and terminal width
This is activly maintained, handles stdout being directed, and seems
to be tested more on Windows too (potentially improving #363).
2023-01-17 00:03:08 +07:00
Wilfred Hughes dffc72d1ca cargo fmt 2023-01-15 20:13:08 +07:00
Wilfred Hughes daa7156a2c Fix crash with --display=inline and trailing whitespace
Line numbers may be less than .max_line(), as .max_line() trims
whitespace. Ensure pad_after() is robust to this, and add a test.

I could only reproduce the crash in inline display mode, but in
principle this could be an issue in all modes.

Fixes #452
2023-01-15 20:04:12 +07:00
Wilfred Hughes e17b6e6109 Clarify wording 2023-01-15 15:56:37 +07:00
Wilfred Hughes c9105ca0ba cargo fmt 2023-01-15 15:49:24 +07:00
Wilfred Hughes b8a2910c26 Reuse the tree-sitter tree for extracting comments
This reduces instruction counts by 9% on some test
files (e.g. samples_files/typing_before.ml).
2023-01-15 15:47:15 +07:00
Wilfred Hughes 2536fe7396 Factor out parsing to tree in main.rs 2023-01-15 15:42:44 +07:00
Wilfred Hughes 2d6ee94ac1 Split out to_tree and to_syntax functions in parser 2023-01-15 15:39:07 +07:00
Wilfred Hughes a488efd63b Add highlighting for ignored syntactic elements
This finishes --ignore-comment support.

Fixes #449.
2023-01-15 14:49:46 +07:00
Wilfred Hughes f3b02f7b47 cargo fmt 2023-01-15 11:43:09 +07:00
Wilfred Hughes 0e3c57c64a Skip unique items before computing Myer's diff on text
This substantially improves performance on text files where there are
few lines in common.

For example, 10,000 line files with no lines in common is more than 10x
faster (8.5 seconds to 0.49 seconds on my machine), and
sample_files/huge_cpp_before.cpp is nearly 2% faster.

Fixes the case mentioned by @quackenbush in #236.

This is inspired by the heuristics discussions at
https://github.com/mitsuhiko/similar/issues/15
2023-01-15 11:38:02 +07:00
Wilfred Hughes c08eefb14a Move slice_by_hash to myers_diff and add unit tests 2023-01-15 11:03:31 +07:00
Wilfred Hughes dd92af3643 Add tests for myers_diff module 2023-01-15 10:55:14 +07:00
Wilfred Hughes 85e630aabc cargo fmt 2023-01-14 12:52:53 +07:00
Wilfred Hughes efec759504 Only set language_used after a full syntactic diff
This fixes cases where the language is detected but the file hits the
byte limit.

Fixes #462.
2023-01-14 12:52:08 +07:00
Wilfred Hughes 6eed874362 Clarify field name 2023-01-14 12:30:54 +07:00
Wilfred Hughes 94b6117723 Define an --ignore-comments option and pass to parser
Initial work for #449
2023-01-10 08:47:58 +07:00
Wilfred Hughes 08b3ff138f Rename vendor/ directory
Closes #453
2023-01-10 08:35:01 +07:00
Wilfred Hughes 1ad9789b38 clippy fixes 2023-01-10 00:45:07 +07:00
Wilfred Hughes 63a3bf0c91 Ensure we use the correct config for sublanguage parsing
Otherwise get the wrong node names for atoms.
2023-01-08 22:24:43 +07:00
Wilfred Hughes 8ed4fbccfa Treat colour values (e.g. `#FFF`) as atoms in CSS 2023-01-08 22:22:46 +07:00
Wilfred Hughes 29d87a6ac4 Adding TODO 2023-01-08 22:06:58 +07:00
Wilfred Hughes c310fb34f9 Use u32 for edge cost
This is performance neutral (both runtime and memory size) but the
code is slightly readable as there are fewer conversions.
2023-01-08 21:34:49 +07:00
Wilfred Hughes 34967f588d Treat predefined_type as an atom in TypeScript
Currently it contains a nested string node, even though it's a fixed
set of known types. This was preventing us from applying good syntax
highlighting.

This was particularly noticeable with `string`, which wasn't
previously highlighted as a type.
2023-01-07 22:43:50 +07:00
Wilfred Hughes 610a6e441d Ensure that textual fallback diffing has a parse language of None
Previously we still passed the parse language after exceeding the
graph limit, leading to incorrect underline highlighting.
2023-01-06 19:07:05 +07:00
Wilfred Hughes 8a799af0ff cargo fmt 2023-01-06 18:18:37 +07:00
Wilfred Hughes 9ae4eb17fd Add test for is_all_whitespace 2023-01-06 18:16:01 +07:00
Wilfred Hughes d8d4b8c003 Add is_all_whitespace helper function 2023-01-06 08:36:54 +07:00
Wilfred Hughes cd87796552 Treat doctype nodes as atoms in HTML
The tree-sitter parser doesn't include the text after DOCTYPE in the
inner tag.
2023-01-03 08:40:39 +07:00
Steinar H. Gunderson 9133918dd4 Support parsing of sub-languages.
This allows given nodes (configurable per-language, using tree-sitter's
query syntax) to be re-parsed as other languages. The canonical example
is CSS or JavaScript inside HTML, which normally would be a single token
but now can get the full range of syntax highlighting and tree diffing.

The config sets this up for only two languages: HTML (contains CSS or
JavaScript in <script> or <style> tags; we don't support style="" or
onclick="" etc. at this point), and Makefiles (contains Bash in
$(shell ...) commands). The latter is fairly obscure; the big win is
in the former.

It would be nice to also have this support for PHP; however, the HTML
parser seems to be a bit confused when asked to parse the partial HTML
blocks we get if we just mark the "text" blocks as HTML, so for this
to work well, probably the PHP blocks should be parsed as sub-languages
of HTML instead of vice versa.

Also, as a minor quibble, there should be support for bash in Perl's
backticks (similar to in Makefiles), but the tree-sitter Perl parser
does not support backticks at all (it goes into error recovery).

There may have been languages that I've missed, e.g. some languages
might have nodes that contain e.g. SQL.

Fixes #382. Potentially relevant to #376.
2023-01-03 08:31:48 +07:00
Wilfred Hughes 0fc1842595 Improve word highlighting heuristics in comments
Previously we highlighted changed whitespace, which led to ugly
results if the number of words changed (there was a different number
of whitespace characters so some were highlighted).

Also treat _ and - as word constituents, as it produces nicer results
when people write example CLI invocations in comments.
2023-01-02 16:56:31 +07:00
Wilfred Hughes 87dcfd2cca Replace tabs in inline output too 2023-01-01 22:55:48 +07:00
Wilfred Hughes e9eb4cd209 Always return the padding amount in split_string_by_width 2023-01-01 22:50:38 +07:00
Wilfred Hughes e8e5ca8e47 Replace tabs during display, so parsing sees the original source
Fixes #350
2023-01-01 22:44:47 +07:00
Wilfred Hughes 00ecf36a22 Pop delimiters immediately, rather than having ExitDelimiter* edges
@QuarticCat observed that popping delimiters is unnecessary, and saw a
speedup in PR #401. This reduces the number of nodes in typical graphs
by ~20%, reducing runtime and memory usage.

This works because there is only one thing we can do at the end of a
list: pop the delimiter. The syntax node on the other side does not
give us more options, we have at most one. Popping all the delimiters
as soon as possible is equivalent, and produces the same graph route.

This change has also slightly changed the output of
samples_files/slow_after.rs, producing a better (more minimal)
diff. This is probably luck, due to the path-dependent nature of the
route solving logic, but it's a positive sign.

A huge thanks to @QuarticCat for their contributions, this is a huge
speedup.

Co-authored-by: QuarticCat <QuarticCat@pm.me>
2022-12-28 02:00:09 +07:00
Wilfred Hughes 4bfdc7685c Log vertex count at the info level
This is less noisy and more useful than the path logging at the debug
level.
2022-12-28 01:00:10 +07:00
Wilfred Hughes 57d1f6d449 Reserve the vec inside allocate_if_new
Pushing to this vec was showing 2.5% of total compute time in profiles.
2022-12-28 00:30:25 +07:00
Wilfred Hughes 9745d06c87 Add docs for our stack implementation 2022-12-24 17:12:58 +07:00
Wilfred Hughes 3b37b9a12c Expand symlinks before computing relative path for display paths
Fixes #447
2022-12-22 22:48:27 +07:00
Wilfred Hughes 3766b944a8 Define a struct for DiffOptions 2022-12-22 09:11:59 +07:00
Wilfred Hughes cadceb20b0 Show whole file names too with --list-languages 2022-12-19 09:33:53 +07:00
Wilfred Hughes a21327ab13 Factor out a constant for language file associations 2022-12-19 09:23:46 +07:00
Wilfred Hughes fda897b816 Don't parse rebar.lock files as Erlang
The Erlang parser doesn't support this syntax apparently:
https://github.com/WhatsApp/tree-sitter-erlang/issues/3
2022-12-19 01:00:56 +07:00
Wilfred Hughes e0fcf2b84b Add a --check-only flag
Fixes #386
2022-12-18 23:55:22 +07:00
Wilfred Hughes abd5e07654 Add a has_syntactic_changes field 2022-12-18 23:21:36 +07:00
Wilfred Hughes b34cdabfdc Prefer has_changes to has_same for clarity 2022-12-18 23:17:05 +07:00
Wilfred Hughes a2f22cb17c Only set the exit code if --exit-code is set
This is important for usage with git log, which terminates on non-zero
exit codes.
2022-12-18 23:11:18 +07:00
dannyfreeman d99a3ba8a0
Separate rules for keywords leading with / char, inline them 2022-12-18 09:42:54 +07:00
Wilfred Hughes b7cfff3f27 Silence some clippy lints 2022-12-18 00:33:27 +07:00
Wilfred Hughes 6a46237bb0 Set the exit code when changes are found
Closes #285
2022-12-18 00:28:54 +07:00
Wilfred Hughes 7dd6bbd609 Implement Default on DisplayOptions 2022-12-17 23:49:18 +07:00
Wilfred Hughes aa067e636b Don't bother storing bytes of binary files 2022-12-17 23:44:48 +07:00
Wilfred Hughes 4b2e601de1 Store hunks in Summary 2022-12-17 23:42:20 +07:00
Wilfred Hughes 2cf27ec7cd Display paths relative to cwd
Fixes #444
2022-12-16 10:08:38 +07:00
Wilfred Hughes bb2ae868d7 Add basic Erlang syntax highlighting
Improves #394
2022-12-15 23:20:37 +07:00
dannyfreeman 1744342eb6
Reorder rules so that dependent rules come later 2022-12-15 11:33:42 +07:00
Wilfred Hughes 75299f79fd Configure erlang delimiters 2022-12-14 23:58:15 +07:00
Wilfred Hughes e6cec41e23 Add basic Erlang support
Add tree-sitter and configure language detection. Helps with #394

Co-authored-by: Benedikt Reinartz <filmor@gmail.com>
2022-12-14 23:51:26 +07:00
dannyfreeman e3940a0818
Remove named symbols for delimiters and keyword markers
These can still be queried, but they don't appear in parse tree
results. This makes the corpus tests a little simpler to maintain,
and it's more in line with other markers in the grammar
2022-12-14 09:50:29 +07:00
dannyfreeman b17f8407a0
Fix freeze on null byte char literals
Tree-sitter does not like parsing empty chars, which is what a null
byte registers as.

This fixes that by forcing the *_NAMERSPACED_NAMES to require
at least one character to be picked up

See
https://github.com/tree-sitter/tree-sitter/issues/98

This commit was squashed

This is the commit message 2:

Don't require at least 1 char for namespaced symbol names.

This allows for symbols like `this-one/` to be parsed without error.
They occur frequently in leiningen templates, which are not valid
clojure files, but are still common enough to account for. Note that
clojure repls will not accpet them as valid symbols. The reader throws
and error

This is the commit message 3

Revert "Don't require at least 1 char for namespaced symbol names."

The leiningnen templates are not valid clojure code, so we won't make
a special exception in the parsing for them, they can just be invalid.
Parsing still works, we just get an error node in the `sym_lit` nodes
when we fail to matach a `sym_name`
2022-12-14 09:50:09 +07:00
dannyfreeman fd5e743d2c
Remove dyanamic precedence, no longer needed to resolve conflicts
Also fix regex typo, missing escape char
2022-12-14 09:49:59 +07:00
dannyfreeman 80a63ddb5e
Account for weird keywords that start with / 2022-12-14 09:49:45 +07:00
dannyfreeman 319e45c253
Don't use an alias for $.kwd_marker, reduce duplication 2022-12-14 09:49:21 +07:00
dannyfreeman 4eef1073f6
Tokenize keyword in namespace and name parts
Adds runtime conflict resolution for keywords, when one keywords can
match both keyword rules, treesitter will prefer using the
_kwd_qualified rule over _kwd_unqualified
2022-12-14 09:49:13 +07:00
dannyfreeman 0a50ef8786
Not whitespace required after lone division symbol /
Accounts for things like '(+ - * /)

but creates individual symbol tokens for invalid clojure code like

(/////) ;; 1 token per /

/asdf ;; 2 tokens: / and asdf

/asdf/hjkl ;; 2 token: / and asfd/hjkl

; Correct comment, squash this commit later
2022-12-14 09:49:01 +07:00
dannyfreeman ce6290192a
POC: Tokenize symbol into namespace and name parts
Attempts to address issue #21 and issue #28
2022-12-14 09:48:43 +07:00
Wilfred Hughes c5985c88b2 Allow null bytes in UTF16 input files 2022-12-13 09:38:24 +07:00
Wilfred Hughes 7b31be8adb Improve binary file detection heuristics
Fixes #433
2022-12-08 10:29:35 +07:00
Wilfred Hughes 84a968cdbb cargo fmt 2022-12-08 10:28:24 +07:00
Wilfred Hughes 4da26f6459 Derive equality and debug on ProbableFileKind 2022-12-08 10:04:25 +07:00
Wilfred Hughes 554fb18b7c Fix interleaved output when diffing directories
Fixes #437
2022-12-08 09:58:19 +07:00
rhirano0715 436edb2ab4 Only add colour to the first hunk header
This helps skimming the results when multiple files are changed with
multiple hunks. It makes the file changing more prominent than just
going from e.g. 5/5 to 1/10.

Fixes #400

Acked-by: Wilfred Hughes <me@wilfred.me.uk>
2022-12-01 09:38:36 +07:00
Wilfred Hughes 2e7c90c472 Ensure line wrapping uses the same length on both sides
Closes #421
2022-11-13 00:35:06 +07:00
Wilfred Hughes 923989d1a8 clippy fixes 2022-11-03 22:18:56 +07:00
Wilfred Hughes 7f7b35441b Ensure that inline display without color has newlines
This was broken in 3147eb8e6a when
newline splitting was made consistent, and
2071517621 only fixed the inline case
when color was enabled.

Fixed #383
2022-10-28 23:42:52 +07:00
Wilfred Hughes 28c3b0ef5d Tweak line number styling to make it more distinct from content
Dim line numbers for unchanged lines, and make changed lines bold (in
addition to the existing red/green colours).

Closes #384
2022-10-28 20:34:36 +07:00
Wilfred Hughes 2a3346e338 Use apply_line_number_color consistently on LHS and RHS
Previously we missed a case on the LHS.
2022-10-28 20:18:17 +07:00
Wilfred Hughes 4d8d2a2f9d Fix a clippy lint 2022-10-28 19:39:44 +07:00
Wilfred Hughes 1d6c7923e3 Replace remaining is_lhs booleans with Side arguments 2022-10-28 19:27:46 +07:00
Wilfred Hughes 7ea4b96a41 Prefer Side over booleans in line number styling 2022-10-28 19:14:06 +07:00
Wilfred Hughes 490787fe28 Factor out line number styling 2022-10-28 19:07:51 +07:00
Wilfred Hughes b9d44ae65f Treat error nodes as atoms
Fixes #408
2022-10-15 22:50:08 +07:00
Wilfred Hughes 02f1cca444 Display stdin CLI arguments as "(stdin)"
This improves display for #389, and makes language detection use
pattern matching on FileArgument rather than comparing literal
strings.
2022-10-14 13:46:30 +07:00
Wilfred Hughes b4ff28c75e Fix side-by-side line length when colour is disabled
Fixes #406

Looks like this was inadvertently broken in #301.
2022-10-14 13:15:17 +07:00
Wilfred Hughes 2a6eb7e4f8 Add Debug on SourceDimensions 2022-10-14 11:52:53 +07:00
Wilfred Hughes b6ddd152d0 Add the ability to configure how many lines of context are shown
See #242
2022-10-13 12:34:52 +07:00
QuarticCat cd5ba54752 Reduce number of branches of Vertex::eq 2022-10-06 22:33:47 +07:00
QuarticCat 887dec7645 Remove field can_pop_either from Vertex 2022-10-06 22:31:48 +07:00
QuarticCat 7a8044696e Simplify push_{lhs,rhs}_delimiter 2022-10-06 22:31:38 +07:00