Commit Graph

15734 Commits (claude/reduce-binary-size-01BSCVzUBqZD4ZBiji5q5kh7)
 

Author SHA1 Message Date
Claude 7ee432f351
Document root causes of parser test failures
Analyzed why 9 parsers failed during size testing. Failures were NOT
bugs but rather fundamental cross-dependencies between parsers:

Cross-parser dependencies (highlight queries):
- C++ depends on C's HIGHLIGHT_QUERY
- TypeScript depends on JavaScript's HIGHLIGHT_QUERY
- QML depends on both JavaScript and TypeScript queries

Sub-language dependencies:
- HTML embeds JavaScript (for <script>) and CSS (for <style>)
- Make embeds Bash (for shell commands)

Multi-variant languages:
- OCaml provides both OCaml and OCamlInterface from one crate

This is VALUABLE information for feature flag design - dependent
parsers should be grouped together since they can't be removed
independently anyway.

Recommended feature bundles:
- Web: JavaScript + TypeScript + HTML + CSS (interdependent)
- C/C++: Must stay together
- Build: Bash + Make (linked)
2025-12-05 23:41:06 +07:00
Claude ce91512285
Complete comprehensive tree-sitter parser size analysis
Tested 43 of 52 parsers (82.7% coverage) to identify binary size
contributors. Replaced initial 7-parser analysis with full results.

MAJOR FINDING: Verilog parser alone accounts for 17.33 MB (15.5%)!

Top 10 largest parsers (56.97 MB total, 51% of binary):
1. Verilog: 17.33 MB - EXTREME outlier, 3x larger than #2
2. C#: 6.06 MB
3. Julia: 5.98 MB
4. ObjC: 5.09 MB
5. F#: 4.90 MB
6. Kotlin: 3.88 MB
7. Haskell: 3.71 MB
8. C++: 3.68 MB
9. Swift: 3.18 MB
10. TypeScript: 3.16 MB

Key insights:
- Top 5 parsers = 39.4 MB (35% of binary)
- All 43 parsers = 74.1 MB (66% of binary)
- Making Verilog optional alone saves 15.5%
- Tiered feature flags could reduce binary to ~40-85 MB

Recommendations:
1. Immediate: Make Verilog optional (17 MB savings)
2. Short-term: Implement tiered feature system
3. Medium-term: Provide pre-built binaries for common configs

Complete data in all_parser_results.csv with detailed analysis
in PARSER_SIZE_ANALYSIS.md including methodology, insights, and
actionable recommendations for binary size optimization.
2025-12-05 00:49:37 +07:00
Claude d84a6caa40
Add tree-sitter parser size analysis
Systematically tested 7 representative parsers to identify which
contribute most to binary size. Key findings:

- C++ parser: 3.7 MB (largest contributor)
- TypeScript parser: 3.1 MB (second largest)
- PHP parser: 1.2 MB
- Top 3 parsers account for ~8 MB (~7% of 112 MB binary)

Other tested parsers (Python, Go, Rust, Java) have minimal impact
(<1 MB each). This suggests a few large parsers dominate the size.

The analysis includes recommendations for implementing optional
parser features using Cargo feature flags to allow users to build
with only needed language support.
2025-12-04 20:49:54 +07:00
Wilfred Hughes cc064349ac Raw string literals should be atoms in Rust 2025-11-22 17:15:28 +07:00
Wilfred Hughes 45373568a4 Add comments to all justfile recipes 2025-11-22 17:13:51 +07:00
Igor Velkov f99af4968a Add metadate to allow fast create .deb package with "cargo deb" command 2025-11-20 00:48:51 +07:00
Wilfred Hughes 5936671d5f Fix grammar 2025-11-16 16:37:06 +07:00
Wilfred Hughes 83188358a3 0.67 is released, 0.66 was skipped 2025-11-16 16:34:23 +07:00
Wilfred Hughes 7d8175dccf Next release will be 0.67, skip 0.66
Release process was triggered too early.
2025-11-16 16:20:30 +07:00
Wilfred Hughes fd33557903 Clarify changelog wording 2025-11-16 16:19:22 +07:00
Antonin Delpeuch 4ea2b23203 Unvendor tree-sitter-elisp 2025-11-16 10:29:54 +07:00
Wilfred Hughes 5b28c34eea Fix incorrect display width calculation
Previously we assumed that line numbers always required 4 characters to
display (3 digits plus a space). This should be calculated properly
and was probably a placeholder value when testing the original
implementation.

Instead, use the actual width of the rendered line numbers.
2025-11-16 01:26:23 +07:00
Wilfred Hughes 6b58a62668 Add another comment 2025-11-16 01:09:13 +07:00
Wilfred Hughes 811e256aa8 Update comment now we have accurate content widths 2025-11-16 01:04:23 +07:00
Antonin Delpeuch 0f2b1022f2 Unvendor tree-sitter-clojure 2025-11-15 14:10:32 +07:00
Wilfred Hughes 0a3f8c2f92 Merge adjacent spans to normalise output in regression tests 2025-11-15 02:40:43 +07:00
Wilfred Hughes 57bcd173a7 Fix a clippy warning on newer rust due to lint ordering 2025-11-15 02:24:47 +07:00
Wilfred Hughes 3943c1401a Don't consider - as a word character
This produced some unfortunate subword diffs when mixing words, numbers and
hyphens.

Fixes #918
2025-11-14 16:42:25 +07:00
Wilfred Hughes c2c562f579 Fix example in doc comment 2025-11-14 16:29:17 +07:00
Wilfred Hughes c2c74fe1f4 Clarify comment 2025-11-14 16:28:54 +07:00
Wilfred Hughes 3e567b001d More YAML file patterns
Fixes #913
2025-11-14 16:14:50 +07:00
Wilfred Hughes 683dbe5a1b Minor markdown fixes 2025-11-10 22:49:29 +07:00
Wilfred Hughes dd18b1d6cd Tweak variable name for MatchedPos values 2025-11-10 22:23:09 +07:00
adamnemecek d615490493
various refactorings (#909)
* ran `cargo clippy --fix -- -Wclippy::use_self`

* refactoring

* refactored counting functions to use `std::iter::successors`
2025-10-26 16:49:08 +07:00
Wilfred Hughes 5df84a4e92 Clarify language name 2025-10-26 16:55:28 +07:00
Wilfred Hughes 020dd7d8dd Define a separate type for content IDs 2025-10-25 01:13:48 +07:00
Wilfred Hughes 85c56e7a44 Update changelog for a96ed2de9 2025-10-23 20:00:18 +07:00
Wilfred Hughes 77ade2e765 Fix file name confusing compare script because it has multiple _1 2025-10-23 10:03:36 +07:00
Wilfred Hughes 3c2d70b80d Update expected output for hyphen subword fix 2025-10-23 10:01:58 +07:00
Wilfred Hughes ec230ffa10 Sort languages for easy skimming 2025-10-23 09:53:04 +07:00
Wilfred Hughes 84e9a9e673 Fix word splitting with hyphens
Fixes #908
2025-10-23 09:50:06 +07:00
Wilfred Hughes 648fe733ba Improve bash atoms
Fixes #903
2025-10-23 09:37:53 +07:00
Wilfred Hughes 117274ad6c Word wrap 2025-10-23 09:37:14 +07:00
Wilfred Hughes bd864b1468 Cross-compile Intel mac builds 2025-10-22 21:43:37 +07:00
Wilfred Hughes 9acb7e9a3a Add CI configuration for typos 2025-10-22 21:39:43 +07:00
Lee Dogeon 042009899a Fix typo in installation documentation 2025-10-22 19:04:07 +07:00
Wilfred Hughes ee51b89708 Heading tweak 2025-10-22 01:09:19 +07:00
Wilfred Hughes 415e3f87f7 Wording polish 2025-10-22 01:06:34 +07:00
Wilfred Hughes 6f47e787ab Format errors more consistently 2025-10-22 01:03:01 +07:00
Wilfred Hughes a96ed2de96 Improve binary change descriptions 2025-10-22 00:49:42 +07:00
Wilfred Hughes 711d399758 Remove old broken symlink 2025-10-21 18:48:22 +07:00
Wilfred Hughes 4ee35456d4 Prefer screenshot over screencast for git integration 2025-10-21 18:46:36 +07:00
Wilfred Hughes bee925adc5 Add more, but individually simpler, examples 2025-10-20 01:08:27 +07:00
Wilfred Hughes 86e31458fc Use colour to make errors more obvious 2025-10-20 00:47:53 +07:00
Wilfred Hughes ca95aaaa67 Minor wording 2025-10-20 00:39:19 +07:00
Wilfred Hughes a22b32d82d Clarify doc comment 2025-10-20 00:38:48 +07:00
Antonin Delpeuch ef5cd765ef Unvendor tree-sitter-qmljs 2025-10-18 09:57:38 +07:00
Wilfred Hughes 160e184933 Revert "Autodetect dark/light terminals"
This reverts commit bf335094b8.

Doing `git dlog -p` and waiting shows a bunch of

ESCESCESC

in the terminal, so I don't think terminal-colorsaurus is
quite ready for difftastic yet.
2025-10-17 00:27:44 +07:00
Antonin Delpeuch 2a65dd7e02
Migrate to tree-sitter-sequel (#905)
For #891.
2025-10-15 00:52:40 +07:00
Wilfred Hughes aca32ba1ac Add doc comments 2025-10-14 00:26:29 +07:00