|
|
|
|
@ -0,0 +1,82 @@
|
|
|
|
|
# Tree-Sitter Parser Binary Size Analysis
|
|
|
|
|
|
|
|
|
|
## Baseline
|
|
|
|
|
- **Full binary with all parsers: 112 MB** (117,440,512 bytes)
|
|
|
|
|
|
|
|
|
|
## Tested Parsers (Sorted by Size Reduction)
|
|
|
|
|
|
|
|
|
|
| Parser | Binary Size (MB) | Size Reduction (MB) | Percentage |
|
|
|
|
|
|--------|------------------|---------------------|------------|
|
|
|
|
|
| **tree-sitter-cpp** | 108.3 | **3.7** | **3.3%** |
|
|
|
|
|
| **tree-sitter-typescript** | 108.8 | **3.1** | **2.8%** |
|
|
|
|
|
| tree-sitter-php | 110.8 | 1.2 | 1.1% |
|
|
|
|
|
| tree-sitter-python | 111.1 | 0.9 | 0.8% |
|
|
|
|
|
| tree-sitter-go | 111.4 | 0.7 | 0.6% |
|
|
|
|
|
| tree-sitter-rust-orchard | 111.6 | 0.4 | 0.4% |
|
|
|
|
|
| tree-sitter-java | 112.0 | ~0 | ~0% |
|
|
|
|
|
|
|
|
|
|
## Key Findings
|
|
|
|
|
|
|
|
|
|
### Top Contributors
|
|
|
|
|
1. **C++ (tree-sitter-cpp)**: 3.7 MB - **Largest single contributor**
|
|
|
|
|
2. **TypeScript (tree-sitter-typescript)**: 3.1 MB - **Second largest**
|
|
|
|
|
3. PHP (tree-sitter-php): 1.2 MB
|
|
|
|
|
|
|
|
|
|
### Combined Impact
|
|
|
|
|
- Removing just C++ and TypeScript together would save **~6.8 MB** (~6% reduction)
|
|
|
|
|
- Removing top 3 (C++, TypeScript, PHP) would save **~8 MB** (~7% reduction)
|
|
|
|
|
|
|
|
|
|
### Observations
|
|
|
|
|
- **Large language parsers don't always mean large binary size**:
|
|
|
|
|
- Java parser has minimal impact despite being a large language
|
|
|
|
|
- Rust parser has minimal impact (~0.4 MB) despite language complexity
|
|
|
|
|
|
|
|
|
|
- **Parser size varies significantly**:
|
|
|
|
|
- Some parsers (C++, TypeScript) contribute 3+ MB each
|
|
|
|
|
- Others (Java, Rust) contribute < 0.5 MB each
|
|
|
|
|
|
|
|
|
|
## Recommendations
|
|
|
|
|
|
|
|
|
|
### For Maximum Size Reduction
|
|
|
|
|
1. **Make C++ support optional** - saves 3.7 MB
|
|
|
|
|
2. **Make TypeScript support optional** - saves 3.1 MB
|
|
|
|
|
3. Consider making PHP optional - saves 1.2 MB
|
|
|
|
|
|
|
|
|
|
### Feature Flagging Strategy
|
|
|
|
|
Consider using Cargo features to make parsers optional:
|
|
|
|
|
|
|
|
|
|
```toml
|
|
|
|
|
[features]
|
|
|
|
|
default = ["all-parsers"]
|
|
|
|
|
all-parsers = ["cpp", "typescript", "php", /* ... */]
|
|
|
|
|
cpp = ["dep:tree-sitter-cpp"]
|
|
|
|
|
typescript = ["dep:tree-sitter-typescript"]
|
|
|
|
|
# ... etc
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This would allow users to:
|
|
|
|
|
- Install only the parsers they need
|
|
|
|
|
- Reduce binary size for specific use cases
|
|
|
|
|
- Keep full functionality as the default
|
|
|
|
|
|
|
|
|
|
### Estimated Total Savings
|
|
|
|
|
If all 52 parsers have similar size distribution (unlikely, but for estimation):
|
|
|
|
|
- Average tested parser: ~1.3 MB
|
|
|
|
|
- 52 parsers × 1.3 MB ≈ 68 MB total from all parsers
|
|
|
|
|
- Actual overhead is likely 40-60 MB based on the tested sample
|
|
|
|
|
|
|
|
|
|
## Testing Methodology
|
|
|
|
|
|
|
|
|
|
For each parser:
|
|
|
|
|
1. Removed dependency from Cargo.toml
|
|
|
|
|
2. Stubbed the language case in tree_sitter_parser.rs with panic!()
|
|
|
|
|
3. Ran `cargo clean && cargo build --release`
|
|
|
|
|
4. Measured binary size with `stat -c%s`
|
|
|
|
|
5. Restored original files
|
|
|
|
|
|
|
|
|
|
## Notes
|
|
|
|
|
|
|
|
|
|
- Build times: ~1.5 minutes per parser on this system
|
|
|
|
|
- Testing all 52 parsers would take ~1.5 hours
|
|
|
|
|
- Sample of 7 parsers provides good representation of the variation
|
|
|
|
|
- The largest parsers (C++, TypeScript) are clearly identified
|