manual: Update instructions to add a parser (#902)

* manual: Update instructions to add a parser

This changes the manual so that it doesn't encourage people to vendor parsers if they are available on crates.io.

For #891.

* Fix language inconsistency
pull/905/head
Antonin Delpeuch 2025-10-14 01:03:19 +07:00 committed by GitHub
parent b0e331eb2f
commit b666424bbd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 59 additions and 40 deletions

@ -10,60 +10,44 @@ parsers](https://tree-sitter.github.io/tree-sitter/#available-parsers).
## Add the source code ## Add the source code
Once you've found a parser, add it as a git subtree to Ideally, the parser should be available as a Rust crate on crates.io.
`vendored_parsers/`. We'll use If that's the case, add it to `Cargo.toml` in the alphabetically sorted list
[tree-sitter-json](https://github.com/tree-sitter/tree-sitter-json) as of parser dependencies. For instance:
an example.
``` ```
$ git subtree add --prefix=vendored_parsers/tree-sitter-json https://github.com/tree-sitter/tree-sitter-json.git master tree-sitter-json = "0.24.8"
``` ```
Otherwise, it is possible to [vendor the parser in difftastic's source code](./parser_vendoring.md),
## Configure the build but this should only be used as a last resort.
Cargo does not allow packages to include subdirectories that contain a
`Cargo.toml`. Add a symlink to the `src/` parser subdirectory.
```
$ cd vendored_parsers
$ ln -s tree-sitter-json/src tree-sitter-json-src
```
You can now add the parser to build by including the directory in
`build.rs`.
```
TreeSitterParser {
name: "tree-sitter-json",
src_dir: "vendored_parsers/tree-sitter-json-src",
extra_files: vec![],
},
```
If your parser includes custom C or C++ files for lexing (e.g. a
`scanner.cc`), add them to `extra_files`.
## Configure parsing ## Configure parsing
Add an entry to `tree_sitter_parser.rs` for your language. Add an entry to `tree_sitter_parser.rs` for your language.
``` ```rust
Json => { Json => {
let language = unsafe { tree_sitter_json() }; let language_fn = tree_sitter_json::LANGUAGE;
let language = tree_sitter::Language::new(language_fn);
TreeSitterConfig { TreeSitterConfig {
language, language,
atom_nodes: vec!["string"].into_iter().collect(), atom_nodes: vec!["string"].into_iter().collect(),
delimiter_tokens: vec![("{", "}"), ("[", "]")], delimiter_tokens: vec![("{", "}"), ("[", "]")],
highlight_query: ts::Query::new( highlight_query: ts::Query::new(language, tree_sitter_json::HIGHLIGHTS_QUERY)
language,
include_str!("../../vendored_parsers/highlights/json.scm"),
)
.unwrap(), .unwrap(),
sub_languages: vec![], sub_languages: vec![],
} }
} }
``` ```
If the Rust crate does not include a `HIGHLIGHTS_QUERY`, then you need to include
it from a file instead, with
```
include_str!("../../vendored_parsers/highlights/json.scm")
```
Many parser repositories include a highlights query in the repository without
exposing it in the Rust crate. In that case you can include it as
`vendored_parsers/highlights/json.scm` in the repository.
`atom_nodes` is a list of tree-sitter node names that should be `atom_nodes` is a list of tree-sitter node names that should be
treated as atoms even though the nodes have children. This is common treated as atoms even though the nodes have children. This is common
for things like string literals or interpolated strings, where the for things like string literals or interpolated strings, where the

@ -2,9 +2,44 @@
## Git Subtrees ## Git Subtrees
Tree-sitter parsers are sometimes packaged on npm, sometimes packaged Tree-sitter parsers are sometimes not packaged on crates.io. In that case, Difftastic uses
on crates.io, and have different release frequencies. Difftastic uses git subtrees (not git submodules) to track them.
git subtrees (not git submodules) to track parsers.
## Vendoring a parser
Once you've found the source repository for the parser, add it as a git subtree to
`vendored_parsers/`. We'll use
[tree-sitter-json](https://github.com/tree-sitter/tree-sitter-json) as
an example.
```
$ git subtree add --prefix=vendored_parsers/tree-sitter-json https://github.com/tree-sitter/tree-sitter-json.git master
```
### Configure the build
Cargo does not allow packages to include subdirectories that contain a
`Cargo.toml`. Add a symlink to the `src/` parser subdirectory.
```
$ cd vendored_parsers
$ ln -s tree-sitter-json/src tree-sitter-json-src
```
You can now add the parser to build by including the directory in
`build.rs`.
```
TreeSitterParser {
name: "tree-sitter-json",
src_dir: "vendored_parsers/tree-sitter-json-src",
extra_files: vec![],
},
```
If your parser includes custom C or C++ files for lexing (e.g. a
`scanner.cc`), add them to `extra_files`.
## Updating a parser ## Updating a parser