manual: Update instructions to add a parser (#902)

* manual: Update instructions to add a parser

This changes the manual so that it doesn't encourage people to vendor parsers if they are available on crates.io.

For #891.

* Fix language inconsistency
pull/905/head
Antonin Delpeuch 2025-10-14 01:03:19 +07:00 committed by GitHub
parent b0e331eb2f
commit b666424bbd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 59 additions and 40 deletions

@ -10,60 +10,44 @@ parsers](https://tree-sitter.github.io/tree-sitter/#available-parsers).
## Add the source code
Once you've found a parser, add it as a git subtree to
`vendored_parsers/`. We'll use
[tree-sitter-json](https://github.com/tree-sitter/tree-sitter-json) as
an example.
```
$ git subtree add --prefix=vendored_parsers/tree-sitter-json https://github.com/tree-sitter/tree-sitter-json.git master
```
## Configure the build
Cargo does not allow packages to include subdirectories that contain a
`Cargo.toml`. Add a symlink to the `src/` parser subdirectory.
```
$ cd vendored_parsers
$ ln -s tree-sitter-json/src tree-sitter-json-src
```
You can now add the parser to build by including the directory in
`build.rs`.
Ideally, the parser should be available as a Rust crate on crates.io.
If that's the case, add it to `Cargo.toml` in the alphabetically sorted list
of parser dependencies. For instance:
```
TreeSitterParser {
name: "tree-sitter-json",
src_dir: "vendored_parsers/tree-sitter-json-src",
extra_files: vec![],
},
tree-sitter-json = "0.24.8"
```
If your parser includes custom C or C++ files for lexing (e.g. a
`scanner.cc`), add them to `extra_files`.
Otherwise, it is possible to [vendor the parser in difftastic's source code](./parser_vendoring.md),
but this should only be used as a last resort.
## Configure parsing
Add an entry to `tree_sitter_parser.rs` for your language.
```
```rust
Json => {
let language = unsafe { tree_sitter_json() };
let language_fn = tree_sitter_json::LANGUAGE;
let language = tree_sitter::Language::new(language_fn);
TreeSitterConfig {
language,
atom_nodes: vec!["string"].into_iter().collect(),
delimiter_tokens: vec![("{", "}"), ("[", "]")],
highlight_query: ts::Query::new(
language,
include_str!("../../vendored_parsers/highlights/json.scm"),
)
.unwrap(),
highlight_query: ts::Query::new(language, tree_sitter_json::HIGHLIGHTS_QUERY)
.unwrap(),
sub_languages: vec![],
}
}
```
If the Rust crate does not include a `HIGHLIGHTS_QUERY`, then you need to include
it from a file instead, with
```
include_str!("../../vendored_parsers/highlights/json.scm")
```
Many parser repositories include a highlights query in the repository without
exposing it in the Rust crate. In that case you can include it as
`vendored_parsers/highlights/json.scm` in the repository.
`atom_nodes` is a list of tree-sitter node names that should be
treated as atoms even though the nodes have children. This is common
for things like string literals or interpolated strings, where the

@ -2,9 +2,44 @@
## Git Subtrees
Tree-sitter parsers are sometimes packaged on npm, sometimes packaged
on crates.io, and have different release frequencies. Difftastic uses
git subtrees (not git submodules) to track parsers.
Tree-sitter parsers are sometimes not packaged on crates.io. In that case, Difftastic uses
git subtrees (not git submodules) to track them.
## Vendoring a parser
Once you've found the source repository for the parser, add it as a git subtree to
`vendored_parsers/`. We'll use
[tree-sitter-json](https://github.com/tree-sitter/tree-sitter-json) as
an example.
```
$ git subtree add --prefix=vendored_parsers/tree-sitter-json https://github.com/tree-sitter/tree-sitter-json.git master
```
### Configure the build
Cargo does not allow packages to include subdirectories that contain a
`Cargo.toml`. Add a symlink to the `src/` parser subdirectory.
```
$ cd vendored_parsers
$ ln -s tree-sitter-json/src tree-sitter-json-src
```
You can now add the parser to build by including the directory in
`build.rs`.
```
TreeSitterParser {
name: "tree-sitter-json",
src_dir: "vendored_parsers/tree-sitter-json-src",
extra_files: vec![],
},
```
If your parser includes custom C or C++ files for lexing (e.g. a
`scanner.cc`), add them to `extra_files`.
## Updating a parser