Merge commit 'be514eec2c86d560c18fab146e9298e21b8eab62'

pull/856/head
Wilfred Hughes 2025-07-02 23:16:05 +07:00
commit 092817a046
52 changed files with 15685 additions and 17074 deletions

@ -2,13 +2,123 @@
Bits may be missing and/or inaccurate :) Bits may be missing and/or inaccurate :)
### Upcoming? ### Future?
* Update tree-sitter and friends to 0.19.5 or 0.20.x * Handle zero bytes?
* Add formatting docs and utilities * Decide about inline use (e.g. add some \_bare\_\* constructs? stop using?)
* Revise and enhance package.json scripts ([#41](https://github.com/sogaiu/tree-sitter-clojure/issues/41))
* Revise and update docs
* Add some \_bare\_\* constructs to inline ### v0.0.13 - 2024-05-15
* Features and Fixes
* Increase API number from 13 to 14
([#60](https://github.com/sogaiu/tree-sitter-clojure/issues/60))
* Remove Node and Rust Bindings
([#61](https://github.com/sogaiu/tree-sitter-clojure/issues/61))
* Update version info in package.json
* Docs
* What and why doc - update bindings info
### v0.0.12 - 2023-05-07
* Features and Fixes
* Loosen sym_val_lit definition
([#51](https://github.com/sogaiu/tree-sitter-clojure/issues/51))
* Handle metadata that is an evaling_lit
([#35](https://github.com/sogaiu/tree-sitter-clojure/issues/35),
[#46](https://github.com/sogaiu/tree-sitter-clojure/issues/46),
[#50](https://github.com/sogaiu/tree-sitter-clojure/issues/50))
* Handle construct used for ClojureDart's parameterized types
([#35](https://github.com/sogaiu/tree-sitter-clojure/issues/35),
[#44](https://github.com/sogaiu/tree-sitter-clojure/pull/44),
[#46](https://github.com/sogaiu/tree-sitter-clojure/issues/46))
* Generate parser.c and friends with tree-sitter 0.20.7 (ABI 13)
([#26](https://github.com/sogaiu/tree-sitter-clojure/pull/26),
[#34](https://github.com/sogaiu/tree-sitter-clojure/issues/34),
[#45](https://github.com/sogaiu/tree-sitter-clojure/issues/45))
* Docs
* README
* Add section on "what and why"
([#38](https://github.com/sogaiu/tree-sitter-clojure/issues/38))
* Add section pointing to other docs
([#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* Move resources list to own document
([#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* Remove npm-related descriptions
([#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* Use doc - mostly new users added
([#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* Scope doc - corrections and refinements
([#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* Testing doc - link and format updates
([#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* What and why doc - added
([#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* Limits doc - added
* notes.txt - removed
* Developer-related
* Improve maintainability of grammar.js
([#39](https://github.com/sogaiu/tree-sitter-clojure/issues/39),
[#40](https://github.com/sogaiu/tree-sitter-clojure/issues/40))
* Remove dependence on npm
([#36](https://github.com/sogaiu/tree-sitter-clojure/issues/36),
[#37](https://github.com/sogaiu/tree-sitter-clojure/issues/37),
[#38](https://github.com/sogaiu/tree-sitter-clojure/issues/38),
[#45](https://github.com/sogaiu/tree-sitter-clojure/issues/45))
* Cleanup package.json
([#34](https://github.com/sogaiu/tree-sitter-clojure/issues/34),
[#36](https://github.com/sogaiu/tree-sitter-clojure/issues/36),
[#37](https://github.com/sogaiu/tree-sitter-clojure/issues/37),
[#38](https://github.com/sogaiu/tree-sitter-clojure/issues/38),
[#45](https://github.com/sogaiu/tree-sitter-clojure/issues/45))
* Move corpus to test/corpus
* Most developer-bits moved to separate repository
([#35](https://github.com/sogaiu/tree-sitter-clojure/issues/35),
[#36](https://github.com/sogaiu/tree-sitter-clojure/issues/36),
[#39](https://github.com/sogaiu/tree-sitter-clojure/issues/39),
[#42](https://github.com/sogaiu/tree-sitter-clojure/issues/42),
[#43](https://github.com/sogaiu/tree-sitter-clojure/issues/43),
[#45](https://github.com/sogaiu/tree-sitter-clojure/issues/45),
[#46](https://github.com/sogaiu/tree-sitter-clojure/issues/46),
[#47](https://github.com/sogaiu/tree-sitter-clojure/issues/47))
* Credits
* borkdude
([#51](https://github.com/sogaiu/tree-sitter-clojure/issues/51))
* cgrand
([#44](https://github.com/sogaiu/tree-sitter-clojure/pull/44))
* dannyfreeman
([#26](https://github.com/sogaiu/tree-sitter-clojure/pull/26),
[#35](https://github.com/sogaiu/tree-sitter-clojure/issues/35),
[#37](https://github.com/sogaiu/tree-sitter-clojure/issues/37),
[#38](https://github.com/sogaiu/tree-sitter-clojure/issues/38),
[#39](https://github.com/sogaiu/tree-sitter-clojure/issues/39),
[#40](https://github.com/sogaiu/tree-sitter-clojure/issues/40),
[#41](https://github.com/sogaiu/tree-sitter-clojure/issues/41),
[#42](https://github.com/sogaiu/tree-sitter-clojure/issues/42),
[#43](https://github.com/sogaiu/tree-sitter-clojure/issues/43),
[#46](https://github.com/sogaiu/tree-sitter-clojure/issues/46),
[#48](https://github.com/sogaiu/tree-sitter-clojure/pull/48),
[#49](https://github.com/sogaiu/tree-sitter-clojure/issues/49),
[#51](https://github.com/sogaiu/tree-sitter-clojure/issues/51))
* dmiller
([#42](https://github.com/sogaiu/tree-sitter-clojure/issues/42))
* IGJoshua
([#35](https://github.com/sogaiu/tree-sitter-clojure/issues/35))
* NoahTheDuke
([#26](https://github.com/sogaiu/tree-sitter-clojure/pull/26),
[#35](https://github.com/sogaiu/tree-sitter-clojure/issues/35),
[#37](https://github.com/sogaiu/tree-sitter-clojure/issues/37),
[#38](https://github.com/sogaiu/tree-sitter-clojure/issues/38),
[#39](https://github.com/sogaiu/tree-sitter-clojure/issues/39),
[#40](https://github.com/sogaiu/tree-sitter-clojure/issues/40),
[#42](https://github.com/sogaiu/tree-sitter-clojure/issues/42))
* phronmophobic
([#35](https://github.com/sogaiu/tree-sitter-clojure/issues/35))
### v0.0.11 - 2023-01-22
* Update version info in package.json
### v0.0.10 - 2023-01-06 ### v0.0.10 - 2023-01-06

@ -1,25 +0,0 @@
[package]
name = "tree-sitter-clojure"
description = "clojure grammar for the tree-sitter parsing library"
version = "0.0.9"
keywords = ["incremental", "parsing", "clojure"]
categories = ["parsing", "text-editors"]
repository = "https://github.com/sogaiu/tree-sitter-clojure"
edition = "2018"
build = "bindings/rust/build.rs"
include = [
"bindings/rust/*",
"grammar.js",
"queries/*",
"src/*",
]
[lib]
path = "bindings/rust/lib.rs"
[dependencies]
tree-sitter = "0.19.3"
[build-dependencies]
cc = "1.0"

@ -1,156 +1,46 @@
# tree-sitter-clojure # tree-sitter-clojure
## Notice A tree-sitter grammar for Clojure and ClojureScript
Although no major changes are anticipated at this point, there are no ## What the Repository Provides
guarantees. To get a heads-up before such changes occur, please
consider subscribing to the [Potential Changes Announcements
issue](https://github.com/sogaiu/tree-sitter-clojure/issues/33) to be
notified beforehand. The hope is that by commnuicating early enough
about these sorts of things, unnecessary breakage can be avoided
and/or mitigated.
## Status
tree-sitter-clojure has been:
* [Tested in various ways](doc/testing.md)
* [Used in some ways](doc/use.md)
* [Scoped for better behavior](doc/scope.md)
* [Brought about through cooperation](doc/credits.md)
## Prerequisites
Unfortunately, the short of it is that it may be a bit complicated depending on what you want to do.
* If you don't use any of the wasm-related functionality (e.g. previewing parse results in your web browser or you want to build a `.wasm` file for use in a plugin or extension), you probably just need:
* an appropriate version of node (I've tested with various versions >= 12, 14) and
* other typical development-related bits (e.g. git, appropriate c compiler, etc.)
* If you want wasm-related functionality, you get to have fun figuring out which version of [emsdk](https://emscripten.org/docs/getting_started/downloads.html#installation-instructions) currently works with tree-sitter. At the time of this writing, [this file](https://github.com/tree-sitter/tree-sitter/blob/master/cli/emscripten-version) indicates a version that might be appropriate. That may depend on precisely what the versions of other bits (e.g. tree-sitter-cli, web-tree-sitter, etc.) might be though, so if something doesn't work right away, you might consider trying [different versions that have been recorded](https://github.com/tree-sitter/tree-sitter/commits/master/emscripten-version).
Note that there may be an upside to using emsdk though -- it may figure out and arrange for an appropriate version of node, making a separate installation of node unnecessary. I don't use such a setup on a day-to-day basis, but it did work for me at least once.
## Fine Print
* The instructions below assume emsdk has been installed, but `emcc` (tool that can be used to compile to wasm) is not necessarily on one's `PATH`. If an appropriate `emcc` is on one's `PATH` (e.g. emscripten installed via homebrew), the emsdk steps (e.g. `source ~/src/emsdk/emsdk_env.sh`) below may be ignored.
* `node-gyp` (tool for compiling native addon modules for Node.js) may fail on machines upgraded to macos Catalina. [This document](https://github.com/nodejs/node-gyp/blob/master/macOS_Catalina.md) may help cope with such a situation.
## Initial Setup
Suppose typical development sources are stored under `~/src`.
### Short Version
```
# clone repository
cd ~/src
git clone https://github.com/sogaiu/tree-sitter-clojure
cd tree-sitter-clojure
# install tree-sitter-cli and dependencies, then build
npm ci
```
### Long Version
```
# clone repository
cd ~/src
git clone https://github.com/sogaiu/tree-sitter-clojure
cd tree-sitter-clojure
# ensure tree-sitter-cli is avaliable as a dev dependency
npm install --save-dev --save-exact tree-sitter-cli
# create `src` and populate with tree-sitter `.c` goodness This repository provides some files used to create various artifacts
npx tree-sitter generate (e.g. dynamic libraries) used for handling Clojure and ClojureScript
source code via tree-sitter.
# populate `node_modules` with dependencies Please see the [what and why document](doc/what-and-why.md) for
npm install detailed information.
# create `build` and populate appropriately ## Potential Changes Announcements
npx node-gyp configure
# create `build/Release` and build `tree_sitter_clojure_binding.node` Changes may occur because:
npx node-gyp rebuild
```
## Grammar Development 1. There may be unanticipated important use cases we may want to
account for
2. The grammar depends on tree-sitter which remains in flux (and is
still pre 1.0)
3. It's possible we missed something or got something wrong about
Clojure and we might want to remedy that
Hack on grammar. To get a heads-up before such changes occur, please consider
subscribing to the [Potential Changes Announcements
``` issue](https://github.com/sogaiu/tree-sitter-clojure/issues/33) to be
# edit grammar.js using some editor notified beforehand.
# rebuild tree-sitter stuff
npx tree-sitter generate && \
npx node-gyp rebuild
```
Parse individual files.
```
# create and populate sample code file for parsing named `sample.clj`
# parse sample file
npx tree-sitter parse sample.clj
# if output has errors, figure out what's wrong
```
Interactively test in the browser (requires emsdk).
```
# prepare emsdk (specifically emcc) for building .wasm
source ~/src/emsdk/emsdk_env.sh
# build .wasm bits and invoke web-ui for interactive testing
npx tree-sitter build-wasm && \
npx tree-sitter web-ui
# in appropriate browser window, paste code in left pane
# examine results in right pane -- can even click on nodes
# if output has errors, figure out what's wrong
```
## Measure Performance
```
# single measurement
npx tree-sitter parse --time sample.clj
# mutliple measurements with `multitime`
multitime -n10 -s1 npx tree-sitter parse --time --quiet sample.clj
```
## Build .wasm
Assuming emsdk is installed appropriately under `~/src/emsdk`.
``` Note that previously tagged versions may work fine depending on the
# prepare emsdk (specifically emcc) for use use case. See the [changelog](CHANGELOG.md) for details.
source ~/src/emsdk/emsdk_env.sh
# create `tree-sitter-clojure.wasm` ## Other Documents
npx tree-sitter build-wasm
```
## Resources There are some documents in the [`doc` directory](doc/) covering
topics such as:
* [Guide to your first Tree-sitter grammar](https://gist.github.com/Aerijo/df27228d70c633e088b0591b8857eeef) * [Scope](doc/scope.md)
* [sublime-clojure](https://github.com/tonsky/sublime-clojure) * [Limits](doc/limits.md)
* [syntax-highlighter](https://github.com/EvgeniyPeshkov/syntax-highlighter) * [Testing](doc/testing.md)
* [tree-sitter](http://tree-sitter.github.io/tree-sitter/) * [Uses](doc/use.md)
* [tree-sitter-clojure.oakmac](https://github.com/oakmac/tree-sitter-clojure)
* [tree-sitter-clojure.SergeevPavel](https://github.com/SergeevPavel/tree-sitter-clojure)
* [tree-sitter-clojure.Tavistock](https://github.com/Tavistock/tree-sitter-clojure)
* [vscode-tree-sitter](https://github.com/georgewfraser/vscode-tree-sitter)
* [web-tree-sitter API](https://github.com/tree-sitter/tree-sitter/blob/master/lib/binding_web/tree-sitter-web.d.ts)
## Acknowledgments ## Acknowledgments
Please see the [credits](doc/credits.md). Please see the [credits](doc/credits.md).

@ -1,18 +0,0 @@
{
"targets": [
{
"target_name": "tree_sitter_clojure_binding",
"include_dirs": [
"<!(node -e \"require('nan')\")",
"src"
],
"sources": [
"src/parser.c",
"bindings/node/binding.cc"
],
"cflags_c": [
"-std=c99",
]
}
]
}

@ -1,28 +0,0 @@
#include "tree_sitter/parser.h"
#include <node.h>
#include "nan.h"
using namespace v8;
extern "C" TSLanguage * tree_sitter_clojure();
namespace {
NAN_METHOD(New) {}
void Init(Local<Object> exports, Local<Object> module) {
Local<FunctionTemplate> tpl = Nan::New<FunctionTemplate>(New);
tpl->SetClassName(Nan::New("Language").ToLocalChecked());
tpl->InstanceTemplate()->SetInternalFieldCount(1);
Local<Function> constructor = Nan::GetFunction(tpl).ToLocalChecked();
Local<Object> instance = constructor->NewInstance(Nan::GetCurrentContext()).ToLocalChecked();
Nan::SetInternalFieldPointer(instance, 0, tree_sitter_clojure());
Nan::Set(instance, Nan::New("name").ToLocalChecked(), Nan::New("clojure").ToLocalChecked());
Nan::Set(module, Nan::New("exports").ToLocalChecked(), instance);
}
NODE_MODULE(tree_sitter_clojure_binding, Init)
} // namespace

@ -1,19 +0,0 @@
try {
module.exports = require("../../build/Release/tree_sitter_clojure_binding");
} catch (error1) {
if (error1.code !== 'MODULE_NOT_FOUND') {
throw error1;
}
try {
module.exports = require("../../build/Debug/tree_sitter_clojure_binding");
} catch (error2) {
if (error2.code !== 'MODULE_NOT_FOUND') {
throw error2;
}
throw error1
}
}
try {
module.exports.nodeTypeInfo = require("../../src/node-types.json");
} catch (_) {}

@ -1,40 +0,0 @@
fn main() {
let src_dir = std::path::Path::new("src");
let mut c_config = cc::Build::new();
c_config.include(&src_dir);
c_config
.flag_if_supported("-Wno-unused-parameter")
.flag_if_supported("-Wno-unused-but-set-variable")
.flag_if_supported("-Wno-trigraphs");
let parser_path = src_dir.join("parser.c");
c_config.file(&parser_path);
// If your language uses an external scanner written in C,
// then include this block of code:
/*
let scanner_path = src_dir.join("scanner.c");
c_config.file(&scanner_path);
println!("cargo:rerun-if-changed={}", scanner_path.to_str().unwrap());
*/
println!("cargo:rerun-if-changed={}", parser_path.to_str().unwrap());
c_config.compile("parser");
// If your language uses an external scanner written in C++,
// then include this block of code:
/*
let mut cpp_config = cc::Build::new();
cpp_config.cpp(true);
cpp_config.include(&src_dir);
cpp_config
.flag_if_supported("-Wno-unused-parameter")
.flag_if_supported("-Wno-unused-but-set-variable");
let scanner_path = src_dir.join("scanner.cc");
cpp_config.file(&scanner_path);
println!("cargo:rerun-if-changed={}", scanner_path.to_str().unwrap());
cpp_config.compile("scanner");
*/
}

@ -1,52 +0,0 @@
//! This crate provides clojure language support for the [tree-sitter][] parsing library.
//!
//! Typically, you will use the [language][language func] function to add this language to a
//! tree-sitter [Parser][], and then use the parser to parse some code:
//!
//! ```
//! let code = "";
//! let mut parser = tree_sitter::Parser::new();
//! parser.set_language(tree_sitter_javascript::language()).expect("Error loading clojure grammar");
//! let tree = parser.parse(code, None).unwrap();
//! ```
//!
//! [Language]: https://docs.rs/tree-sitter/*/tree_sitter/struct.Language.html
//! [language func]: fn.language.html
//! [Parser]: https://docs.rs/tree-sitter/*/tree_sitter/struct.Parser.html
//! [tree-sitter]: https://tree-sitter.github.io/
use tree_sitter::Language;
extern "C" {
fn tree_sitter_clojure() -> Language;
}
/// Get the tree-sitter [Language][] for this grammar.
///
/// [Language]: https://docs.rs/tree-sitter/*/tree_sitter/struct.Language.html
pub fn language() -> Language {
unsafe { tree_sitter_clojure() }
}
/// The content of the [`node-types.json`][] file for this grammar.
///
/// [`node-types.json`]: https://tree-sitter.github.io/tree-sitter/using-parsers#static-node-types
pub const NODE_TYPES: &'static str = include_str!("../../src/node-types.json");
// Uncomment these to include any queries that this grammar contains
// pub const HIGHLIGHTS_QUERY: &'static str = include_str!("../../queries/highlights.scm");
// pub const INJECTIONS_QUERY: &'static str = include_str!("../../queries/injections.scm");
// pub const LOCALS_QUERY: &'static str = include_str!("../../queries/locals.scm");
// pub const TAGS_QUERY: &'static str = include_str!("../../queries/tags.scm");
#[cfg(test)]
mod tests {
#[test]
fn test_can_load_grammar() {
let mut parser = tree_sitter::Parser::new();
parser
.set_language(super::language())
.expect("Error loading clojure language");
}
}

@ -7,6 +7,7 @@ Many people were directly and indirectly involved in bringing about tree-sitter-
* alehatsman - nvim-treesitter and related discussion * alehatsman - nvim-treesitter and related discussion
* alexmiller - clojure-related inquiries and docs * alexmiller - clojure-related inquiries and docs
* andrewchambers - discussion * andrewchambers - discussion
* bbatsov - discussions and clojure-ts-mode
* bfredl - neovim and tree-sitter work * bfredl - neovim and tree-sitter work
* borkdude - analyze-reify, babashka, clj-kondo, edamame, and more * borkdude - analyze-reify, babashka, clj-kondo, edamame, and more
* carocad - parcera and discussions * carocad - parcera and discussions
@ -14,6 +15,7 @@ Many people were directly and indirectly involved in bringing about tree-sitter-
* clojars - including everyone who has uploaded there * clojars - including everyone who has uploaded there
* CoenraadS - Bracket-Pair-Colorizer-2 * CoenraadS - Bracket-Pair-Colorizer-2
* dannyfreeman - grammar.js enhancements and fixes, clojure-ts-mode and discussions * dannyfreeman - grammar.js enhancements and fixes, clojure-ts-mode and discussions
* dmiller - ClojureCLR consultation
* EvegeniyPeshkov - syntax-highlighter * EvegeniyPeshkov - syntax-highlighter
* georgewfraser - vscode-tree-sitter * georgewfraser - vscode-tree-sitter
* gfredericks - test.check, generators, and discussions * gfredericks - test.check, generators, and discussions
@ -34,9 +36,11 @@ Many people were directly and indirectly involved in bringing about tree-sitter-
* p00f - nvim-ts-rainbow * p00f - nvim-ts-rainbow
* pedrorgirardi - discussions, vscode and tree-sitter-clojure bits * pedrorgirardi - discussions, vscode and tree-sitter-clojure bits
* PEZ - calva, vscode tips, and general discussion * PEZ - calva, vscode tips, and general discussion
* phronmophobic - dewey, discussion, and repository data
* pyrmont - review, error-spotting, fix, and discussions * pyrmont - review, error-spotting, fix, and discussions
* rewinfrey - helpful bits from tree-sitter-haskell * rewinfrey - helpful bits from tree-sitter-haskell
* richhickey - clojure, etc. * richhickey - clojure, etc.
* rrudakov - discussions and clojure-ts-mode
* Saikyun - discussions * Saikyun - discussions
* seancorfield - clojure-related inquiries * seancorfield - clojure-related inquiries
* SergeevPavel - tree-sitter-clojure.SergeevPavel (fork of tree-sitter-clojure.Tavistock with further work) * SergeevPavel - tree-sitter-clojure.SergeevPavel (fork of tree-sitter-clojure.Tavistock with further work)

@ -0,0 +1,28 @@
# Limits
The following items are known to not necessarily work:
* [Some template
files](https://github.com/sogaiu/tree-sitter-clojure/issues/42#issuecomment-1426727973) -
these are often not strictly speaking Clojure, though they look pretty close
* Other code that is not standard Clojure
[1](https://github.com/fjarri/clojure-scribble#basic-usage)
[2](https://github.com/dgrnbrg/piplin/blob/4c39386413d62ec9c2d679fa4c742313d97f75ef/src/piplin/mips.clj#L12)
because it uses functionality that modifies Clojure's reader behavior
in certain ways [1](https://github.com/jwymanm/chiara#the-syntax)
[2](https://github.com/dgrnbrg/piplin/blob/4c39386413d62ec9c2d679fa4c742313d97f75ef/src/piplin/types/bits.clj#L231-L251)
* Some older Clojure code - for example, `^` used to mean "wrap the
following thing in `(meta ...)`"
[1](https://github.com/clojure/clojure/blob/1.0.x/src/jvm/clojure/lang/LispReader.java#L71)
[2](https://github.com/clojure/clojure/blob/1.0.x/src/clj/clojure/zip.clj#L58)
* [ClojureCLR's pipe syntax for
symbols](https://github.com/sogaiu/tree-sitter-clojure/issues/35#issuecomment-1407320526)
([comment at #42](https://github.com/sogaiu/tree-sitter-clojure/issues/42#issuecomment-1450290140))
* [Files that contain one or more
zero-bytes](https://github.com/sogaiu/tree-sitter-clojure/issues/42#issuecomment-1430546851)
[1](https://github.com/santifa/clj-dbase/blob/a269ca62d529cf82cec7bffce2e38b71458c6087/src/clj_dbase/core.clj#L121)
[2](https://github.com/ont-app/vocabulary/blob/5929b9b1a16b07dc60f1012070da684e8f073326/resources/uri-escapes.edn) -
this might be a tree-sitter limitation
See [#42](https://github.com/sogaiu/tree-sitter-clojure/issues/42) for
more details.

@ -0,0 +1,17 @@
# Resources
Below is a list of resources related to tree-sitter and/or Clojure.
Some may be a bit dated at this point.
* [Guide to your first Tree-sitter
grammar](https://gist.github.com/Aerijo/df27228d70c633e088b0591b8857eeef)
* [sublime-clojure](https://github.com/tonsky/sublime-clojure)
* [syntax-highlighter](https://github.com/EvgeniyPeshkov/syntax-highlighter)
* [tree-sitter](http://tree-sitter.github.io/tree-sitter/)
* [tree-sitter-clojure.oakmac](https://github.com/oakmac/tree-sitter-clojure)
* [tree-sitter-clojure.SergeevPavel](https://github.com/SergeevPavel/tree-sitter-clojure)
* [tree-sitter-clojure.Tavistock](https://github.com/Tavistock/tree-sitter-clojure)
* [vscode-tree-sitter](https://github.com/georgewfraser/vscode-tree-sitter)
* [web-tree-sitter
API](https://github.com/tree-sitter/tree-sitter/blob/master/lib/binding_web/tree-sitter-web.d.ts)

@ -2,53 +2,100 @@
## TLDR ## TLDR
Only "primitives" (e.g. [symbols](https://github.com/sogaiu/tree-sitter-clojure/blob/c00293fb0cd5ce3a7005c0601e9b546c1ea73094/grammar.js#L280-L282), [lists](https://github.com/sogaiu/tree-sitter-clojure/blob/c00293fb0cd5ce3a7005c0601e9b546c1ea73094/grammar.js#L307-L309), etc.) Only "primitives"
are supported, i.e. no higher level constructs like `defn`. (e.g. [symbols](https://github.com/sogaiu/tree-sitter-clojure/blob/c00293fb0cd5ce3a7005c0601e9b546c1ea73094/grammar.js#L280-L282),
[lists](https://github.com/sogaiu/tree-sitter-clojure/blob/c00293fb0cd5ce3a7005c0601e9b546c1ea73094/grammar.js#L307-L309),
etc.) are supported, i.e. no higher level constructs like `defn`.
## The Details ## The Details
### Why ### Why
For some background, Clojure (and other Lisps) have runtime extensible "syntax" via macros, but AFAIU tree-sitter's current design assumes a fixed syntax. For some background, Clojure (and other Lisps) have runtime extensible
"syntax" via macros, but AFAIU tree-sitter's current design assumes a
fixed syntax.
Keeping the above in mind, below are some of the factors that influenced the current stance on scope: Keeping the above in mind, below are some of the factors that
influenced the current stance on scope:
* Clojure has no language specification. This means it's unclear what to try to support in the grammar. For example, `defn` is defined in the `clojure.core` namespace, but then so are a lot of other things. * Clojure has no language specification. This means it's unclear what
* Each additional item added to the grammar increases the chance of a conflict which in turn may adversely impact correct parsing, but also makes the grammar harder to extend and maintain. In some cases this may lead to degraded performance (though it may be premature to be concerned about this point). to try to support in the grammar. For example, `defn` is defined in
the `clojure.core` namespace, but then so are a lot of other things
and `clojure.core` is not a small namespace.
### Alternatives * Each additional item added to the grammar tends to increase the
difficulty of getting the grammar to function correctly (or well
It is possible to [use tree-sitter-clojure as a base](https://github.com/tree-sitter/tree-sitter/issues/645) enough). In the event that an issue is discovered or a much desired
to add additional constructs to a "derived" grammar. For example, such a grammar feature surfaces, the more items there already are in the grammar,
might be specialized to look for "definitions". At least in [emacs-tree-sitter](https://github.com/ubolonton/emacs-tree-sitter), generally, the harder it may be to accomodate / adjust.
[it is technically possibly to have multiple grammars be used on single buffer](https://github.com/ubolonton/emacs-tree-sitter/discussions/129#discussioncomment-502836):
> If you want 2 parse trees in the same buffer instead, you would need to define an advice for tree-sitter--do-parse, as well as additional buffer-local variables for the secondary grammar. * Handling more things might lead to degraded performance. Apart from
possibly that being a negative for end-user use, that might also
lead to more waiting time while testing across large samples of code
(which has been essential because of the lack of a specification).
Apparently it became possible in September of 2020 for [queries to match on any of a node's supertypes](https://github.com/tree-sitter/tree-sitter/pull/738). It may be possible to make a list supertype that is "composed of" `defn` and things that are not `defn`. [tree-sitter-clojure-def](https://github.com/sogaiu/tree-sitter-clojure-def) is an attempt at realizing this apoproach. ### Alternatives
However, depending on one's goals, it might make more sense to consider leveraging It is possible to [use tree-sitter-clojure as a
[clj-kondo's analysis capabilities](https://github.com/clj-kondo/clj-kondo/tree/master/analysis) as clj-kondo already understands Clojure pretty well. IIUC, base](https://github.com/tree-sitter/tree-sitter/issues/645) to add
[clojure-lsp does this](https://github.com/clojure-lsp/clojure-lsp/blob/14724457f0d553795dfe16317d3ee6c5fc97d4ba/deps.edn#L21). additional constructs to a "derived" grammar. For example, such a
grammar might be specialized to look for "definitions". At least in
[emacs-tree-sitter](https://github.com/ubolonton/emacs-tree-sitter),
[it is technically possibly to have multiple grammars be used on
single
buffer](https://github.com/ubolonton/emacs-tree-sitter/discussions/129#discussioncomment-502836):
> If you want 2 parse trees in the same buffer instead, you would need
> to define an advice for tree-sitter--do-parse, as well as additional
> buffer-local variables for the secondary grammar.
Apparently it became possible in September of 2020 for [queries to
match on any of a node's
supertypes](https://github.com/tree-sitter/tree-sitter/pull/738). It
may be possible to make a list supertype that is "composed of" `defn`
and things that are not `defn`.
[tree-sitter-clojure-def](https://github.com/sogaiu/tree-sitter-clojure-def)
is an attempt at realizing this apoproach.
However, depending on one's goals, it might make more sense to
consider leveraging [clj-kondo's analysis
capabilities](https://github.com/clj-kondo/clj-kondo/tree/master/analysis)
as clj-kondo already understands Clojure pretty well. IIUC,
[clojure-lsp does
this](https://github.com/clojure-lsp/clojure-lsp/blob/14724457f0d553795dfe16317d3ee6c5fc97d4ba/deps.edn#L21).
### Miscellaneous Points ### Miscellaneous Points
* Earlier attempts at adding `def` and friends resulted in unacceptably high error rates [1]. The tests were conducted against code from [Clojars](https://clojars.org/) (uncontrived code). FWIW, two of the previous tree-sitter-clojure attempts (by [oakmac](https://github.com/oakmac/tree-sitter-clojure) and * Earlier attempts at adding `def` and friends resulted in
[Tavistock](https://github.com/Tavistock/tree-sitter-clojure)) also had unacceptably high error rates [2] and they both attempted to support higher level constructs. unacceptably high error rates [1]. The tests were conducted against
code from [Clojars](https://clojars.org/) (uncontrived code) [2].
* For use cases like structural editing, it seems important to be able to distinguish between the following sorts of cases: * For use cases like structural editing, it seems important to be able
to distinguish between the following sorts of cases:
* `defn` used for defining a function, and * `defn` used for defining a function, and
* [Using the symbol `defn` within a macro to construct code to define a function](https://github.com/Raynes/conch/blob/685f2c73138f376f2aa0623053dfdaba350a04f4/src/me/raynes/conch.clj#L251-L252) * [Using the symbol `defn` within a macro to construct code to
define a
function](https://github.com/Raynes/conch/blob/685f2c73138f376f2aa0623053dfdaba350a04f4/src/me/raynes/conch.clj#L251-L252)
AFAICT, the approach taken in tree-sitter-clojure-def does not make telling these sorts of things apart possible. AFAICT, the approach taken in tree-sitter-clojure-def does not
make telling these sorts of things apart possible.
* It doesn't seem possible to support all "defining" macros like `defsomething` * It doesn't seem possible to support all "defining" macros like
(e.g. https://github.com/redplanetlabs/specter/blob/efaf35558a2c0068f5b6a8ef1dbbd0912702bdbd/src/clj/com/rpl/specter.cljc#L57-L60) since a user's Clojure code can define these. `defsomething`
(e.g. https://github.com/redplanetlabs/specter/blob/efaf35558a2c0068f5b6a8ef1dbbd0912702bdbd/src/clj/com/rpl/specter.cljc#L57-L60)
since a user's Clojure code can define these.
## Footnotes ## Footnotes
* [1] Author's opinion :) * [1] Author's opinion :)
* [2] Author's opinion again :) * [2] Two of the previous tree-sitter-clojure attempts (by
[oakmac](https://github.com/oakmac/tree-sitter-clojure) and
[Tavistock](https://github.com/Tavistock/tree-sitter-clojure)) also
had unacceptably high error rates. The former of those two grammars
tried to handle higher level constructs and it had a notably higher
error rate. After trying to modify that grammar to address the error
rate unsuccessfully, it seemed like the two points were related. Note
though that this is just a suspicion.
## References ## References

@ -2,161 +2,225 @@
## TLDR ## TLDR
[tree-sitter-clojure](https://github.com/sogaiu/tree-sitter-clojure) has been tested using a variety of methods. [tree-sitter-clojure](https://github.com/sogaiu/tree-sitter-clojure)
has been tested using a variety of methods.
_Note_: Current serious testing is done via the code and instructions
in the [ts-clojure](https://github.com/sogaiu/ts-clojure) repository.
The description below is left for historical purposes.
## The Details ## The Details
This document will touch on some of those methods and why they were attempted: This document will touch on some of those methods and why they were
attempted:
1. Using corpus data from other tree-sitter-clojure attempts 1. Using corpus data from other tree-sitter-clojure attempts
2. Using Clojure source from [Clojars](https://clojars.org/) 2. Using Clojure source from [Clojars](https://clojars.org/)
3. Generative testing via [Hypothesis](https://github.com/HypothesisWorks/hypothesis) 3. Generative testing via
[Hypothesis](https://github.com/HypothesisWorks/hypothesis)
Other employed methods that won't be covered (in much, if any, detail) here: Other employed methods that won't be covered (in much, if any, detail)
here:
1. Sporadic manual invocations 1. Sporadic manual invocations
2. Using [tonsky's sublime-clojure](https://github.com/tonsky/sublime-clojure) test data 2. Using [tonsky's
3. Generative testing via [test.check](https://github.com/clojure/test.check/) sublime-clojure](https://github.com/tonsky/sublime-clojure) test
4. [Manual inspection of the grammar](https://github.com/sogaiu/tree-sitter-clojure/issues/3) data
3. Generative testing via
[test.check](https://github.com/clojure/test.check/)
4. [Manual inspection of the
grammar](https://github.com/sogaiu/tree-sitter-clojure/issues/3)
## Using corpus data from other tree-sitter-clojure attempts ## Using corpus data from other tree-sitter-clojure attempts
There were at least two previous attempts at implementing tree-sitter-clojure, There were at least two previous attempts at implementing
[one by oakmac](https://github.com/oakmac/tree-sitter-clojure) and [another by Tavistock](https://github.com/Tavistock/tree-sitter-clojure). Important things tree-sitter-clojure, [one by
were learned by trying to make these attempts work, but for reasons not covered oakmac](https://github.com/oakmac/tree-sitter-clojure) and [another by
here, a separate attempt was started. Tavistock](https://github.com/Tavistock/tree-sitter-clojure).
Important things were learned by trying to make these attempts work,
Both earlier attempts had [corpus](https://github.com/oakmac/tree-sitter-clojure/tree/master/corpus) [data](https://github.com/Tavistock/tree-sitter-clojure/tree/master/corpus) that could be adapted for testing. Consequently, but for reasons not covered here, a separate attempt was started.
[tsclj-tests-parser](https://gitlab.com/sogaiu/tsclj-tests-parser)
was created to extract [the relevant data as plain files](https://gitlab.com/sogaiu/tsclj-tests-parser/-/tree/master/test-files). These were in turn fed to Both earlier attempts had
tree-sitter's `parse` command using the tree-sitter-clojure grammar to check [corpus](https://github.com/oakmac/tree-sitter-clojure/tree/master/corpus)
for parsing errors. [data](https://github.com/Tavistock/tree-sitter-clojure/tree/master/corpus)
that could be adapted for testing. Consequently,
If changes are made to tree-sitter-clojure's grammar, this method can be used [tsclj-tests-parser](https://github.com/sogaiu/tsclj-tests-parser) was
to quickly check for some forms of undesirable breakage. (This could be taken created to extract [the relevant data as plain
a bit further by adapting the content as corpus data for tree-sitter-clojure.) files](https://github.com/sogaiu/tsclj-tests-parser/-/tree/master/test-files).
These were in turn fed to tree-sitter's `parse` command using the
tree-sitter-clojure grammar to check for parsing errors.
If changes are made to tree-sitter-clojure's grammar, this method can
be used to quickly check for some forms of undesirable breakage.
(This could be taken a bit further by adapting the content as corpus
data for tree-sitter-clojure.)
### But... ### But...
One issue with this approach is that it relies on manually identifying and One issue with this approach is that it relies on manually identifying
spelling out appropriate test cases, which in the case of Clojure, is and spelling out appropriate test cases, which in the case of Clojure,
complicated by the lack of a language specification. is complicated by the lack of a language specification.
Apart from detailed research, this was partially addressed by testing against Apart from detailed research, this was partially addressed by testing
a large sample of Clojure source code written by the community. against a large sample of Clojure source code written by the
community.
## Using Clojure source from Clojars ## Using Clojure source from Clojars
The most fruitful method of testing was working with Clojure source written The most fruitful method of testing was working with Clojure source
by humans for purposes other than for testing tree-sitter-clojure. written by humans for purposes other than for testing
tree-sitter-clojure.
### Where to get samples of Clojure source ### Where to get samples of Clojure source
Initially, repositories were cloned from a variety of locations, but before Initially, repositories were cloned from a variety of locations, but
long a decision was made to switch to using "release" jars from Clojars. before long a decision was made to switch to using "release" jars from
Clojars.
The latter decision was motivated by wanting source that was less likely to
be "broken" in various ways. Compared to "release" jar content from Clojars, The latter decision was motivated by wanting source that was less
the default branch of a repository seemed to have a higher probability of likely to be "broken" in various ways. Compared to "release" jar
"not quite working". Although the Clojars "release" idea was an improvement, content from Clojars, the default branch of a repository seemed to
weeding out inappropriate Clojure source was still necessary. have a higher probability of "not quite working". Although the
Clojars "release" idea was an improvement, weeding out inappropriate
A variety of approaches were used to come up with a specific list of jars from Clojure source was still necessary.
Clojars, but the most recent attempt is [gen-clru-list](https://gitlab.com/sogaiu/gen-clru-list). This is basically a [babashka](https://github.com/babashka/babashka) script that fetches [Clojars' feed.clj](https://github.com/clojars/clojars-web/wiki/Data#useful-extracts-from-the-poms), does some processing, and
writes out a list of urls. For reference, this approach currently yields a number A variety of approaches were used to come up with a specific list of
of urls in the neighborhood of 19,000. jars from Clojars, but the most recent attempt is
[gen-clru-list](https://github.com/sogaiu/gen-clru-list). This is
basically a [babashka](https://github.com/babashka/babashka) script
that fetches [Clojars'
feed.clj](https://github.com/clojars/clojars-web/wiki/Data#useful-extracts-from-the-poms),
does some processing, and writes out a list of urls. For reference,
this approach currently yields a number of urls in the neighborhood of
19,000.
### How to check retrieved Clojure samples ### How to check retrieved Clojure samples
The retrieved content was initially checked using [a-tsclj-checker](https://github.com/sogaiu/a-tsclj-checker) (an adaptation of The retrieved content was initially checked using
[a-tsclj-checker](https://github.com/sogaiu/a-tsclj-checker) (an
adaptation of
[analyze-reify](https://github.com/borkdude/analyze-reify)) which uses [analyze-reify](https://github.com/borkdude/analyze-reify)) which uses
[Rust bindings for tree-sitter](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust) and tree-sitter-clojure to parse Clojure [Rust bindings for
source code. Notably, it can traverse directories and also operate on `.jar` tree-sitter](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust)
files. and tree-sitter-clojure to parse Clojure source code. Notably, it can
traverse directories and also operate on `.jar` files.
Once an error is detected, it is easier to investigate if one has direct
access to the Clojure source file in question (as compared with rummaging Once an error is detected, it is easier to investigate if one has
around `.jar` files). Thus, it was decided to create a single directory tree direct access to the Clojure source file in question (as compared with
containing extracted data from all retrieved jars. On a side note, the rummaging around `.jar` files). Thus, it was decided to create a
single directory tree took less than 2 GB of disk space. single directory tree containing extracted data from all retrieved
jars. On a side note, the single directory tree took less than 2 GB
of disk space.
A less fancy, but easier to maintain (i.e. not written in Rust) tool -- A less fancy, but easier to maintain (i.e. not written in Rust) tool --
[ts-grammar-checker](https://gitlab.com/sogaiu/ts-grammar-checker) -- was [ts-grammar-checker](https://github.com/sogaiu/ts-grammar-checker) -- was
developed as an alternative to `a-tsclj-checker`. Strictly speaking, developed as an alternative to `a-tsclj-checker`. Strictly speaking,
`ts-grammar-checker` may not be necessary as one can probably employ `ts-grammar-checker` may not be necessary as one can probably employ
tree-sitter's `parse` command in combination with `find`, `xargs` and the like tree-sitter's `parse` command in combination with `find`, `xargs` and the like
if on some kind of \*nix. An example of a comparable invocation is: if on some kind of \*nix. An example of a comparable invocation is:
``` ```
find ~/src/clojars-cljish -type f -regex '.*\.clj[cs]?$' -print0 | xargs -0 npx tree-sitter parse --quiet > my-results.txt find ~/src/clojars-cljish -type f -regex '.*\.clj[cs]?$' -print0 | xargs -0 tree-sitter parse --quiet > my-results.txt
``` ```
`a-tsclj-checker` is the fastest tool but it has not been updated to the most `a-tsclj-checker` is the fastest tool but it has not been updated to
recent version of tree-sitter-clojure. `ts-grammar-checker` is not quite as the most recent version of tree-sitter-clojure. `ts-grammar-checker`
fast, but it can be easily adapted to work with other tree-sitter grammars (e.g. is not quite as fast, but it can be easily adapted to work with other
it's [used](https://gitlab.com/sogaiu/ts-grammar-checker/-/blob/master/janet-checker.janet) for [tree-sitter-janet-simple](https://github.com/sogaiu/tree-sitter-janet-simple) as well). However, it does not support accessing content tree-sitter grammars (e.g. it's
within `.jar` files. [used](https://github.com/sogaiu/ts-grammar-checker/-/blob/master/janet-checker.janet)
for
Across somewhat less than 150,000 files (.clj, .cljc, .cljs), `a-tsclj-checker` [tree-sitter-janet-simple](https://github.com/sogaiu/tree-sitter-janet-simple)
typically takes a little less than 30 seconds, while `ts-grammar-checker` as well). However, it does not support accessing content within
typically takes a bit more than 100 seconds (at least on the author's machine). `.jar` files.
In subjective terms, it hasn't felt terribly different because knowing there
is at least a 30 second wait, [one typically doesn't sit waiting at a prompt Across somewhat less than 150,000 files (.clj, .cljc, .cljs),
for execution completion](https://xkcd.com/303/). `a-tsclj-checker` typically takes a little less than 30 seconds, while
`ts-grammar-checker` typically takes a bit more than 100 seconds (at
least on the author's machine). In subjective terms, it hasn't felt
terribly different because knowing there is at least a 30 second wait,
[one typically doesn't sit waiting at a prompt for execution
completion](https://xkcd.com/303/).
For any files that parse with errors, it can be handy to apply For any files that parse with errors, it can be handy to apply
[clj-kondo](https://github.com/clj-kondo/clj-kondo). The specific details that [clj-kondo](https://github.com/clj-kondo/clj-kondo). The specific
`clj-kondo` reported were often helpful when examining individual files, but details that `clj-kondo` reported were often helpful when examining
that diagnostic information also provided a way to partition the files into individual files, but that diagnostic information also provided a way
groups. Subjectively it can feel more manageable to deal with 5 groups of files to partition the files into groups. Subjectively it can feel more
compared with 100 separate files (though it's true that the grouping does manageable to deal with 5 groups of files compared with 100 separate
not always turn out to be that meaningful). files (though it's true that the grouping does not always turn out to
be that meaningful).
An individual "suspect" file is typically viewed manually in an editor (usually
one that has `clj-kondo` support enabled) and examined for "issues". An individual "suspect" file is typically viewed manually in an editor
(usually one that has `clj-kondo` support enabled) and examined for
In practice, testing the grammar against appropriate Clojure source from Clojars "issues".
has been the most useful in finding issues with the grammar. The lack of a
specification for Clojure increased the difficulty of creating an appropriate In practice, testing the grammar against appropriate Clojure source
grammar, but having a large sample of code to test against helped to mitigate from Clojars has been the most useful in finding issues with the
this a bit. On more than one occasion some version of the grammar failed to grammar. The lack of a specification for Clojure increased the
parse some legitimate Clojure source and subsequent investigation revealed difficulty of creating an appropriate grammar, but having a large
that the grammar had not accounted for an uncommom and/or unanticipated usage. sample of code to test against helped to mitigate this a bit. On more
than one occasion some version of the grammar failed to parse some
legitimate Clojure source and subsequent investigation revealed that
the grammar had not accounted for an uncommom and/or unanticipated
usage.
### But... ### But...
This method has a significant weakness as there could be cases where This method has a significant weakness as there could be cases where
tree-sitter would parse successfully but the result could be inappropriate. tree-sitter would parse successfully but the result could be
For example, if the grammar definition was faulty, something which should inappropriate. For example, if the grammar definition was faulty,
be parsed as a symbol might end up parsed as a number with no error reported. something which should be parsed as a symbol might end up parsed as a
number with no error reported.
To partially address this issue, generative / property-based testing was To partially address this issue, generative / property-based testing
attempted. was attempted.
## Generative testing via Hypothesis ## Generative testing via Hypothesis
Initially, [some effort was made to use test.check](https://gist.github.com/sogaiu/c0d668d050b63e298ef63549e357f9d2). However, [an outstanding issue with test.check](https://github.com/clojure/test.check/blob/master/doc/growth-and-shrinking.md#unnecessary-bind) (aka TCHECK-112) seemed very likely to be relevant Initially, [some effort was made to use
for the types of tests being considered. Also, the approach used [libpython-clj](https://github.com/clj-python/libpython-clj) to call tree-sitter via [Python bindings for tree-sitter](https://github.com/tree-sitter/py-tree-sitter). Although invoking tree-sitter via Python worked, it was awkward to connect this with `test.check`. For the above reasons, the `test.check` + `libpython-clj` approach (neat as it was) was abandoned. test.check](https://gist.github.com/sogaiu/c0d668d050b63e298ef63549e357f9d2).
However, [an outstanding issue with
Interestingly, Python's Hypothesis doesn't suffer from test.check's ["long-standing Hard Problem"](https://clojure.atlassian.net/browse/TCHECK-112) so that was given a try. [prop-test-ts-clj](https://github.com/sogaiu/prop-test-ts-clj) and [hypothesis-grammar-clojure](https://github.com/sogaiu/hypothesis-grammar-clojure) are the resulting test.check](https://github.com/clojure/test.check/blob/master/doc/growth-and-shrinking.md#unnecessary-bind)
bits. (aka TCHECK-112) seemed very likely to be relevant for the types of
tests being considered. Also, the approach used
At least [one issue](https://github.com/sogaiu/tree-sitter-clojure/issues/7) was discovered and it also turned out that [libpython-clj](https://github.com/clj-python/libpython-clj) to call
[parcera](https://github.com/carocad/parcera) was [affected](https://github.com/carocad/parcera/issues/86). tree-sitter via [Python bindings for
tree-sitter](https://github.com/tree-sitter/py-tree-sitter). Although
The code was also adapted a bit to test [Calva](https://github.com/BetterThanTomorrow/calva). Some issues were discovered and [reported upstream](https://github.com/BetterThanTomorrow/calva/issues/802). invoking tree-sitter via Python worked, it was awkward to connect this
with `test.check`. For the above reasons, the `test.check` +
`libpython-clj` approach (neat as it was) was abandoned.
Interestingly, Python's Hypothesis doesn't suffer from test.check's
["long-standing Hard
Problem"](https://clojure.atlassian.net/browse/TCHECK-112) so that was
given a try.
[prop-test-ts-clj](https://github.com/sogaiu/prop-test-ts-clj) and
[hypothesis-grammar-clojure](https://github.com/sogaiu/hypothesis-grammar-clojure)
are the resulting bits.
At least [one
issue](https://github.com/sogaiu/tree-sitter-clojure/issues/7) was
discovered and it also turned out that
[parcera](https://github.com/carocad/parcera) was
[affected](https://github.com/carocad/parcera/issues/86).
The code was also adapted a bit to test
[Calva](https://github.com/BetterThanTomorrow/calva). Some issues
were discovered and [reported
upstream](https://github.com/BetterThanTomorrow/calva/issues/802).
### But... ### But...
A drawback of this approach is that details of the tree-sitter-clojure grammar A drawback of this approach is that details of the tree-sitter-clojure
became embedded in the tests. One consequence is that if grammar became embedded in the tests. One consequence is that if
tree-sitter-clojure's grammar changes, then the tests may need to be updated tree-sitter-clojure's grammar changes, then the tests may need to be
to reflect changes in the grammar (if there is an intent to continue to updated to reflect changes in the grammar (if there is an intent to
use them). continue to use them).
## Summary ## Summary
tree-sitter-clojure has been tested in a variety ways attempting to address tree-sitter-clojure has been tested in a variety ways attempting to
various real-world constraints (e.g. lack of a language specification, address various real-world constraints (e.g. lack of a language
limitations of tree-sitter's approach for a language with extensible syntax, specification, limitations of tree-sitter's approach for a language
etc.). AFAICT, for what it sets out to do, it seems to work pretty well so with extensible syntax, etc.). AFAICT, for what it sets out to do, it
far. seems to work pretty well so far.

@ -1,17 +1,25 @@
## Use Information ## Use Information
tree-sitter-clojure has been used in the following: tree-sitter-clojure has been used in or by the following:
* One of the supported languages in the [nvim-treesitter](https://github.com/nvim-treesitter/nvim-treesitter#supported-languages) plugin for * [clojure-ts-mode](https://github.com/clojure-emacs/clojure-ts-mode)
[neovim](https://github.com/neovim/neovim) where [tree-sitter support is still in the early stages](https://neovim.io/news/2021/07).
* One of the supported languages in [difftastic](https://github.com/Wilfred/difftastic) -- "an experimental diff tool that compares files based on their syntax". * [Cursorless](https://github.com/cursorless-dev/cursorless)
* One of the supported languages in [Cursorless](https://github.com/cursorless-dev/cursorless) -- "a spoken language for structural code editing, enabling developers to code by voice at speeds not possible with a keyboard". * [difftastic](https://github.com/Wilfred/difftastic)
* Exploring [alternative highlighting ideas](https://github.com/ubolonton/emacs-tree-sitter/issues/68) and [an early emacs user foray](https://ag91.github.io/blog/2021/06/22/how-(simple-is)-to-install-a-clojure-tree-sitter-grammar-and-use-it-from-emacs/), both via [emacs-tree-sitter](https://github.com/ubolonton/emacs-tree-sitter). * [Helix Editor](https://github.com/helix-editor/helix)
* Base of [tree-sitter-commonlisp](https://github.com/theHamsta/tree-sitter-commonlisp) * [nvim-treesitter](https://github.com/nvim-treesitter/nvim-treesitter)
* [Semgrep](https://github.com/returntocorp/semgrep)
* [tree-sitter-langs](https://github.com/emacs-tree-sitter/tree-sitter-langs)
* Exploring [alternative highlighting
ideas](https://github.com/ubolonton/emacs-tree-sitter/issues/68) and
[an early emacs user
foray](https://ag91.github.io/blog/2021/06/22/how-(simple-is)-to-install-a-clojure-tree-sitter-grammar-and-use-it-from-emacs/),
both via
[emacs-tree-sitter](https://github.com/ubolonton/emacs-tree-sitter).
* Older versions of the grammar were used to implement [Atom support](https://github.com/sogaiu/language-clojure/tree/tree-sitter-clojure) as well as a couple of [proof-of-concept](https://github.com/sogaiu/vscode-clojure-defs)
[VSCode extensions](https://github.com/sogaiu/vscode-clojure-colorizer). However, these have not been updated to use the most recent grammar.

@ -0,0 +1,101 @@
# What the Repository Provides and Why
This document describes what files and directories the repository
provides and associated reasoning. First it covers things which are
likely to remain in place for some time (except perhaps the `src`
directory). This is followed by a description of things that are more
likely to change or be removed.
One might be interested in this content out of academic curiosity but
more likely it might be because one is thinking of depending on the
repository in some way.
## What and Why
The order of the following files and directories is alphabetical and
not meant to reflect relative importance.
* `CHANGELOG.md` - this file contains a changelog.
* `COPYING.txt` - this file contains license information for the
repository.
* `grammar.js` - this file contains a grammar description and is used
in the process of generating parser source code that lives in `src`.
It's likely that this (or something comparable) will continue to be
provided assuming tree-sitter doesn't change the way it works.
* `package.json` - this file is needed by a
[component](https://github.com/cursorless-dev/vscode-parse-tree/) of
[Cursorless](https://www.cursorless.org/). It uses our grammar via
yarn and `package.json` seems to be essential.
* `queries` - this directory and the simple file it contains are
provided on request from
[`difftastic`](https://github.com/Wilfred/difftastic) folks. The
file it contains doesn't contain much and is not likely to be the
sort of thing one expects to be used in an editor.
* `README.md` - this file contains the repository's README content.
* `src` - this directory contains source files that are generated [1]
from `grammar.js`. The files are typically used to generate a
dynamic library / shared object that can be used by the tree-sitter
library to handle Clojure / ClojureScript source code. Although the
content of this directory is generated, the files are provided
because in practice, multiple parties have already become dependant
on them. There have been opinions voiced that this should not
remain so, but change in that direction has not been widespread. We
would prefer not to be hosting this directory and its content, but
are leaving it in place for the time being. See
[here](https://github.com/sogaiu/ts-questions/blob/master/questions/should-parser-source-be-committed/README.md)
for more on the topic if interested.
* `test/corpus` - this directory contains tree-sitter's corpus
test-related files.
## Other Content
The rest of the content of the repository is currently documentation
that lives in the `doc` directory.
## About Bindings
The repository does not host any bindings (e.g. for Rust, Node, or
other programming language).
They should be straight-forward to generate as long as one has a
suitable `tree-sitter` cli and the `grammar.js` file mentioned above.
Binding code used to be created by the `generate` subcommand, but this
appears to have [changed from version 0.24.0 of the `tree-sitter`
cli](https://github.com/tree-sitter/tree-sitter/releases/tag/v0.24.0):
> Move generation of grammar files to an init command ([#3694](https://github.com/tree-sitter/tree-sitter/pull/3694))
Note that "grammar files" here seems to refer to "bindings" files.
Further evidence in support of this change is [this
documentation](https://tree-sitter.github.io/tree-sitter/cli/init.html#binding-files):
> When you run tree-sitter init, the CLI will also generate a number
> of files in your repository that allow for your parser to be used
> from different language.
Which languages bindings files are generated for is affected by [the
`bindings` field in
`tree-sitter.json`](https://tree-sitter.github.io/tree-sitter/cli/init.html#the-bindings-field).
(It appears that omitting the field means "don't generate any
bindings".)
Probably it's better to consult the official documentation and/or ask
around about what the latest procedure is rather than rely on these
brief notes though.
## Footnotes
[1] If the grammar uses an external scanner, `src` may contain
non-generated files such as `scanner.c`, `scanner.cc`, etc. In the
current case, no external scanner is used and the `src` directory
content is entirely generated.

@ -3,6 +3,10 @@
// things. this is more or less in line with advice from tree-sitter // things. this is more or less in line with advice from tree-sitter
// folks. // folks.
function regex(...patts) {
return RegExp(patts.join(""));
}
// java.lang.Character.isWhitespace AND comma // java.lang.Character.isWhitespace AND comma
// //
// Space Separator (Zs) but NOT including (U+00A0, U+2007, U+202F) // Space Separator (Zs) but NOT including (U+00A0, U+2007, U+202F)
@ -31,29 +35,35 @@
// Unit Separator // Unit Separator
// U+001F // U+001F
const WHITESPACE_CHAR = const WHITESPACE_CHAR =
/[\f\n\r\t, \u000B\u001C\u001D\u001E\u001F\u2028\u2029\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2008\u2009\u200a\u205f\u3000]/; regex("[",
"\\f\\n\\r\\t, ",
"\\u000B\\u001C\\u001D\\u001E\\u001F",
"\\u2028\\u2029\\u1680",
"\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008",
"\\u2009\\u200a\\u205f\\u3000",
"]");
const WHITESPACE = const WHITESPACE =
token(repeat1(WHITESPACE_CHAR)); token(repeat1(WHITESPACE_CHAR));
const COMMENT = const COMMENT =
token(/(;|#!).*\n?/); token(regex('(;|#!).*\n?'));
const DIGIT = const DIGIT =
/[0-9]/; regex('[0-9]');
const ALPHANUMERIC = const ALPHANUMERIC =
/[0-9a-zA-Z]/; regex('[0-9a-zA-Z]');
const HEX_DIGIT = const HEX_DIGIT =
/[0-9a-fA-F]/; regex('[0-9a-fA-F]');
const OCTAL_DIGIT = const OCTAL_DIGIT =
/[0-7]/; regex('[0-7]');
const HEX_NUMBER = const HEX_NUMBER =
seq("0", seq("0",
/[xX]/, regex('[xX]'),
repeat1(HEX_DIGIT), repeat1(HEX_DIGIT),
optional("N")); optional("N"));
@ -66,10 +76,9 @@ const OCTAL_NUMBER =
// XXX: not constraining portion after r/R // XXX: not constraining portion after r/R
const RADIX_NUMBER = const RADIX_NUMBER =
seq(repeat1(DIGIT), seq(repeat1(DIGIT),
/[rR]/, regex('[rR]'),
repeat1(ALPHANUMERIC)); repeat1(ALPHANUMERIC));
// XXX: not accounting for division by zero
const RATIO = const RATIO =
seq(repeat1(DIGIT), seq(repeat1(DIGIT),
"/", "/",
@ -79,17 +88,17 @@ const DOUBLE =
seq(repeat1(DIGIT), seq(repeat1(DIGIT),
optional(seq(".", optional(seq(".",
repeat(DIGIT))), repeat(DIGIT))),
optional(seq(/[eE]/, optional(seq(regex('[eE]'),
optional(/[+-]/), optional(regex('[+-]')),
repeat1(DIGIT))), repeat1(DIGIT))),
optional("M")); optional("M"));
const INTEGER = const INTEGER =
seq(repeat1(DIGIT), seq(repeat1(DIGIT),
optional(/[MN]/)); optional(regex('[MN]')));
const NUMBER = const NUMBER =
token(prec(10, seq(optional(/[+-]/), token(prec(10, seq(optional(regex('[+-]')),
choice(HEX_NUMBER, choice(HEX_NUMBER,
OCTAL_NUMBER, OCTAL_NUMBER,
RADIX_NUMBER, RADIX_NUMBER,
@ -105,13 +114,26 @@ const BOOLEAN =
'true')); 'true'));
const KEYWORD_HEAD = const KEYWORD_HEAD =
/[^\f\n\r\t ()\[\]{}"@~^;`\\,:/\u000B\u001C\u001D\u001E\u001F\u2028\u2029\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2008\u2009\u200a\u205f\u3000]/; regex("[^",
"\\f\\n\\r\\t ",
"()",
"\\[\\]",
"{}",
'"',
"@~^;`",
"\\\\",
",:/",
"\\u000B\\u001C\\u001D\\u001E\\u001F",
"\\u2028\\u2029\\u1680",
"\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008",
"\\u2009\\u200a\\u205f\\u3000",
"]");
const KEYWORD_BODY = const KEYWORD_BODY =
choice(/[:']/, KEYWORD_HEAD); choice(regex("[:']"), KEYWORD_HEAD);
const KEYWORD_NAMESPACED_BODY = const KEYWORD_NAMESPACED_BODY =
token(repeat1(choice(/[:'\/]/, KEYWORD_HEAD))); token(repeat1(choice(regex("[:'/]"), KEYWORD_HEAD)));
const KEYWORD_NO_SIGIL = const KEYWORD_NO_SIGIL =
token(seq(KEYWORD_HEAD, token(seq(KEYWORD_HEAD,
@ -125,10 +147,10 @@ const AUTO_RESOLVE_MARK =
const STRING = const STRING =
token(seq('"', token(seq('"',
repeat(/[^"\\]/), repeat(regex('[^"\\\\]')),
repeat(seq("\\", repeat(seq("\\",
/./, regex("."),
repeat(/[^"\\]/))), repeat(regex('[^"\\\\]')))),
'"')); '"'));
// XXX: better to match \o378 as a single item // XXX: better to match \o378 as a single item
@ -137,9 +159,6 @@ const OCTAL_CHAR =
choice(seq(DIGIT, DIGIT, DIGIT), choice(seq(DIGIT, DIGIT, DIGIT),
seq(DIGIT, DIGIT), seq(DIGIT, DIGIT),
seq(DIGIT))); seq(DIGIT)));
// choice(seq(/[0-3]/, OCTAL_DIGIT, OCTAL_DIGIT),
// seq(OCTAL_DIGIT, OCTAL_DIGIT),
// seq(OCTAL_DIGIT)));
const NAMED_CHAR = const NAMED_CHAR =
choice("backspace", choice("backspace",
@ -165,7 +184,7 @@ const UNICODE =
// XXX: null is supposed to be usable but putting \x00 below // XXX: null is supposed to be usable but putting \x00 below
// does not seem to work // does not seem to work
const ANY_CHAR = const ANY_CHAR =
/.|\n/; regex('.|\n');
const CHARACTER = const CHARACTER =
token(seq("\\", token(seq("\\",
@ -175,18 +194,30 @@ const CHARACTER =
ANY_CHAR))); ANY_CHAR)));
const SYMBOL_HEAD = const SYMBOL_HEAD =
/[^\f\n\r\t \/()\[\]{}"@~^;`\\,:#'0-9\u000B\u001C\u001D\u001E\u001F\u2028\u2029\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2008\u2009\u200a\u205f\u3000]/; regex("[^",
"\\f\\n\\r\\t ",
"/",
"()\\[\\]{}",
'"',
"@~^;`",
"\\\\",
",:#'0-9",
"\\u000B\\u001C\\u001D\\u001E\\u001F",
"\\u2028\\u2029\\u1680",
"\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008",
"\\u2009\\u200a\\u205f\\u3000",
"]");
const NS_DELIMITER = const NS_DELIMITER =
token("/"); token("/");
const SYMBOL_BODY = const SYMBOL_BODY =
choice(SYMBOL_HEAD, choice(SYMBOL_HEAD,
/[:#'0-9]/); regex("[:#'0-9]"));
const SYMBOL_NAMESPACED_NAME = const SYMBOL_NAMESPACED_NAME =
token(repeat1(choice(SYMBOL_HEAD, token(repeat1(choice(SYMBOL_HEAD,
/[\/:#'0-9]/))); regex("[/:#'0-9]"))));
// XXX: no attempt is made to enforce certain complex things, e.g. // XXX: no attempt is made to enforce certain complex things, e.g.
// //
@ -335,20 +366,12 @@ module.exports = grammar({
meta_lit: $ => meta_lit: $ =>
seq(field('marker', "^"), seq(field('marker', "^"),
repeat($._gap), repeat($._gap),
field('value', choice($.read_cond_lit, field('value', $._form)),
$.map_lit,
$.str_lit,
$.kwd_lit,
$.sym_lit))),
old_meta_lit: $ => old_meta_lit: $ =>
seq(field('marker', "#^"), seq(field('marker', "#^"),
repeat($._gap), repeat($._gap),
field('value', choice($.read_cond_lit, field('value', $._form)),
$.map_lit,
$.str_lit,
$.kwd_lit,
$.sym_lit))),
list_lit: $ => list_lit: $ =>
seq(repeat($._metadata_lit), seq(repeat($._metadata_lit),
@ -438,7 +461,7 @@ module.exports = grammar({
sym_val_lit: $ => sym_val_lit: $ =>
seq(field('marker', "##"), seq(field('marker', "##"),
repeat($._gap), repeat($._gap),
field('value', $.sym_lit)), field('value', $._form)),
evaling_lit: $ => evaling_lit: $ =>
seq(repeat($._metadata_lit), // ^:x #=(vector 1) seq(repeat($._metadata_lit), // ^:x #=(vector 1)

@ -1,278 +0,0 @@
// NOTES
//
// - possibilities (may be as separate grammars?)
// - no fields (but likely that means metadata lives "outside")
// - retain whitespace and comments (for round-tripping)
// - clojure clr's pipe-escaping:
// https://github.com/clojure/clojure-clr/wiki/Specifying-types
//
// - traveral issues
// - use of fields (e.g. value, prefix, tag, metadata)
// - allows skipping certain nodes such as:
// - metadata
// - comment
// - discard-related
// - allows targeted navigation without having to know the
// node type (e.g. field value vs node type map)
// - limitations
// - a bit slower?
// - cannot use fields for things without names, e.g.
// - seq(...) cannot be the 2nd arg to field()
// - $._foo won't work unless it "resolves" to $.bar (non underscore)
// - for a given node, examine child nodes in reverse, that is,
// starting at the end and working backwards
//
// - probably won't do
// - support def, if, and other "primitives"
// - support for {{}} template constructs
//
// - testing
// - clj, cljc, cljs
// - what about edn?
// - approaches
// - "port" hand-written tests
// - oakmac (done)
// - Tavistock (done)
// - tonsky
// - generative testing for token testing (done via hypothesis and py-tree-sitter)
// - look for parsing errors across large sample (e.g. clojars) (done)
// - how to "package" testing facilities
// - currently each approach has its own project directory
//
// - debugging
// - npx tree-sitter parse filepath + look for ERROR in console output
// - npx tree-sitter parse --debug-graph filepath + view log.html
// - npx tree-sitter parse --debug filepath + view console output
//
// - loosening ideas:
// - allow ##Other (not just ##Inf, -##Inf, ##NaN)
// - allow # in keywords
// - allow ::/
// - don't handle "no repeating colons" in symbols and in non-leading
// portions of keywords (currently unimplemented anyway)
//
// - can strings have unicode escapes in them?
//
// - tree-sitter
// - parse subcommand
// - parse from stdin
// - recursively traverse multiple directories (globbing exists)
// - parsing within zips/jars
// - more flexible file type specification
// - custom parsing / processing per "file"
// - web-ui subcommand
// - didn't work when grammar used externals
// - file browsing + loading better than copy-paste
// - indiciate error via color
// - jump to error
// - somehow searching for error doesn't seem to work sometimes
// - ~/.tree-sitter
// - bin
// - contains shared libraries for each grammar
// - parse command seems to install stuff here
// - config.json
// - parser-directories used to customize "scan" for grammars
// - theme used for highlight subcommand
// symbolPat from LispReader.java (for keywords and symbols?)
// "[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"
//
// https://clojure.org/reference/reader#_symbols
// 1. Symbols begin with a non-numeric char -- XXX: see 2 for limits?
// 2. Can contain alphanumeric chars and *, +, !, -, _, ', ?, <, > and =
// 3. / can be used once in the middle of a symbol (sep the ns from the name)
// 4. / by itself names the division function
// 5. . special meaning can be used >= 1 times in the middle of a symbol
// to designate a fully-qualified class name, e.g. java.util.BitSet,
// or in namespace names.
// 6. Symbols beginning or ending with . are reserved by Clojure
// 7. Symbols beginning or ending with : are reserved by Clojure
// 8. A symbol can contain one or more non-repeating ':'s
//
// missing
// 9. $, &, % -- in body and end of symbol
//
// undocumented
// -1a can be made a symbol, but reader will reject? repl rejects
// => number parsing takes priority?
// 'a can be made a symbol, but reader will reject? repl -> quote
//
// implied?
// doesn't start with ,
// doesn't start with '
// doesn't start with #
// doesn't start with `
// doesn't start with @
// doesn't start with ^
// doesn't start with \
// doesn't start with ;
// doesn't start with ~
// doesn't start with "
// doesn't start with ( )
// doesn't start with { }
// doesn't start with [ ]
//
// extra:
//
// is my-ns// valid?
//
// "Consistency of symbols between different readers/edn"
//
// foo// should be valid.
//
// 2014-09-16 clojure-dev google group alex miller
//
// https://groups.google.com/d/msg/clojure-dev/b09WvRR90Zc/c3zzMFqDsRYJ
//
// "CLJ-1238 Allow EdnReader to read foo// (matches LispReader behavior)"
//
// changelog for clojure 1.6
//
// is # allowed as a constituent character in keywords?
//
// following points are reasoning based on edn docs
//
// "Bug in reader or repl? reading keyword :#abc"
//
// Symbols begin with a non-numeric character and can contain
// alphanumeric characters and . * + ! - _ ? $ % & =. If -, + or
// . are the first character, the second character must be
// non-numeric. Additionally, : # are allowed as constituent
// characters in symbols other than as the first character.
//
// 2013-05-02 clojure google group colin jones (fwd by dave sann)
//
// https://groups.google.com/d/msg/clojure/lK7juHxsPCc/TeYjxoW_3csJ
//
// Keywords are identifiers that typically designate
// themselves. They are semantically akin to enumeration
// values. Keywords follow the rules of symbols, except they can
// (and must) begin with :, e.g. :fred or :my/fred. If the target
// platform does not have a keyword type distinct from a symbol
// type, the same type can be used without conflict, since the
// mandatory leading : of keywords is disallowed for symbols.
//
// https://github.com/edn-format/edn#symbols
//
// https://clojure.org/reference/reader#_literals
// 0. Keywords are like symbols, except:
// 1. They can and must begin with a colon, e.g. :fred.
// ~~2. They cannot contain '.' in the name part, or name classes.~~
// 3. They can contain a namespace, :person/name, which may contain '.'s.
// 4. A keyword that begins with two colons is auto-resolved in the current
// namespace to a qualified keyword:
// - If the keyword is unqualified, the namespace will be the current
// namespace. In user, ::rect is read as :user/rect.
// - If the keyword is qualified, the namespace will be resolved using
// aliases in the current namespace. In a namespace where x is aliased
// to example, ::x/foo resolves to :example/foo.
//
// extra:
//
// :/ is a legal keyword(?):
//
// alexmiller: @gfredericks :/ is "open for the language to start
// interpreting" and not an invalid keyword so should be ok to generate.
// and cljs should fix it's weirdness. (#clojure-dev 2019-06-07)
//
// https://clojurians-log.clojureverse.org/clojure-dev/2019-06-07
//
// It is undefined/left for future expansion.
//
// Clojurescript's reading seems weird but given that this is undefined
// it's hard to say it's wrong. :)
//
// 2020-07-10 (or so) alexmiller
//
// https://ask.clojure.org/index.php/9427/clarify-the-position-of-as-a-keyword
// https://clojure.atlassian.net/browse/TCHECK-155
//
// . CAN be in the name part:
//
// "[Bug?] Keyword constraints not enforced"
//
// I think you've both misread "they cannot name classes" to be - "They
// cannot contain class names".
//
// The symbol String can name a class but the keyword :String can't,
// that's all I meant there.
//
// As far as '.', that restriction has been relaxed. I'll try to touch
// up the docs for the next release.
//
// 2008-11-25 clojure google group rich hickey
//
// https://groups.google.com/d/msg/clojure/CCuIp_bZ-ZM/THea7NF91Z4J
//
// Whether keywords can start with numbers:
//
// "puzzled by RuntimeException"
//
// we currently allow keywords starting with numbers and seem to have
// decided this is ok. I would like to get Rich to approve a change to
// the page and do so.
//
// 2014-04-25 clojure google group alex miller
//
// https://groups.google.com/forum/#!msg/clojure/XP1XAaDdKLY/kodfZTk8eeoJ
//
// From a discussion in #clojure, it emerged that while :foo/1 is
// currently not allowed, ::1 is.
//
// 2014-12-10 nicola mometto
//
// https://clojure.atlassian.net/browse/CLJ-1286
//
// "Clarify and align valid symbol and keyword rules for Clojure (and edn)"
//
// https://clojure.atlassian.net/browse/CLJ-1527
//
// consistency of symbols between different readers/edn
//
// https://groups.google.com/forum/#!topic/clojure-dev/b09WvRR90Zc
//
// :1 is accepted because it once accidentally worked and they
// don't like breaking existing code
//
// it was never meant to
//
// 2020-06-14 ish noisesmith on #clojure (slack)
//
// There are libraries out there that assume :1 works. They changed
// Clojure at one point in an alpha to disallow such keywords and it broke
// code so they decided to continue allowing them (even tho' they are
// not "legal").
//
// 2020-06-14 ish seancorfield on #clojure (slack)
//
// Whether # is allowed in a keyword:
//
// "Clarification on # as valid symbol character"
//
// this works now, but is not guaranteed to always be valid
//
// 2016-11-08 clojure google group alex miller
//
// https://groups.google.com/forum/#!topic/clojure/CwZHu1Eszbk
// https://clojure.org/reference/reader#_literals
// 1. Integers can be indefinitely long and will be read as Longs when
// in range and clojure.lang.BigInts otherwise.
// 2. Integers with an N suffix are always read as BigInts.
// 3. When possible, they can be specified in any base with radix from
// 2 to 36 (see Long.parseLong()); for example 2r101010, 8r52, 36r16,
// and 42 are all the same Long.
// 4. Floating point numbers are read as Doubles; with M suffix they are
// read as BigDecimals.
// 5. Ratios are supported, e.g. 22/7.
// intPat
// "([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?"
// 0[0-9]+ is for better errors -- thanks seancorfield and andyfingerhut
// ratioPat
// "([-+]?[0-9]+)/([0-9]+)"
// floatPat
// "([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?"

@ -1,19 +0,0 @@
{
"name": "tree-sitter-clojure",
"version": "0.0.11",
"lockfileVersion": 1,
"requires": true,
"dependencies": {
"nan": {
"version": "2.14.2",
"resolved": "https://registry.npmjs.org/nan/-/nan-2.14.2.tgz",
"integrity": "sha512-M2ufzIiINKCuDfBSAUr1vWQ+vuVcA9kqx8JJUsbQi6yf1uGRyb7HfpdfUr5qLXf3B/t8dPvcjhKMmlfnP47EzQ=="
},
"tree-sitter-cli": {
"version": "0.19.3",
"resolved": "https://registry.npmjs.org/tree-sitter-cli/-/tree-sitter-cli-0.19.3.tgz",
"integrity": "sha512-UlntGxLrlkQCKVrhm7guzfi+ovM4wDLVCCu3z5jmfDgFNoUoKa/23ddaQON5afD5jB9a02xv4N5MXJfCx+/mpw==",
"dev": true
}
}
}

@ -1,20 +1,7 @@
{ {
"name": "tree-sitter-clojure", "name": "tree-sitter-clojure",
"version": "0.0.11", "version": "0.0.13",
"description": "Clojure grammar for tree-sitter", "description": "Clojure grammar for tree-sitter",
"main": "bindings/node",
"scripts": {
"build": "npx tree-sitter generate && npx node-gyp build",
"test": "npx tree-sitter test"
},
"author": "",
"license": "",
"dependencies": {
"nan": "2.14.2"
},
"devDependencies": {
"tree-sitter-cli": "0.19.3"
},
"tree-sitter": [ "tree-sitter": [
{ {
"scope": "source.clojure", "scope": "source.clojure",

@ -484,7 +484,7 @@
"members": [ "members": [
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[:'\\/]" "value": "[:'/]"
}, },
{ {
"type": "PATTERN", "type": "PATTERN",
@ -605,7 +605,7 @@
"members": [ "members": [
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[:'\\/]" "value": "[:'/]"
}, },
{ {
"type": "PATTERN", "type": "PATTERN",
@ -935,7 +935,7 @@
"members": [ "members": [
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[^\\f\\n\\r\\t \\/()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]" "value": "[^\\f\\n\\r\\t /()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]"
}, },
{ {
"type": "REPEAT", "type": "REPEAT",
@ -944,7 +944,7 @@
"members": [ "members": [
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[^\\f\\n\\r\\t \\/()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]" "value": "[^\\f\\n\\r\\t /()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]"
}, },
{ {
"type": "PATTERN", "type": "PATTERN",
@ -985,11 +985,11 @@
"members": [ "members": [
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[^\\f\\n\\r\\t \\/()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]" "value": "[^\\f\\n\\r\\t /()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]"
}, },
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[\\/:#'0-9]" "value": "[/:#'0-9]"
} }
] ]
} }
@ -1024,7 +1024,7 @@
"members": [ "members": [
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[^\\f\\n\\r\\t \\/()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]" "value": "[^\\f\\n\\r\\t /()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]"
}, },
{ {
"type": "REPEAT", "type": "REPEAT",
@ -1033,7 +1033,7 @@
"members": [ "members": [
{ {
"type": "PATTERN", "type": "PATTERN",
"value": "[^\\f\\n\\r\\t \\/()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]" "value": "[^\\f\\n\\r\\t /()\\[\\]{}\"@~^;`\\\\,:#'0-9\\u000B\\u001C\\u001D\\u001E\\u001F\\u2028\\u2029\\u1680\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009\\u200a\\u205f\\u3000]"
}, },
{ {
"type": "PATTERN", "type": "PATTERN",
@ -1114,29 +1114,8 @@
"type": "FIELD", "type": "FIELD",
"name": "value", "name": "value",
"content": { "content": {
"type": "CHOICE", "type": "SYMBOL",
"members": [ "name": "_form"
{
"type": "SYMBOL",
"name": "read_cond_lit"
},
{
"type": "SYMBOL",
"name": "map_lit"
},
{
"type": "SYMBOL",
"name": "str_lit"
},
{
"type": "SYMBOL",
"name": "kwd_lit"
},
{
"type": "SYMBOL",
"name": "sym_lit"
}
]
} }
} }
] ]
@ -1163,29 +1142,8 @@
"type": "FIELD", "type": "FIELD",
"name": "value", "name": "value",
"content": { "content": {
"type": "CHOICE", "type": "SYMBOL",
"members": [ "name": "_form"
{
"type": "SYMBOL",
"name": "read_cond_lit"
},
{
"type": "SYMBOL",
"name": "map_lit"
},
{
"type": "SYMBOL",
"name": "str_lit"
},
{
"type": "SYMBOL",
"name": "kwd_lit"
},
{
"type": "SYMBOL",
"name": "sym_lit"
}
]
} }
} }
] ]
@ -1684,7 +1642,7 @@
"name": "value", "name": "value",
"content": { "content": {
"type": "SYMBOL", "type": "SYMBOL",
"name": "sym_lit" "name": "_form"
} }
} }
] ]

@ -940,18 +940,70 @@
"multiple": false, "multiple": false,
"required": true, "required": true,
"types": [ "types": [
{
"type": "anon_fn_lit",
"named": true
},
{
"type": "bool_lit",
"named": true
},
{
"type": "char_lit",
"named": true
},
{
"type": "derefing_lit",
"named": true
},
{
"type": "evaling_lit",
"named": true
},
{ {
"type": "kwd_lit", "type": "kwd_lit",
"named": true "named": true
}, },
{
"type": "list_lit",
"named": true
},
{ {
"type": "map_lit", "type": "map_lit",
"named": true "named": true
}, },
{
"type": "nil_lit",
"named": true
},
{
"type": "ns_map_lit",
"named": true
},
{
"type": "num_lit",
"named": true
},
{
"type": "quoting_lit",
"named": true
},
{ {
"type": "read_cond_lit", "type": "read_cond_lit",
"named": true "named": true
}, },
{
"type": "regex_lit",
"named": true
},
{
"type": "set_lit",
"named": true
},
{
"type": "splicing_read_cond_lit",
"named": true
},
{ {
"type": "str_lit", "type": "str_lit",
"named": true "named": true
@ -959,6 +1011,34 @@
{ {
"type": "sym_lit", "type": "sym_lit",
"named": true "named": true
},
{
"type": "sym_val_lit",
"named": true
},
{
"type": "syn_quoting_lit",
"named": true
},
{
"type": "tagged_or_ctor_lit",
"named": true
},
{
"type": "unquote_splicing_lit",
"named": true
},
{
"type": "unquoting_lit",
"named": true
},
{
"type": "var_quoting_lit",
"named": true
},
{
"type": "vec_lit",
"named": true
} }
] ]
} }
@ -1186,18 +1266,70 @@
"multiple": false, "multiple": false,
"required": true, "required": true,
"types": [ "types": [
{
"type": "anon_fn_lit",
"named": true
},
{
"type": "bool_lit",
"named": true
},
{
"type": "char_lit",
"named": true
},
{
"type": "derefing_lit",
"named": true
},
{
"type": "evaling_lit",
"named": true
},
{ {
"type": "kwd_lit", "type": "kwd_lit",
"named": true "named": true
}, },
{
"type": "list_lit",
"named": true
},
{ {
"type": "map_lit", "type": "map_lit",
"named": true "named": true
}, },
{
"type": "nil_lit",
"named": true
},
{
"type": "ns_map_lit",
"named": true
},
{
"type": "num_lit",
"named": true
},
{
"type": "quoting_lit",
"named": true
},
{ {
"type": "read_cond_lit", "type": "read_cond_lit",
"named": true "named": true
}, },
{
"type": "regex_lit",
"named": true
},
{
"type": "set_lit",
"named": true
},
{
"type": "splicing_read_cond_lit",
"named": true
},
{ {
"type": "str_lit", "type": "str_lit",
"named": true "named": true
@ -1205,6 +1337,34 @@
{ {
"type": "sym_lit", "type": "sym_lit",
"named": true "named": true
},
{
"type": "sym_val_lit",
"named": true
},
{
"type": "syn_quoting_lit",
"named": true
},
{
"type": "tagged_or_ctor_lit",
"named": true
},
{
"type": "unquote_splicing_lit",
"named": true
},
{
"type": "unquoting_lit",
"named": true
},
{
"type": "var_quoting_lit",
"named": true
},
{
"type": "vec_lit",
"named": true
} }
] ]
} }
@ -2136,9 +2296,105 @@
"multiple": false, "multiple": false,
"required": true, "required": true,
"types": [ "types": [
{
"type": "anon_fn_lit",
"named": true
},
{
"type": "bool_lit",
"named": true
},
{
"type": "char_lit",
"named": true
},
{
"type": "derefing_lit",
"named": true
},
{
"type": "evaling_lit",
"named": true
},
{
"type": "kwd_lit",
"named": true
},
{
"type": "list_lit",
"named": true
},
{
"type": "map_lit",
"named": true
},
{
"type": "nil_lit",
"named": true
},
{
"type": "ns_map_lit",
"named": true
},
{
"type": "num_lit",
"named": true
},
{
"type": "quoting_lit",
"named": true
},
{
"type": "read_cond_lit",
"named": true
},
{
"type": "regex_lit",
"named": true
},
{
"type": "set_lit",
"named": true
},
{
"type": "splicing_read_cond_lit",
"named": true
},
{
"type": "str_lit",
"named": true
},
{ {
"type": "sym_lit", "type": "sym_lit",
"named": true "named": true
},
{
"type": "sym_val_lit",
"named": true
},
{
"type": "syn_quoting_lit",
"named": true
},
{
"type": "tagged_or_ctor_lit",
"named": true
},
{
"type": "unquote_splicing_lit",
"named": true
},
{
"type": "unquoting_lit",
"named": true
},
{
"type": "var_quoting_lit",
"named": true
},
{
"type": "vec_lit",
"named": true
} }
] ]
} }

File diff suppressed because it is too large Load Diff

@ -13,9 +13,8 @@ extern "C" {
#define ts_builtin_sym_end 0 #define ts_builtin_sym_end 0
#define TREE_SITTER_SERIALIZATION_BUFFER_SIZE 1024 #define TREE_SITTER_SERIALIZATION_BUFFER_SIZE 1024
typedef uint16_t TSStateId;
#ifndef TREE_SITTER_API_H_ #ifndef TREE_SITTER_API_H_
typedef uint16_t TSStateId;
typedef uint16_t TSSymbol; typedef uint16_t TSSymbol;
typedef uint16_t TSFieldId; typedef uint16_t TSFieldId;
typedef struct TSLanguage TSLanguage; typedef struct TSLanguage TSLanguage;
@ -102,8 +101,8 @@ struct TSLanguage {
const uint16_t *small_parse_table; const uint16_t *small_parse_table;
const uint32_t *small_parse_table_map; const uint32_t *small_parse_table_map;
const TSParseActionEntry *parse_actions; const TSParseActionEntry *parse_actions;
const char **symbol_names; const char * const *symbol_names;
const char **field_names; const char * const *field_names;
const TSFieldMapSlice *field_map_slices; const TSFieldMapSlice *field_map_slices;
const TSFieldMapEntry *field_map_entries; const TSFieldMapEntry *field_map_entries;
const TSSymbolMetadata *symbol_metadata; const TSSymbolMetadata *symbol_metadata;
@ -123,15 +122,23 @@ struct TSLanguage {
unsigned (*serialize)(void *, char *); unsigned (*serialize)(void *, char *);
void (*deserialize)(void *, const char *, unsigned); void (*deserialize)(void *, const char *, unsigned);
} external_scanner; } external_scanner;
const TSStateId *primary_state_ids;
}; };
/* /*
* Lexer Macros * Lexer Macros
*/ */
#ifdef _MSC_VER
#define UNUSED __pragma(warning(suppress : 4101))
#else
#define UNUSED __attribute__((unused))
#endif
#define START_LEXER() \ #define START_LEXER() \
bool result = false; \ bool result = false; \
bool skip = false; \ bool skip = false; \
UNUSED \
bool eof = false; \ bool eof = false; \
int32_t lookahead; \ int32_t lookahead; \
goto start; \ goto start; \
@ -165,7 +172,7 @@ struct TSLanguage {
* Parse Table Macros * Parse Table Macros
*/ */
#define SMALL_STATE(id) id - LARGE_STATE_COUNT #define SMALL_STATE(id) ((id) - LARGE_STATE_COUNT)
#define STATE(id) id #define STATE(id) id
@ -175,7 +182,7 @@ struct TSLanguage {
{{ \ {{ \
.shift = { \ .shift = { \
.type = TSParseActionTypeShift, \ .type = TSParseActionTypeShift, \
.state = state_value \ .state = (state_value) \
} \ } \
}} }}
@ -183,7 +190,7 @@ struct TSLanguage {
{{ \ {{ \
.shift = { \ .shift = { \
.type = TSParseActionTypeShift, \ .type = TSParseActionTypeShift, \
.state = state_value, \ .state = (state_value), \
.repetition = true \ .repetition = true \
} \ } \
}} }}

@ -9,7 +9,8 @@ Symbol Metadata
(source (source
(vec_lit (vec_lit
(meta_lit (meta_lit
(sym_lit (sym_name))))) (sym_lit
(sym_name)))))
================================================================================ ================================================================================
Keyword Metadata Keyword Metadata
@ -22,7 +23,8 @@ Keyword Metadata
(source (source
(map_lit (map_lit
(meta_lit (meta_lit
(kwd_lit (kwd_name))))) (kwd_lit
(kwd_name)))))
================================================================================ ================================================================================
String Metadata String Metadata
@ -49,9 +51,11 @@ Map Metadata
(set_lit (set_lit
(meta_lit (meta_lit
(map_lit (map_lit
(kwd_lit (kwd_name)) (kwd_lit
(kwd_name))
(num_lit) (num_lit)
(kwd_lit (kwd_name)) (kwd_lit
(kwd_name))
(num_lit))))) (num_lit)))))
================================================================================ ================================================================================
@ -66,11 +70,14 @@ Reader Conditional Metadata
(vec_lit (vec_lit
(meta_lit (meta_lit
(read_cond_lit (read_cond_lit
(kwd_lit (kwd_name)) (kwd_lit
(kwd_name))
(str_lit) (str_lit)
(kwd_lit (kwd_name)) (kwd_lit
(kwd_name))
(str_lit) (str_lit)
(kwd_lit (kwd_name)) (kwd_lit
(kwd_name))
(str_lit))))) (str_lit)))))
================================================================================ ================================================================================
@ -84,8 +91,47 @@ Multiple Bits of Metadata
(source (source
(set_lit (set_lit
(meta_lit (meta_lit
(kwd_lit (kwd_name))) (kwd_lit
(kwd_name)))
(meta_lit (meta_lit
(kwd_lit (kwd_name))) (kwd_lit
(kwd_name)))
(meta_lit (meta_lit
(kwd_lit (kwd_name))))) (kwd_lit
(kwd_name)))))
================================================================================
Tagged Literal Metadata
================================================================================
^#/(data) thing
--------------------------------------------------------------------------------
(source
(sym_lit
(meta_lit
(tagged_or_ctor_lit
(sym_lit
(sym_name))
(list_lit
(sym_lit
(sym_name)))))
(sym_name)))
================================================================================
Evaling Literal Metadata
================================================================================
^#=(keyword "a") []
--------------------------------------------------------------------------------
(source
(vec_lit
(meta_lit
(evaling_lit
(list_lit
(sym_lit
(sym_name))
(str_lit))))))

@ -8,7 +8,8 @@ Inf
(source (source
(sym_val_lit (sym_val_lit
(sym_lit (sym_name)))) (sym_lit
(sym_name))))
================================================================================ ================================================================================
-Inf -Inf
@ -20,7 +21,8 @@ Inf
(source (source
(sym_val_lit (sym_val_lit
(sym_lit (sym_name)))) (sym_lit
(sym_name))))
================================================================================ ================================================================================
NaN NaN
@ -32,4 +34,22 @@ NaN
(source (source
(sym_val_lit (sym_val_lit
(sym_lit (sym_name)))) (sym_lit
(sym_name))))
================================================================================
Symbolic Value Literal with Evaling Literal
================================================================================
###=(identity NaN)
--------------------------------------------------------------------------------
(source
(sym_val_lit
(evaling_lit
(list_lit
(sym_lit
(sym_name))
(sym_lit
(sym_name))))))