difftastic/vendored_parsers/tree-sitter-clojure/doc/scope.md

104 lines
4.6 KiB
Markdown

# Scope of tree-sitter-clojure
## TLDR
Only "primitives"
(e.g. [symbols](https://github.com/sogaiu/tree-sitter-clojure/blob/c00293fb0cd5ce3a7005c0601e9b546c1ea73094/grammar.js#L280-L282),
[lists](https://github.com/sogaiu/tree-sitter-clojure/blob/c00293fb0cd5ce3a7005c0601e9b546c1ea73094/grammar.js#L307-L309),
etc.) are supported, i.e. no higher level constructs like `defn`.
## The Details
### Why
For some background, Clojure (and other Lisps) have runtime extensible
"syntax" via macros, but AFAIU tree-sitter's current design assumes a
fixed syntax.
Keeping the above in mind, below are some of the factors that
influenced the current stance on scope:
* Clojure has no language specification. This means it's unclear what
to try to support in the grammar. For example, `defn` is defined in
the `clojure.core` namespace, but then so are a lot of other things
and `clojure.core` is not a small namespace.
* Each additional item added to the grammar tends to increase the
difficulty of getting the grammar to function correctly (or well
enough). In the event that an issue is discovered or a much desired
feature surfaces, the more items there already are in the grammar,
generally, the harder it may be to accomodate / adjust.
* Handling more things might lead to degraded performance. Apart from
possibly that being a negative for end-user use, that might also
lead to more waiting time while testing across large samples of code
(which has been essential because of the lack of a specification).
### Alternatives
It is possible to [use tree-sitter-clojure as a
base](https://github.com/tree-sitter/tree-sitter/issues/645) to add
additional constructs to a "derived" grammar. For example, such a
grammar might be specialized to look for "definitions". At least in
[emacs-tree-sitter](https://github.com/ubolonton/emacs-tree-sitter),
[it is technically possibly to have multiple grammars be used on
single
buffer](https://github.com/ubolonton/emacs-tree-sitter/discussions/129#discussioncomment-502836):
> If you want 2 parse trees in the same buffer instead, you would need
> to define an advice for tree-sitter--do-parse, as well as additional
> buffer-local variables for the secondary grammar.
Apparently it became possible in September of 2020 for [queries to
match on any of a node's
supertypes](https://github.com/tree-sitter/tree-sitter/pull/738). It
may be possible to make a list supertype that is "composed of" `defn`
and things that are not `defn`.
[tree-sitter-clojure-def](https://github.com/sogaiu/tree-sitter-clojure-def)
is an attempt at realizing this apoproach.
However, depending on one's goals, it might make more sense to
consider leveraging [clj-kondo's analysis
capabilities](https://github.com/clj-kondo/clj-kondo/tree/master/analysis)
as clj-kondo already understands Clojure pretty well. IIUC,
[clojure-lsp does
this](https://github.com/clojure-lsp/clojure-lsp/blob/14724457f0d553795dfe16317d3ee6c5fc97d4ba/deps.edn#L21).
### Miscellaneous Points
* Earlier attempts at adding `def` and friends resulted in
unacceptably high error rates [1]. The tests were conducted against
code from [Clojars](https://clojars.org/) (uncontrived code) [2].
* For use cases like structural editing, it seems important to be able
to distinguish between the following sorts of cases:
* `defn` used for defining a function, and
* [Using the symbol `defn` within a macro to construct code to
define a
function](https://github.com/Raynes/conch/blob/685f2c73138f376f2aa0623053dfdaba350a04f4/src/me/raynes/conch.clj#L251-L252)
AFAICT, the approach taken in tree-sitter-clojure-def does not
make telling these sorts of things apart possible.
* It doesn't seem possible to support all "defining" macros like
`defsomething`
(e.g. https://github.com/redplanetlabs/specter/blob/efaf35558a2c0068f5b6a8ef1dbbd0912702bdbd/src/clj/com/rpl/specter.cljc#L57-L60)
since a user's Clojure code can define these.
## Footnotes
* [1] Author's opinion :)
* [2] Two of the previous tree-sitter-clojure attempts (by
[oakmac](https://github.com/oakmac/tree-sitter-clojure) and
[Tavistock](https://github.com/Tavistock/tree-sitter-clojure)) also
had unacceptably high error rates. The former of those two grammars
tried to handle higher level constructs and it had a notably higher
error rate. After trying to modify that grammar to address the error
rate unsuccessfully, it seemed like the two points were related. Note
though that this is just a suspicion.
## References
* https://www.reddit.com/r/Clojure/comments/fkc6uv/is_anyone_working_on_a_treesitter_parser_for/fksmf67/
* https://github.com/sogaiu/tree-sitter-clojure/issues/15#issuecomment-880729889