difftastic/vendored_parsers/tree-sitter-clojure/doc/testing.md

163 lines
9.5 KiB
Markdown

# Testing tree-sitter-clojure
## TLDR
[tree-sitter-clojure](https://github.com/sogaiu/tree-sitter-clojure) has been tested using a variety of methods.
## The Details
This document will touch on some of those methods and why they were attempted:
1. Using corpus data from other tree-sitter-clojure attempts
2. Using Clojure source from [Clojars](https://clojars.org/)
3. Generative testing via [Hypothesis](https://github.com/HypothesisWorks/hypothesis)
Other employed methods that won't be covered (in much, if any, detail) here:
1. Sporadic manual invocations
2. Using [tonsky's sublime-clojure](https://github.com/tonsky/sublime-clojure) test data
3. Generative testing via [test.check](https://github.com/clojure/test.check/)
4. [Manual inspection of the grammar](https://github.com/sogaiu/tree-sitter-clojure/issues/3)
## Using corpus data from other tree-sitter-clojure attempts
There were at least two previous attempts at implementing tree-sitter-clojure,
[one by oakmac](https://github.com/oakmac/tree-sitter-clojure) and [another by Tavistock](https://github.com/Tavistock/tree-sitter-clojure). Important things
were learned by trying to make these attempts work, but for reasons not covered
here, a separate attempt was started.
Both earlier attempts had [corpus](https://github.com/oakmac/tree-sitter-clojure/tree/master/corpus) [data](https://github.com/Tavistock/tree-sitter-clojure/tree/master/corpus) that could be adapted for testing. Consequently,
[tsclj-tests-parser](https://gitlab.com/sogaiu/tsclj-tests-parser)
was created to extract [the relevant data as plain files](https://gitlab.com/sogaiu/tsclj-tests-parser/-/tree/master/test-files). These were in turn fed to
tree-sitter's `parse` command using the tree-sitter-clojure grammar to check
for parsing errors.
If changes are made to tree-sitter-clojure's grammar, this method can be used
to quickly check for some forms of undesirable breakage. (This could be taken
a bit further by adapting the content as corpus data for tree-sitter-clojure.)
### But...
One issue with this approach is that it relies on manually identifying and
spelling out appropriate test cases, which in the case of Clojure, is
complicated by the lack of a language specification.
Apart from detailed research, this was partially addressed by testing against
a large sample of Clojure source code written by the community.
## Using Clojure source from Clojars
The most fruitful method of testing was working with Clojure source written
by humans for purposes other than for testing tree-sitter-clojure.
### Where to get samples of Clojure source
Initially, repositories were cloned from a variety of locations, but before
long a decision was made to switch to using "release" jars from Clojars.
The latter decision was motivated by wanting source that was less likely to
be "broken" in various ways. Compared to "release" jar content from Clojars,
the default branch of a repository seemed to have a higher probability of
"not quite working". Although the Clojars "release" idea was an improvement,
weeding out inappropriate Clojure source was still necessary.
A variety of approaches were used to come up with a specific list of jars from
Clojars, but the most recent attempt is [gen-clru-list](https://gitlab.com/sogaiu/gen-clru-list). This is basically a [babashka](https://github.com/babashka/babashka) script that fetches [Clojars' feed.clj](https://github.com/clojars/clojars-web/wiki/Data#useful-extracts-from-the-poms), does some processing, and
writes out a list of urls. For reference, this approach currently yields a number
of urls in the neighborhood of 19,000.
### How to check retrieved Clojure samples
The retrieved content was initially checked using [a-tsclj-checker](https://github.com/sogaiu/a-tsclj-checker) (an adaptation of
[analyze-reify](https://github.com/borkdude/analyze-reify)) which uses
[Rust bindings for tree-sitter](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust) and tree-sitter-clojure to parse Clojure
source code. Notably, it can traverse directories and also operate on `.jar`
files.
Once an error is detected, it is easier to investigate if one has direct
access to the Clojure source file in question (as compared with rummaging
around `.jar` files). Thus, it was decided to create a single directory tree
containing extracted data from all retrieved jars. On a side note, the
single directory tree took less than 2 GB of disk space.
A less fancy, but easier to maintain (i.e. not written in Rust) tool --
[ts-grammar-checker](https://gitlab.com/sogaiu/ts-grammar-checker) -- was
developed as an alternative to `a-tsclj-checker`. Strictly speaking,
`ts-grammar-checker` may not be necessary as one can probably employ
tree-sitter's `parse` command in combination with `find`, `xargs` and the like
if on some kind of \*nix. An example of a comparable invocation is:
```
find ~/src/clojars-cljish -type f -regex '.*\.clj[cs]?$' -print0 | xargs -0 npx tree-sitter parse --quiet > my-results.txt
```
`a-tsclj-checker` is the fastest tool but it has not been updated to the most
recent version of tree-sitter-clojure. `ts-grammar-checker` is not quite as
fast, but it can be easily adapted to work with other tree-sitter grammars (e.g.
it's [used](https://gitlab.com/sogaiu/ts-grammar-checker/-/blob/master/janet-checker.janet) for [tree-sitter-janet-simple](https://github.com/sogaiu/tree-sitter-janet-simple) as well). However, it does not support accessing content
within `.jar` files.
Across somewhat less than 150,000 files (.clj, .cljc, .cljs), `a-tsclj-checker`
typically takes a little less than 30 seconds, while `ts-grammar-checker`
typically takes a bit more than 100 seconds (at least on the author's machine).
In subjective terms, it hasn't felt terribly different because knowing there
is at least a 30 second wait, [one typically doesn't sit waiting at a prompt
for execution completion](https://xkcd.com/303/).
For any files that parse with errors, it can be handy to apply
[clj-kondo](https://github.com/clj-kondo/clj-kondo). The specific details that
`clj-kondo` reported were often helpful when examining individual files, but
that diagnostic information also provided a way to partition the files into
groups. Subjectively it can feel more manageable to deal with 5 groups of files
compared with 100 separate files (though it's true that the grouping does
not always turn out to be that meaningful).
An individual "suspect" file is typically viewed manually in an editor (usually
one that has `clj-kondo` support enabled) and examined for "issues".
In practice, testing the grammar against appropriate Clojure source from Clojars
has been the most useful in finding issues with the grammar. The lack of a
specification for Clojure increased the difficulty of creating an appropriate
grammar, but having a large sample of code to test against helped to mitigate
this a bit. On more than one occasion some version of the grammar failed to
parse some legitimate Clojure source and subsequent investigation revealed
that the grammar had not accounted for an uncommom and/or unanticipated usage.
### But...
This method has a significant weakness as there could be cases where
tree-sitter would parse successfully but the result could be inappropriate.
For example, if the grammar definition was faulty, something which should
be parsed as a symbol might end up parsed as a number with no error reported.
To partially address this issue, generative / property-based testing was
attempted.
## Generative testing via Hypothesis
Initially, [some effort was made to use test.check](https://gist.github.com/sogaiu/c0d668d050b63e298ef63549e357f9d2). However, [an outstanding issue with test.check](https://github.com/clojure/test.check/blob/master/doc/growth-and-shrinking.md#unnecessary-bind) (aka TCHECK-112) seemed very likely to be relevant
for the types of tests being considered. Also, the approach used [libpython-clj](https://github.com/clj-python/libpython-clj) to call tree-sitter via [Python bindings for tree-sitter](https://github.com/tree-sitter/py-tree-sitter). Although invoking tree-sitter via Python worked, it was awkward to connect this with `test.check`. For the above reasons, the `test.check` + `libpython-clj` approach (neat as it was) was abandoned.
Interestingly, Python's Hypothesis doesn't suffer from test.check's ["long-standing Hard Problem"](https://clojure.atlassian.net/browse/TCHECK-112) so that was given a try. [prop-test-ts-clj](https://github.com/sogaiu/prop-test-ts-clj) and [hypothesis-grammar-clojure](https://github.com/sogaiu/hypothesis-grammar-clojure) are the resulting
bits.
At least [one issue](https://github.com/sogaiu/tree-sitter-clojure/issues/7) was discovered and it also turned out that
[parcera](https://github.com/carocad/parcera) was [affected](https://github.com/carocad/parcera/issues/86).
The code was also adapted a bit to test [Calva](https://github.com/BetterThanTomorrow/calva). Some issues were discovered and [reported upstream](https://github.com/BetterThanTomorrow/calva/issues/802).
### But...
A drawback of this approach is that details of the tree-sitter-clojure grammar
became embedded in the tests. One consequence is that if
tree-sitter-clojure's grammar changes, then the tests may need to be updated
to reflect changes in the grammar (if there is an intent to continue to
use them).
## Summary
tree-sitter-clojure has been tested in a variety ways attempting to address
various real-world constraints (e.g. lack of a language specification,
limitations of tree-sitter's approach for a language with extensible syntax,
etc.). AFAICT, for what it sets out to do, it seems to work pretty well so
far.