difftastic/vendored_parsers/tree-sitter-clojure/notes.txt

// NOTES
//
// - possibilities (may be as separate grammars?)
//   - no fields (but likely that means metadata lives "outside")
//   - retain whitespace and comments (for round-tripping)
//   - clojure clr's pipe-escaping:
//       https://github.com/clojure/clojure-clr/wiki/Specifying-types
//
// - traveral issues
//   - use of fields (e.g. value, prefix, tag, metadata)
//     - allows skipping certain nodes such as:
//       - metadata
//       - comment
//       - discard-related
//     - allows targeted navigation without having to know the
//       node type (e.g. field value vs node type map)
//     - limitations
//       - a bit slower?
//       - cannot use fields for things without names, e.g.
//         - seq(...) cannot be the 2nd arg to field()
//         - $._foo won't work unless it "resolves" to $.bar (non underscore)
//   - for a given node, examine child nodes in reverse, that is,
//     starting at the end and working backwards
//
// - probably won't do
//   - support def, if, and other "primitives"
//   - support for {{}} template constructs
//
// - testing
//   - clj, cljc, cljs
//   - what about edn?
//   - approaches
//     - "port" hand-written tests
//       - oakmac (done)
//       - Tavistock (done)
//       - tonsky
//     - generative testing for token testing (done via hypothesis and py-tree-sitter)
//     - look for parsing errors across large sample (e.g. clojars) (done)
//   - how to "package" testing facilities
//     - currently each approach has its own project directory
//
// - debugging
//   - npx tree-sitter parse filepath + look for ERROR in console output
//   - npx tree-sitter parse --debug-graph filepath + view log.html
//   - npx tree-sitter parse --debug filepath + view console output
//
// - loosening ideas:
//   - allow ##Other (not just ##Inf, -##Inf, ##NaN)
//   - allow # in keywords
//   - allow ::/
//   - don't handle "no repeating colons" in symbols and in non-leading
//     portions of keywords (currently unimplemented anyway)
//
// - can strings have unicode escapes in them?
//
// - tree-sitter
//   - parse subcommand
//     - parse from stdin
//     - recursively traverse multiple directories (globbing exists)
//     - parsing within zips/jars
//     - more flexible file type specification
//     - custom parsing / processing per "file"
//   - web-ui subcommand
//     - didn't work when grammar used externals
//     - file browsing + loading better than copy-paste
//     - indiciate error via color
//     - jump to error
//     - somehow searching for error doesn't seem to work sometimes
//   - ~/.tree-sitter
//     - bin
//       - contains shared libraries for each grammar
//       - parse command seems to install stuff here
//     - config.json
//       - parser-directories used to customize "scan" for grammars
//       - theme used for highlight subcommand

// symbolPat from LispReader.java (for keywords and symbols?)
//   "[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"
//
// https://clojure.org/reference/reader#_symbols
//   1. Symbols begin with a non-numeric char -- XXX: see 2 for limits?
//   2. Can contain alphanumeric chars and *, +, !, -, _, ', ?, <, > and =
//   3. / can be used once in the middle of a symbol (sep the ns from the name)
//   4. / by itself names the division function
//   5. . special meaning can be used >= 1 times in the middle of a symbol
//        to designate a fully-qualified class name, e.g. java.util.BitSet,
//        or in namespace names.
//   6. Symbols beginning or ending with . are reserved by Clojure
//   7. Symbols beginning or ending with : are reserved by Clojure
//   8. A symbol can contain one or more non-repeating ':'s
//
// missing
//   9. $, &, % -- in body and end of symbol
//
// undocumented
//   -1a can be made a symbol, but reader will reject?  repl rejects
//     => number parsing takes priority?
//   'a can be made a symbol, but reader will reject?  repl -> quote
//
// implied?
//   doesn't start with ,
//   doesn't start with '
//   doesn't start with #
//   doesn't start with `
//   doesn't start with @
//   doesn't start with ^
//   doesn't start with \
//   doesn't start with ;
//   doesn't start with ~
//   doesn't start with "
//   doesn't start with ( )
//   doesn't start with { }
//   doesn't start with [ ]
//
// extra:
//
//   is my-ns// valid?
//
//     "Consistency of symbols between different readers/edn"
//
//     foo// should be valid.
//
//     2014-09-16 clojure-dev google group alex miller
//
//     https://groups.google.com/d/msg/clojure-dev/b09WvRR90Zc/c3zzMFqDsRYJ
//
//     "CLJ-1238 Allow EdnReader to read foo// (matches LispReader behavior)"
//
//     changelog for clojure 1.6
//
//   is # allowed as a constituent character in keywords?
//
//     following points are reasoning based on edn docs
//
//     "Bug in reader or repl? reading keyword :#abc"
//
//     Symbols begin with a non-numeric character and can contain
//     alphanumeric characters and . * + ! - _ ? $ % & =. If -, + or
//     . are the first character, the second character must be
//     non-numeric. Additionally, : # are allowed as constituent
//     characters in symbols other than as the first character.
//
//     2013-05-02 clojure google group colin jones (fwd by dave sann)
//
//     https://groups.google.com/d/msg/clojure/lK7juHxsPCc/TeYjxoW_3csJ
//
//     Keywords are identifiers that typically designate
//     themselves. They are semantically akin to enumeration
//     values. Keywords follow the rules of symbols, except they can
//     (and must) begin with :, e.g. :fred or :my/fred. If the target
//     platform does not have a keyword type distinct from a symbol
//     type, the same type can be used without conflict, since the
//     mandatory leading : of keywords is disallowed for symbols.
//
//     https://github.com/edn-format/edn#symbols
//
// https://clojure.org/reference/reader#_literals
//   0. Keywords are like symbols, except:
//   1. They can and must begin with a colon, e.g. :fred.
//   ~~2. They cannot contain '.' in the name part, or name classes.~~
//   3. They can contain a namespace, :person/name, which may contain '.'s.
//   4. A keyword that begins with two colons is auto-resolved in the current
//      namespace to a qualified keyword:
//      - If the keyword is unqualified, the namespace will be the current
//        namespace. In user, ::rect is read as :user/rect.
//      - If the keyword is qualified, the namespace will be resolved using
//        aliases in the current namespace. In a namespace where x is aliased
//        to example, ::x/foo resolves to :example/foo.
//
// extra:
//
//   :/ is a legal keyword(?):
//
//     alexmiller: @gfredericks :/ is "open for the language to start
//     interpreting" and not an invalid keyword so should be ok to generate.
//     and cljs should fix it's weirdness. (#clojure-dev 2019-06-07)
//
//     https://clojurians-log.clojureverse.org/clojure-dev/2019-06-07
//
//     It is undefined/left for future expansion.
//
//     Clojurescript's reading seems weird but given that this is undefined
//     it's hard to say it's wrong. :)
//
//     2020-07-10 (or so) alexmiller
//
//     https://ask.clojure.org/index.php/9427/clarify-the-position-of-as-a-keyword
//     https://clojure.atlassian.net/browse/TCHECK-155
//
//   . CAN be in the name part:
//
//     "[Bug?] Keyword constraints not enforced"
//
//     I think you've both misread "they cannot name classes" to be - "They
//     cannot contain class names".
//
//     The symbol String can name a class but the keyword :String can't,
//     that's all I meant there.
//
//     As far as '.', that restriction has been relaxed. I'll try to touch
//     up the docs for the next release.
//
//     2008-11-25 clojure google group rich hickey
//
//     https://groups.google.com/d/msg/clojure/CCuIp_bZ-ZM/THea7NF91Z4J
//
//   Whether keywords can start with numbers:
//
//     "puzzled by RuntimeException"
//
//     we currently allow keywords starting with numbers and seem to have
//     decided this is ok. I would like to get Rich to approve a change to
//     the page and do so.
//
//     2014-04-25 clojure google group alex miller
//
//     https://groups.google.com/forum/#!msg/clojure/XP1XAaDdKLY/kodfZTk8eeoJ
//
//     From a discussion in #clojure, it emerged that while :foo/1 is
//     currently not allowed, ::1 is.
//
//     2014-12-10 nicola mometto
//
//     https://clojure.atlassian.net/browse/CLJ-1286
//
//     "Clarify and align valid symbol and keyword rules for Clojure (and edn)"
//
//     https://clojure.atlassian.net/browse/CLJ-1527
//
//     consistency of symbols between different readers/edn
//
//     https://groups.google.com/forum/#!topic/clojure-dev/b09WvRR90Zc
//
//     :1 is accepted because it once accidentally worked and they
//     don't like breaking existing code
//
//     it was never meant to
//
//     2020-06-14 ish noisesmith on #clojure (slack)
//
//     There are libraries out there that assume :1 works.  They changed
//     Clojure at one point in an alpha to disallow such keywords and it broke
//     code so they decided to continue allowing them (even tho' they are
//     not "legal").
//
//     2020-06-14 ish seancorfield on #clojure (slack)
//
//   Whether # is allowed in a keyword:
//
//     "Clarification on # as valid symbol character"
//
//     this works now, but is not guaranteed to always be valid
//
//     2016-11-08 clojure google group alex miller
//
//     https://groups.google.com/forum/#!topic/clojure/CwZHu1Eszbk

// https://clojure.org/reference/reader#_literals
//   1. Integers can be indefinitely long and will be read as Longs when
//      in range and clojure.lang.BigInts otherwise.
//   2. Integers with an N suffix are always read as BigInts.
//   3. When possible, they can be specified in any base with radix from
//      2 to 36 (see Long.parseLong()); for example 2r101010, 8r52, 36r16,
//      and 42 are all the same Long.
//   4. Floating point numbers are read as Doubles; with M suffix they are
//      read as BigDecimals.
//   5. Ratios are supported, e.g. 22/7.

// intPat
//   "([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?"

// 0[0-9]+ is for better errors -- thanks seancorfield and andyfingerhut

// ratioPat
//   "([-+]?[0-9]+)/([0-9]+)"

// floatPat
//   "([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?"