mirror of https://github.com/Wilfred/difftastic/
279 lines
10 KiB
Plaintext
279 lines
10 KiB
Plaintext
// NOTES
|
|
//
|
|
// - possibilities (may be as separate grammars?)
|
|
// - no fields (but likely that means metadata lives "outside")
|
|
// - retain whitespace and comments (for round-tripping)
|
|
// - clojure clr's pipe-escaping:
|
|
// https://github.com/clojure/clojure-clr/wiki/Specifying-types
|
|
//
|
|
// - traveral issues
|
|
// - use of fields (e.g. value, prefix, tag, metadata)
|
|
// - allows skipping certain nodes such as:
|
|
// - metadata
|
|
// - comment
|
|
// - discard-related
|
|
// - allows targeted navigation without having to know the
|
|
// node type (e.g. field value vs node type map)
|
|
// - limitations
|
|
// - a bit slower?
|
|
// - cannot use fields for things without names, e.g.
|
|
// - seq(...) cannot be the 2nd arg to field()
|
|
// - $._foo won't work unless it "resolves" to $.bar (non underscore)
|
|
// - for a given node, examine child nodes in reverse, that is,
|
|
// starting at the end and working backwards
|
|
//
|
|
// - probably won't do
|
|
// - support def, if, and other "primitives"
|
|
// - support for {{}} template constructs
|
|
//
|
|
// - testing
|
|
// - clj, cljc, cljs
|
|
// - what about edn?
|
|
// - approaches
|
|
// - "port" hand-written tests
|
|
// - oakmac (done)
|
|
// - Tavistock (done)
|
|
// - tonsky
|
|
// - generative testing for token testing (done via hypothesis and py-tree-sitter)
|
|
// - look for parsing errors across large sample (e.g. clojars) (done)
|
|
// - how to "package" testing facilities
|
|
// - currently each approach has its own project directory
|
|
//
|
|
// - debugging
|
|
// - npx tree-sitter parse filepath + look for ERROR in console output
|
|
// - npx tree-sitter parse --debug-graph filepath + view log.html
|
|
// - npx tree-sitter parse --debug filepath + view console output
|
|
//
|
|
// - loosening ideas:
|
|
// - allow ##Other (not just ##Inf, -##Inf, ##NaN)
|
|
// - allow # in keywords
|
|
// - allow ::/
|
|
// - don't handle "no repeating colons" in symbols and in non-leading
|
|
// portions of keywords (currently unimplemented anyway)
|
|
//
|
|
// - can strings have unicode escapes in them?
|
|
//
|
|
// - tree-sitter
|
|
// - parse subcommand
|
|
// - parse from stdin
|
|
// - recursively traverse multiple directories (globbing exists)
|
|
// - parsing within zips/jars
|
|
// - more flexible file type specification
|
|
// - custom parsing / processing per "file"
|
|
// - web-ui subcommand
|
|
// - didn't work when grammar used externals
|
|
// - file browsing + loading better than copy-paste
|
|
// - indiciate error via color
|
|
// - jump to error
|
|
// - somehow searching for error doesn't seem to work sometimes
|
|
// - ~/.tree-sitter
|
|
// - bin
|
|
// - contains shared libraries for each grammar
|
|
// - parse command seems to install stuff here
|
|
// - config.json
|
|
// - parser-directories used to customize "scan" for grammars
|
|
// - theme used for highlight subcommand
|
|
|
|
// symbolPat from LispReader.java (for keywords and symbols?)
|
|
// "[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"
|
|
//
|
|
// https://clojure.org/reference/reader#_symbols
|
|
// 1. Symbols begin with a non-numeric char -- XXX: see 2 for limits?
|
|
// 2. Can contain alphanumeric chars and *, +, !, -, _, ', ?, <, > and =
|
|
// 3. / can be used once in the middle of a symbol (sep the ns from the name)
|
|
// 4. / by itself names the division function
|
|
// 5. . special meaning can be used >= 1 times in the middle of a symbol
|
|
// to designate a fully-qualified class name, e.g. java.util.BitSet,
|
|
// or in namespace names.
|
|
// 6. Symbols beginning or ending with . are reserved by Clojure
|
|
// 7. Symbols beginning or ending with : are reserved by Clojure
|
|
// 8. A symbol can contain one or more non-repeating ':'s
|
|
//
|
|
// missing
|
|
// 9. $, &, % -- in body and end of symbol
|
|
//
|
|
// undocumented
|
|
// -1a can be made a symbol, but reader will reject? repl rejects
|
|
// => number parsing takes priority?
|
|
// 'a can be made a symbol, but reader will reject? repl -> quote
|
|
//
|
|
// implied?
|
|
// doesn't start with ,
|
|
// doesn't start with '
|
|
// doesn't start with #
|
|
// doesn't start with `
|
|
// doesn't start with @
|
|
// doesn't start with ^
|
|
// doesn't start with \
|
|
// doesn't start with ;
|
|
// doesn't start with ~
|
|
// doesn't start with "
|
|
// doesn't start with ( )
|
|
// doesn't start with { }
|
|
// doesn't start with [ ]
|
|
//
|
|
// extra:
|
|
//
|
|
// is my-ns// valid?
|
|
//
|
|
// "Consistency of symbols between different readers/edn"
|
|
//
|
|
// foo// should be valid.
|
|
//
|
|
// 2014-09-16 clojure-dev google group alex miller
|
|
//
|
|
// https://groups.google.com/d/msg/clojure-dev/b09WvRR90Zc/c3zzMFqDsRYJ
|
|
//
|
|
// "CLJ-1238 Allow EdnReader to read foo// (matches LispReader behavior)"
|
|
//
|
|
// changelog for clojure 1.6
|
|
//
|
|
// is # allowed as a constituent character in keywords?
|
|
//
|
|
// following points are reasoning based on edn docs
|
|
//
|
|
// "Bug in reader or repl? reading keyword :#abc"
|
|
//
|
|
// Symbols begin with a non-numeric character and can contain
|
|
// alphanumeric characters and . * + ! - _ ? $ % & =. If -, + or
|
|
// . are the first character, the second character must be
|
|
// non-numeric. Additionally, : # are allowed as constituent
|
|
// characters in symbols other than as the first character.
|
|
//
|
|
// 2013-05-02 clojure google group colin jones (fwd by dave sann)
|
|
//
|
|
// https://groups.google.com/d/msg/clojure/lK7juHxsPCc/TeYjxoW_3csJ
|
|
//
|
|
// Keywords are identifiers that typically designate
|
|
// themselves. They are semantically akin to enumeration
|
|
// values. Keywords follow the rules of symbols, except they can
|
|
// (and must) begin with :, e.g. :fred or :my/fred. If the target
|
|
// platform does not have a keyword type distinct from a symbol
|
|
// type, the same type can be used without conflict, since the
|
|
// mandatory leading : of keywords is disallowed for symbols.
|
|
//
|
|
// https://github.com/edn-format/edn#symbols
|
|
//
|
|
// https://clojure.org/reference/reader#_literals
|
|
// 0. Keywords are like symbols, except:
|
|
// 1. They can and must begin with a colon, e.g. :fred.
|
|
// ~~2. They cannot contain '.' in the name part, or name classes.~~
|
|
// 3. They can contain a namespace, :person/name, which may contain '.'s.
|
|
// 4. A keyword that begins with two colons is auto-resolved in the current
|
|
// namespace to a qualified keyword:
|
|
// - If the keyword is unqualified, the namespace will be the current
|
|
// namespace. In user, ::rect is read as :user/rect.
|
|
// - If the keyword is qualified, the namespace will be resolved using
|
|
// aliases in the current namespace. In a namespace where x is aliased
|
|
// to example, ::x/foo resolves to :example/foo.
|
|
//
|
|
// extra:
|
|
//
|
|
// :/ is a legal keyword(?):
|
|
//
|
|
// alexmiller: @gfredericks :/ is "open for the language to start
|
|
// interpreting" and not an invalid keyword so should be ok to generate.
|
|
// and cljs should fix it's weirdness. (#clojure-dev 2019-06-07)
|
|
//
|
|
// https://clojurians-log.clojureverse.org/clojure-dev/2019-06-07
|
|
//
|
|
// It is undefined/left for future expansion.
|
|
//
|
|
// Clojurescript's reading seems weird but given that this is undefined
|
|
// it's hard to say it's wrong. :)
|
|
//
|
|
// 2020-07-10 (or so) alexmiller
|
|
//
|
|
// https://ask.clojure.org/index.php/9427/clarify-the-position-of-as-a-keyword
|
|
// https://clojure.atlassian.net/browse/TCHECK-155
|
|
//
|
|
// . CAN be in the name part:
|
|
//
|
|
// "[Bug?] Keyword constraints not enforced"
|
|
//
|
|
// I think you've both misread "they cannot name classes" to be - "They
|
|
// cannot contain class names".
|
|
//
|
|
// The symbol String can name a class but the keyword :String can't,
|
|
// that's all I meant there.
|
|
//
|
|
// As far as '.', that restriction has been relaxed. I'll try to touch
|
|
// up the docs for the next release.
|
|
//
|
|
// 2008-11-25 clojure google group rich hickey
|
|
//
|
|
// https://groups.google.com/d/msg/clojure/CCuIp_bZ-ZM/THea7NF91Z4J
|
|
//
|
|
// Whether keywords can start with numbers:
|
|
//
|
|
// "puzzled by RuntimeException"
|
|
//
|
|
// we currently allow keywords starting with numbers and seem to have
|
|
// decided this is ok. I would like to get Rich to approve a change to
|
|
// the page and do so.
|
|
//
|
|
// 2014-04-25 clojure google group alex miller
|
|
//
|
|
// https://groups.google.com/forum/#!msg/clojure/XP1XAaDdKLY/kodfZTk8eeoJ
|
|
//
|
|
// From a discussion in #clojure, it emerged that while :foo/1 is
|
|
// currently not allowed, ::1 is.
|
|
//
|
|
// 2014-12-10 nicola mometto
|
|
//
|
|
// https://clojure.atlassian.net/browse/CLJ-1286
|
|
//
|
|
// "Clarify and align valid symbol and keyword rules for Clojure (and edn)"
|
|
//
|
|
// https://clojure.atlassian.net/browse/CLJ-1527
|
|
//
|
|
// consistency of symbols between different readers/edn
|
|
//
|
|
// https://groups.google.com/forum/#!topic/clojure-dev/b09WvRR90Zc
|
|
//
|
|
// :1 is accepted because it once accidentally worked and they
|
|
// don't like breaking existing code
|
|
//
|
|
// it was never meant to
|
|
//
|
|
// 2020-06-14 ish noisesmith on #clojure (slack)
|
|
//
|
|
// There are libraries out there that assume :1 works. They changed
|
|
// Clojure at one point in an alpha to disallow such keywords and it broke
|
|
// code so they decided to continue allowing them (even tho' they are
|
|
// not "legal").
|
|
//
|
|
// 2020-06-14 ish seancorfield on #clojure (slack)
|
|
//
|
|
// Whether # is allowed in a keyword:
|
|
//
|
|
// "Clarification on # as valid symbol character"
|
|
//
|
|
// this works now, but is not guaranteed to always be valid
|
|
//
|
|
// 2016-11-08 clojure google group alex miller
|
|
//
|
|
// https://groups.google.com/forum/#!topic/clojure/CwZHu1Eszbk
|
|
|
|
// https://clojure.org/reference/reader#_literals
|
|
// 1. Integers can be indefinitely long and will be read as Longs when
|
|
// in range and clojure.lang.BigInts otherwise.
|
|
// 2. Integers with an N suffix are always read as BigInts.
|
|
// 3. When possible, they can be specified in any base with radix from
|
|
// 2 to 36 (see Long.parseLong()); for example 2r101010, 8r52, 36r16,
|
|
// and 42 are all the same Long.
|
|
// 4. Floating point numbers are read as Doubles; with M suffix they are
|
|
// read as BigDecimals.
|
|
// 5. Ratios are supported, e.g. 22/7.
|
|
|
|
// intPat
|
|
// "([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?"
|
|
|
|
// 0[0-9]+ is for better errors -- thanks seancorfield and andyfingerhut
|
|
|
|
// ratioPat
|
|
// "([-+]?[0-9]+)/([0-9]+)"
|
|
|
|
// floatPat
|
|
// "([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?"
|