* Fixed CRLF behavior for tests
* Add windows tests to CI
* Removed comment about windows tests from CI file
* Run tests on PRs
* Use windows test script in CI
* Prefer element reference over method invocation
Ruby presents two ~syntactic~ sugarings that can not be distinguished
unambiguously from syntax alone:
First, array elements can be referenced using a bracketed argument after
any amount of white space, so:
x.[](0)
x[0]
x [0]
are all equivalent.
Second, methods may be invoked with omitted parends, so:
f(y)
f y
are equivalent.
The ambiguity can be seen when the function argument is a literal array:
f [0]
At this point, there is no syntactic information that can distinguish
between element reference and procedural invocation.
This can be seen by running this program in irb:
irb(main):001:0> x = [0, 1, 2]
=> [0, 1, 2]
irb(main):002:0> x.[](0)
=> 0
irb(main):003:0> x [0]
=> 0
irb(main):004:0> def y(z)
irb(main):005:1> z
irb(main):006:1> end
=> :y
irb(main):007:0> y([0])
=> [0]
irb(main):008:0> y [0]
=> [0]
Previously, tree-sitter-ruby handled this ambiguity by presenting both
`x [0]` and `y [0]` as procedural invocation.
However, this causes a parse error as described in
tree-sitter/tree-sitter-ruby#146, when parsing
d.find { |x| a(x) } [b] == c
Here I add an optional, lower-precedence interpretation of `x [0]` as an
element reference.
Due to the construction of the grammar in this project, this
unfortunately causes problems when attempting to parse constructs like:
fun [0] do
something
end
as the parser will eagerly consume `fun [0]` as the left-hand-side of
a binary expression. To deal with this case, I explicitly add this
construct to the `call` production rule. Unfortunately I had to resort
to the GLR parser in order to resolve the ambiguity between these two
rules.
Finally, note that the tree obtained from the construct
z [0] == 0
is context-sensitive in Ruby. If `z` is an array type, it is interpreted
as `binary ( reference ( identifier, integer ), integer`. If `z` is
a method, it is interpreted as `call ( identifier, binary ( array
(integer), integer)`. Since tree-sitter assumes the parsed language is
context-free, there's no good way for us to resolve this ambiguity. This
commit prefers the second, method-invocation, interpretation, which
appears to be more common within the test corpus.
* Use external scanner logic to distinguish between arrays & subscripts
When an opening square bracket appears immediately after a callable
expression like "a" or "a.b", we must decide between two possible
interpretations of the bracket:
1. It could be part of an element reference, as in
`a[0] = true`.
2. Or it could be an array literal, passed as an argumet, as in
`puts [1, 2, 3]`
If there is no preceding whitespace, the bracket should *always* be
treated as part of an element reference. This matches MRI's behavior.
If there *is* preceding whitespace, MRI makes its decision in a
context-sensitive way, based on whether the preceding expression
is a local variable or a method name.
This parser is not context-sensitive, so we instead will interpret
the bracket as part of an array literal whenever that is syntactically
valid, and interpret it as part of element reference otherwise. The
external scanner can use the validity of other expression tokens like
`string` to infer whether an array literal would be valid.
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Per irb:
irb(main):016:0> ?\xff
=> "\xFF"
irb(main):017:0> ?\u{024f}
=> "ɏ"
irb(main):018:0> ?\u024f
=> "ɏ"
Add these test cases and update the character matching regex to enable.
Fixestree-sitter/tree-sitter-ruby#145.
Trailing commas (accepted by MRI) no longer cause parser errors, e.g.
in:
for x, in 1..10 do
puts x
end
For the following example, with multiple identifiers *not* surrounded by
parentheses, it was awkward to have multiple pattern nodes without an
overall node for the entire `key, value` pattern. Now, there is one.
for key, value in my_hash do
end
foo.bar and foo.bar() result in the same shape of parse tree, where
previously the latter had an extra level of nesting with a call inside
the method_call. Now they are both just a call.