difftastic/vendor/tree-sitter-hack
Wilfred Hughes 22813808fe Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3'
git-subtree-dir: vendor/tree-sitter-hack
git-subtree-mainline: 73c12d2b04
git-subtree-split: 4770eb21a3
2022-02-02 23:11:29 +07:00
..
.github Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.vscode Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
bin Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
bindings Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
src Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
test Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.clang-format Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.gitattributes Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.gitignore Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.npmignore Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.npmrc Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.prettierignore Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.prettierrc Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
.rubocop.yml Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
Cargo.toml Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
LICENSE Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
README.md Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
binding.gyp Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
diagram.png Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
grammar.js Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00
package.json Add 'vendor/tree-sitter-hack/' from commit '4770eb21a36307c156cfd2555ddd8e10c304fdc3' 2022-02-02 23:11:29 +07:00

README.md

tree-sitter-hack

build

At Slack proactively securing our systems is a top priority. One way we achieve this is by automating the detection of vulnerabilities with static code analysis scanning. Although an abundance of tools exist for scanning the majority of programming languages, our codebase is overwhelmingly written in Hack - a language not widely used outside of Slack. Rather than building our own tool from scratch, we are extending the functionality of an open source static analysis tool, Semgrep, to be compatible with Hack. But how do we teach Semgrep the Hack programming language?

Like all human languages, programming languages have a structure to them known as grammar. Grammar rules are used to create a parser which converts source code into a concrete syntax tree (CST) which is a structural representation of the code. Tree-Sitter is a fast and robust library that can generate a CST from our Hack grammar rules. This CST has many use cases such as robust syntax highlighting, code folding, linting, etc. Most importantly, Semgrep uses this CST to understand Hack on a semantic level. This semantic understanding in conjunction with Semgrep rules can detect vulnerabilities in source code. This process is demonstrated by the following diagram.

tree-sitter-hack use in Semgrep

In summary, we use tree-sitter-hack to teach Semgrep the Hack programming language.

Installation

$ git clone https://github.com/slackhq/tree-sitter-hack
$ cd tree-sitter-hack
$ npm install

Usage

$ echo 'function main(): void { print "wyd, world\\n"; }' > script.hack
$ npx tree-sitter generate
$ npx tree-sitter parse script.hack
(script [0, 0] - [3, 0]
  (function_declaration [0, 0] - [2, 1]
    name: (identifier [0, 9] - [0, 13])
    (parameters [0, 13] - [0, 15])
    return_type: (primitive_type [0, 17] - [0, 21])
    body: (compound_statement [0, 22] - [2, 1]
      (expression_statement [1, 2] - [1, 23]
        (print_expression [1, 2] - [1, 22]
          (string [1, 8] - [1, 22]))))))

Testing

$ npx tree-sitter generate
$ bin/test-corpus

Scripts

bin/generate-parser

Wrapper around tree-sitter generate that skips parser generation if grammar.js hasn't changed since last run.

bin/generate-corpus

Unlike most other Tree-sitter projects, we breakout test cases into separate files (see test/cases). This is done so editors have an easier time syntax highlighting test cases. But also I find individual files easier to navigate than the corpus.txt files used by Tree-sitter.

We use bin/generate-corpus to generate the test/corpus/case1.txt from individual test/cases files so we can still use tree-sitter test.

bin/test-corpus

Run bin/generate-corpus and bin/generate-parser before running tree-sitter test.

bin/test-dir

Run bin/ts-errors on all files with .hack or .php extension in the given directory recursively.

$ ./bin/test-dir hhvm/hphp/hack/test
examples/hhvm/hphp/hack/test/error_formatting_highlighted/zero_width_syntax_err.php
(3,11)-(3,18) extends
examples/hhvm/hphp/hack/test/autocomplete/not_shape_key_string.php
(3,1)-(6,1) function foo(): string {\n  return "AUTO332\n}\n
(4,10)-(6,1) "AUTO332\n}\n
...

Note

npm doesn't allow packages with the word "hack" in their registry which is why the repo name does not match the package name.

Unfortunately, the word "hack" triggers our spam detection and can't be used in package names. We recommend choosing other keywords that highlight your package's functionality.

References

There's no published official Hacklang language spec so we have to make do.