mirror of https://github.com/Wilfred/difftastic/
create the manual-zh-CN folder, and organize the translation file structure
parent
53d44175cc
commit
44712123d3
@ -0,0 +1 @@
|
||||
book
|
||||
@ -0,0 +1,20 @@
|
||||
[book]
|
||||
authors = ["Wilfred Hughes"]
|
||||
language = "en"
|
||||
multilingual = false
|
||||
src = "src"
|
||||
title = "Difftastic Manual"
|
||||
description = "The official manual for difftastic, the syntactic differ"
|
||||
|
||||
[output.html]
|
||||
git-repository-url = "https://github.com/wilfred/difftastic"
|
||||
|
||||
[output.html.redirect]
|
||||
"/getting_started.html" = "./installation.html"
|
||||
"/upstream_parsers.html" = "/languages_supported.html"
|
||||
|
||||
[output.html.playground]
|
||||
copyable = false
|
||||
|
||||
[preprocessor.replace-version-placeholder]
|
||||
command = "./replace_version_placeholder.sh"
|
||||
@ -0,0 +1,5 @@
|
||||
#!/bin/bash
|
||||
|
||||
DFT_VERSION=$(cargo read-manifest | jq -r .version)
|
||||
|
||||
jq .[1] | jq '.sections[0].Chapter.content |= sub("DFT_VERSION_HERE"; "'$DFT_VERSION'")'
|
||||
@ -0,0 +1,17 @@
|
||||
# Summary
|
||||
|
||||
- [Introduction](./introduction.md)
|
||||
- [Installation](./installation.md)
|
||||
- [Usage](./usage.md)
|
||||
- [Git](./git.md)
|
||||
- [Mercurial](./mercurial.md)
|
||||
- [Languages Supported](./languages_supported.md)
|
||||
- [Internals: Parsing](./parsing.md)
|
||||
- [Internals: Diffing](./diffing.md)
|
||||
- [Tricky Cases](./tricky_cases.md)
|
||||
- [Contributing](./contributing.md)
|
||||
- [Parser Vendoring](./parser_vendoring.md)
|
||||
- [Adding A Parser](./adding_a_parser.md)
|
||||
- [Glossary](./glossary.md)
|
||||
- [Alternative Projects](./alternative_projects.md)
|
||||
- [Tree Diffing](./tree_diffing.md)
|
||||
@ -0,0 +1,136 @@
|
||||
# Adding A Parser
|
||||
|
||||
## Finding a parser
|
||||
|
||||
New parsers for difftastic must be reasonably complete and maintained.
|
||||
|
||||
There are many tree-sitter parsers available, and the tree-sitter
|
||||
website includes [a list of some well-known
|
||||
parsers](https://tree-sitter.github.io/tree-sitter/#available-parsers).
|
||||
|
||||
## Add the source code
|
||||
|
||||
Once you've found a parser, add it as a git subtree to
|
||||
`vendor/`. We'll use
|
||||
[tree-sitter-json](https://github.com/tree-sitter/tree-sitter-json) as
|
||||
an example.
|
||||
|
||||
```
|
||||
$ git subtree add --prefix=vendor/tree-sitter-json git@github.com:tree-sitter/tree-sitter-json.git master
|
||||
```
|
||||
|
||||
## Configure the build
|
||||
|
||||
Cargo does not allow packages to include subdirectories that contain a
|
||||
`Cargo.toml`. Add a symlink to the `src/` parser subdirectory.
|
||||
|
||||
```
|
||||
$ cd vendor
|
||||
$ ln -s tree-sitter-json/src tree-sitter-json-src
|
||||
```
|
||||
|
||||
You can now add the parser to build by including the directory in
|
||||
`build.rs`.
|
||||
|
||||
```
|
||||
TreeSitterParser {
|
||||
name: "tree-sitter-json",
|
||||
src_dir: "vendor/tree-sitter-json-src",
|
||||
extra_files: vec![],
|
||||
},
|
||||
```
|
||||
|
||||
If your parser includes custom C or C++ files for lexing (e.g. a
|
||||
`scanner.cc`), add them to `extra_files`.
|
||||
|
||||
## Configure parsing
|
||||
|
||||
Add an entry to `tree_sitter_parser.rs` for your language.
|
||||
|
||||
```
|
||||
Json => {
|
||||
let language = unsafe { tree_sitter_json() };
|
||||
TreeSitterConfig {
|
||||
name: "JSON",
|
||||
language,
|
||||
atom_nodes: vec!["string"].into_iter().collect(),
|
||||
delimiter_tokens: vec![("{", "}"), ("[", "]")],
|
||||
highlight_query: ts::Query::new(
|
||||
language,
|
||||
include_str!("../vendor/highlights/json.scm"),
|
||||
)
|
||||
.unwrap(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`name` is the human-readable name shown in the UI.
|
||||
|
||||
`atom_nodes` is a list of tree-sitter node names that should be
|
||||
treated as atoms even though the nodes have children. This is common
|
||||
for things like string literals or interpolated strings, where the
|
||||
node might have children for the opening and closing quote.
|
||||
|
||||
If you don't set `atom_nodes`, you may notice added/removed content
|
||||
shown in white. This is usually a sign that child node should have its
|
||||
parent treated as an atom.
|
||||
|
||||
`delimiter_tokens` are delimiters that difftastic stores on
|
||||
the enclosing list node. This allows difftastic to distinguish
|
||||
delimiter tokens from other punctuation in the language.
|
||||
|
||||
If you don't set `delimiter_tokens`, difftastic will consider the
|
||||
tokens in isolation, and may think that a `(` was added but the `)`
|
||||
was unchanged.
|
||||
|
||||
You can use `difft --dump-ts foo.json` to see the results of the
|
||||
tree-sitter parser, and `difft --dump-syntax foo.json` to confirm that
|
||||
you've set atoms and delimiters correctly.
|
||||
|
||||
## Configure sliders
|
||||
|
||||
Add an entry to `sliders.rs` for your language.
|
||||
|
||||
## Configure language detection
|
||||
|
||||
Update `from_extension` in `guess_language.rs` to detect your new
|
||||
language.
|
||||
|
||||
```
|
||||
"json" => Some(Json),
|
||||
```
|
||||
|
||||
There may also file names or shebangs associated with your
|
||||
language. [GitHub's linguist
|
||||
definitions](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml)
|
||||
is a useful source of common file extensions.
|
||||
|
||||
## Syntax highlighting (Optional)
|
||||
|
||||
To add syntax highlighting for your language, you'll also need a symlink
|
||||
to the `queries/highlights.scm` file, if available.
|
||||
|
||||
```
|
||||
$ cd vendor/highlights
|
||||
$ ln -s ../tree-sitter-json/queries/highlights.scm json.scm
|
||||
```
|
||||
|
||||
## Add a regression test
|
||||
|
||||
Finally, add a regression test for your language. This ensures that
|
||||
the output for your test file doesn't change unexpectedly.
|
||||
|
||||
Regression test files live in `sample_files/` and have the form
|
||||
`foo_before.abc` and `foo_after.abc`.
|
||||
|
||||
```
|
||||
$ nano simple_before.json
|
||||
$ nano simple_after.json
|
||||
```
|
||||
|
||||
Run the regression test script and update the `.expected` file.
|
||||
|
||||
```
|
||||
$ ./sample_files/compare_all.sh
|
||||
$ cp sample_files/compare.result sample_files/compare.expected
|
||||
```
|
||||
@ -0,0 +1,5 @@
|
||||
# Alternative Projects
|
||||
|
||||
Many different tools exist for diffing files. This section of the
|
||||
manual discusses the design of other tools that have influenced
|
||||
difftastic.
|
||||
@ -0,0 +1,120 @@
|
||||
# Contributing
|
||||
|
||||
## Building
|
||||
|
||||
Install Rust with [rustup](https://rustup.rs/), then clone the code.
|
||||
|
||||
```
|
||||
$ git clone git@github.com:Wilfred/difftastic.git
|
||||
$ cd difftastic
|
||||
```
|
||||
|
||||
Difftastic uses [Cargo](https://doc.rust-lang.org/cargo/) for
|
||||
building.
|
||||
|
||||
```
|
||||
$ cargo build
|
||||
```
|
||||
|
||||
Debug builds are significantly slower than release builds. For files
|
||||
with more than fifty lines, it's usually worth using an optimised
|
||||
build.
|
||||
|
||||
```
|
||||
$ cargo build --release
|
||||
```
|
||||
|
||||
## Manual
|
||||
|
||||
This website is generated with
|
||||
[mdbook](https://github.com/rust-lang/mdBook/). mdbook can be
|
||||
installed with Cargo.
|
||||
|
||||
```
|
||||
$ cargo install mdbook
|
||||
```
|
||||
|
||||
You can then use the `mdbook` binary to build and serve the site
|
||||
locally.
|
||||
|
||||
```
|
||||
$ cd manual
|
||||
$ mdbook serve
|
||||
```
|
||||
|
||||
## API Documentation
|
||||
|
||||
You can browse the internal API documentation generated by rustdoc
|
||||
[here](https://difftastic.wilfred.me.uk/rustdoc/difft/).
|
||||
|
||||
Difftastic's internal docs are not available on docs.rs, as it [does
|
||||
not support binary crates today](https://difftastic.wilfred.me.uk/rustdoc/difft/).
|
||||
|
||||
## Testing
|
||||
|
||||
```
|
||||
$ cargo test
|
||||
```
|
||||
|
||||
There are also several files in `sample_files/` that you can use.
|
||||
|
||||
The best way to test difftastic is to look at history from a real
|
||||
project. Set `GIT_EXTERNAL_DIFF` to point to your current build.
|
||||
|
||||
For example, you can run difftastic on its own source code.
|
||||
|
||||
```
|
||||
$ GIT_EXTERNAL_DIFF=./target/release/difft git log -p --ext-diff -- src
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
Difftastic uses the `pretty_env_logger` library to log some additional
|
||||
debug information.
|
||||
|
||||
```
|
||||
$ RUST_LOG=debug cargo run sample_files/old.jsx sample_files/new.jsx
|
||||
```
|
||||
|
||||
See the [`env_logger`
|
||||
documentation](https://docs.rs/env_logger/0.9.0/env_logger/) for full details.
|
||||
|
||||
## Profiling
|
||||
|
||||
If you have a file that's particularly slow, you can use
|
||||
[cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph) to see
|
||||
which functions are slow.
|
||||
|
||||
```
|
||||
$ CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph --bin difft sample_files/slow_before.rs sample_files/slow_after.rs
|
||||
```
|
||||
|
||||
It's also worth looking at memory usage, as graph traversal bugs can
|
||||
lead to huge memory consumption.
|
||||
|
||||
```
|
||||
$ /usr/bin/time -v ./target/release/difft sample_files/slow_before.rs sample_files/slow_after.rs
|
||||
```
|
||||
|
||||
If timing measurement are noisy, Linux's `perf` tool will report
|
||||
instructions executed, which is more stable.
|
||||
|
||||
```
|
||||
$ perf stat ./target/release/difft sample_files/slow_before.rs sample_files/slow_after.rs
|
||||
$ perf stat ./target/release/difft sample_files/typing_old.ml sample_files/typing_new.ml
|
||||
```
|
||||
|
||||
Many more profiling techniques are discussed in the [The Rust
|
||||
Performance Book](https://nnethercote.github.io/perf-book/).
|
||||
|
||||
## Releasing
|
||||
|
||||
Use Cargo to create a new release, and tag it in git. Difftastic has a
|
||||
helper script for this:
|
||||
|
||||
```
|
||||
$ ./scripts/release.sh
|
||||
```
|
||||
|
||||
You can now increment the version in Cargo.toml and add a new entry to
|
||||
CHANGELOG.md.
|
||||
@ -0,0 +1,100 @@
|
||||
# Diffing
|
||||
|
||||
Difftastic treats diff calculations as a route finding problem on a
|
||||
directed acyclic graph.
|
||||
|
||||
## Graph Representation
|
||||
|
||||
A vertex in the graph represents a position in two syntax trees.
|
||||
|
||||
The start vertex has both positions pointing to the first syntax node
|
||||
in both trees. The end vertex has both positions just
|
||||
after the last syntax node in both trees.
|
||||
|
||||
Consider comparing `A` with `X A`.
|
||||
|
||||
```
|
||||
START
|
||||
+---------------------+
|
||||
| Left: A Right: X A |
|
||||
| ^ ^ |
|
||||
+---------------------+
|
||||
|
||||
END
|
||||
+---------------------+
|
||||
| Left: A Right: X A |
|
||||
| ^ ^|
|
||||
+---------------------+
|
||||
```
|
||||
|
||||
From the start vertex, we have two options:
|
||||
|
||||
* we can mark the first syntax node on the left as novel, and advance
|
||||
to the next syntax node on the left (vertex 1 above), or
|
||||
* we can mark the first syntax node on the right as novel, and advance
|
||||
to the next syntax node on the right (vertex 2 above).
|
||||
|
||||
```
|
||||
START
|
||||
+---------------------+
|
||||
| Left: A Right: X A |
|
||||
| ^ ^ |
|
||||
+---------------------+
|
||||
/ \
|
||||
Novel atom L / \ Novel atom R
|
||||
1 v 2 v
|
||||
+---------------------+ +---------------------+
|
||||
| Left: A Right: X A | | Left: A Right: X A |
|
||||
| ^ ^ | | ^ ^ |
|
||||
+---------------------+ +---------------------+
|
||||
```
|
||||
|
||||
|
||||
Choosing "novel atom R" to vertex 2 will turn out to be the best
|
||||
choice. From vertex 2, we can see three routes to the end vertex.
|
||||
|
||||
```
|
||||
2
|
||||
+---------------------+
|
||||
| Left: A Right: X A |
|
||||
| ^ ^ |
|
||||
+---------------------+
|
||||
/ | \
|
||||
Novel atom L / | \ Novel atom R
|
||||
v | v
|
||||
+---------------------+ | +---------------------+
|
||||
| Left: A Right: X A | | | Left: A Right: X A |
|
||||
| ^ ^ | | | ^ ^|
|
||||
+---------------------+ | +---------------------+
|
||||
| | |
|
||||
| Novel atom R | Nodes match | Novel atom L
|
||||
| | |
|
||||
| END v |
|
||||
| +---------------------+ |
|
||||
+-------->| Left: A Right: X A |<---------+
|
||||
| ^ ^|
|
||||
+---------------------+
|
||||
```
|
||||
|
||||
## Comparing Routes
|
||||
|
||||
We assign a cost to each edge. Marking a syntax node as novel is worse
|
||||
than finding a matching syntax node, so the "novel atom" edge has a
|
||||
higher cost than the "syntax nodes match" edge.
|
||||
|
||||
The best route is the lowest cost route from the start vertex to the
|
||||
end vertex.
|
||||
|
||||
## Finding The Best Route
|
||||
|
||||
Difftastic uses Dijkstra's algorithm to find the best (i.e. lowest cost)
|
||||
route.
|
||||
|
||||
One big advantage of this algorithm is that we don't need to construct
|
||||
the graph in advance. Constructing the whole graph would require
|
||||
exponential memory relative to the number of syntax nodes. Instead,
|
||||
vertex neighbours are constructed as the graph is explored.
|
||||
|
||||
There are lots of resources explaining Dijkstra's algorithm online,
|
||||
but I particularly recommend the [graph search section of Red Blob
|
||||
Games](https://www.redblobgames.com/pathfinding/a-star/introduction.html#dijkstra).
|
||||
@ -0,0 +1,70 @@
|
||||
# Git
|
||||
|
||||
Git [supports external diff
|
||||
tools](https://git-scm.com/docs/diff-config#Documentation/diff-config.txt-diffexternal). You
|
||||
can use `GIT_EXTERNAL_DIFF` for a one-off git command.
|
||||
|
||||
```
|
||||
$ GIT_EXTERNAL_DIFF=difft git diff
|
||||
$ GIT_EXTERNAL_DIFF=difft git log -p --ext-diff
|
||||
$ GIT_EXTERNAL_DIFF=difft git show e96a7241760319 --ext-diff
|
||||
```
|
||||
|
||||
If you want to use difftastic by default, use `git config`.
|
||||
|
||||
```
|
||||
# Set git configuration for the current repository.
|
||||
$ git config diff.external difft
|
||||
|
||||
# Set git configuration for all repositories.
|
||||
$ git config --global diff.external difft
|
||||
```
|
||||
|
||||
After running `git config`, `git diff` will use `difft`
|
||||
automatically. Other git commands require `--ext-diff` to use
|
||||
`diff.external`.
|
||||
|
||||
```
|
||||
$ git diff
|
||||
$ git log -p --ext-diff
|
||||
$ git show e96a7241760319 --ext-diff
|
||||
```
|
||||
|
||||
## git-difftool
|
||||
|
||||
[git difftool](https://git-scm.com/docs/git-difftool) is a git command
|
||||
for viewing the current changes with a different diff tool. It's
|
||||
useful if you want to use difftastic occasionally.
|
||||
|
||||
Add the
|
||||
following to your `.gitconfig` to use difftastic as your difftool.
|
||||
|
||||
```ini
|
||||
[diff]
|
||||
tool = difftastic
|
||||
|
||||
[difftool]
|
||||
prompt = false
|
||||
|
||||
[difftool "difftastic"]
|
||||
cmd = difft "$LOCAL" "$REMOTE"
|
||||
```
|
||||
|
||||
You can then run `git difftool` to see current changes with difftastic.
|
||||
|
||||
```
|
||||
$ git difftool
|
||||
```
|
||||
|
||||
We also recommend the following settings to get the best difftool
|
||||
experience.
|
||||
|
||||
```ini
|
||||
# Use a pager for large output, just like other git commands.
|
||||
[pager]
|
||||
difftool = true
|
||||
|
||||
# `git dft` is less to type than `git difftool`.
|
||||
[alias]
|
||||
dft = difftool
|
||||
```
|
||||
@ -0,0 +1,33 @@
|
||||
# Glossary
|
||||
|
||||
**Atom**: An atom is an item in difftastic's syntax tree structure
|
||||
that has no children. It represents things like literals, variable
|
||||
names, and comments. See also 'list'.
|
||||
|
||||
**Delimiter**: A paired piece of syntax. A list has an open delimiter
|
||||
and a close delimiter, such as `[` and `]`. Delimiters may not be
|
||||
punctuation (e.g. `begin` and `end`) and may be empty strings (e.g. infix
|
||||
syntax converted to difftastic's syntax tree).
|
||||
|
||||
**LHS**: Left-hand side. Difftastic compares two items, and LHS refers
|
||||
to the first item. See also 'RHS'.
|
||||
|
||||
**List**: A list is an item in difftastic's syntax tree structure that
|
||||
has an open delimiter, children, and a close delimiter. It represents
|
||||
things like expressions and function definitions. See also 'atom'.
|
||||
|
||||
**Novel**: An addition or a removal. Syntax is novel if it occurs
|
||||
in only one of the two items being compared.
|
||||
|
||||
**RHS**: Right-hand side. Difftastic compares two items, and RHS
|
||||
refers to the second item. See also 'LHS'.
|
||||
|
||||
**Root**: A syntax tree without a parent node. Roots represent
|
||||
top-level definitions in the file being diffed.
|
||||
|
||||
**Syntax node**: An item in difftastic's syntax tree structure. Either
|
||||
an atom or a list.
|
||||
|
||||
**Token**: A small piece of syntax tracked by difftastic (e.g. `$x`,
|
||||
`function` or `]`), for highlighting and aligned display. This is
|
||||
either an atom or a non-empty delimiter.
|
||||
@ -0,0 +1,64 @@
|
||||
# Installation
|
||||
|
||||
## Installing a binary
|
||||
|
||||
Difftastic [provides GitHub
|
||||
releases](https://github.com/Wilfred/difftastic/releases) with
|
||||
prebuilt binaries.
|
||||
|
||||
Packages are also available on the following platforms.
|
||||
|
||||
[](https://repology.org/project/difftastic/versions)
|
||||
|
||||
|
||||
## Installing via homebrew (on macOS or Linux)
|
||||
|
||||
Difftastic can be installed with [Homebrew](https://formulae.brew.sh/formula/difftastic) on macOS or Linux.
|
||||
|
||||
|
||||
```
|
||||
$ brew install difftastic
|
||||
```
|
||||
|
||||
## Installing from source
|
||||
|
||||
### Build Requirements
|
||||
|
||||
Difftastic is written in Rust, so you will need Rust installed. I
|
||||
recommend [rustup](https://rustup.rs/) to install Rust. Difftastic
|
||||
requires Rust version 1.57 or later.
|
||||
|
||||
You will also need a C++ compiler that supports C++14. If you're using
|
||||
GCC, you need at least version 8.
|
||||
|
||||
### Build
|
||||
|
||||
You can download and build [difftastic on
|
||||
crates.io](https://crates.io/crates/difftastic) with Cargo (which is
|
||||
part of Rust).
|
||||
|
||||
```
|
||||
$ cargo install difftastic
|
||||
```
|
||||
|
||||
Difftastic uses the `cc` crate for building C/C++ dependencies. This
|
||||
allows you to use environment variables `CC` and `CXX` to control the
|
||||
compiler used (see [the cc
|
||||
docs](https://github.com/alexcrichton/cc-rs#external-configuration-via-environment-variables)).
|
||||
|
||||
See [contributing](./contributing.md) for instructions on debug
|
||||
builds.
|
||||
|
||||
## (Optional) Install MIME Database
|
||||
|
||||
If a MIME database is available, difftastic will use it to detect
|
||||
binary files more accurately. This is the same database used by the
|
||||
`file` command, so you probably already have it.
|
||||
|
||||
The MIME database path is [specified in the XDG
|
||||
specification](https://specifications.freedesktop.org/shared-mime-info-spec/0.11/ar01s03.html). The
|
||||
database should be at one of the following paths:
|
||||
|
||||
* `/usr/share/mime/magic`
|
||||
* `/usr/local/share/mime/magic`
|
||||
* `$HOME/.local/share/mime/magic`
|
||||
@ -0,0 +1,64 @@
|
||||
# Introduction
|
||||
|
||||
Difftastic is a structural diff tool that understands syntax. It
|
||||
supports [over 20 programming languages](./languages_supported.html)
|
||||
and when it works, it's *fantastic*.
|
||||
|
||||
Difftastic is open source software (MIT license) and [available on
|
||||
GitHub](https://github.com/wilfred/difftastic).
|
||||
|
||||
This copy of the manual describes version DFT_VERSION_HERE. The
|
||||
[changelog](https://github.com/Wilfred/difftastic/blob/master/CHANGELOG.md)
|
||||
records which features and bug fixes are in each version.
|
||||
|
||||
## Syntactic Diffing
|
||||
|
||||
Difftastic [detects the language](./usage.html#language-detection), parses the code, and then
|
||||
compares the syntax trees. Let's look at an example.
|
||||
|
||||
```
|
||||
// old.rs
|
||||
let ts_lang = guess(path, guess_src).map(tsp::from_language);
|
||||
```
|
||||
```
|
||||
// new.rs
|
||||
let ts_lang = language_override
|
||||
.or_else(|| guess(path, guess_src))
|
||||
.map(tsp::from_language);
|
||||
```
|
||||
|
||||
<pre><code style="display:block">$ difft old.rs new.rs
|
||||
|
||||
1 <span style="background-color: PaleGreen">1</span> let ts_lang = <span style="background-color: PaleGreen">language_override</span>
|
||||
. <span style="background-color: PaleGreen">2</span> <span style="background-color: PaleGreen">.or_else(||</span> guess(path, guess_src)<span style="background-color: PaleGreen">)</span>
|
||||
. 3 .map(tsp::from_language);
|
||||
</code>
|
||||
</pre>
|
||||
|
||||
Notice how difftastic recognises that `.map` is unchanged, even though
|
||||
it's now on a new line with whitespace.
|
||||
|
||||
A line-oriented diff does a much worse job here.
|
||||
|
||||
<pre><code style="display:block">$ diff -u old.rs new.rs
|
||||
|
||||
@@ -1 +1,3 @@
|
||||
<span style="background-color: #fbbd98">-let ts_lang = guess(path, guess_src).map(tsp::from_language);</span>
|
||||
<span style="background-color: PaleGreen">+let ts_lang = language_override
|
||||
+ .or_else(|| guess(path, guess_src))
|
||||
+ .map(tsp::from_language);</span>
|
||||
</code>
|
||||
</pre>
|
||||
|
||||
Some textual diff tools also highlight word changes (e.g. GitHub or
|
||||
git's `--word-diff`). They still don't understand the code
|
||||
though. Difftastic will always find matched delimiters: you can see
|
||||
the closing `)` from `or_else` has been highlighted.
|
||||
|
||||
## Fallback Textual Diffing
|
||||
|
||||
If input files are not in a format that difftastic understands, it
|
||||
uses a conventional line-oriented text diff with word highlighting.
|
||||
|
||||
Difftastic will also use textual diffing when given extremely large
|
||||
inputs.
|
||||
@ -0,0 +1,57 @@
|
||||
# Languages Supported
|
||||
|
||||
This page lists all the languages supported by difftastic. You can
|
||||
also view the languages supported in your current installed version
|
||||
with `difft --list-languages`.
|
||||
|
||||
## Programming Languages
|
||||
|
||||
| Language | Parser Used |
|
||||
|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| Bash | [tree-sitter/tree-sitter-bash](https://github.com/tree-sitter/tree-sitter-bash) |
|
||||
| C | [tree-sitter/tree-sitter-c](https://github.com/tree-sitter/tree-sitter-c) |
|
||||
| C++ | [tree-sitter/tree-sitter-cpp](https://github.com/tree-sitter/tree-sitter-cpp) |
|
||||
| C# | [tree-sitter/tree-sitter-c-sharp](https://github.com/tree-sitter/tree-sitter-c-sharp) |
|
||||
| Clojure | [sogaiu/tree-sitter-clojure](https://github.com/sogaiu/tree-sitter-clojure) ([branched](https://github.com/sogaiu/tree-sitter-clojure/tree/issue-21)) |
|
||||
| CMake | [uyha/tree-sitter-cmake](https://github.com/uyha/tree-sitter-cmake) |
|
||||
| Common Lisp | [theHamsta/tree-sitter-commonlisp](https://github.com/theHamsta/tree-sitter-commonlisp) |
|
||||
| Dart | [UserNobody14/tree-sitter-dart](https://github.com/UserNobody14/tree-sitter-dart) |
|
||||
| Elixir | [elixir-lang/tree-sitter-elixir](https://github.com/elixir-lang/tree-sitter-elixir) |
|
||||
| Elm | [elm-tooling/tree-sitter-elm](https://github.com/elm-tooling/tree-sitter-elm) |
|
||||
| Elvish | [ckafi/tree-sitter-elvish](https://github.com/ckafi/tree-sitter-elvish) |
|
||||
| Emacs Lisp | [wilfred/tree-sitter-elisp](https://github.com/Wilfred/tree-sitter-elisp) |
|
||||
| Gleam | [gleam-lang/tree-sitter-gleam](https://github.com/gleam-lang/tree-sitter-gleam) |
|
||||
| Go | [tree-sitter/tree-sitter-go](https://github.com/tree-sitter/tree-sitter-go) |
|
||||
| Hack | [slackhq/tree-sitter-hack](https://github.com/slackhq/tree-sitter-hack) |
|
||||
| Haskell | [tree-sitter/tree-sitter-haskell](https://github.com/tree-sitter/tree-sitter-haskell) |
|
||||
| Janet | [sogaiu/tree-sitter-janet-simple](https://github.com/sogaiu/tree-sitter-janet-simple) |
|
||||
| Java | [tree-sitter/tree-sitter-java](https://github.com/tree-sitter/tree-sitter-java) |
|
||||
| JavaScript, JSX | [tree-sitter/tree-sitter-javascript](https://github.com/tree-sitter/tree-sitter-javascript) |
|
||||
| Julia | [tree-sitter/tree-sitter-julia](https://github.com/tree-sitter/tree-sitter-julia) |
|
||||
| Kotlin | [fwcd/tree-sitter-kotlin](https://github.com/fwcd/tree-sitter-kotlin) |
|
||||
| Lua | [nvim-treesitter/tree-sitter-lua](https://github.com/nvim-treesitter/tree-sitter-lua) |
|
||||
| Make | [alemuller/tree-sitter-make](https://github.com/alemuller/tree-sitter-make) |
|
||||
| Nix | [cstrahan/tree-sitter-nix](https://github.com/cstrahan/tree-sitter-nix) |
|
||||
| OCaml | [tree-sitter/tree-sitter-ocaml](https://github.com/tree-sitter/tree-sitter-ocaml) |
|
||||
| Perl | [ganezdragon/tree-sitter-perl](https://github.com/ganezdragon/tree-sitter-perl) |
|
||||
| PHP | [tree-sitter/tree-sitter-php](https://github.com/tree-sitter/tree-sitter-php) |
|
||||
| Python | [tree-sitter/tree-sitter-python](https://github.com/tree-sitter/tree-sitter-python) |
|
||||
| Ruby | [tree-sitter/tree-sitter-ruby](https://github.com/tree-sitter/tree-sitter-ruby) |
|
||||
| Rust | [tree-sitter/tree-sitter-rust](https://github.com/tree-sitter/tree-sitter-rust) ([forked](https://github.com/Wilfred/tree-sitter-rust/tree/non_special_token)) |
|
||||
| Scala | [tree-sitter/tree-sitter-scala](https://github.com/tree-sitter/tree-sitter-scala) |
|
||||
| SQL | [m-novikov/tree-sitter-sql](https://github.com/m-novikov/tree-sitter-sql) |
|
||||
| Swift | [alex-pinkus/tree-sitter-swift](https://github.com/alex-pinkus/tree-sitter-swift) |
|
||||
| TypeScript, TSX | [tree-sitter/tree-sitter-typescript](https://github.com/tree-sitter/tree-sitter-typescript) |
|
||||
| Zig | [maxxnino/tree-sitter-zig](https://github.com/maxxnino/tree-sitter-zig) |
|
||||
|
||||
## Structured Text Formats
|
||||
|
||||
| Language | Parser Used |
|
||||
|----------|-----------------------------------------------------------------------------------|
|
||||
| CSS | [tree-sitter/tree-sitter-css](https://github.com/tree-sitter/tree-sitter-css) |
|
||||
| HCL | [MichaHoffmann/tree-sitter-hcl](https://github.com/MichaHoffmann/tree-sitter-hcl) |
|
||||
| HTML | [tree-sitter/tree-sitter-html](https://github.com/tree-sitter/tree-sitter-html) |
|
||||
| JSON | [tree-sitter/tree-sitter-json](https://github.com/tree-sitter/tree-sitter-json) |
|
||||
| TOML | [ikatyang/tree-sitter-toml](https://github.com/ikatyang/tree-sitter-toml) |
|
||||
| YAML | [ikatyang/tree-sitter-yaml](https://github.com/ikatyang/tree-sitter-yaml) |
|
||||
|
||||
@ -0,0 +1,38 @@
|
||||
# Mercurial
|
||||
|
||||
Mercurial [supports external diff
|
||||
tools](https://www.mercurial-scm.org/wiki/ExtdiffExtension) with the
|
||||
Extdiff extension. Enable it by adding an entry to `extensions` in
|
||||
your `.hgrc`.
|
||||
|
||||
```
|
||||
[extensions]
|
||||
extdiff =
|
||||
```
|
||||
|
||||
You can then run `hg extdiff -p difft` (assumes the `difft` binary is
|
||||
on your `$PATH`).
|
||||
|
||||
You can also define an alias to run difftastic with hg. Add the
|
||||
following to your `.hgrc` to run difftastic with `hg dft`.
|
||||
|
||||
```
|
||||
[extdiff]
|
||||
cmd.dft = difft
|
||||
opts.dft = --missing-as-empty
|
||||
```
|
||||
|
||||
## hg log -p
|
||||
|
||||
Mercurial does not have a way of changing the default diff tool, at
|
||||
least to the author's knowledge.
|
||||
|
||||
If you just want to view the diff of the most recent commit, you can
|
||||
use the following.
|
||||
|
||||
```
|
||||
GIT_PAGER_IN_USE=1 hg dft -r .^ -r . | less
|
||||
```
|
||||
|
||||
This is equivalent to `hg log -l 1 -p`, although it does not show the
|
||||
commit message.
|
||||
@ -0,0 +1,23 @@
|
||||
# Vendoring
|
||||
|
||||
## Git Subtrees
|
||||
|
||||
Tree-sitter parsers are sometimes packaged on npm, sometimes packaged
|
||||
on crates.io, and have different release frequencies. Difftastic uses
|
||||
git subtrees (not git submodules) to track parsers.
|
||||
|
||||
## Updating a parser
|
||||
|
||||
To update a parser, pull commits from the upstream git repository. For
|
||||
example, the following command will update the Java parser:
|
||||
|
||||
```
|
||||
$ git subtree pull --prefix=vendor/tree-sitter-java git@github.com:tree-sitter/tree-sitter-java.git master
|
||||
```
|
||||
|
||||
To see when each parser was last updated, use the following shell
|
||||
command:
|
||||
|
||||
```
|
||||
$ for d in $(git log | grep git-subtree-dir | tr -d ' ' | cut -d ":" -f2 | sort); do echo "$d"; git log --pretty=" %cs" -n 1 $d; done
|
||||
```
|
||||
@ -0,0 +1,97 @@
|
||||
# Parsing
|
||||
|
||||
Difftastic uses
|
||||
[tree-sitter](https://tree-sitter.github.io/tree-sitter/) to build a
|
||||
parse tree. The parse tree is then converted to a simpler tree which
|
||||
can be diffed.
|
||||
|
||||
## Parsing with Tree-sitter
|
||||
|
||||
Difftastic relies on tree-sitter to understand syntax. You can view
|
||||
the parse tree that tree-sitter produces using the `--dump-ts`
|
||||
flag.
|
||||
|
||||
```
|
||||
$ difft --dump-ts sample_files/javascript_simple_before.js | head
|
||||
program (0, 0) - (7, 0)
|
||||
comment (0, 0) - (0, 8) "// hello"
|
||||
expression_statement (1, 0) - (1, 6)
|
||||
call_expression (1, 0) - (1, 5)
|
||||
identifier (1, 0) - (1, 3) "foo"
|
||||
arguments (1, 3) - (1, 5)
|
||||
( (1, 3) - (1, 4) "("
|
||||
) (1, 4) - (1, 5) ")"
|
||||
; (1, 5) - (1, 6) ";"
|
||||
expression_statement (2, 0) - (2, 6)
|
||||
```
|
||||
|
||||
## Simplified Syntax
|
||||
|
||||
Difftastic converts the tree-sitter parse tree to a simplified syntax
|
||||
tree. The syntax tree is a uniform representation where everything is
|
||||
either an atom (e.g. integer literals, comments, variable names) or a
|
||||
list (consisting of the open delimiter, children and the close
|
||||
delimiter).
|
||||
|
||||
The flag `--dump-syntax` will display the syntax tree generated for a
|
||||
file.
|
||||
|
||||
```
|
||||
$ difft --dump-syntax sample_files/before.js
|
||||
[
|
||||
Atom id:1 {
|
||||
content: "// hello",
|
||||
position: "0:0-8",
|
||||
},
|
||||
List id:2 {
|
||||
open_content: "",
|
||||
open_position: "1:0-0",
|
||||
children: [
|
||||
...
|
||||
```
|
||||
|
||||
### Conversion Process
|
||||
|
||||
The simple representation of the difftastic parse tree makes diffing
|
||||
much easier. Converting the detailed tree-sitter parse tree is a
|
||||
recursive tree walk, treating tree-sitter leaf nodes as atoms. There
|
||||
are two exceptions.
|
||||
|
||||
(1) Tree-sitter parse trees sometimes include unwanted structure. Some
|
||||
grammars consider string literals to be a single token, whereas others
|
||||
treat strings as a complex structure where the delimiters are
|
||||
separate.
|
||||
|
||||
`tree_sitter_parser.rs` uses `atom_nodes` to mark specific tree-sitter
|
||||
node names as flat atoms even if the node has children.
|
||||
|
||||
(2) Tree-sitter parse trees include open and closing delimiters as
|
||||
tokens. A list `[1]` will have a parse tree that includes `[` and `]`
|
||||
as nodes.
|
||||
|
||||
```
|
||||
$ echo '[1]' > example.js
|
||||
$ difft --dump-ts example.js
|
||||
program (0, 0) - (1, 0)
|
||||
expression_statement (0, 0) - (0, 3)
|
||||
array (0, 0) - (0, 3)
|
||||
[ (0, 0) - (0, 1) "["
|
||||
number (0, 1) - (0, 2) "1"
|
||||
] (0, 2) - (0, 3) "]"
|
||||
```
|
||||
|
||||
`tree_sitter_parser.rs` uses `open_delimiter_tokens` to ensure that
|
||||
`[` and `]` are used as delimiter content in the enclosing list,
|
||||
rather than converitng them to atoms.
|
||||
|
||||
Difftastic can match up atoms that occur in different parts of the
|
||||
simplified syntax tree. If e.g. a `[` is treated as an atom,
|
||||
difftastic might match it with another `[` elsewhere. The resulting
|
||||
diff would be unbalanced, highlighting different numbers of open and
|
||||
close delimiters.
|
||||
|
||||
### Lossy Syntax Trees
|
||||
|
||||
The simplified syntax tree only stores node content and node
|
||||
position. It does not store whitespace between nodes, and position is
|
||||
largely ignored during diffing.
|
||||
@ -0,0 +1,2 @@
|
||||
User-agent: *
|
||||
Allow: /
|
||||
@ -0,0 +1,120 @@
|
||||
# Tree Diffing
|
||||
|
||||
This page summarises some of the other tree diffing tools available.
|
||||
|
||||
If you're in a hurry, start by looking at Autochrome. It's extremely
|
||||
capable, and has an excellent description of the design.
|
||||
|
||||
If you're interested in a summary of the academic literature, [this
|
||||
blog
|
||||
post](http://useless-factor.blogspot.com/2008/01/matching-diffing-and-merging-xml.html)
|
||||
(and its [accompanying
|
||||
paper](http://useless-factor.blogspot.com/2008/01/matching-diffing-and-merging-xml.html)
|
||||
-- mirrored under a CC BY-NC license) are great resources.
|
||||
|
||||
## json-diff (2012)
|
||||
|
||||
Languages: JSON
|
||||
Algorithm: Pairwise comparison
|
||||
Output: CLI colours
|
||||
|
||||
[json-diff](https://github.com/andreyvit/json-diff) performs a
|
||||
structural diff of JSON files. It considers subtrees to be different
|
||||
if they don't match exactly, so e.g. `"foo"` and `["foo"]` are
|
||||
entirely different.
|
||||
|
||||
json-diff is also noteworthy for its extremely readable display of
|
||||
results.
|
||||
|
||||
## GumTree (2014)
|
||||
|
||||
Languages: [~10 programming
|
||||
languages](https://github.com/GumTreeDiff/gumtree/wiki/Languages)
|
||||
Parser: Several, including [srcML](https://www.srcml.org/)
|
||||
Algorithm: Top-down, then bottom-up
|
||||
Ouput: HTML, Swing GUI, or text
|
||||
|
||||
[GumTree](https://github.com/GumTreeDiff/gumtree) can parse several
|
||||
programming languages and then performs a tree-based diff, outputting
|
||||
an HTML display.
|
||||
|
||||
The GumTree algorithm is described in the associated paper
|
||||
'Fine-grained and accurate source code differencing' by Falleri et al
|
||||
([DOI](http://doi.acm.org/10.1145/2642937.2642982),
|
||||
[PDF](https://hal.archives-ouvertes.fr/hal-01054552/document)). It
|
||||
performs a greedy top-down search for identical subtrees, then
|
||||
performs a bottom-up search to match up the rest.
|
||||
|
||||
## Tree Diff (2017)
|
||||
|
||||
Languages: S-expression data format
|
||||
Algorithm: A* search
|
||||
Output: Merged s-expression file
|
||||
|
||||
Tristan Hume wrote a tree diffing algorithm during his 2017 internship
|
||||
and Jane Street. The source code is not available, but [he has a blog
|
||||
post](https://thume.ca/2017/06/17/tree-diffing/) discussing the design
|
||||
in depth.
|
||||
|
||||
This project finds minimal diffs between s-expression files used as
|
||||
configuration by Jane Street. It uses A* search to find the minimal
|
||||
diff between them, and builds a new s-expression with a section marked
|
||||
with `:date-switch` for the differing parts.
|
||||
|
||||
(Jane Street also has patdiff, but that seems to be a line-oriented
|
||||
diff with some whitespace/integer display polish. It doesn't
|
||||
understand that e.g. whitespace in `"foo "` is meaningful).
|
||||
|
||||
## Autochrome (2017)
|
||||
|
||||
Languages: Clojure
|
||||
Parser: Custom, preserves comments
|
||||
Algorithm: Dijkstra (previously A* search)
|
||||
Output: HTML
|
||||
|
||||
[Autochrome](https://fazzone.github.io/autochrome.html) parses Clojure
|
||||
with a custom parser that preserves comments. Autochrome uses
|
||||
Dijkstra's algorithm to compare syntax trees.
|
||||
|
||||
Autochrome's webpage includes worked examples of the algorithm and a
|
||||
discussion of design tradeoffs. It's a really great resource for
|
||||
understanding tree diffing techniques in general.
|
||||
|
||||
## graphtage (2020)
|
||||
|
||||
Languages: JSON, XML, HTML, YAML, plist, and CSS
|
||||
Parser: json5, pyYAML, ignores comments
|
||||
Algorithm: Levenshtein distance
|
||||
Output: CLI colours
|
||||
|
||||
[graphtage](https://blog.trailofbits.com/2020/08/28/graphtage/)
|
||||
compares structured data by parsing into a generic file format, then
|
||||
displaying a diff. It even allows things like diffing JSON against
|
||||
YAML.
|
||||
|
||||
As with json-diff, it does not consider `["foo"]` and `"foo"` to have
|
||||
any similarities.
|
||||
|
||||
## Diffsitter (2020)
|
||||
|
||||
Parser: [Tree-sitter](https://tree-sitter.github.io/tree-sitter/)
|
||||
Algorithm: Longest-common-subsequence
|
||||
Output: CLI colours
|
||||
|
||||
[Diffsitter](https://github.com/afnanenayet/diffsitter) is another
|
||||
tree-sitter based diff tool. It uses [LCS diffing on the leaves of the
|
||||
syntax
|
||||
tree](https://github.com/afnanenayet/diffsitter/blob/b0fd72612c6fcfdb8c061d3afa3bea2b0b754f33/src/ast.rs#L310-L313).
|
||||
|
||||
## sdiff (2021)
|
||||
|
||||
Languages: Scheme
|
||||
Parser: Scheme's built-in `read`, ignores comments
|
||||
Algorithm: MH-Diff from the Chawathe paper
|
||||
Output: CLI colours
|
||||
|
||||
[Semantically meaningful S-expression diff: Tree-diff for lisp source
|
||||
code](https://archive.fosdem.org/2021/schedule/event/sexpressiondiff/)
|
||||
was presented at FOSDEM 2021.
|
||||
|
||||
|
||||
@ -0,0 +1,382 @@
|
||||
# Tricky Cases
|
||||
|
||||
Tree diffing is challenging in some situations. This page demonstrates
|
||||
difficult cases observed during development.
|
||||
|
||||
Not all of these cases work well in difftastic yet.
|
||||
|
||||
## Adding Delimiters
|
||||
|
||||
```
|
||||
;; Before
|
||||
x
|
||||
|
||||
;; After
|
||||
(x)
|
||||
```
|
||||
|
||||
Desired result: <code><span style="background-color: PaleGreen">(</span>x<span style="background-color: PaleGreen">)</span></code>
|
||||
|
||||
This is tricky because `x` has changed its depth in the tree, but `x`
|
||||
itself is unchanged.
|
||||
|
||||
Not all tree diff algorithms handle this case. It is also challenging
|
||||
to display this case clearly: we want to highlight the changed
|
||||
delimiters, but not their content. This is challenging in larger
|
||||
expressions.
|
||||
|
||||
## Changing Delimiters
|
||||
|
||||
```
|
||||
;; Before
|
||||
(x)
|
||||
|
||||
;; After
|
||||
[x]
|
||||
```
|
||||
|
||||
As with the wrapping case, we want to highlight the delimiters rather
|
||||
than the `x`.
|
||||
|
||||
## Expanding Delimiters
|
||||
|
||||
```
|
||||
;; Before
|
||||
(x) y
|
||||
|
||||
;; After
|
||||
(x y)
|
||||
```
|
||||
|
||||
Desired output: <code>(x <span style="background-color: PaleGreen">y</span>)</code>
|
||||
|
||||
In this case, we want to highlight `y`. Highlighting the delimiters
|
||||
could make `x` look changed.
|
||||
|
||||
## Contracting Delimiters
|
||||
|
||||
```
|
||||
;; Before
|
||||
(x y)
|
||||
|
||||
;; After
|
||||
(x) y
|
||||
```
|
||||
|
||||
This should be highlighted similar to the expanding delimiter case.
|
||||
|
||||
## Disconnected Delimiters
|
||||
|
||||
```
|
||||
;; Before
|
||||
(foo (bar))
|
||||
|
||||
;; After
|
||||
(foo (novel) (bar))
|
||||
```
|
||||
|
||||
Desired result: <code>(foo <span style="background-color:PaleGreen">(novel)</span> (bar)</code>
|
||||
|
||||
It is easy to end up with
|
||||
<code>(foo (<span style="background-color:PaleGreen">novel</span>) <span style="background-color:PaleGreen">(</span>bar<span style="background-color:PaleGreen">)</span>)</code>,
|
||||
where a later pair of delimiters are chosen.
|
||||
|
||||
## Rewrapping Large Nodes
|
||||
|
||||
```
|
||||
;; Before
|
||||
[[foo]]
|
||||
(x y)
|
||||
|
||||
;; After
|
||||
([[foo]] x y)
|
||||
```
|
||||
|
||||
We want to highlight `[[foo]]` being moved inside the
|
||||
parentheses. However, a naive syntax differ prefers consider a removal
|
||||
of `()` in the before and an addition of `()` in the after to be more
|
||||
minimal diff.
|
||||
|
||||
(Reported as [issue 44](https://github.com/Wilfred/difftastic/issues/44).)
|
||||
|
||||
## Reordering Within A List
|
||||
|
||||
```
|
||||
;; Before
|
||||
(x y)
|
||||
|
||||
;; After
|
||||
(y x)
|
||||
```
|
||||
|
||||
Desired result: <code>(<span style="background-color: PaleGreen">y</span> <span style="background-color: PaleGreen">x</span>)</code>
|
||||
|
||||
We want to highlight the list contents and not the delimiters.
|
||||
|
||||
## Middle Insertions
|
||||
|
||||
```
|
||||
// Before
|
||||
foo(bar(123))
|
||||
|
||||
// After
|
||||
foo(extra(bar(123)))
|
||||
```
|
||||
|
||||
Desired result: <code>foo(<span style="background-color: PaleGreen">extra(</span>bar(123)<span style="background-color: PaleGreen">)</span>)</code>
|
||||
|
||||
We want to consider both `foo` and `bar` to be unchanged. This case is
|
||||
challenging for diffing algorithms that do a bottom-up then top-down
|
||||
matching of trees.
|
||||
|
||||
## Sliders (Flat)
|
||||
|
||||
Sliders are a common problem in text based diffs, where lines are
|
||||
matched in a confusing way.
|
||||
|
||||
They typically look like this. The diff has to arbitrarily choose a
|
||||
line containing delimiter, and it chooses the wrong one.
|
||||
|
||||
```
|
||||
+ }
|
||||
+
|
||||
+ function foo () {
|
||||
}
|
||||
```
|
||||
|
||||
git-diff has some heuristics to reduce the risk of this (e.g. the
|
||||
"patience diff"), but it can still occur.
|
||||
|
||||
There's a similar problem in tree diffs.
|
||||
|
||||
```
|
||||
;; Before
|
||||
A B
|
||||
C D
|
||||
|
||||
;; After
|
||||
A B
|
||||
A B
|
||||
C D
|
||||
```
|
||||
|
||||
Ideally we'd prefer marking contiguous nodes as novel, so we highlight
|
||||
`A B` rather than `B\nA`. From the perspective of a
|
||||
longest-common-subsequence algorithm, these two choices are
|
||||
equivalent.
|
||||
|
||||
## Sliders (Nested)
|
||||
|
||||
```
|
||||
// Before
|
||||
old1(old2)
|
||||
|
||||
// After
|
||||
old1(new1(old2))
|
||||
```
|
||||
|
||||
Should this be <code>old1(<span style="background-color: PaleGreen">new1(</span>old2<span style="background-color: PaleGreen">)</span>)</code> or
|
||||
<code>old1<span style="background-color: PaleGreen">(new1</span>(old2)<span style="background-color: PaleGreen">)</span></code>?
|
||||
|
||||
The correct answer depends on the language. Most languages want to
|
||||
prefer the inner delimiter, whereas Lisps and JSON prefer the outer
|
||||
delimiter.
|
||||
|
||||
## Minimising Depth Changes
|
||||
|
||||
```
|
||||
// Before
|
||||
if true {
|
||||
foo(123);
|
||||
}
|
||||
foo(456);
|
||||
|
||||
// After
|
||||
foo(789);
|
||||
```
|
||||
|
||||
Do we consider `foo(123)` or `foo(456)` to match with `foo(789)`?
|
||||
Difftastic prefers `foo(456)` by preferring nodes at the same nesting depth.
|
||||
|
||||
## Replacements With Minor Similarities
|
||||
|
||||
```
|
||||
// Before
|
||||
function foo(x) { return x + 1; }
|
||||
|
||||
// After
|
||||
function bar(y) { baz(y); }
|
||||
```
|
||||
|
||||
In this example, we've deleted a function and written a completely
|
||||
different one. A tree-based diff could match up the `function` and the
|
||||
outer delimiters, resulting in a confusing display showing lots of
|
||||
small changes.
|
||||
|
||||
As with sliders, the replacement problem can also occur in textual
|
||||
line-based diffs. Line-diffs struggle if there are a small number of
|
||||
common lines. The more precise, granular behaviour of tree diffs makes
|
||||
this problem much more common though.
|
||||
|
||||
## Matching Substrings In Comments
|
||||
|
||||
```
|
||||
// Before
|
||||
/* The quick brown fox. */
|
||||
foobar();
|
||||
|
||||
// After
|
||||
/* The slow brown fox. */
|
||||
foobaz();
|
||||
```
|
||||
|
||||
`foobar` and `foobaz` are completely different, and their common
|
||||
prefix `fooba` should not be matched up. However, matching common
|
||||
prefixes or suffixes for comments is desirable.
|
||||
|
||||
## Multiline Comments
|
||||
|
||||
```
|
||||
// Before
|
||||
/* Hello
|
||||
* World. */
|
||||
|
||||
// After
|
||||
if (x) {
|
||||
/* Hello
|
||||
* World. */
|
||||
}
|
||||
```
|
||||
|
||||
The inner content of these two comments is technically different. We
|
||||
want to treat them as identical however.
|
||||
|
||||
## Reflowing Doc Comments
|
||||
|
||||
Block comments have prefixes that aren't meaningful.
|
||||
|
||||
```
|
||||
// Before
|
||||
/* The quick brown fox jumps
|
||||
* over the lazy dog. */
|
||||
|
||||
// After
|
||||
/* The quick brown fox immediately
|
||||
* jumps over the lazy dog. */
|
||||
```
|
||||
|
||||
The inner content has changed from `jumps * over` to `immediately *
|
||||
jumps over`. However, the `*` is decorative and we don't care that
|
||||
it's moved.
|
||||
|
||||
## Small Changes To Large Strings
|
||||
|
||||
```
|
||||
// Before
|
||||
"""A very long string
|
||||
with lots of words about
|
||||
lots of stuff."""
|
||||
|
||||
// After
|
||||
"""A very long string
|
||||
with lots of NOVEL words about
|
||||
lots of stuff."""
|
||||
```
|
||||
|
||||
It would be correct to highlight the entire string literal as being
|
||||
removed and replaced with a new string literal. However, this makes it
|
||||
hard to see what's actually changed.
|
||||
|
||||
It's clear that variable names should be treated atomically, and
|
||||
comments are safe to show subword changes. It's not clear how to
|
||||
handle a small change in a 20 line string literal.
|
||||
|
||||
It's tempting to split strings on spaces and diff that, but users
|
||||
still want to know when whitespace changes inside strings. `" "` and
|
||||
`" "` are not the same.
|
||||
|
||||
## Autoformatter Punctuation
|
||||
|
||||
```
|
||||
// Before
|
||||
foo("looooong", "also looooong");
|
||||
|
||||
// Before
|
||||
foo(
|
||||
"looooong",
|
||||
"novel",
|
||||
"also looooong",
|
||||
);
|
||||
```
|
||||
|
||||
Autoformatters (e.g. [prettier](https://prettier.io/)) will sometimes
|
||||
add or remove punctuation when formatting. Commas and parentheses are
|
||||
the most common.
|
||||
|
||||
Syntactic diffing can ignore whitespace changes, but it has to assume
|
||||
punctuation is meaningful. This can lead to punctuation changes being
|
||||
highlighted, which may be quite far from the relevant content change.
|
||||
|
||||
## Novel Blank Lines
|
||||
|
||||
Blank lines are challenging for syntactic diffs. We are comparing
|
||||
syntactic tokens, so we don't see blank lines.
|
||||
|
||||
```
|
||||
// Before
|
||||
A
|
||||
B
|
||||
|
||||
// After
|
||||
A
|
||||
|
||||
B
|
||||
```
|
||||
|
||||
Generally we want syntactic diffing to ignore blank lines. In this
|
||||
first example, this should show no changes.
|
||||
|
||||
This is occasionally problematic, as it can hide accidental code
|
||||
reformatting.
|
||||
|
||||
```
|
||||
// Before
|
||||
A
|
||||
B
|
||||
|
||||
// After
|
||||
A
|
||||
X
|
||||
|
||||
Y
|
||||
B
|
||||
```
|
||||
|
||||
In this second example, we've inserted X and Y and a blank line. We
|
||||
want to highlight the blank line as an addition.
|
||||
|
||||
```
|
||||
// Before
|
||||
A
|
||||
|
||||
|
||||
B
|
||||
|
||||
// After
|
||||
A
|
||||
X
|
||||
B
|
||||
```
|
||||
|
||||
In this third example, the syntactic diffing only sees an
|
||||
addition. From the user's perspective, there has also been a removal
|
||||
of two blank lines.
|
||||
|
||||
## Invalid Syntax
|
||||
|
||||
There's no guarantee that the input we're given is valid syntax. Even
|
||||
if the code is valid, it might use syntax that isn't supported by the
|
||||
parser.
|
||||
|
||||
Tree-sitter provided explicit error nodes, and difftastic treats them
|
||||
as atoms so it can run the same tree diff algorithm regardless.
|
||||
@ -0,0 +1,45 @@
|
||||
# Usage
|
||||
|
||||
## Diffing Files
|
||||
|
||||
```
|
||||
$ difft sample_files/before.js sample_files/after.js
|
||||
```
|
||||
|
||||
## Diffing Directories
|
||||
|
||||
```
|
||||
$ difft sample_files/dir_before/ sample_files/dir_after/
|
||||
```
|
||||
|
||||
Difftastic will recursively walk the two directories, diffing files
|
||||
with the same name.
|
||||
|
||||
The `--skip-unchanged` option is useful when diffing directories that
|
||||
contain many unchanged files.
|
||||
|
||||
## Language Detection
|
||||
|
||||
Difftastic guesses the language used based on the file extension, file
|
||||
name, and the contents of the first lines.
|
||||
|
||||
You can override the language detection by passing the `--language`
|
||||
option. Difftastic will treat input files as if they had that
|
||||
extension, and ignore other language detection heuristics.
|
||||
|
||||
|
||||
```
|
||||
$ difft --language cpp before.c after.c
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
Difftastic includes a range of configuration CLI options, see `difft
|
||||
--help` for the full list.
|
||||
|
||||
Difftastic can also be configured with environment variables. These
|
||||
are also visible in `--help`.
|
||||
|
||||
For example, `DFT_BACKGROUND=light` is equivalent to
|
||||
`--background=light`. This is useful when using VCS tools like git,
|
||||
where you are not invoking the `difft` binary directly.
|
||||
Loading…
Reference in New Issue