4 LCS Algorithms
Wilfred Hughes edited this page 2025-03-20 23:42:07 +07:00
This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

Finding a minimal set of changes between two files is a well-researched problem, often described as finding the 'longest common subsequence' (LCS) or 'shortest edit script'.

See also discussion here: https://github.com/mitsuhiko/similar/issues/44#issuecomment-1412178765

Wu Diff Algorithm (1990)

Wu, Sun, Udi Manber, Eugene Wimberly Myers and Webb Miller. "An O(NP) Sequence Comparison Algorithm." Inf. Process. Lett. 35 (1990): 317-323.

https://doi.org/10.1016/0020-0190(90)90035-V

This is the algorithm used in difftastic, and is also used in Subversion's diff.

Myers Diff Algorithm (1986)

Myers, Eugene Wimberly. "An O(ND) difference algorithm and its variations." Algorithmica 1 (1986): 251-266.

https://doi.org/10.1007/BF01840446

This algorithm is widely implemented, including in the diff crate. There's a great introduction here.

GNU diff has a bunch of heuristics to avoid pathological cases that can occur in a simple implementation. The imara-diff crate implements these in Rust.

Google's go-cmp also has an interesting variant that optimises for performance over minimal diffs.

HuntMcIlroy Algorithm (1976)

https://doi.org/10.1145/359581.359603

The original 1970s diff tool used the Hunt-McIlroy algorithm (sometimes called the HuntSzymanski algorithm).