mirror of https://github.com/Wilfred/difftastic/
Fix UTF-8 character boundary panic in style.rs (Issue #865)
This commit fixes a crash that occurred when difftastic tried to slice strings at byte positions that fell in the middle of multi-byte UTF-8 characters. Root Cause: The crash was caused by a mismatch between how byte positions are calculated (in the line-numbers crate) and how lines are split when applying styles. Specifically: 1. LinePositions::from() always adds 1 byte for newlines, even for the last line if it doesn't end with a newline 2. split_on_newlines() strips \r characters from CRLF line endings, which LinePositions does not account for 3. This causes byte indices to be off, potentially landing in the middle of multi-byte UTF-8 characters Solution: Added UTF-8 boundary validation to substring_by_byte() that: - Checks if slice indices are within string bounds - Verifies indices fall on valid UTF-8 character boundaries - Adjusts to the nearest previous valid boundary if not - Prevents panics from invalid string slicing This is a defensive fix that handles the symptom while the root cause (in the line-numbers crate) can be addressed separately. Tests: Added comprehensive test cases for UTF-8 boundary handling with multi-byte characters including 2-byte (ê) and 3-byte (世界) chars. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>claude/debug-issue-865-crash-011CUd9ut241YRW738w3K84r
parent
d615490493
commit
32da88b88e
Loading…
Reference in New Issue