We should split lines based on their codepoint length, so all our
lengths are on codepoint boundaries. We can then safely index by byte position.
All the positions are measured in bytes, not code points. Tweak
function names to make this explicit.
Fixes#149