I read, via the FFI, some nasty untrusted binary sludge sent from who-knows-where.
Sometimes this sludge contains a possibly strange UTF-8 string I might want to display - maybe I want an Elvish localisation when 12.0 adds all that custom ligature support.
The utf8 library is only concerned with encoding, so it doesn't keep text:add from choking on things:
local function validate(s)
for p, c in utf8.codes(s) do
if c >= 0xD800 and c <= 0xDFFF or c == 0xFFFE or c == 0xFFFF then
error("invalid UTF-8 codepoint")
end
end
end
utf8.codes already catches overlong sequences and codes > U+10FFFF, so that's covered.