YounYokel wrote: ↑Sun Feb 02, 2020 11:07 am
Uh... I didn't understand a word. Sorry(
Point is that UTF-8/Unicode characters are coded in multiple bytes. So, if you try crop or get a substring of string, it can cut the string in the middle of an actual Unicode character. E.g.
-- ASCII chars
abcdefgh
e.g. get (ASCII) chars 1 to 4 of this string: abcd
However:
-- Unicode chars (emojis)
🥰
🥰 (Unicode) = \xf0\x9f\xa5\xb0 (UTF-8 encoding)
e.g. get (ASCII) char 1 of this string: \xf0
You are getting a malformed string or possibly error, because Lua is unable to handle this malformed string.
You should never manually do a substring on Unicode/UTF-8 characters, rather use only Unicode/UTF-8 safe functions to handle any cropping or substring cutting.
See this converter:
https://www.branah.com/unicode-converter
enter: 🥰
get (4 bytes): \xf0\x9f\xa5\xb0
remove last byte (3 bytes): \xf0\x9f\xa5
get:
If you accidentally remove subsequent characters, you get a malformed string. Also see "Invalid byte sequences on Wikipedia":
https://en.wikipedia.org/wiki/UTF-8#Inv ... _sequences