Hey Embers,
TL;DR version:
I’d like to use cinder’s toUtf16()
function on a string like this “ñêQRV”, but this leads to an invalid UTF-8 exception. I’m guessing I have to convert them to characters representing their unicode value, as the character itself isn’t a proper UTF-8 character? Any tips/hints on doing this?
Long version:
I’m reading in text from Mp3 ID3 tags and apparently it occasionally uses a (now) deprecated encoding. Specifically, “UCS-2 encoded Unicode with BOM” according to the wikipedia page. I spent at least an hour trying to find the ‘right way’ to properly decode it, until I eventually gave up and settled for a solution provided by a kind soul on the github for the id3 decode library. I’m left with an std::string
which will occasionally contain what is apparently invalid UTF-8 characters because when I use toUtf16()
on it, I’m dealt an exception highlighting this.
I know UTF-16 is generally to be avoided as per the ever helpful @paul.houx, but since I only plan to use UTF-16 internally on the windows platform for now, I’ve opted to make an exception.
I use it primarily so I can remove a single character at a time from a string and be sure it is a ‘complete’ character rather than a portion of one.
So I’m wondering what I’m to do - perhaps convert the characters to proper UTF-8 using toUtf8()
? But if that’s the approach, how do I do this when these characters are in an std::string
to begin with?
Thanks in advance,
Gazoo