UTF8 encoding and Cinder and cross-platform


#1

I’m conducting some internal testing with a project of mine, and when testing the application on non-dev machines I ran into a few issues. Specifically when running the software on a Polish copy of Windows. The issue was fairly easy to track down, it was basically some of the std::string’s being passed to the ci::TextBox resulting in a utf8::invalid_utf8 exception being thrown.

Last time I bothered looking into text encoding I found it to be a pretty big hairy mess. My recent searches didn’t really lead me to any new conclusions. Some people seemed to be advocating using UTF8 throughout and unless I’m compelled otherwise, that’s what I’m sticking to. The two things that triggered these invalid UTF8 characters were as follows:

  1. BASS (the audio library) - per default it seems - returns ANSI text (for audio device names). A simple config setting causes it to return UTF8 encoded instead.

  2. Path string returned via boost isn’t UTF8 either by default. I found this little helpful page that provided some code to have them default to UTF8 as well:

http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/default_encoding_under_windows.html

Hopefully that helps someone else.

But I do also have a question re. all of this malarkey. The BASS help file states that the UTF8 encoding only works on the windows-based version of the library. Some point in the future I’d like to also port the software to run on mac’s, which makes me a but worried about if UTF8 across the board works well here too.

So basically my question to y’all is, if anyone has any good practical advice and if anyone knows of any good support for cross-platformy text.

Gazoo


Filepath looks fine in UTF8 BOM, but doesn't render. UTF8 changes the string :S
#2

Quick answer:

  • std::string can contain either ASCII text or UTF8 text and it depends on the consumer what format to use. Some functions expect ASCII, most notably older (C-) libraries. Some use UTF8.
  • When using string literals in your code, make sure to save your source file using the correct encoding. Visual Studio used to save as ASCII by default, not sure if this is still the case in VS2015. Use Save As... to specify UTF8 with or without BOM.
  • It may be better to store strings in a JSON or XML file and save the file as UTF8.
  • A lot of Windows-specific functions expect UTF16-encoded strings, stored in std::wstring or similar.
  • You can always convert from UTF16 to UTF8 and back, see the cinder/Unicode.h header. Some characters may not be compatible. This header works on all supported platforms.
  • If possible, avoid UTF16 and stick to either UTF8 (for compatibility) or UTF32 (if you really need it).

-Paul


#3

@paul.houx - Thanks for the prompt and helpful response.

I found that setting the encoding on all my source files require me to individually. do. this. for. every. single. file.

I’m under the impression that as long as my hard-coded string literals don’t contain any funky characters, then the executable shouldn’t have any issues pretending/handling it as UTF-8. Or am I asking for trouble?

The header you’ve linked to looks real useful if I require changing any encoding.