Yet another UTF-8 Question - loading files with Cyrillic paths


#1

Hi guys,
Having an issue with file paths with cyrillic symbols in them, on windows. The file’s a valid jpg for example (changing the filepath to my standard symbol set works) but with cyrillic, I get issues. I’ve been bashing my head against a wall, not entirely sure how to get past this. Been down a few google holes and haven’t really been able to get much motion.

When the file name is
“4980434.jpg”
of course everythings good,

but when it’s the cyrillic version:
“фото-4980434.jpg”
is when I get issues.

     |error  | cinder::app::AppBase::executeLaunch[137] Uncaught exception, type: class cinder::ImageIoExceptionFailedLoad, what : Could not create WIC Decoder from filename.

My test code is (super dirty ofc):

void App::fileDrop(FileDropEvent event) {
gl::Texture2d::create(loadImage(event.getFile(0)));
}

I’ve tried quite a lot of conversions but I’m kind of muddling my way through right now.

ci::fs::u8path(filepath.generic_string())
event.getFile(0).generic_u8string()

I was wondering if anyone’s encountered this before and resolved it?


#2

Ah, this is very interesting!
I loaded up a clean cinder, built the ImageFileBasic sample and ran it.

It seems that getting a path through “getOpenFilePath()” will work as expected, and cinder is able to load the same image absolutely fine, even with these characters.
The filedrop method here is failing. It must be something to do with how the FileDropEvent is passing through those paths? Will continue to look.


#3

My current best advice is to look at some of Cinder’s conversion functions, i.e. to_utf8, to_utf16 etc. Generally I’ve had a lot of success with keeping things in one format or another depending on what exactly was required. In some instances it was a tremendous benefit to use utf16 where each character is litteraly a whole character, rather than being spread over several. Other times it’s been helpful to just use the functions to properly parse characters in them.


#4

My advice would be to use utf-8 for storage (files) and utf-32 for internal use (std::u32string). The latter is much more convenient for parsing, shaping, etc. because (as @Gazoo almost correctly pointed out) it always uses 32 bits per character. In comparison, utf-16 does not; it may still need more than 16 bits to encode a single character. Use utf-16 only when converting to and from Windows OS functions, which all happen to use utf-16 (or ASCII, but let’s forget about those).


#5

Thank you! I’ve tracked it down to specifically the msw filedrop event implementation, which is passing through paths as std::string, which is removing some of the formatting that’s necessary.
I have everything else working as expected, it’s just specifically the filedropevent now.

Line 965 of AppImplMsw.cpp:
files.push_back( std::string( fileName ) );

The same file implements GetOpenFilePath and returns as an fs::path, rather than a conversion to std::string, and so works as expected.