ImageSource and Cinder inter-API image conversions routines


#1

Hi all,

At the moment I’m trying to make a small bridge between libcinder and dlib. Everything is working except image conversion from dlib to Cinder. Let me elaborate a bit:

In dlib images are either 2D arrays or matrixes. The type of pixel is templated and can be anything including unsigned char, unsigned short, float etc. . Following the same methods used in OpenCV3 block (and having a look at the openframeworks impl), I’ve made a ImageSource class as:

template<typename pixel_type>
    class ImageSourceDlib : public ci::ImageSource {
    public:
        
        typedef typename dlib::pixel_traits<pixel_type>::basic_pixel_type basic_pixel_type;
        
        ImageSourceDlib(const dlib::array2d<pixel_type>& in) : ci::ImageSource()
        {
            mWidth = dlib::num_columns(in);
            mHeight = dlib::num_rows(in);
            setColorModel(getDlibColorModel<pixel_type>());
            setChannelOrder(getDlibChannelOrder<pixel_type>());
            setDataType(getDlibDataType<pixel_type>());
            mRowBytes = (int32_t)dlib::width_step(in);
            mData = reinterpret_cast<const basic_pixel_type*>(dlib::image_data(in));
        }
        
        ImageSourceDlib(const dlib::matrix<pixel_type>& in) : ci::ImageSource()
        {
            mWidth = dlib::num_columns(in);
            mHeight = dlib::num_rows(in);
            setColorModel(getDlibColorModel<pixel_type>());
            setChannelOrder(getDlibChannelOrder<pixel_type>());
            setDataType(getDlibDataType<pixel_type>());
            mRowBytes = (int32_t)dlib::width_step(in);
            mData = reinterpret_cast<const basic_pixel_type*>(dlib::image_data(in));
        }
        
        const basic_pixel_type*		mData;
        int32_t                     mRowBytes;
        
    };

Here “basic_pixel_type” can be unsigned char, unsigned short or float. As for the load(ImageTargetRef) override, I have this:

void load( ci::ImageTargetRef target ) {
                ci::ImageSource::RowFunc func = setupRowFunc( target );
                
                const uint8_t *data = reinterpret_cast<const uint8_t*>(mData);
                for( int32_t row = 0; row < mHeight; ++row ) {
                    ((*this).*func)( target, row, data );
                    data += mRowBytes;
                }
            }

When dealing with all sorts of dlib images with unsigned char pixel type, everything works fine but when using float or unsigned short I get weird results like this:

Don’t mind picture 4 and 5. These are because of dlib’s different color models (HSI and LAB). But as you can see in picture 7 and 8 the results are very weird. 7 is float and 8 is unsigned short.

Can anyone help me out here? Am I doing something wrong in load() method?
Thanks a lot guys


#2

Also if I change the load() method to this (change the data to const basic_pixel_type*):

void load( ci::ImageTargetRef target ) {
            // get a pointer to the ImageSource function appropriate for handling our data configuration
            ci::ImageSource::RowFunc func = setupRowFunc( target );
            
            const basic_pixel_type *data = mData;
            for( int32_t row = 0; row < mHeight; ++row ) {
                ((*this).*func)( target, row, data );
                data += mRowBytes;
            }
        }

I get this:

The only difference is that the unsigned short version is totally black now, not helping much unfortunately :slight_smile:


#3

I should also mention that the DataType, ColorModel and ChannelOrder are all correct when I debugged the code. unsigned short should be ci::ImageIo::DataType::UINT16, right?


#4

Maybe it has to do with endianness? Try reversing the order of the bytes for each unsigned short or float.


#5

Thanks Paul for the suggestion, it didn’t seem to cut it though.

Meanwhile I tried some variations with the previous code. I suspected the reinterpret_cast to be the culprit and when started digging in Cinder’s source, I found ImageFileTinyExr.cpp which used another method to load into the ImageTargetRef. hacked around a little and reached this:

void load(ci::ImageTargetRef target) {
           ci::ImageSource::RowFunc rowFunc = setupRowFunc( target );
            
            const size_t numChannels = ci::ImageIo::channelOrderNumChannels(mChannelOrder);

            // load one interleaved row at a time
            if( getDataType() == ci::ImageIo::FLOAT32 ) {
                std::vector<uint8_t> rowData( mWidth * numChannels * sizeof(float), 0 );
                for( int32_t row = 0; row < mHeight; row++ ) {
                    for( int32_t col = 0; col < mWidth * sizeof(float); col++ ) {
                        for (int i=0; i<numChannels; i++){
                            rowData.at(col * numChannels + i) = (uint8_t) mData[row * mWidth + col / sizeof(float)];
                        }
                    }
                    
                    ((*this).*rowFunc)( target, row, rowData.data() );
                }
            }
            else if( getDataType() == ci::ImageIo::FLOAT16 || getDataType() == ci::ImageIo::UINT16 ){
                std::vector<uint8_t> rowData( mWidth * numChannels * sizeof(uint16_t), 0 );
                for( int32_t row = 0; row < mHeight; row++ ) {
                    for( int32_t col = 0; col < mWidth * sizeof(uint16_t); col++ ) {
                        for (int i=0; i<numChannels; i++){
                            rowData.at(col * numChannels + i) = (uint8_t) mData[row * mWidth + col / sizeof(uint16_t)];
                        }
                    }
                    
                    ((*this).*rowFunc)( target, row, rowData.data() );
                }
            }
            else {
                const uint8_t *data = reinterpret_cast<const uint8_t*>(mData);
                for( int32_t row = 0; row < mHeight; ++row ) {
                    ((*this).*rowFunc)( target, row, data );
                    data += mRowBytes;
                }
            }
        }

this gave me this result:

the unsigned short version is looking correct and the float version kind of inverted.
I still can’t figure out where the problem is though, this was just trial and error on my side so this could be pure luck and this could be an incorrect code yielding correct results. Can you maybe see where the problem is?


#6

Yeah, that’s looking better. All I can think of now is that the channel order is different for the float data. Maybe it’s BGRA, or RGBA, or any other combination? I believe Cinder defaults to ARGB.


#7

I don’t think that’s it, in dlib the rgb, rgba and brg have their own pixel_types and all range from 0-255 (unsigned shorts), they’re actually the first three in my test. The last three are all grayscale and have only one channel. They are of types array2d<unsigned char>, array2d<float>, array2d<unsigned short>.
I’m also starting to suspect the conversion process of the values. I remember reading in dlib docs that when using each type, the min and max value range can change based on the value type. And as far as I understand cinder likes uint8_t, right? Also the first 6 images are all in root unsigned chars, and dlib explicitly mentions 0-255 range for them in here but the last two follows this rule:

  • min() == the minimum obtainable value of objects of type T
  • max() == the maximum obtainable value of objects of type T

#8

Questions regarding how Cinder works in terms of its ImageSource class:

1- When loading the rows into the ImageTarget, should the data be only of uint8_t? or does this make sense?:

const float* mData;
mRowBytes = sizeof(float) * mWidth * mNumChannels;
for( int32_t row = 0; row < mHeight; ++row ) {
   ((*this).*func)( target, row, mData );
    mData += mRowBytes;
}

2- Is the numeric range for different DataTypes different or are they all between 0-255? I guess this question is also valid when it comes to the surface/channel pixel iterators. For instance does the Surface32f iterator’s value go across float’s numerical range or a float between 0-255. I checked Channel.h for instance and found out than getData() returns reinterpret_cast<T*>( reinterpret_cast<unsigned char*>( mData + offset.x * mIncrement ) + offset.y * mRowBytes ),why interpreting to unsigned it and back to T, I can’t wrap my head around this easily.


#10

Regarding your second question: the reinterpret_cast results in an unsigned char*, making sure the subsequent addition of offset.y * mRowBytes is expressed in bytes, rather than anything else. The result of that is a pointer to a new address, containing the actual value, which is then cast to the correct type.

I believe float values are expected to be in the range [0…1], otherwise known as normalised values, but I could be wrong. HDR images (like the EXR format), for example, will have a far greater range of values.

As for your first question: no, I believe that mData can be any kind of data, as long as it is supported by the conversion routine in func. Best advice I can give you is to step through the code with the debugger to see what exactly is going on. Also use image source data that you know intimately, for example only floats with a value of 1.0f. Then you know whether or not the conversion is done correctly.

-Paul


#11

Thanks Paul for the piece of wisdom and for the advice. I started working with pre-defined values for sometime and I think I finally got it right :slight_smile:

Here’s the explanation: At first I started playing with values to understand Cinder’s numeric ranges, for uint8_t type we already knew that this was 0-255. For floats as Paul mentioned in his post, it’s a normalized range between 0.0f - 1.0f . For uint16_t the range was 0-65535 , to be precise for floats it’s 0.0f - 1.0f and for uint types it is:

uint8_t -> ( std::numeric_limits<uint8_t>::min() , std::numeric_limits<uint8_t>::max() )

uint16_t -> ( std::numeric_limits<uint16_t>::min() , std::numeric_limits<uint16_t>::max() )

As for dlib’s range (see this issue for detailed answer from dlib’s Davis), I realized that for most use cases and also in my case (since I was loading a jpg in a float container) the ranges are 0-255.


#12

So in the end I ended up with this code that works:

void load( ci::ImageTargetRef target ) {
    // get a pointer to the ImageSource function appropriate for handling our data configuration
    ci::ImageSource::RowFunc func = setupRowFunc( target );
    const size_t numChannels = ci::ImageIo::channelOrderNumChannels(mChannelOrder);
    
    // seperate the process for UINT8, UINT16 and FLOAT32 DataTypes        
    if(getDataType() == ci::ImageIo::DataType::UINT8){
        for( int32_t row = 0; row < mHeight; ++row ) {
            ((*this).*func)( target, row, mData );
            mData += mRowBytes;
        }
    }
    else if (getDataType() == ci::ImageIo::DataType::FLOAT32) {
        std::vector<float> data(mWidth * mHeight * numChannels, 0);
        for(auto& val: data){
            val = ci::lmap<float>(*mData, mSourceValueMin, mSourceValueMax, 0.0f, 1.0f);
            mData++;
        }
                
        for( int32_t row = 0; row < mHeight; ++row ) {
            std::vector<float> rowData(data.begin() + mWidth * numChannels * row, data.begin() + mWidth * numChannels * (row+1));
            ((*this).*func)( target, row, rowData.data() );
        }
   }
   else if (getDataType() == ci::ImageIo::DataType::UINT16) {
            std::vector<uint16_t> data(mWidth * mHeight * numChannels, 0);
            for(auto& val: data){
                val = ci::lmap<uint16_t>(*mData, mSourceValueMin, mSourceValueMax, std::numeric_limits<uint16_t>::min(), std::numeric_limits<uint16_t>::max());
                mData++;
            }
                
            for( int32_t row = 0; row < mHeight; ++row ) {
                std::vector<uint16_t> rowData(data.begin() + mWidth * numChannels * row, data.begin() + mWidth * numChannels * (row+1));
                ((*this).*func)( target, row, rowData.data() );
            }
        }
   }

Here mSourceValueMin and mSourceValueMax are imagined 0-255 unless changed by the user. Also I’m not so sure if using vectors like I did is the fastest route but it works pretty good for my case.

The only piece of conversion that is left to tackle are the LAB and HSI color models. I’ve seen CHAN_LAB_L, CHAN_LAB_A and CHAN_LAB_B definitions in Cinder’s ChannelType but still have to dig in to see how it will work for my case.


#13

Thanks a lot Paul for guiding me through this problem. Cheers!