Easily Determined, Deterministically Destroyed Textures

Hello,

Firstly thank you on behalf of myself and the silent majority to all who made Cinder and all who post here. Very useful forum (and the last) and excellent library.

I’m an old style C/C++ programmer trying to drag myself into the current decade on the shared pointer thing. I’ve been deliberately avoiding it until now and now its causing me some trouble. I’m not sure I like its apparent opaqueness - no, I am sure. I don’t. I prefer managing memory myself, but I will qualify that, as it is not a fully educated opinion yet and I may yet discover why not caring about memory usage appeals.

Anyway, back on point. My application does the following:

1.
Read a whole bunch of textures(encapsulated by the ImageViewer class) with the sole purpose of getting the texture width and height info. I do this for each file, like this:

bool ImageViewer::preLoad()
{
    try
    {
        cinder::ImageSourceRef imageSource = loadImage(loadAsset(mFilename));
        if (imageSource != nullptr)
        {
            mImageWidth = imageSource->getWidth();
            mImageHeight = imageSource->getHeight();
            return true;
        }        
    }
    catch (Exception e)
    {
        imageApp::mApplication->log("failed to create texture during pre load for IMAGE = " + mFilename);
        mImageWidth = 0;
        mImageHeight = 0;        
    }   
    return false;
}

I include the context above because of the following question:

a) Image Dimension Information
I can probably pull the image dimension bytes of the file totally independent of Cinder, but then would loose support for other formats (or hard code support for them), so this is not too desirable. Is the above the best way in Cinder to do such a task?

2.
Once the image sizes have been read, the Cinder app displays a 3D view where theses images are loaded on demand based on what is visible. The load send’s a request to the texture loader thread, which then does this:

...

try
{
    cinder::ImageSourceRef imageSource = loadImage(loadAssetAbsolute(textureLoaderRequest->mFilename), options);
    cinder::gl::Texture2dRef tex = gl::Texture2d::create(imageSource, mTextureFormat);

    // Lock this from the other thread to ensure multi thread safe syncing with opengl (just incase!)                    
    {   
        std::lock_guard<std::mutex> lk(mOpenGLClientWaitSyncMutex);       // TODO: is this needed?

         // we need to wait on a fence before alerting the primary thread that the Texture is ready
         auto fence = gl::Sync::create();
         fence->clientWaitSync();

         // Idenitfy the frame index in the call
         textureLoaderRequest->mContentViewer->setTexture(tex, requestFrameIndex);
    }
}
catch(Exception e)
{
    ...
}
...

I have the following areas of uncertainty:

a) OpenGL Fences
I must admit, I am still chewing over the OpenGL fences subject (the fence->clientWaitSync() bit of code came from a sample done by Andrew (thanks!)). At the moment there is only 1 texture loader thread, and the main thread. If I had multiple texture loader threads, would I need to synchronise the fence->clientWaitSync()'s? (or am I being daft here as would the clientWaitSync in each loader thread provide the necessary sync guards for multiple threads setting a texture to be drawn on the main thread). I did some reading, added the above lock_guard, and a bit of testing and it didnt seem to affect anything. (but that doesnt usually mean anything, hence my question).

b) Any kind of Texture Unloading
This brings me to my main problem. After running for a bit and zooming in out with thousands of largish textures on display, my poor, already suffering macbook, gets to its knees and keels over after the processes Private Bytes gets larger than the pagefile, and the system doesnt even have the courtesey to give me a blue screen! (yes I installed windows on it because I discovered after I got it that macOS and myself dont get on much). My pagefile ended up getting expanded to 30GB by windows without it even telling me!

I am at a complete loss as to how to deterministically delete textures. I know the theory - the shared pointer will just “go away” after scope loss, but I can’t see how this is a good thing (or if even possible) if we are tying to manage the “going away” - without reshaping the code to suite the language feature. All I know is that they are not being deleted anywhere and I think it may because the ImageViewers are held within a std::vector<> that to hangs around (due to need) until the application ends. I don’t really want to structure my code such that my code aligns with shared pointer semantics - If such a thing is even possible. Ideally I am looking for a simple method that just releases the image memory so that it is as symmetric as possible with my load function in the texture loader.

c) Smooth Texture Unloading
The end game would be to have another thread that performs the unloads to keep the UI smooth. My question is that if I create the texture on the loader thread which is initialised as follows:

...
mTextureLoaderRequests = new ConcurrentCircularBuffer<TextureLoaderRequest*>(TEXTURE_LOADER_REQUEST_Q_SIZE);
mTextureLoaderBackgroundCtx = gl::Context::create(gl::context());
mTextureLoaderThread = shared_ptr<thread>(new thread(bind(&imageApp::textureLoaderThreadFn, this, mTextureLoaderBackgroundCtx)));
...

Would the unloader thread need access to context that created the texture, and therefore would I need to setup context sharing between the loader and unloader thread and contexts. How would you tackle such a problem?

d) Bonus question (not directly Cinder, but resource related).
For applications with potentially large private memory usage, but also having the need to run on low memory hardware as well as hardware with lots of memory, then some sort of scheme to detect max mem usage must be devised. What if any are the best practice measures for an application stopping itself from allocating too much and making Windows (or any OS) constantly swap the pagefile until it exhausts and kills the system (as I described above). Should we just be monitoring the Page File Size and Private Bytes allocated and literally just stop allocating textures when the Private Bytes grows to some arbitrary percentage or the Page File Size, as a last resort to avoid the system dying?

Apologies for the large post with lots of questions I get carried away sometimes :slight_smile: - appreciate any and all fb.

Thanks!, Laythe

EDIT:

I just found out that I can do:

tex.reset();

make the shared pointer “empty”. Yet to benchmark but I guess this also results in a glDeleteTexture() call.

I should not reply to a post in the middle of the night (it’s 2AM here), so I’ll keep it brief and may revisit your question later.

The reason your memory is being flooded, is indeed because you’re storing the ImageSourceRef in a std::vector<>. As long as the vector is holding on to it, the reference count of the shared pointer will be at least 1 and the object pointed to (ImageSource) is never deleted.

I’d advise to store a weak pointer instead and then return the shared pointer to the code that is actually using it. As soon as that code discards the pointer, the weak reference will expire and the image will be deleted from memory. The power of a system like this lies in the fact that multiple objects can share the same image, you’ll never load it twice. The cache will make sure of that. But the image will also be deleted automatically when no object is using it any longer.

// Our cache stores weak pointers, because it does not have ownership of the image.
// See my remark about hashing the key below.
std::map<ci::fs::path, std::weak_ptr<ImageSource>> mCache;

ci::ImageSourceRef getImage( const ci::fs::path &src )
{
    // Check if our image is in the cache already and return it if so.
    // Note: you may want to hash "src" first, instead of using the path as key.
    if( mCache.count( src ) ) {
        auto image = mCache.at( src ).lock();
        if( image ) return image;
    }

    // Load the image file.
    auto image = loadImage( loadFile( src ) );

    // Store a weak pointer in our cache.
    mCache.insert( std::make_pair( path, image ) );

    // Return the shared pointer.
    return image;
}

-Paul

P.S.: if you’re not using the ImageSource other than creating a gl::Texture from it, you don’t have to hold on to it. Just keep the gl::Texture and discard the ImageSource. The code above would have to be rewritten so that it holds textures instead of images, of course.

Good morning! :slight_smile:

So after a good night’s sleep, I’ve returned with a few more pointers. First of all, here’s Herb Sutter talking about pointers and modern style C++. The gist: use smart pointers (std::unique_ptr, std::shared_ptr) for owning pointers, only use raw pointers for function parameters if they are optional. Smart pointers, especially std::unique_ptr, add very little overhead but make your code much more manageable. Write clear code that has good performance, not optimal performance.

You should regard smart pointers as a proper way to describe what you mean to do with the data. It’s like a contract where you clearly state your intentions. “I want to share this data”, “only I control this data”, etc. By stating your intentions, you become less prone to leaks and bugs, because the compiler will help you avoid them. It also means that you may need to think a bit harder on what it is you’re doing, which is why some people don’t like smart pointers. Until they have an obscure bug in their code that takes days to solve and would have been prevented by using smart pointers.

Oh, and promise me to never use auto_ptr.

On to question 1. You’re right that it’s probably much faster to simply open a binary stream and read the image file header to get the width and height. You’re also right that this requires you to know the file format, or be able to recognize it, which might be a lot of work. I don’t know if Cinder can help you out there and your code is probably how many of us would obtain the width and height of an image. You may want to look for an image loading c++ library that can do what you want.

Question 2a. I assume the lock_guard in your code also came from the sample code written by Andrew? I think it was there to synchronize access to a container shared among threads. But it has nothing to do with the fence. The fence itself is simply inserted into the GPU’s long “todo list” and gets signaled when everything up to the fence has been completed. You could create a bunch of textures and then use a single fence to check if they have been uploaded. But it’s perfectly fine to have several threads, each uploading a texture to the GPU and inserting a fence to check if their texture has been uploaded. The GPU driver will handle the complexity. Note that glClientWaitSync will block the caller thread, so preferably don’t use it on your main thread.

2b has been answered already.

2c: first of all, your ConcurrentCircularBuffer is created on the heap, which isn’t necessary and adds complexity. Create it on the stack instead. Secondly, textures created in a shared context can be destroyed from any shared context, it does not have to be the exact context that created the texture. Last but not least: you should not worry about when a texture is deleted from memory. Let the shared pointer worry about that. When you create the gl::TextureRef, memory on the GPU is allocated and the data is the copied from CPU memory to GPU memory after which you can delete the CPU memory (read: ImageSource). Then, as long as the reference count of gl::Texture is not zero, the texture will remain valid and usable. When it reaches zero, its destructor is called and the memory is freed. It’s that simple. If you’re still confused, read a bit more on how smart pointers work.

2d: in general you should not have to worry about running out of memory, simply because most computers have so much of it. If you do run out of memory, I’d blame your application design; it simply consumes too many resources. As explained above, you don’t need to hold on to the image source data once you’ve created a texture from it.

If you do need to know if there is still memory available, then there are no cross-platform solutions and you’ll have to write platform specific code to query the state of the memory. The only cross-platform solution would be to try and allocate memory and deal with it when that fails. When it comes to GPU memory size and knowing if a texture can still fit inside it: there is no waterproof solution there. The OpenGL driver does not provide functions to measure the size of the VRAM and guessing it by keeping track of the textures you upload is, to put it mildly, inaccurate.

Happy to help,

-Paul

Hi Paul,

Holy Mowly Batman, thanks for the very considered answers! People like you make this forum awesome. :slight_smile:

With regard to your first post, I am not storing the ImageSourceRefs from within the method, so as far as I understand, it should get cleaned up.

I do this:

...
cinder::ImageSourceRef imageSource = loadImage(loadAssetAbsolute(textureLoaderRequest->mFilename), options);
cinder::gl::Texture2dRef tex = gl::Texture2d::create(imageSource, mTextureFormat);
...

tex is then assigned to be stored in the my image container class (ImageViewer), whilst the image container class belongs to a vector (hate to say - but of raw pointers :)). I do not ever delete those image container objects, they always hang around by design until exit. What I was after was a way of just removing the texture, which it seems to be doing now.

I now have an unload() method which I can call in a symmetric manner to my load() in order to be able to control the quantity of texture data currently allocated by the application at any given time.

void ImageViewer::do_unload(TextureLoadRequest *textureRequest)
{
    int useCount = mImage.use_count();
    if (useCount != 1)
    {
        imageApp::mApplication->log("WARNING: trying to free an image viewer that has more than 1 reference. go learn more smart pointers! (" + std::to_string(useCount)+")");
    }

    mImage.reset();
    mImage = nullptr;
}

Does all that sound sensible to yourself? Is this a shunned upon misuse of shared pointers due to the explicit nature of the destruction?

It seems to work ok - I can see the Windows Process Private memory for my application throttle its memory itself and shift some of the work to the HD (by deallocating the image, and re-triggering a load when appropriate) and is in step with my anticipated behavior based on how much punishment I deem fit to assign to the HD/memory (bunch of parameters controlling (de)allocations). Next step will be to get some platform specific windows memory stats and then somehow use those to dictate what upper bounds to put on said parameters, such that with the same input specification (ie Load 1000000 images) the application takes advantage of a high end pc (can allocate lots and lots so the user can “fly around” fast without any loading artifacts), but does not grind to a halt on a low end laptop (the user has a nice smooth UI thread in this case, the images just take longer to appear). The other killer here is the actual drawing of the texture. My idea is to throttle the loading as described above and throttle the drawing by monitoring FPS.

I am a bit confused as to the following however - As you alluded to, I thought that the TextureRef’s are GPU only memory, therefore when you create a texture, some code somewhere copies the image source data directly into the GPU memory, leaving the TextureRef absent of the actual data. This is not what I see in Process Explorer however, where I see a healthy throttling behavior, but the actual Private Memory usage is such that it indicates that the TextureRefs do actually store the pixel data also (Ie lots of textures - 400MB ± usage, hardly any 20MB±. I’d be grateful if you would be able to clarify the process that is actually going on here? This may be other overhead on my part though - I need to confirm.

With regard to your second post, the lock_guard was me thinking that because 1 loader thread is syncing with “openGL”, then if we have another that wants to do the same, then do the 2 loader threads may need to negotiate between themselves a schedule (a mutex). But I realise now that that does not make sense in this context. Thanks, I will remove it.

Thanks again for your time!

ps. Note to self:auto_ptr shall never be used!.

Hi,

to comment on your code first: the use_count() method (or the equivalent unique()) should not be used in a multi-threaded environment, since it can’t be guaranteed that the shared pointer is still unique upon exiting that method. Another thread might just have incremented the reference count. Another small comment is that you don’t have to do mImage = nullptr, just calling reset() is enough.

With regards to memory usage of textures: you’re right that a gl::Texture does not store image data in system (CPU) memory. It only talks to the GPU to allocate memory in device (GPU) memory, then uploads the data. It also makes sure to deallocate the memory once the gl::Texture is destroyed. The reason you may see a lot of process memory could be due to debugging, but it is more likely that your GPU uses a bit of system memory as well. This is especially the case for integrated GPU like Intel’s. Other than that, I can’t think of any reason why you’re seeing so much memory.

As to your loading strategy: if it works for you then that is all you need. But you still might give my solution a try. I have often used it for systems very similar to yours, where for example I had to display a detailed map of The Netherlands with the ability to zoom in, very similar to Google Maps. The map was made of tiles of 512x512 pixels. Thanks to the cache, I did not have to worry about releasing the memory, I just loaded the images and did something like if( ! mTile->isVisible() ) mTile->unloadTexture(); where unloadTexture simply did a mTexture.reset(). The coarsest level of the map always remained in memory, which was simply a matter of storing those textures in a std::vector at the start of the application. Of course, loading textures was always done through the cache system to make sure they were shared where possible.

Happy to help,

-Paul

Ok thanks, I was concerned because I was experiencing the output log message every so often.

I will probably implement this one way, then re-implement it different ways to play around with the pros/cons of each. so will certainly use your cache, cheers.

I like the look of your application, in particular your border shadowing effect. Can I ask how you achieved this?

Cheers!

I’m not paul (by a long shot), but since the border appears to be static, I would implement it as a black inner glow in photoshop and dump it out as a PNG with alpha. Then you could just gl::draw() it above your scene (assuming a fixed window size), or as a 9-slice if resizable.

1 Like

Would it not appear “stretched” when rendered or is it just the alpha values that the texture would contain? I thought that stuff might be done in shaders and generated in real time so you can do arbitrary shaped shadowing with dynamics? you would also have the opportunity of choosing the performance hit by selecting what to shadow.

Oh, i misunderstood what you meant by “bordering” and was referring specifically to the shadowing at the edges of the screen. Assuming you’re talking about the drop shadows around the buttons and whatnot, there’s a few different ways to go about it. If all your “shadow casting” objects are rectangular, the pre-rendered shadow + 9-slice technique is still a very valid (and performant) option, but for arbitrary shadowing, one such method could be (in pseudo-code):

on draw
    bind a gl::Fbo the same size as the screen and clear to ColorAf(0, 0, 0, 0);
        for each shadow caster
            draw object as solid black
        end
    end

    run multiple blur passes using Fbo ping pong, starting with the source Fbo (can usually afford to do this at half res)

    draw the final buffer's color texture, offset by some vec2 for shadow distance

    render scene as normal
end

Obviously a lot slower because you’re rendering all your geometry twice, not to mention the multiple blur passes required multiple texture samples each, but the result is a proper dynamic drop shadow (that’s not without its problems, like everything in graphics programming ;))

*edit. Upon review, that pseudocode was kind of gibberish. Here’s a quick example of the technique, but by no means an optimal or even correct way of going about it.

Hi,

I believe the shadow around the borders was done with a simple vignetting trick in the fragment shader:

vec2 uv = vertTexCoord0.xy;
float vignette = 0.5 + 0.5 * 16.0 * uv.x * uv.y * ( 1.0 - uv.x ) * ( 1.0 - uv.y );
fragColor.rgb *= vignette;

See also:
https://www.shadertoy.com/view/4dfGzn

-Paul