Hi,
I’m doing a texture loader application and have hit a bit of a brick wall. This is possibly linked to my other question:
“Actual GPU memory usage vs glDeleteXXXX?”
But I’m using a PBO now, so I’m thinking it warrant a question in itself.
After further investigations (Ie. Giving up :)), I went back to a single threaded loader where I could load and unload images as per the camera frustum. I could observe the allocated video memory increase/decrease accordingly. For example an image of 959x1440 took about 13MB in GPU memory, so the memory usage started at 30MB when nothing in view, (presumably some system memory), and then went to 43MB when loaded and in view and then back to around 30MB when unloaded and not in view again. This memory behavior was like clockwork.
This seemed fine and gave me a good reference, but since this was single threaded, as expected it suffered from unacceptable stuttering, so I have tried various different methods of putting the texture loading on a second thread and doing the loading there. All my methods were based on various examples that I found (mainly paul.houx - thanks paul) and my reading on GL contexts etc.
All my attempts work from a visual point of view but I cannot satisfy myself that they work from a memory usage point of view. I’m not sure whether I am right about being wrong, or I just have a wrong understanding. Either way – there is way too many wrongs in there!
From what I gather the best way to do this is to use a PBO in a background thread, so I implemented that as per below pseudo code:
void setup()
{
mTextureLoaderRequests = new ConcurrentCircularBuffer<TextureLoadRequest*>(TEXTURE_LOADER_REQUEST_Q_SIZE);
mTextureLoaderBackgroundCtx = gl::Context::create(gl::context());
mPBO = gl::Pbo::create(GL_PIXEL_UNPACK_BUFFER, 5000 * 5000 * 4, nullptr, GL_STATIC_DRAW);
mTextureLoaderThread = shared_ptr<thread>(new thread(bind(&imageApp::textureLoaderThreadFn, this, mTextureLoaderBackgroundCtx)));
}
void textureLoaderThreadFn(gl::ContextRef context)
{
ci::ThreadSetup threadSetup;
context->makeCurrent();
while (1)
{
TextureLoadRequest *textureLoaderRequest = nullptr;
while (mTextureLoaderRequests->tryPopBack(&textureLoaderRequest))
{
// Load texture in CPU memory.
auto surface = loadImage(loadFile(textureLoaderRequest->mFilename));
// Upload to GPU using the Pbo.
auto fmt = gl::Texture2d::Format().intermediatePbo(mPBO);
auto texture = gl::Texture2d::create(surface, fmt); // TODO check size of PBO > size of texture
// Create a fence so we know when the upload has finished.
auto fence = gl::Sync::create();
// read this may be needed - TODO: is this needed?
glFlush();
// Now check the fence.
while (1)
{
auto status = fence->clientWaitSync(GL_SYNC_FLUSH_COMMANDS_BIT, 0L);
if (status == GL_CONDITION_SATISFIED || status == GL_ALREADY_SIGNALED)
break;
}
// assign the texture so main thread can access it
{
std::lock_guard<std::mutex> lk(mImageMutex);
mImage = texture;
}
}
}
}
void draw()
{
std::lock_guard<std::mutex> lk(mImageMutex);
gl::draw(mImage...)
}
Visually, when I run this, the application feels super smooth and things load very fast, however the memory usage does not behave as per single threaded version.
Multithreaded PBO vs Single Thread Memory Usage
If I initially run the application with 0 images loaded, and then bring 1 image into view, record the memory usage, bring it out again and repeat, I get the following memory usage (for both single and multi-threaded PBO:
Multithreaded PBO OpenGL Contexts
For the multithreaded PBO case, in CodeXL (visual studio tool to debug GPU), I can also see the following structures for the GL contexts:
So starting with nothing in view:
0 allocations – 0 images loaded
Then bringing 1 image into view:
1 allocations – 1 images loaded
If I then move the image out of view again, so that the image is unloaded, the GL context tree looks as per 0 allocations – 0 images loaded.
I’ve breakpointed on any Texture creations, and it does not get created by my application, so Texture 1 (512x512 as seen in 0 allocations – 0 images loaded) is being created, but I have no idea what is creating it.
Singlethreaded PBO OpenGL Contexts
For the single threaded case, in CodeXL (visual studio tool to debug GPU), I can also see the following structures for the GL contexts:
So starting with nothing in view:
0 allocations – 0 images loaded
Much cleaner and as expected, no texture are present.
1 allocations – 1 images loaded
As expected, 1 texture is present.
Conclusion
I can see in CodeXL that the PBO is 97MB in size, so I would expect a larger memory usage, but I cannot make sense of what I am seeing. It seems that the graphics driver is freeing the GPU memory but not the amounts expected and is allocating more than expected. Does my usage of PBO/multi-threaded texture loading look ok? Am I correct, to use the single threaded version as a reference with regard to memory usage?
I’ve also tried not using a PBO, but the memory misbehavior seems identical.
Appreciate any help or pointers,
thanks! - Laythe