I am using a Texture3D to optimize my application’s drawing speed. Most of my drawing is a single instanced call with rectangles cropped from that texture.
I now would like to optimize updating the content in that texture. New images are loaded on the fly and slotted into their own z-layer of the 3d texture.
My first and current working approach was just to use Texture3d::update(surface, level)
in a separate thread. This avoids blocking the CPU with the initial image load/transfer, which is great, but can introduce some stuttering due to (I believe) the GPUs implicit synchronization behavior.
This update method is simple and easy to understand:
_texture->update(*surface, index);
I would like to move to a faster approach and am trying (at the recommendation of Ryan Bartley) to use a PBO and a glFenceSync object so I can upload the data without hitting the GPUs implicit synchronization. I have successfully written code that uploads to a PBO and uses a client fence to signal when that upload is complete (and there are no hiccups caused by this).
This upload happens in code that is essentially as follows:
auto *pixels = static_cast<ColorA8u*>(_pixel_buffer->mapBufferRange(offset, imageByteSize, GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT));
// might need to correct for channel order (BGRA?)
std::memcpy(pixels, surface->getData(), surface->getRowBytes() * surface->getHeight());
_pixel_buffer->unmap();
// create a fence so CPU doesn't proceed on this thread until the mapped buffer writing is complete
auto fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
auto result = glClientWaitSync(fence, GL_SYNC_FLUSH_COMMANDS_BIT, 0);
auto ready = [&result] {
return (result == GL_ALREADY_SIGNALED) || (result == GL_CONDITION_SATISFIED);
};
while (!ready()) {
if (result == GL_WAIT_FAILED) {
auto err = gl::getError();
CI_LOG_E("Gl error waiting on fence: " << gl::getErrorString(err));
}
result = glClientWaitSync(fence, GL_SYNC_FLUSH_COMMANDS_BIT, 0);
}
That copies the data over the the GPU and seems to be working fine (without any way to look at the result). Unfortunately, I am unsure how to make the Texture3d’s data correspond to that in the Pbo.
To set the texture data, I tried passing my Pbo
to Texture::Format::setIntermediatePbo()
, but that only seems to be (optionally) used at initial texture construction time. There is no Texture3d::update(PboRef)
method to try like there is for Texture2D
. Furthermore, I think it would be ideal if I could just write directly to the texture’s pixel data without going through another copy step on the GPU. That would both avoid doubling the amount of data on the GPU and adding more hidden synchronization points.
Is there some way I can just get the buffer of data used by my Texture3d, then map and write directly to it? Do I need a Pbo for that operation? If so, how might I tell the texture to only update from a portion of the Pbo? And (fingers crossed) will that texture update cause the initial synchronization hiccups that this whole thing is meant to avoid?
I have treated Vbos as 1-dimensional textures in the past, which gives me hope that I can do something similar for this data upload. I’m mostly not sure how to specify the buffers at the moment.