How to speed up vbo's mapReplace()?

Hey all,

I’m working on an app where every few frames I have to upload pointcloud data from the CPU to the GPU. I used the instanced teapots as a starting point for my app and worked my way from there. I get the visual I want, and can update the pointcloud but the issue is that mapReplace() is quite slow. I’m using the following code to do the mapReplace:

void PointCloud::updateInstance()
{
    Ply::Polygon dummy{ quat(), ci::ColorAf::zero(), vec3(std::numeric_limits<float>::max()), vec3(0) };
    
    auto start = std::chrono::steady_clock::now();
    Ply::Polygon *data = (Ply::Polygon*) m_vbo->mapReplace();
    auto end = std::chrono::steady_clock::now();
    std::cout << "replace took: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms\n";

    for (size_t i=0; i<POINTCLOUD_MAX; i++)
    {
        if (i < m_target->size())
            *data++ = (*m_target)[i];
        else
            *data++ = dummy;
    }
    m_vbo->unmap();
}

Timing mapReplace like I do above results in a value all over the place, sometimes it takes 0ms, often it takes 50ms. I don’t really care if the mapReplace() is done within a frame, but I don’t want the mapReplace to make my app lag like it does now. This made me wonder if there’s a way to ping-pong a VBO, or have it done in a separate thread? Here is where my knowledge ends though, and I’m not exactly sure where to start. Any tips hints and especially code is much appreciated…!

For reference, I’m using a NVidia 3080, Ryzen 7 5800X. Ideally I’d like to upload data for around 300k particles, but that’s definitely an ideal and can be much lower if impossible. Each particle has a quat for rotation, a color, a vec3 for position and a vec3 for scale.

Many thanks!
Willem

The variance in timing certainly hints at unpredictable contention on the buffer, so multibuffering that should solve that (assuming that is in fact what’s bottlenecking you). Keep in mind that depending on your case, 2 buffers may not be enough to prevent this so some kind of ring of offset read/write buffers might be the better choice.

This article has good info about various buffer updating strategies, but NB I believe cinder’s mapReplace uses orphaning by definition so don’t chase that one too hard.

A

Thanks for the quick reply Lithium! Read through the link you posted - reading it makes sense, I’m just not sure how to apply the information without any concrete code examples…!

As a test I disabled my call to draw the batch, by which I simply mean commenting out m_batch->drawInstanced(POINTCLOUD_MAX);. The duration of the mapReplace call now drops to a consistent 16ms at 60fps, and 0ms at 30fps (though as you can see in the image, the FPS still dips everytime I do the replace). I imagine this has to do with implicit synchronisation as described under “The problem” in the wiki article you linked?

30FPS trace:
30fps

60FPS trace:
60fps

EDIT: rabbithole goes a little deeper, I’m seeing these spikes even when drawing absolutely nothing in my app…!