Optimised CPU read back of GPU data

Hi,

I’m a bit out of my depth here (best way to be me thinks), but I am poking around looking for an optimization that could reduce GPU to CPU data transfer for my application.

I have an application that performs some modifications to vertex data in the GPU. Occasionally the CPU has to read back parts of the modified vertex data and then compute some parameters which then get passed back into the GPU shader via uniforms, forming a loop.

It takes too long to transfer all the vertex data back to the CPU and then sift through it on the CPU (millions of points), and so I have a “hack” in place to reduce the workload to usable, although not optimal.

What I do:

  1. CPU: read image
  2. CPU: generate 1 vertex per pixel, Z based on colour information/filter etc
  3. CPU: transfer all vertex data to GPU
  4. GPU: transform feedback used to update GL_POINT vertex coords in realtime based on some uniform parameters set from the CPU.

When I wish to read only a rectangular “section”, I use mapBufferRange to map the entire rows that comprise the desired rect (bad diagram alert):

This is supposed to represent the image/set of vertices in the GPU. My “hack” involves having to read all the blue and red vertices. This is because I can only specify 1 continuous range of data to read back.

Does anyone know a clever way to efficiently get at the red, without the blue? (without having to issue a series of mapBufferRange calls)

Cheers!
Laythe

If you’re storing your vertex data in textures, you could have a look at PBOs.

Hi lithium.snepo. Please correct me as necessary, I’m talking on my “edge” :slight_smile: I generate vertex x,y 's from the image x,y and then compute a Z based on colour in the CPU, then give it to a vertex shader to modify. I’m a complete newbie at this really - Due to lack of graphics hardware to test I could no longer progress my previous PBO investigations so I backed them off (for now).

I suppose I could do what I do to generate the initial x,y,z points as a texture with the a unused, then pass it in to shader for use, but how would PBO’s help in this scenario, in terms of read back of said data?

EDIT: it just occurred to me: did you mean that the vertex shader writes to a texture (or PBO?) and then the CPU configures read back of said texture (via a PBO?). In that case - this could still be too costly, as the CPU after readback would have to prepare another texture for use to feedback into the shader. Also, how could I get at the red instead of both the blue and red in the PBO in order to optimize it?

I guess something like glMapRangeForARectSection(…) would be useful if such a thing or equivalent exists. I guess this kind of optimization may not be visible to the client app?
Cheers

Yes, you could store all your vertex information in 2 PBOs (then your mesh just needs to be a bunch of UV coordinates to sample the positional data from), which you can then ping pong. Have a look at the “Asynchronous read-back” example here

*edit
Without knowing your specific use case, you could always render into an Fbo and then call readPixels8u, which takes an Area to read from

@lithium Thanks for the link and food for thought. The use case is that I render the image into a 3D world as GLPoints, coloured and offset in the Z by an amount based on the colour info (sized etc according to distance). Then the user can modify the vertex Z data with a mouse cursor brush. the logic behind some of the brush application code needs to know the Z’s of the area under the mouse (brush circle), eg. min/max/average etc so that the CPU can control the shaders modification of data by setting a series of uniforms that feed into the shader. So for example the user can say, I want all points under the cursor to set to the average value. This could all probably be done entirely in the GPU, but the idea is that once I get the CPU-GPU “loop” I can then expand out the min/max/avg stuff to do interesting things on the CPU that would be cumbersome (probably) to do entirely on the GPU.

Hi,

Have you considered using compute shaders?

Hi balachandran_c,

I haven’t learnt about those yet, but from what I have gathered it is a different way of doing the same thing. I could be wrong off course? I guess the logic would be similar - something like:

  1. Setup initial data in CPU. (x,y,z’s)
  2. GPU performs calcs on huge 1D array (vbo) based on initial position and some other parameters from CPU.
  3. CPU reads back either all or part of it in order to work out how to manipulate variable the GPU.

Would my problem not still remain because the 2D image representing the brush (the red), is actually multiple ranges in a single 1D buffer. What I’m really after is a way to get at multiple ranges within a buffer without issuing multiple requests from the CPU (1 per row of the red).

Cheers!

Hi @laythe,

Please take a look at the comp shader sample under cinder/samples/_opengl/NVidiaComputeParticles. While it is possible that in your particular case, it is absolutely necessary to move data to the CPU, you may not actually have to do it. The memory access model for compute shaders is far more flexible than that of fragment shaders. So you should be able to do the kind of ops that you have mentioned, entirely in the GPU. Even N-body interactions work okay for a few 1000 bodies without any optimizations.

Do keep us updated on your progress, it sounds interesting.

Bala.

Hi Bala,

Would that work on a macbook with win10 ? :slight_smile:

EDIT: crikey it does! be right back…

EDIT EDIT: Ps. can we change that name of the sample for noobs like me please? :slight_smile:
Cheers!

Hey Laythe,

Can you elaborate what are the brush operations (“min/max/avg stuff”, point 2 and 3 of your previous post,…). It sounds to me like you should leave the vertex data alone and only manipulate textures/pixels. This would probably make it easier to port the logic to GLSL and in the worst case if you still need to read-back to the cpu you should be able to query a small area of pixels instead of having to map one of your buffer.

I might be missing a few details, but I would create a static gl::Batch (probably with a geom::Plane().subdivisions( pixelsSize ) as a source). Its vertex shader would take as input the different maps resulting from the painting operations to move the z component of your vertices:

in vec4 ciPosition;

uniform mat4 ciModelViewProjection;
uniform sampler2D uPaintMap;
uniform vec2 uPaintMapSize;

void main()
{
   vec3 position = ciPosition.xyz;
   vec4 paintSample = texture( uPaintMap, ciPosition.xy / uPaintMapSize );
   position.z += paintSample.x; // there's 3 other components you could
   // use to store other data. If you can make your logic to render
   // your modified mesh to fit in here, it will make your life easier
   // and your gpu happier. The idea here is to have "ciPosition" be what
   // you call "inital data" and "paintSample" be whatever modification
   // the user would have done to that vertex.
   gl_Position = ciModelViewProjection * vec4( position, 1.0f );
}

You could implement the brush operations to uPaintMap with a ping-ponging gl::Fbo

ci::gl::FboRef mPaintMap[2];
size_t mPaintMapRead = 0;
size_t mPaintMapWrite = 1;

gl::ScopedMatrices scopedMatrices;
gl::ScopedViewport scopedViewport( mPaintMap[mPaintMapWrite]->getSize() );
 // if your brush operations are only additive you can get rid of the ping-pong fbo
// and turn the following into a gl::ScopedBlendAdditive
gl::ScopedBlend scopedBlend( false );
gl::ScopedDepth scopedDepth( false );
gl::ScopedFrameBuffer scopedFbo( mPaintMap[mPaintMapWrite] );
gl::ScopedTextureBind scopedTexBind0( mPaintMap[mPaintMapRead], 0 );
gl::ScopedGlslProg scopedGlslProg( mBrushGlsl );
gl::setWindowMatrices( ( mPaintMap[mPaintMapWrite]->getSize() );

mBrushGlsl->uniform( "uBrushPos", mBrushPos );
mBrushGlsl->uniform( "uBrushRadius", mBrushRadius );
gl::drawSolidRect( mPaintMap[mPaintMapWrite]->getBounds() );

std::swap( mPaintMapRead , mPaintMapWrite  );

From there it should be a matter of fitting the painting logic in mBrushGlsl fragment shader (again if I’m not missing anything):

uniform vec2 uBrushPosition;
uniform float uBrushRadius;
uniform sampler2D uPaintMap;
uniform vec2 uPaintMapSize;

out vec4 oColor;

void main()
{
   vec4 paintSample = texture( uPaintMap, gl_FragCoord.xy / uPaintMapSize );
   vec4 output = paintSample;
   
   if( length( uBrushPosition - gl_FragCoord.xy ) < uBrushRadius ) {
   // float brushAlpha = texture( uBrushMap, gl_FragCoord.xy / uPaintMapSize ).a;
   // if( brushAlpha > 0.0f ) {

       // sample neighbor texels, current texel or any 
           // other fancy operation you need. This fragment will
           // be executed on user interaction, so you should probably
           // be able to go do relatively expansive things here.

   }
   
   oColor = output;
}

I hope this helps!

1 Like

Hi Simon,

I will give what you said a go, cheers!

I tried to answer as detailed as possible below, so added a 4. Please excuse the bit of a code dump and general state of code - learning graphics, not programming an’ all :slight_smile:

2. GPU performs calcs on huge 1D array (vbo) based on initial position and some other parameters from CPU

(particleUpdate.vs vertex shader)


#version 150 core

uniform float  uElapsedSeconds;
uniform vec3   uMousePos;
uniform bool   uMouseDown;
uniform bool   uCTRLDown;
uniform int    uToolMode;   
uniform float  uBrushRadius;
uniform float  uBrushLevel;         // how much to offset at center of radius (mouse point)     
uniform float  uBrushFeather;   
uniform int    uBrushFeatherProfile;
uniform float  uLevelValue;
uniform float  uMaxVertexDepthZ;    
uniform float  uMinVertexDepthZ;
uniform bool    uDepthRenderMode;
uniform float   uDepthRenderModeMinVertexDepthZ;
uniform float   uDepthRenderModeMaxVertexDepthZ;
uniform float   uFlattenTargetValue;

in vec3   iPosition;
in vec4   iColor;
in vec3   iHomePosition;
in vec4   iHomeColor;

out vec3  position;
out vec4  color;
out vec3  homePosition;
out vec4  homeColor;

float easeInOutSine(float t) {
  return -0.5 * (cos(3.141592653589793 * t) - 1.0);
}
float easeInOutQuad(float t) {
  float p = 2.0 * t * t;
  return t < 0.5 ? p : -p + (4.0 * t) - 1.0;
}
float easeInOutCubic(float t) {
  return t < 0.5
    ? 4.0 * t * t * t
    : 0.5 * pow(2.0 * t - 2.0, 3.0) + 1.0;
}
float easeInOutQuart(float t) {
  return t < 0.5
    ? +8.0 * pow(t, 4.0)
    : -8.0 * pow(t - 1.0, 4.0) + 1.0;
}
float easeInOutQuint(float t) {
  return t < 0.5
    ? +16.0 * pow(t, 5.0)
    : -0.5 * pow(2.0 * t - 2.0, 5.0) + 1.0;
}
float easeInOutExpo(float t) {
  return t == 0.0 || t == 1.0
    ? t
    : t < 0.5
      ? +0.5 * pow(2.0, (20.0 * t) - 10.0)
      : -0.5 * pow(2.0, 10.0 - (t * 20.0)) + 1.0;
}
float easeInOutCirc(float t) {
  return t < 0.5
    ? 0.5 * (1.0 - sqrt(1.0 - 4.0 * t * t))
    : 0.5 * (sqrt((3.0 - 2.0 * t) * (2.0 * t - 1.0)) + 1.0);
}
float easeNone(float t) {
  return t;
}

float ease(float t) 
{
    if (uBrushFeatherProfile == 1)
        return easeInOutSine(t);
    else if (uBrushFeatherProfile == 2)
        return easeInOutQuad(t);
    else if (uBrushFeatherProfile == 3)
        return easeInOutCubic(t);
    else if (uBrushFeatherProfile == 4)
        return easeInOutQuart(t);
    else if (uBrushFeatherProfile == 5)
        return easeInOutQuint(t);
    else if (uBrushFeatherProfile == 6)
        return easeInOutExpo(t);
    else if (uBrushFeatherProfile == 7)
        return easeInOutCirc(t);
        
  // no easing
  return t;
}

void main()
{
    // get distance from particle to mouse cursor
    float distance = length(uMousePos - iPosition.xyz);
    
    // get distance from particle to mouse cursor
    vec3 camaraAtZ0ToVertex = iPosition;
    camaraAtZ0ToVertex.z = 0;
    float distMouseAtZ0ToVertex = length(uMousePos - camaraAtZ0ToVertex.xyz);
    
    vec3 pos = iPosition;
    vec4 col = iColor;

    if (distMouseAtZ0ToVertex < uBrushRadius )
    {
        if(uMouseDown)
        {
            // SCULPT tool
            if (uToolMode == 0)     
            {           
                // Feather

                // ease produces number between 0 and 1 indicating how far into the brush this vertex is.
                float totalFeatheredRadius = uBrushRadius-uBrushFeather;
                float vertexProportion = (uBrushRadius-distMouseAtZ0ToVertex) / totalFeatheredRadius;
                float deltaZ = uBrushLevel * ease(vertexProportion);

                if (uCTRLDown)
                {
                    // lower
                    pos.z += deltaZ;
                }
                else
                {
                    // raise
                    pos.z -= deltaZ;
                }
            }
            // LEVEL  tool
            else if(uToolMode == 1) 
            {
                pos.z = -uLevelValue;           // Z is going from - to + as you go into the screen1
            }
            // RESET tool
            else if(uToolMode == 2) 
            {
                pos = iHomePosition;
            }   
            // FLATTEN tool
            else if(uToolMode == 3) 
            {   
                // Feather

                // ease produces number between 0 and 1 indicating how far into the brush this vertex is.
                float totalFeatheredRadius = uBrushRadius-uBrushFeather;
                float vertexProportion = (uBrushRadius-distMouseAtZ0ToVertex) / totalFeatheredRadius;
                float deltaZ = uBrushLevel * ease(vertexProportion);

                // which direction are we going relative to the flattenTargetValue value?
                if (pos.z > uFlattenTargetValue)
                {
                    pos.z -= deltaZ;
                }
                
                if (pos.z < uFlattenTargetValue)
                {
                    pos.z += deltaZ;
                }
            }   
        }   
        
        // highlight in depth render mode
        if (uDepthRenderMode)
        {
            float Input = pos.z;
            float InputHigh = -uDepthRenderModeMaxVertexDepthZ;  
            float InputLow = -uDepthRenderModeMinVertexDepthZ;  
            float OutputHigh = 1;
            float OutputLow = 0;  
            float depthScale = ((Input - InputLow) / (InputHigh - InputLow)) * (OutputHigh - OutputLow) + OutputLow;

            if (depthScale > 0.8)   // clamp so we are always able to highlight brush
                depthScale = 0.8;

            col.x = depthScale + 0.2;
            col.y = depthScale;
            col.z = depthScale;
            col.w = 1.0;
        }
        // highlight normal colour
        else
        {
            float brushOuterBorderWidth = uBrushRadius/10.0;
            float brushInnerBorderWidth = uBrushRadius/20.0;
            float midpoint = uBrushRadius/2.0;
            float innerBorderMin =  midpoint-(brushInnerBorderWidth/2.0);
            float innerBorderMax =  midpoint+(brushInnerBorderWidth/2.0);

            // Outer Yellow Border 1/10 the brush size
            if (distMouseAtZ0ToVertex > uBrushRadius-brushOuterBorderWidth)
            {
                col.x = 0.7;
                col.y = 0.7;
                col.z = 0.0;
                col.a = 0.5;
            }
            // Inner border (half way where easing takes effect)
            else if (uBrushFeatherProfile != 0  &&      // only show when easing enabled
                    (distMouseAtZ0ToVertex > innerBorderMin) &&
                    (distMouseAtZ0ToVertex < innerBorderMax))
            {
                col.x = 0.0;
                col.y = 0.0;
                col.z = 0.7;
                col.a = 0.5;
            }
            else
            {
                col = iHomeColor;               // reset brush interior colour to original image colour
            }
        }
    }
    else
    {
        // only oscillate vertices if not under mouse brush
        // TODO: scale osicaltions to distance
        float oFactor = 1.0;
        float phase = uElapsedSeconds;
        float offset = (pos.x + (pos.y * 0.05 )) * 6.5;
        pos.z += (sin(phase+offset ) * 0.2);

        // if outside brush radius, color is normal colour 
        if (uDepthRenderMode)
        {
            // ...unless in depth mode
            float Input = pos.z;
            float InputHigh = -uDepthRenderModeMaxVertexDepthZ;  
            float InputLow = -uDepthRenderModeMinVertexDepthZ; 
            float OutputHigh = 1;
            float OutputLow = 0;  
            float depthScale = ((Input - InputLow) / (InputHigh - InputLow)) * (OutputHigh - OutputLow) + OutputLow;

            col.x = depthScale;
            col.y = depthScale;
            col.z = depthScale;
            col.w = 1.0;
        }
        else
        {
            col = iHomeColor;       
        }
    }   
    position = pos;
    color = col;
    homePosition = iHomePosition;
    homeColor = iHomeColor;
}

3. CPU reads back either all or part of it in order to work out how to manipulate variable the GPU.
(This happens when the user eg selects the flatten tool)


pickedPoint = x,y intersection of ray from mouse position on camera plane, with the Z=0 plane
m3DImage = representing a content viewer (eg image)
mSelectedVertexStats = representing the selected

    int br = getCurrentBrushRadius();
    int numBrushVertices = br*br;                
    vec2 pp = vec2(pickedPoint.x, pickedPoint.y) - vec2(m3DImage->getCurrentPosition().x, m3DImage->getCurrentPosition().y);
    
    if (pp.x < -br)
    {
        mSelectedVertexStats.min = -1;
        mSelectedVertexStats.max = -1;
        mSelectedVertexStats.avg = -1;
        return;
    }
    if (pp.y < -br)
    {
        mSelectedVertexStats.min = -1;
        mSelectedVertexStats.max = -1;
        mSelectedVertexStats.avg = -1;
        return;
    }
    if (pp.x > m3DImage->getWidth()+br)
    {
        mSelectedVertexStats.min = -1;
        mSelectedVertexStats.max = -1;
        mSelectedVertexStats.avg = -1;
        return;
    }
    if (pp.y > m3DImage->getHeight()+br)
    {
        mSelectedVertexStats.min = -1;
        mSelectedVertexStats.max = -1;
        mSelectedVertexStats.avg = -1;
        return;
    }

    float x = pp.x;
    float y = pp.y;
    if (x < 0) x = 0;
    if (y < 0) y = 0;
    if (x > m3DImage->getWidth()-br) x = m3DImage->getWidth()-br;
    if (y > m3DImage->getHeight()-br) y = m3DImage->getHeight()-br;
    int offset = (y*m3DImage->getWidth());
    int numVerticesToRead = m3DImage->getWidth()*br;

    Particle *particleData = (Particle*)mParticleBuffer[0]->mapBufferRange(offset * sizeof(Particle), numVerticesToRead * sizeof(Particle), GL_MAP_READ_BIT);
    if (particleData != nullptr)
    {
         // get the latest min/max from the data read
        float minValueZ = DBL_MAX;
        float maxValueZ = -DBL_MAX;
        double avg = 0;
        int numHighlightedVertices = 0;
        for (int vertexIndex = 0; vertexIndex < numVerticesToRead; vertexIndex++)
        {
            float height = particleData->currentPos.z;

            // discount vertices that are not within brush radius

            // get distance from vertex to mouse cursor
            float distance = length(vec2(pickedPoint.x , pickedPoint.y) - vec2(particleData->currentPos.x, particleData->currentPos.y));

            if (distance < mBrushRadiusStrokeStart)
            {
                avg += (double)height;

                if (height < minValueZ)
                {
                    minValueZ = height;
                }

                if (height > maxValueZ)
                {
                    maxValueZ = height;
                }
                numHighlightedVertices++;
            }
            particleData++;
        }

        mSelectedVertexStats.min = minValueZ;
        mSelectedVertexStats.max = maxValueZ;
        mSelectedVertexStats.avg = avg / numHighlightedVertices;
    }

    mParticleBuffer[0]->unmap();
}

4. I have the vertex and fragment shader setup in code as follows.


mRenderProg = gl::GlslProg::create(gl::GlslProg::Format()
.vertex(CI_GLSL(150,
uniform mat4    ciModelViewProjection;
uniform mat4    ciModelView;
uniform float   iMaxDistance;
uniform float   iMinDistance;
uniform float   iMinPointScale;
uniform float   iMaxPointScale;
in vec4         ciPosition;
in vec4         ciColor;
out vec4        pixelColor;
void main(void) {
    vec3    vertex = vec3(ciModelView * ciPosition);
    float   dist = length(vertex);
    float pointScale = (1.0 - (dist / iMaxDistance));

    // calculate point size 
    float Input = dist;
    float InputHigh = iMaxDistance; 
    float InputLow = iMinDistance; 
    float OutputHigh = iMinPointScale;
    float OutputLow = iMaxPointScale;
    gl_PointSize = ((Input - InputLow) / (InputHigh - InputLow)) * (OutputHigh - OutputLow) + OutputLow;

    gl_Position = ciModelViewProjection * ciPosition;
    vec4 c = ciColor;
    pixelColor = c;
}
))

.fragment(CI_GLSL(150,
uniform bool    drawPointsRound;
uniform bool    drawPointsFade;
in vec4         pixelColor;
out vec4        oColor;
void main(void)
{
    // Make points circlular if specified
    float pDist = -1;
    if (drawPointsRound)
    {
        vec2 coord = gl_PointCoord - vec2(0.5);
        pDist = length(coord);
        if (pDist > 0.5)
            discard;
    }
    oColor.rgb = pixelColor.rgb;
    // If the A is less than 1 then the vertex 
    // shader has indicated that this vertex is seleceted
    if (pixelColor.a < 1)
    {
        // pass throught the alpha valu calculated by the vertex shader.
        // This is used for highlighting the brush
        oColor.a = pixelColor.a;
    }
    else
    {
        // non highlighted vertices - fade out GL point to border, if specified
        if (drawPointsFade)
        {
            // calculate distance to border of GL point if not already calculated above.
            if (pDist == -1)
            {
                vec2 coord = gl_PointCoord - vec2(0.5);
                pDist = length(coord);
            }
            oColor.a = 1 - pDist;
        }
        else
        {
            oColor.a = pixelColor.a;    // should be 1 as reveiced from the v shader
        }
    }
}
)));

mUpdateProg = gl::GlslProg::create(gl::GlslProg::Format().vertex(loadAssetAbsolute("C:\\shaders\\particleUpdate.vs"))
                                   .feedbackFormat(GL_INTERLEAVED_ATTRIBS)
                                   .feedbackVaryings({ "position", "color", "homePosition", "homeColor" })
                                   .attribLocation("iPosition", 0)
                                   .attribLocation("iColor", 1)
                                   .attribLocation("iHomePosition", 2)
                                   .attribLocation("iHomeColor", 3));

mRenderProg called in the draw function:

    gl::ScopedGlslProg render(mRenderProg);
    mRenderProg->uniform("drawPointsRound", mDrawPointsRound);
    mRenderProg->uniform("drawPointsFade", mDrawPointsFade);        
    mRenderProg->uniform("iMaxDistance", mMaxDistance);
    mRenderProg->uniform("iMinDistance", mMinDistance);
    mRenderProg->uniform("iMinPointScale", mMinPointScale);
    mRenderProg->uniform("iMaxPointScale", mMaxPointScale);

mUpdateProg called in the update function:

    gl::ScopedGlslProg prog(mUpdateProg);
    gl::ScopedState rasterizer(GL_RASTERIZER_DISCARD, true);    // turn off fragment stage
    mUpdateProg->uniform("uElapsedSeconds", (float)getElapsedSeconds());
    mUpdateProg->uniform("uMousePos", getPickedPoint());
    mUpdateProg->uniform("uMouseDown", mMouseLeftDown);
    mUpdateProg->uniform("uCTRLDown", mCTRLDown);
    mUpdateProg->uniform("uToolMode", (int)mToolMode);        
    mUpdateProg->uniform("uBrushRadius", getCurrentBrushRadius());  // animated stroke
    mUpdateProg->uniform("uBrushLevel", (float)mBrushLevel);
    mUpdateProg->uniform("uBrushFeather", (float)mBrushFeather);
    mUpdateProg->uniform("uBrushFeatherProfile", (int)mBrushFeatherProfile);
    mUpdateProg->uniform("uLevelValue", mLevelValue);
    mUpdateProg->uniform("uDepthRenderMode", mDepthRenderMode);
    mUpdateProg->uniform("uDepthRenderModeMinVertexDepthZ", mDepthRenderModeMinVertexDepthZ);
    mUpdateProg->uniform("uDepthRenderModeMaxVertexDepthZ", mDepthRenderModeMaxVertexDepthZ);
    mUpdateProg->uniform("uFlattenTargetValue", mSelectedVertexStats.avg);

Cheers!