drawSolidRect raises GL_INVALID_OPERATION intermittently - threading issue


#1

Hello folks, merry Christmas!

I’ve been developing a simple Cinder app which mostly renders post-processing fragment shaders with multiple passes. Part of the workflow is hot reloading the shader every time I edit the file so I can iterate without restarting the app. I noticed that fairly intermittently the rendering would freeze when I edit a shader. After closer investigation, it seems that it’s a GL_INVALID_OPERATION error which happens after I call drawSolidRect(). With the multiple passes I am doing quite a lot of texture / FBO binding and unbinding, but not sure why this error is so intermittent.

Any idea what is happening? Is there a way to get better debug information? What I did is basically sprinkle getError() calls through my rendering code to see what call exactly was creating an issue and this is how I got to drawSolidRect().

Thanks!


#2

Adding a little more info just to make it easier to get help. Here is the method with the problematic call to drawSolidRect() line 25: https://gist.github.com/couleurs/60706c752c0d7cdbe8b1e839f2bca5d5

Thanks!


#3

If you’re on windows you can create a debug context that will call you back or break when you do something iffy. You set this up in your renderer options, like so:

CINDER_APP( YourApp, RendererGl ( RendererGl::Options().debug().debugLog().debugBreak() ) )


#4

Interesting, thanks! Unfortunately I’m on Mac, no equivalent there?


#5

You can try running with the OpenGL Profiler attached and click “break on errors”. However the debugger is a piece of shit and rarely works properly. In fact I have to connect to my own machine as a remote debugging session to make it attach at all, but that’s the state of GL on mac. In other words, good luck. :wink:


#6

The only reliable way I’ve been able to trace back gl errors on Mac is to sprinkle CI_CHECK_GL() statements after every single gl call in your project, until you get back to the culprit. Sometimes this takes adding them directly to libcinder gl code too… yep it’s a pain and a shame apple didn’t ever bother implementing the debug break features before ceasing OpenGL development. Lack of critical OpenGL features is the main reason why many of us stopped developing on OS X.


#7

Thanks for the tips folks!

I did what you suggested and it seems the culprit is ctx->drawArrays( GL_TRIANGLE_STRIP, 0, 4 ) in drawSolidRect() in draw.cpp. Any idea why this would raise a GL_INVALID_OPERATION but only on a very very intermittent basis? Maybe a race condition somewhere?


#8

Can it be the hot reloading? Some threading issue maybe?


#9

Yeah that’s definitely a possibility since the issue happens very intermittently but always right after a hot reload. But as far as I know I’m not creating any thread myself manually. Any pointer on how to debug threading issues?


#11

I guess one thing that could be happening is that there is a bad race condition where in some cases the render thread attempts to draw things while the thread where the hot reloading happens in still recompiling the new shader or loading the new textures. Is there a way to pause the render thread until the hot reloading is done for sure? That would obviously create a moment of freeze but I’d be ok with that if it can ensure I avoid this race condition. One option would be to use a mutex and lock the render thread until the hot reload finishes, but it doesn’t sound like a great idea to block the render thread like that as that could cause other problems?

Thanks for your help!


#12

You can try dispatchAsync that executes your function before the next update on the main thread.


#13

I see. You mean that dispatchAsync guarantees that the function will finish executing before the next update is called? Both update and draw render on the same thread, one after the other?


#14

I am actually a little confused after reading this: https://forum.libcinder.org/topic/separation-between-update-and-draw-cycles

It seems update and draw run sequentially on the main thread, so I’m not sure what dispatchAsync really does. In my case, the logic for the hot reloading happens as part of update, so everything should be ready to go by the time draw is called next which should eliminate the risk of race conditions? I’m clearly missing something. Thanks for your help!


#15

dispatchAsync lets you push some execution to the main thread. One of the things cinder’s update cycle does is poll an io_service to see if there’s any jobs to run. Since OpenGL is inherently single threaded (for all intents and purposes), this is an easy way to have a monitoring thread look for changes, and then push the actual loading back to the main thread to appease the GL gods (may they burn in hell).

e.g: (written inline, not tested)

void YourApp::yourMonitoringThreadsCallbackFunc ( )
{
	while ( true )
	{
		assert ( !app::isMainThread() ); // We're on a different thread
		if ( someVertexShaderFileChanged || someFragmentShaderFileChanged  )
		{
			dispatchAsync ( [=]
			{
				assert ( app::isMainThread() ); // We're on the main thread, so GL won't bitch
				yourShader = gl::GlslProg::create ( loadAsset ( someVertexShaderFile ), loadAsset ( someFragmentShaderFile ) );
			});
		}
	}
}

#16

It seems from your comments that you are not using a separate thread for doing the hot-reloading part so not sure what is going wrong here since the update and draw functions are both called on the main thread which should have a valid GL context at all points.

In any case, doing file I/O is an ideal operation candidate for moving it in a separate thread and the path that @lithium.snepo described should be the way to go but consider using the built-in FileWatcher instead which does exactly that for you :slight_smile:


#17

Thanks @lithium.snepo @petros and @gabor_papp! I understand what dispatchAsync can be useful for now. Not quite sure what’s going wrong for me as I’m not doing anything off the main thread afaik, but will definitely look into FileWatcher. I was aware of Watchdog by @Simon but didn’t realize there was a built-in solution now.


#18

Could you share how you check for the shader changes? Or maybe if you can reproduce the crash in a minimal example and share it we can help you more find the the cause of the crash.


#19

Thanks for your offer @gabor_papp! To check for shader changes I’m following the strategy described by @paul.houx in this post: https://forum.libcinder.org/topic/glsl-live-coding-in-cinder-0-9 Basically leveraging ci::fs::last_write_time(path) and using a File struct as described in the post.

The code is public so I can just link to it, the updateShaders function is where I check for shader chanfges. It is called from the main App update method: https://github.com/couleurs/nightsea-live/blob/485487e2ee390b7e6ba7e4a8991ce2a63e1b2661/src/CouleursApp.cpp#L444

Then, if a shader change is detected, the GlslProg is recreated, as well as some FBOs and Textures (I’m using this multipass shader syntax where a shader change can potentially mean that new FBOs and Textures are needed, which is why it’s not enough to just recreate the GlslProg). I’m assuming that all these operations are synchronous so since all this chain of events is kicked off from the update method which runs on the main thread, I’m expecting that everything will always be ready to go by the next time draw is called. But it does seem there is a race condition somewhere that I’m missing and that is causing this intermittent GL_INVALID_OPERATION. Let me know if you see something suspicious by looking at the code. If it’s too messy to look at I’ll go ahead and create a more minimal example.

In the meantime I’ll go ahead and try using FileWatcher instead of my custom code. Always happy to use the awesome Cinder code instead of my own whenever possible :slight_smile:


#20

Does it help if you call updateShaders from drawScene instead of update?


#21

Hey folks, sorry for the delay I took a little break with the new year celebrations :slight_smile: A couple updates:

  • @gabor_papp I can still repro the issue when updateShaders is called from drawScene

  • It seems using FileWatcher instead of my custom code fixes the issue. There must be some threading code in there that gets rid of the race condition. This is a double win because it also means less custom code and more Cinder code yay! Thanks for suggesting that @petros

Thanks for your help everyone! If anyone has an insight on why my original approach was causing this intermittent issue don’t hesitate.