Shaders if statements for loops alternatives

Still wrapping my head around parallel processing and optimizing some code… from what I understand, it’s generally good practice to avoid conditionals and branching in shaders. No “if” statements and “for” loops whenever possible…
My question is this. Most fragment shader examples do something like

vec4 color = drawA( );
if (color.w > 0. ) color = drawB();

assuming color.w will only be 0 or 1, would it be more efficient to do something like this?

vec4 color = drawA();
color =  color.w * drawB();

the first method seems like it wouldn’t be calling drawB() as much but it uses an if statement. The second one avoids the if statement, but it seems like it would have to compute more pixels and then erases them when multiplying by zero.

this leads to the next question. lets say instead of

drawAll( vec2 uv , int num){
   for(int i = 0; i<3; i++){
      if( i == num) color = drawA(uv);
      if( i == num) color = drawB(uv);
      if( i == num) color = drawC(uv);

what if I just offset the UV by the input num and have it draw out of frame unless the num matches the offset?

vec3 drawB(vec2 uv){
   uv.y -= 1.;  //-offset uv to match num
   return vec3( vec3(0.5) ,1.);
vec3 drawC(vec2 uv){
   uv.y -= 2.;  //-offset uv to match num
   return vec3(1.0);

drawAll( vec2 uv , int num){
   uv.y += num; //offset uv to match draw offset
   vec4 color = drawUVMask();
   color.rgb += drawA(uv) * color.w;  //mult by color.w or use an if statement?
   color.rgb +=  drawB(uv)* color.w;
   color.rgb += drawC(uv)* color.w;

this method would have to mask the draw functions somehow using one of the first two methods mentioned. I used the multiplication example just to show how these questions came about. Am I just worrying about conditionals too much?

hi Malfunkn

Try it!


Your first example should probably be vec4 color = mix( drawA(), drawB(), color.w );.

Shaders behave differently compared to programs run on a CPU, which is why sometimes their performance is counter intuitive. Conditionals can really devastate execution speed. The reason for this is that shader programs are run in parallel on several fragments (pixels) at the same time. Not only that, but they also run in lock-step.

When a program is executed, a program counter keeps track of where in the code we are. On a CPU, each thread has its own program counter and so the code can be run independently of other threads. On a GPU, however, groups of fragments (usually 4) share the same program counter. So when the shader code is executed, all the group’s fragments have to execute the same code at the same time. If one fragment wants to execute drawA() and the other fragments drawB(), both are executed in sequence and the undesired result is simply thrown away (very similar to the mix statement in your example). So if each side of the if statement takes the same amount of time, performance will be twice as slow and the if-statement didn’t help at all.

For-loops require a jump-instruction, resetting the program counter. This is potentially an expensive operation (probably due to caching). So compilers usually unroll the loop and simply copy the inner code a few times. In HLSL you can even supply hints for the compiler.

The only way to know if changes to your code will result in better performance is to test it.