glDrawArraysInstanced max number of instances?

Hi,

I am using glDrawArraysInstanced on OSX Lion with the opengl core 3.2 profile, which works very great. I draw a lot of circles I basically use ARB_instanced_arrays glVertexAttribDivisorARB to pass a vec4 to the shader for each instance of the circle where xyz is the position and w the radius of the circle. I get a very good frame rate for up to 37364 instances, but as soon as I draw one more, it seems to fall back to software mode and about 4fps. If I batch them and call glDrawArraysInstanced twice, things work again. Is there any way to determine the maximum number of instances that can be drawn with one draw call of glDrawArraysInstanced so that the circles can be batched automatically? Here is some code:

Vertex Shader


#version 150
uniform mat4 projection;
uniform mat4 transform;
in vec4 positionAndRadius;
in vec4 vertex;


void main()
{

    vec4 p = vec4(vertex.xyz * positionAndRadius.w, 1.0);
    p += vec4(positionAndRadius.xyz, 0.0);
    gl_Position = projection * transform * p;
}

this is basically how I setup the instanced vertex attribute pointer

glVertexAttribPointer(loc, 4, GL_FLOAT, GL_FALSE, 4 * sizeof(float), (char *)0);
glEnableVertexAttribArray(loc);
glVertexAttribDivisorARB(loc, 1);

It displays correctly, so I am pretty sure that I am hitting a hardware limitation or something. Is there any way to determine the maximum instances that glDrawArraysInstanced can handle? If not, is there a rule of thumb that I could hardcode?

Thank you!

I’m curious to know what hardware you’re using that’s getting this effect.

But as to your actual question, there’s no query for a recommended instance count. And I haven’t heard of any rules of thumb beyond how many objects you should be rendering (minimum of a thousand or so) or how many vertices you should be rendering per instance (minimum of ~100 or so, and probably not more than ~1000).

I’m also curious about some further experimentation. What happens if your instance data is smaller. Can you squeeze your positions into GL_SHORTs of some kind, just to see what the performance implication is? My guess (based on the oddball number of 37364 instances) is that the limit is may be based on the size of your per-vertex data.

Hi,
Thanks for the quick reply, I am on a 2010 MBP with an NVIDIA GeForce GT 330M 512 MB.

I sort of thought something similar as you, because before I used vec4’s I actually passed a whole 4x4 matrix for each circle, which had exactly the same limitation. Also, if I just upload the position and radius transformation data and don’t call glDrawArraysInstanced at all, things don’t fall back to software, so I am very certain that it is very closely related to the number of instances I pass to glDrawArraysInstanced. This is very odd…Could it be related to any other maximum value related to VBO’s that I am currently not seeing?

EDIT: I also just tried short, with exactly the same results

It gets even more bizarre, for every instance count between 37364 and 37405, glDrawArraysInstanced just hangs at this point:

#0 0x00007fff8c50b67a in mach_msg_trap ()
#9 0x0000000101ea3208 in glDrawArraysInstanced_GL3Exec ()

as soon as I go bigger than 37405 everything draws again, but it seems to be software fallback (4fps), so there are 3 stages, instance counts below and including 37364 work just as expected, over 37364 and up to 37405 does seem to break the glDrawArraysInstanced implementation on osx entirely and everything above that seems to run in software. This smells like a bug to me. Any other ideas?

Thanks!