could someone remind me again why we can't set the integer value of a sampler directly in the shader, in the same way we can specify the generic vertex attribute slot of an input, and the uniform...
Type: Posts; User: peterfilm
could someone remind me again why we can't set the integer value of a sampler directly in the shader, in the same way we can specify the generic vertex attribute slot of an input, and the uniform...
i take some of that back, from testing it would seem this assertion is only true on the fermi architecture - on older cards it's no faster at all.
also Dark Photon, sorry but the conclusion I've come to is contradictory to yours. Rather than instanced_arrays having only good performance on large batches, they do in fact give way better...
mmmyeah, i get what you're saying, but i was rather lazily referencing this 2004 nvidia paper......
only....i get a judder every 1 second....ffs....only when using instanced path.
thanks for all the replies.
interesting!...while i'd tried all the buffer submission methods under the sun (buffer streaming/orphaning, round robin of n buffers with either map or glBufferSubData,...
by the way mhagain....moderately measurable???
scene: 3.5 million triangles with very few instanceable batches (but some)
instanced: 709 batches = 13ms
glloadmatrix: 1035 batches = 6ms
7ms...
yes i'm after cleaner code if the required extensions exist, but i also understood from discussions i've read that uniform changes cause a pipeline flush while attribute changes don't (hence nvidia...
has anyone any idea why uploading a single matrix (using a VBO) and then using glDrawElementsInstanced() to draw a batch would be slower than calling glLoadMatrixf() and then calling...
rolling back to 276.52 fixes it. I'm not happy at all with the current state of nvidias drivers.
i'm having this same problem and it doesn't seem to be fixed in the current 297.03 drivers for Quadro 4000/windows 7.
Oh my god mate I've tried everything. Push/pop server and client attribs, setting all client states to default including divisors, flush, finish, drawing quads outside view port, the list goes on....
As you can probably tell I'm rather stressed by this, invested a lot of time in this instance renderer, all hinges on gldrawelementsinstanced
The default profile too....a customer support nightmare. No problem on the 5 times less expensive geforce.
Not an issue of new and old, the immediate mode quad in my example was just for clarity, ironically. The problem's there whatever the non-instanced geometry submission path is (vao/vbo) except...
Thanks guys but its a genuine nvidia driver bug, stretching back to at least February last year. Basically a simple test case using glDrawElementsInstanced and then drawing an immediate mode quad...
is glVertexAttribDivisor() part of a vertex array objects state? can't seem to find confirmation in the spec, although it seems to be true on this nvidia driver.
I'm just checking because I'm...
yup, i know, this is a simple test - i found early on that it made no real difference to performance on the GPU but did on the CPU so I decided to leave it with true length on both implementations to...
#version 420 core
#ifdef GL_VERTEX_SHADER
in vec4 attrib_row1; // xyz=axisX, w=translationX
in vec4 attrib_row2; // xyz=axisY, w=translationY
in vec4 attrib_row3; // xyz=axisZ,...
well i asked for the limits on the quadro 4000, and got:-
GL_MAX_VERTEX_ATOMIC_COUNTERS: 16384
GL_MAX_GEOMETRY_ATOMIC_COUNTERS: 16384
GL_MAX_FRAGMENT_ATOMIC_COUNTERS: 16384
so i tried it, using...
Yes that's what I was afraid of. The whole atomic counter stuff scared me, possible sync issues etc. And then aquen mentioned that you can only use atomic counters at fragment level......
But thanks...
no, i still issue a glDrawElementsInstanced() call for each lod once the queries return me the primCount for each lod.
I'm not using the indirect extension, which is what the original question was...
Intel Xeon Quad Core 2.66GHZ, 8GB ram, windows 7 64 bit. Quadro 4000 2GB ram driver 296.88.
Forgive me for the fps metric, was in a hurry.
1.2ms is for a relatively small number of instances...
I'd gladly do that, but aqnuep has already done a splendid job of writing this stuff up on his blog.
it's got diagrams and everything! ignore the hi-z business for now....
here's some numbers:-
instances:-
26781
CPU culling/lod selection, with glMapBufferRange to pass results to GPU:-
590fps
GPU culling/lod selection, with vertex/geometry shader and...