PDA

View Full Version : full screen quads : Instancing or GS ?

_blitz
02-11-2011, 08:38 AM
Hi,

If I want to sum cells from different grids on the GPU (enable alpha blending, disable depth test, and draw full screen quads), two solutions come to my mind to send the quad vertices : either use instancing or duplicate the quad vertices in a Geometry Shader... Is one of the techniques better(=faster) than the other ?
In my case, I need 5 quads at most.
Thanks,

B

ZbuffeR
02-11-2011, 09:06 AM
5 quads is very small, no use for a complex method, your bottleneck will probably be on per pixel raster operations.

Do you need to change coordinates often, can you reuse same geometry for several quads, etc ?

_blitz
02-11-2011, 09:19 AM
You're absolutely right, just have to use an index buffer with the quad vertices... So simple I hadn't even thought about it ><

Thanks !

Dark Photon
02-11-2011, 07:25 PM
You don't even need an index array. Can just use glDrawArrays( GL_QUADS, ...). As ZbuffeR implied, the efficiency of the batch verts is likely irrelevant when you're eating so much GPU/fill with each quad.

_blitz
02-12-2011, 11:26 AM
A quick calculation :

Solution 1 (vbo only)
5 Full screen quad = 2 triangles so :
5(quads) * 2(triangles) * 3(vertices) * sizeof(vertex)
My vertices are defined by a 2d position with floats so sizeof(vertex) = 8bytes
and finally sizeof(vbo) = 240bytes

Solution 2(vbo + ibo)
vbo size = sizeof(quad) = 32 bytes = (4(vtx) * 2(dimension) * 4(sizeof(float))
ibo size = 6(indexes_per_quad_with_gl_triangles) * 5(quad_count) * sizeof(ushort) = 60 bytes
So in total i'd need 32 + 60 = 92bytes

So with solution 2 I have less memory consumption, and cache efficiency.

Now I know that the vertex shader will definitely not be the bottleneck for my batch, but still, I prefer solution 2 ^^

ZbuffeR
02-12-2011, 11:37 AM
It would be great if you could post benchmark results comparing the two methods.

Alfonse Reinheart
02-12-2011, 11:53 AM
So with solution 2 I have less memory consumption, and cache efficiency.

The time it took you to type even this sentence into the computer, let alone the rest of your post, is not worth the time you "saved". You could be rendering with immediate mode, using double-precision attributes, and it still wouldn't make a bit of difference as far as performance.

You have put far more thought into this subject than is warranted. That's why the 80/20 rule exists, and that's why you should always benchmark before you optimize.

_blitz
02-15-2011, 12:52 AM
You could be rendering with immediate mode, using double-precision attributes, and it still wouldn't make a bit of difference as far as performance.

It would be great if you could post benchmark results comparing the two methods.

After benchmarking (ubuntu 10.10 + Radeon 5650 @*catalyst11.11), turns out there's no difference between VBO, VBO & IBO, or even immediate mode for 5 quads. Not really surprising though, given the few amount of vertices.

ZbuffeR
02-15-2011, 01:21 AM
Even better when you learn it by yourself, right ? :)

V-man
02-15-2011, 05:38 AM
I once spent 4 months trying to figure out what was the fastest way to shutdown my program : destroy the context or make the context non current or doing a process kill or pulling the plug from the computer or cutting the power lines to the entire building. It cost 82 million dollars to find the best solution and now I forgot what it was.

BionicBytes
02-15-2011, 07:08 AM
@V-man; you're too funny man! LOL.

Dark Photon
02-15-2011, 07:36 PM
Thanks man, I needed that laugh. :D