PDA

View Full Version : Bindless graphics with OpenGL 3.x



Vexator
09-06-2010, 02:05 PM
Hi! I read the previous threads on bindless graphics as well as NV's tutorial and presentation but I still can't get it to work. glGetBufferParameterui64vNV() does return valid addresses but my app freezes as soon as i attempt to render a resident buffer. here's a comparison between the code i used before and what it looks like now. Maybe you can spot the mistake. Thank you!

classic:

// setup index buffer
glGenBuffers( 1, &ib->id );
glBindBuffer( GL_ELEMENT_ARRAY_BUFFER, ib->id );
glBufferData( GL_ELEMENT_ARRAY_BUFFER, ib->size, &ib->indices[0], GL_STATIC_DRAW );

// setup vertex buffer
glGenBuffers( 1, &vb->id );
glBindBuffer( GL_ARRAY_BUFFER, vb->id );
glBufferData( GL_ARRAY_BUFFER, vb->size, &vb->vertices[0], GL_STATIC_DRAW );

// rendering
glBindBuffer( GL_ARRAY_BUFFER, vb->id );
glBindBuffer( GL_ELEMENT_ARRAY_BUFFER, ib->id );

glEnableVertexAttribArray( 0 );
glVertexAttribPointer( 0, 3, GL_FLOAT, false, sizeof(Vertex), (void *)0 );

glDrawArrays( vb->topology, 0, vertexCount );


bindless:

// setup index buffer
glGenBuffers( 1, &ib->id );
glBindBuffer( GL_ELEMENT_ARRAY_BUFFER, ib->id );
glBufferData( GL_ELEMENT_ARRAY_BUFFER, ib->size, &ib->indices[0], GL_STATIC_DRAW );
glGetBufferParameterui64vNV( GL_ELEMENT_ARRAY_BUFFER, GL_BUFFER_GPU_ADDRESS_NV, &ib->address );
glMakeBufferResidentNV( GL_ELEMENT_ARRAY_BUFFER, GL_READ_ONLY );

// setup vertex buffer
glGenBuffers( 1, &vb->id );
glBindBuffer( GL_ARRAY_BUFFER, vb->id );
glBufferData( GL_ARRAY_BUFFER, vb->size, &vb->vertices[0], GL_STATIC_DRAW );
glGetBufferParameterui64vNV( GL_ARRAY_BUFFER, GL_BUFFER_GPU_ADDRESS_NV, &vb->address );
glMakeBufferResidentNV( GL_ARRAY_BUFFER, GL_READ_ONLY );

// rendering
glEnableClientState( GL_ELEMENT_ARRAY_UNIFIED_NV );
glBufferAddressRangeNV( GL_ELEMENT_ARRAY_ADDRESS_NV, 0, ib->address, ib->indexCount*sizeof(uint) );

glEnableClientState( GL_VERTEX_ATTRIB_ARRAY_UNIFIED_NV );
glVertexAttribFormatNV( 0, 3, GL_FLOAT, false, sizeof(Vertex) );
glBufferAddressRangeNV( GL_VERTEX_ATTRIB_ARRAY_ADDRESS_NV, 0, vb->address, vb->indexCount*sizeof(Vertex) );

glDrawArrays( vb->topology, 0, vertexCount );

glDisableClientState( GL_ELEMENT_ARRAY_UNIFIED_NV );
glDisableClientState( GL_VERTEX_ATTRIB_ARRAY_UNIFIED_NV );

abolz
09-07-2010, 12:58 AM
Hi. I think you still need to call glEnableVertexAttribArray(0) and should use glDraw[Range]Elements to source your indices from the currently enabled ELEMENT_ARRAY_BUFFER.

Dark Photon
09-07-2010, 05:33 AM
Yes. It doesn't seem to make sense why your code is creating/filling/enabling the Draw*Elements index list buffer (GL_ELEMENT_ARRAY_BUFFER / GL_ELEMENT_ARRAY_UNIFIED_NV) when you are drawing with DrawArrays not Draw*Elements. Those should be disabled for this draw call.

And abolz is right too about still needing to enable the vtx attrib array. I missed seeing that one.

Beyond that your basic approach looks good. If abolz tips don't get you going, my guess is you may have a math error with the ptr start/size values you're providing to the driver. For instance, I'm suspect of sizeof(Vertex), vb->indexCount*sizeof(Vertex), and vertexCount. Does sizeof(Vertex) == 12? And does vb->indexCount == vertexCount?

As always, I fall back to a really simple case when I have trouble and scale it up. Might try drawing one triangle from hard-coded data first and scale up from there. Failing that, post a short GLUT test pgm.

Dark Photon
09-07-2010, 05:45 AM
BTW, while we're back talking about Bindless and batch perf++, one really cool thing you can do with Bindless (not rocket science) is combine it with the "Streaming VBO" approach Rob Barris describes here (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=273141#Post2731 41). But allow reuse (in other words, take advantage of temporal coherence in your scene from frame-to-frame). Then subsequent reuse batch dispatches launch with NVidia display list performance.

This is a easy way to convert formerly client arrays code into display list perf, but without having to allocate dedicated VBO memory on the GPU for each batch (and all the memory allocation/fragmentation mess that entails).

Essentially just uses the Streaming VBO as an L1 cache on the GPU, with bindless giving you the "crazy-fast launch" when the batch is already on the GPU.

For more details on the technique Rob described, see:
* VBOs strangely slow? (OpenGL.org thread, 2/23/10) (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=273141#Post2731 41)
* Buffer Object Streaming (OpenGL.org Wiki) (http://www.opengl.org/wiki/Buffer_Object_Streaming)

Vexator
09-07-2010, 11:24 AM
Sorry i pasted the wrong line; i'm actually using glDrawElements(). Thanks for the input you two, I'll have a look!