Re: Official Bindless Graphics feedback thread
Quote:
Originally Posted by laanwj
Speaking about CUDA, it would be great if you could share a memory space with a CUDA program and just swap pointers between GL and back :) It would make interoperability super cheap. Or is this already possible?
I doubt it. GL_NV_shader_buffer_load says:
18) Is the address space per-context, per-share-group, or global?
RESOLVED: It is per-share-group. Using addresses from one share group in another share group will cause undefined results.
So it the pointer could be shared across shared contexts (but in that case you could already share OpenGL buffer IDs with similar same effect); however, as far as I know, CUDA does not actually share a context with OpenGL (meaning they can use a different virtual address space, resulting in pointers not pointing to the same things (?) ).
I also hope(d) this was possible. For some reason the OpenGL interoperability in CUDA is very slow (Has this been fixed/addressed? I haven't check since CUDA 2.0, I believe). Judging from the performance, it appeared these buffers are actually copied to a different address space; it would be nice to be able to directly share global device memory between CUDA/OpenCL and OpenGL through pointers, without any copying.
Re: Official Bindless Graphics feedback thread
>EnableClientState(VERTEX_ATTRIB_ARRAY_UNIFIED_NV) ;
>EnableClientState has been deprecated by OpenGL 3.0 and removed from OpenGL 3.1.
>If I use VertexAttribFormatNV and VertexAttribIFormatNV instead of VertexFormatNV, I think this code isn't necessary. Right?
I think there's some confusion here. VERTEX_ATTRIB_ARRAY_UNIFIED_NV is the "on switch" for using gpu addresses for any vertex attributes, so it's still necessary. VertexAttribFormatNV sets the same format state as VertexAttribPointer, it doesn't do anything else implicitly.
Regarding deprecation, this sounds like a missing interaction. It would make sense for Enable to accept the new tokens, similar to how primitive restart was incorporated into GL3.1. But I don't think it works that way today.
> And I'm wondering if we could put all vertex attributes in an uniform buffer, instead of assigning them by VertexAttrib*?
Based on your code snippet, I think you're asking if you can put the pointer to an interleaved vertex buffer in a uniform. Yes, that should work.
Re: Official Bindless Graphics feedback thread
Quote:
Originally Posted by jeffb
> And I'm wondering if we could put all vertex attributes in an uniform buffer, instead of assigning them by VertexAttrib*?
Based on your code snippet, I think you're asking if you can put the pointer to an interleaved vertex buffer in a uniform. Yes, that should work.
The bindless API does look quite nice, but it seems you've missed a trick here. There's no need for the client to call all these methods to setup the layout of the buffer. LangFox touches on this point.
The simplest process is:
1. make a buffer and upload your vertex information
2. get a pointer to this buffer and update the vertex program (see LangFox's program above)
3. Draw n vertices and let the vertex program deal with retrieving the data.
All I can see is missing is a DrawVertices(type, count) method and a DrawInstancedVertices(type, count, instances) method.
It seems you've only considered how would direct access to GPU memory help vertex arrays, but when you have direct access, you don't need vertex arrays.
Regards
elFarto
Re: Official Bindless Graphics feedback thread
To help ensure that bindless never dies and hopefully gets merged into the core in some form... ;)
Just wanted to follow up here and mention that merely by using NV_vertex_buffer_unified_memory, I've seen as much as 40% reductions in draw time (about half of that is through wiring the vertex attribs and index arrays directly to GPU memory rather than fetching through bound VBO handles, and the other half is getting rid of the now-needless VBO buffer binds). This is on slow CPUs and fast ones with twice the cache, and this is on real-world database content, not contrived.
Others have seen ~50% draw time reductions (2X speedup).
So in case it's not already "in the cards", please do add bindless->ARB/core to the list and/or bump it up in priority. It's free performance for the vendors, more complex content we can put in front of users (good for the us and the GPU vendors), and increased demand for the latest graphics cards. Not to mention it increases the appeal of developing in OpenGL.