So about the Cg Support? Is there any? I saw a post on this earlier but it didn't make sense.
So about the Cg Support? Is there any? I saw a post on this earlier but it didn't make sense.
Code :// enable vertex address use EnableClientState(VERTEX_ATTRIB_ARRAY_UNIFIED_NV);
EnableClientState has been deprecated by OpenGL 3.0 and removed from OpenGL 3.1.
If I use VertexAttribFormatNV and VertexAttribIFormatNV instead of VertexFormatNV, I think this code isn't necessary. Right?
Is anyone from nVidia still reading this topic?Originally Posted by Overmind
I would really like to know the answer to my question...
And I'm wondering if we could put all vertex attributes in an uniform buffer, instead of assigning them by VertexAttrib*?
Code :uniform mat4 g_mat4_modelViewProjection; struct VERTEX { vec2 texcoord; vec4 position; }; uniform VERTEX *vertex; out vec2 s_vec2_texcoord; void main() { s_vec2_texcoord = vertex[gl_VertexID].texcoord; gl_Position = g_mat4_modelViewProjection * vertex[gl_VertexID].position; }
1) What kind of memory may a pointer to UBO point to? Is it constant memory, or global memory, or both?
2) If it's global memory, how can I be one-hundred percent sure that memory reads are coalesced? (this term is from CUDA)
(usually just hobbyist) OpenGL driver developer
About bindless rendering: I am unable to get any of the speed-ups mentioned.
I've made a simple program that does thousands of draw calls per frame, using normal VBO and bindless extension, and I am unable to get a speed-up using this extension. I've tried:
andCode :- render loop - change material A - render submesh A 1000 times - change material B - render submesh B 1000 times ...
I've tried to render a few different meshes or only one a zillion times. I've tried with large meshes and small, simple meshes...Code :- render loop - render 1000 times: - change material A - render submesh A - change material B - render submesh B ...
In all of the cases there was no performance difference at all... seems the performance gain stated by NVidia is overrated enormously. Or I'm doing something wrong. Or something is wrong with my hardware (8600GT). Does anyone have a simple GL example program that shows an actual, impressive speed-up?
Of course, this API allows doing things (somewhat) more conveniently, and using complex data structures in shaders, but the promised speed-up with simple draw calls is nowhere to be found
With "a few different meshes or only one" you effectively stay in L1. And the whole thing is about so many VBOs, that lookups went out of L2.
Identify your current bottleneckbut the promised speed-up with simple draw calls is nowhere to be found
I assume this doesnt help fillrate at all (perhaps slightly)
thus try rendering to a small window
eg 320x240
Thanks; I'll try with many different meshes, just fill up GPU memory a bitI have some other theories though:
- My graphics card is too slow. CPU has no problem keeping it occupied, even with slow draw calls. Maybe I should try on another card.
- I only looked at the rendering time (FPS). I did not look at CPU usage % or profiled the draw calls. Maybe I should do that instead of look at rendering performance.
The thing is, I'd like to use this extension in an existing rendering engine (Ogre3D), but I first want to see it actually gain something before I bother with the details.
On another note, I really like this new interface. It finally makes it possible to work with plain pointers on the GPU, a la CUDA.
Speaking about CUDA, it would be great if you could share a memory space with a CUDA program and just swap pointers between GL and backIt would make interoperability super cheap. Or is this already possible?