PDA

View Full Version : Fences



mikeman
05-03-2009, 06:26 AM
Hello. I am currently making a renderer that utilizes both Direct3D and OpenGL. I came across the GL_NV_Fence extension, which seems interesting and could boost performance. My questions are:

1)Why doesn't ATI have an extension for it? It just didn't bother, or the hardware doesn't support it?
2)Does anyone know if Direct3D(9/10) has anything equivalent to nv_fence?

Thanks.

dletozeun
05-03-2009, 06:44 AM
Maybe because it does not boost performances all that much. Could you explain what performance gain would you get with this extension because as far as I understand it is not the purpose.

If you want to explain to us what you have in mind using this extension, we might help you to find out a more portable solution.

mikeman
05-03-2009, 06:49 AM
Well, I've just heard, from someone pretty knowledgeable, that using nv_vertex_array_range and nv_fence is actually faster than using VBO, and I guess I was wondering if that was actually true...

wizard
05-03-2009, 06:54 AM
Say for instance that you have a PBO packing operation going on and don't want to map before the operation has executed. A fence gives you the ability to do just that. The general idea is to minimize CPU stalls.

dletozeun
05-03-2009, 07:11 AM
using nv_vertex_array_range and nv_fence is actually faster than using VBO


It does not make sense to me.



Say for instance that you have a PBO packing operation going on and don't want to map before the operation has executed. A fence gives you the ability to do just that. The general idea is to minimize CPU stalls.


That makes sense. :) But this can be done through multithreading.

mikeman
05-03-2009, 07:42 AM
>>It does not make sense to me.

Hm...then is it somehow possible that using nv_vertex_array_range instead of vertex_buffer_object_arb somehow gives you a boost performance? Because that's what I'be been told.

zeoverlord
05-03-2009, 10:25 AM
maybe compared to the immediate mode and old vertex arrays, but not VBOs, they where designed to do away with the bottlenecks that nv_vertex_array_range tried to fix, thus they are way faster.
Technically speaking nv_fence will never speed things up unless your doing something that requires a gl_finish

Brolingstanz
05-03-2009, 02:05 PM
>> Does anyone know if Direct3D(9/10) has anything equivalent to nv_fence?

Not really. There's an event query to test wholesale completion of command buffer processing, and there's a flag to Map() in d3d10 to avoid mapping a buffer with pending rendering operations (kinda nice).

As things become massively parallel going forward I'm mot sure how the big picture will develop in this area or how useful any of this will be 1 or 2 hw generations from now. If say we get DLs back as specialized command buffers, for example, how would fences figure into that yard? Anyhoo I'm too busy contemplating the fullness of bindless graphics to pursue this further...

Ysaneya
05-03-2009, 03:12 PM
Well, I've just heard, from someone pretty knowledgeable, that using nv_vertex_array_range and nv_fence is actually faster than using VBO, and I guess I was wondering if that was actually true...

Yeah that was true... back in 2003 or so when the VBO extension was introduced. Using nv_vertex_array_range today is asking for trouble. It's dinosaur-programming.

Y.

wizard
05-03-2009, 03:52 PM
using nv_vertex_array_range and nv_fence is actually faster than using VBO


It does not make sense to me.



Say for instance that you have a PBO packing operation going on and don't want to map before the operation has executed. A fence gives you the ability to do just that. The general idea is to minimize CPU stalls.


That makes sense. :) But this can be done through multithreading.

Multithreading yes, but you'd be using two contexts just for being able to issue the blocking Map on the other thread. It's usually better to have just one thread doing the GL work.

qzm
05-03-2009, 10:57 PM
"Technically speaking nv_fence will never speed things up unless your doing something that requires a gl_finish "

Of course not, of course also if you NEED to do gl_finish, or any of the associated functionality that implicitly does that, without blocking the CPU, and without Fences, you are stuffed.

THIS is why it improves things a lot for some applications.

wizard
05-04-2009, 02:14 AM
"Technically speaking nv_fence will never speed things up unless your doing something that requires a gl_finish "

Of course not, of course also if you NEED to do gl_finish, or any of the associated functionality that implicitly does that, without blocking the CPU, and without Fences, you are stuffed.

THIS is why it improves things a lot for some applications.


We implemented our video capturing system by letting one thread issue requests for PBO packs from a round robin queue of framebuffers and then read the data from the PBO into the video writer. Because we have a single thread doing the OGL stuff not using fences gives huge performance penalties due to the blocking map operation. A fence gives us the option of letting the other thread just poll for the situation and never block the one and only GL thread.

Jackis
05-04-2009, 03:15 AM
Well, actually, 2 years ago on NV40 hardware in some cases old VAR with fences was 5-15% faster, then VBO, I can confirm this!
But that was the case for only limited number of situations, and the code supporting VAr was like a horrible monster, so we removed it without any regrets and fully switched to VBO.
Don't use VAR, please! :)

dletozeun
05-04-2009, 03:42 AM
Multithreading yes, but you'd be using two contexts just for being able to issue the blocking Map on the other thread. It's usually better to have just one thread doing the GL work.


Not necessarily. You set only one context, map the buffer in this one and give the queried address to the second thread which will update the buffer data and notify the main thread when its done.

wizard
05-04-2009, 06:44 AM
Multithreading yes, but you'd be using two contexts just for being able to issue the blocking Map on the other thread. It's usually better to have just one thread doing the GL work.


Not necessarily. You set only one context, map the buffer in this one and give the queried address to the second thread which will update the buffer data and notify the main thread when its done.

We might be talking about different things here. You seem to be talking about updating a buffer and I'm talking about reading the contents of a GPU filled buffer (for instance through read pixels).

dletozeun
05-04-2009, 08:16 AM
Yes your are right. Though, for packing operations like glReadPixels, the main interest of pbo is asynchronous reads/writes thus you do not need glFinish calls or fences since glReadPixels will return immediatly. But I must admit that when you want to map this buffer then, you are stucked. :)

wizard
05-04-2009, 08:51 AM
Yes your are right. Though, for packing operations like glReadPixels, the main interest of pbo is asynchronous reads/writes thus you do not need glFinish calls or fences since glReadPixels will return immediatly. But I must admit that when you want to map this buffer then, you are stucked. :)

Yeah. The problem here is that read pixels is asynchronous when you have a pack buffer bound and that a fence can basically tell you when all the framebuffer modifying operations have finished and you can expect data in the PBO soon.