Inconsistency openGL implementations

sharp · September 27, 2005, 11:28pm

One of major headaches to develop openGL applications is to deal with inconsistency between various openGL drivers among different vendors. Please give me some advice on how to solve some of my driver related issues.

I started to develop my openGL app on ATI card. I guess that I was spoiled by its openGL implementation and made some assumptions as basis of my application on sole experience working with ATI’s card. Then I move to test the app on NVidia card and found one of my assumptions (pixel ownership test) isn’t universally the same. Then further testing on Intel card, I found another early assumption of mine (scissor test) isn’t behaving the same. In the end, I have to fall back on Microsoft openGL implementation which is not-accelerated. Simply defeating the purpose of taking advatnge of hardware-acceleration offered by recent technology advance in graphics cards. Is this norm in openGL app development? In my experience, I found ATI and Microsoft openGL drivers are closely consistent. NVidia is less and Intel is worse. It’s very frustrating to see a solution developed on one openGL driver doesn’t work as expected on another driver.

The following are two problems that give me most headache and yet found reasonable solutions:

GL_BACK buffer becomes undefined. One of my early assumptions about openGL is to use GL_BACK frame buffer to serve as backing store for my window (multi-windows appl). This assumption works under ATI. For some reason, ATI never trashes GL_BACK frame buffer and always draw to the buffer even if it’s off-screen. As a result, I was able to use it as backing store for any expose events. Instead redraw expose area, I simply bitblt area by swapping buffers. However, this approach no longer works under NVidia and Intel openGL driver. For Nvidia card, GL_BACK frame buffer becomes undefined when a). it’s partially off-screen b). it’s partially obscured by other windows. For Intel card, GL_BACK buffer appears to become trashed once SwapBuffers() call is made.

So I end up developing alternative approach to my backing store need, namely, do my own backing store. To do that, I need a reliable buffer to draw to and to capture my backing store. I can’t really rely on GL_BACK buffer due to different implementations of pixel ownership test. My only choice left is to use PBuffer, The problem with PBuffer is that it’s non-rectangle, slow pixel transfer to other buffers unless using certain WGL extension which doesn’t exist on Intel card; simply put, it has too many intermediate steps. Therefore, defeating the purpose of backing store. All I need is a magic PFD_XXX flag that would make all openGL drivers behave like ATI’s, namely, always draw to GL_BACK buffer even if it’s obscured or off-screen and never trash it unless it’s told so.

2). scissor test isn’t working as I expected on Intel cards. I use scissor test to cut down redraw area and this technique works under ATI and NVidia cards. But it doesn’t behave correctly on Intel cards. When you perform scissor test on a given area, the area within gets cleared by glClear() call, what happens to the area outside scissor boundary? I expect it to remain unchanged, but on Intel card, it gets trashed. What are my options here?

I greatly appreciate anyone who can help.

Thanks

-Sharp

Tom_Nuydens · September 28, 2005, 3:04am

The OpenGL spec does not dictate what happens to the back buffer after a buffer swap, so from that point of view none of the implementations are wrong. What you could try, however, is to use WGL_ARB_pixel_format, which gives you to option of specifying the swap mode yourself.

Personally, though, I wouldn’t even dream of relying on the buffer swapping behaviour of a driver. There are other ways to store intermediate images. Pbuffers are just one of them – you could also use a texture rectangle, or look into the WGL_ARB_buffer_region extension.

This would indicate that Intel’s driver is in clear violation of the spec. Your options are to either work around this in your code, or to complain to Intel and hope that they fix their drivers.

sharp · September 28, 2005, 8:19am

Thanks, Tom.

I am a big fan of www.delphi3d.net . Very good site even though I don’t program in delphi3d, but I learned a lot of techniques from there.

You’re saying there is a way in WGL_ARB_pixel format extension that gives me options to dictate swap mode. Will various openGL drivers listen? Excellent. I’ll look into it.

WGL_ARB_buffer_region and texture rectangle techniques are good ways to back up intermediate images. I already use both of them in my app. The problem is that the original image has garbage in it due to failed pixel ownership test on Nvidia and Intel cards under the condition of overlapping or off-screen windows. PBuffer seems to be the only choice since it will always pass pixel ownership test, but WGL_ARB_buffer_region and texture rectangle technique wouldn’t work with PBuffer in transfering intermediate images into main buffers? would they?

Scissor test, how do I work around this problem? Is there alternative way in openGL to achieve the same result? like using stencil buffer?

knackered · September 28, 2005, 10:07pm

i thought glclear ignored the scissor region?

Relic · September 28, 2005, 10:28pm

This is one of my pet peeve questions.

Pixelownership is pretty well defined in the OpenGL spec. It says:
“The first test is to determine if the pixel at location (xw, yw) in the framebuffer
is currently owned by the GL (more precisely, by this GL context). If it is not, the window system decides the fate the incoming fragment. Possible results are that the fragment is discarded or that some subset of the subsequent per-fragment operations are applied to the fragment. This test allows the window system to control the GL’s behavior, for instance, when a GL window is obscured.”

A pixel is represented by all it’s buffer bits, means your assumption about the back buffer (or depth or stencil) to be unclipped or that there is any memory available to render to for overlapped or partly offscreen windows is simply wrong.
Reading from or writing to those areas results in undefined data.
This is especially true on OpenGL implementations which share offscreen buffers among all applications (workstation class boards do). You can’t overlap two OpenGL windows and expect both have unclipped buffers to render too.
You need to program repaint event handlers as you would do for single buffered (or simple GDI) applications. A newly exposed area must be redrawn into all the pixel’s buffers. How you do that correcly is your decision.

Selcting pixelformats with different swap behaviours in PIXELFORMATDESCRIPTOR dwFlags are PFD_SWAP_EXCHANGE and PFD_SWAP_COPY won’t change anything about the pixelownership test. It just means that if you select a PFD_SWP_COPY format it will never exchange the front and back buffers but blit backto front on SwapBuffers, nothing more. Echange formats define that the backbuffer is undefined after swap. You should never rely on back buffer contents after a swap.

The only way to render into a guaranteed unclipped buffer is to use p-buffers or Frame Buffer Objects (FBO).
PBuffers can be read from directly with the make current read extension:
http://oss.sgi.com/projects/ogl-sample/registry/ARB/wgl_make_current_read.txt

But I would recommend you use FBOs, that’s the better alternative and doesn’t need multiple contexts. For example you could directly render-to-texture with an FBO and redraw your screen with a textured quad.

knackered, glScissor is one of the few calls which affect glCear. If that’s not happening on Intel’s implementation, it’s grossly broken. Update the driver and file a bug to Intel if it persists.

sharp · September 28, 2005, 10:42pm

No. glClear() works with glScissor().

By the way, Intel openGL driver’s scissor test is fine. It behaves correctly, It was my mistake to think it’s the cuprit.

SwapBuffers() is my problem. Intel implements the cheapest swap method: PFD_SWAP_EXCHANGE. It swaps the content of back and front buffer after SwapBuffers() call.

Also I looked into WGL_ARB_pixel_format extension. they are just hints, might not be provided by a driver.

sharp · September 29, 2005, 1:11pm

Relic,

You are absolutely right about the spec on pixel owership test. But the spec sucks. It’s bad spec in my opinion. Thanks for ATI implementation, it just makes me application developer do less work because it takes care of pixel ownwership issue in a developer friendly way. It also takes care of swap behaviors.

Just think about it, how much code I have to write to support the backing store on my own (I already did) and how much additional video memory I have to consume, how many additional steps I have to perform to save/restore backing store where drivers guys would do them more easily and efficiently with the back buffer. Under ATI implementation, I don’t do any of them. For Nvidia and Intel cards, I have to add more code, use more resource, more bugs, and of course slower performance. Repeat this 100 times for 100 different companies. It all adds up.

knackered · September 29, 2005, 1:22pm

be fair, there’s only really 4 vendors - and you only need one workaround (pbuffers, or possibly fbo). And they would argue that for the vast majority of applications, their back buffer mechanism is the most efficient in terms of resources - think nvidia’s unified depth buffer on the quadros. You shouldn’t really rely on ATI retaining it’s current mechanism either - the next hardware revision may lead them to change it for some optimization, and they won’t be violating the spec. The only person violating the spec here is you, sharp.

Oh, I checked the spec, and yes you’re all correct - glclear clears the scissor region if enabled.

Korval · September 29, 2005, 3:15pm

But the spec sucks. It’s bad spec in my opinion. Thanks for ATI implementation, it just makes me application developer do less work because it takes care of pixel ownwership issue in a developer friendly way.
Why is it a bad spec? Because it doesn’t mesh perfectly with your needs?

The vast majority of users of OpenGL do not have your needs. They don’t need for the presence of the data to be guarenteed. More importantly, by leaving the behavior undefine, the ARB has allowed driver developers to make swapping as fast as their hardware can make it. It gives driver developers the flexibility to implement the fastest swap mechanism for their hardware, rather than slaving them to one person’s particular needs.

Humus · September 29, 2005, 7:39pm

Originally posted by sharp:
This assumption works under ATI.
Until you enable multisampling …

Korval · September 29, 2005, 9:22pm

Until you enable multisampling …
And now, yet another reason not to rely on undefined behavior…

sharp · September 30, 2005, 10:14am

I want to stress one thing here: inconsistent driver implementation on openGL spec among ICD vendors. Please don’t tell me this never bothers you in the past or even now.

When I start my openGL development, people warn me about driver issues, but I said it’s 10+ years after the birth of openGL. It shouldn’t be much issue by now. Well, guess what, I am running into the same hassles other people run into the past. Nevertheless, I like openGL and just wish it would be unifying over time. 3D graphics is the future of GUI (QT4, Windows Vista, etc).

I agree that pixel ownship and swaping behavior problem that I am complaining about is really none-issue for game development. However, for engineering/visualization multi-windows application, those issues are a big thing. I dig up a old GL extension: GL_Autodesk_valid_back_buffer_hint. I can’t find the spec, but from the name of this extension, I guess it’s about importance of valid content in the back_buffer.

I never own proffesional/workstaion type of graphics cards (FireGL, Quadros, etc). My issues are probably already taken care of on those type of cards since they tend to gear forward my type of need like overlay, backing store, sprite animation. but the point is that you can do the same stuff in consumer type of graphics cards now without those expensive cards.

On implementation side, I want to clarify a few things based on my experiement:

ATI and Intel seem to be on the same page in dealing with pixel ownership test, namely, off-screen or obscured window area rendering retain valid pixel data. Nvidia doesn’t.
Intel doesn’t do PFD_SWAP_COPY in the buffer swap method. It only does PFD_SWAP_EXCAHNGE. Both ATI and Nvidia do PFD_SWAP_COPY and retain valid back buffer’s content.
Both WGL_ARB_buffer_region and GL_texture rectangle technique work with PBuffer and therefore can be used in copying intermediate images from PBuffer into main buffers
WGL_SWAP_METHOD_ARB is just hint, I couldn’t make driver to obey them. The same thing is true with PFD_SWAP_METHOD.

Korval · September 30, 2005, 12:18pm

I want to stress one thing here: inconsistent driver implementation on openGL spec among ICD vendors. Please don’t tell me this never bothers you in the past or even now.
There’s a big difference between “the spec said X, but the driver developers weren’t paying attention,” and “the spec said undefined, and the driver developers are doing different things for the undefined behavior.”

Undefined means undefined. It doesn’t mean, “Well, do what I would like it to anyway.” It means that the drivers can do anything, including contradictory things, for that behavior. If ATi’s driver wants to flip-flop between swap methods (copy vs. swap), and at random intervals invalidiate unowned pixels (writing them with garbage), I don’t care as long as it doesn’t affect performance and OpenGL defined behavior.

While I’m all for drivers following the specs better, this particular example is not an example of behavior that needs to be “fixed”.

ATI and Intel seem to be on the same page in dealing with pixel ownership test, namely, off-screen or obscured window area rendering retain valid pixel data. Nvidia doesn’t.
I don’t think you’re quite understanding the whole concept of “undefined” behavior. It means that it is spec-legal for a driver to do what it wants to. More specifically, you should not rely on a particular implementation’s implementation of undefined behavior.

The proper way to code what you want to rely only on defined behavior (and therefore to be correct), is to only have one path. This path renders your stuff to an off-screen buffer, which you then copy to the back buffer. You should not be sitting around trying to figure out how each driver implements undefined behavior.

WGL_SWAP_METHOD_ARB is just hint, I couldn’t make driver to obey them. The same thing is true with PFD_SWAP_METHOD.
Even if these were manditory rather than hints, it doesn’t matter. The pixel ownership/undefined test is still relevant. The graphics card doesn’t have to actually render to unowned regions; I could imagine an implementation putting an implicit scissor box or something around unowned regions, so that nothing gets rendered into them. Such an implementation is perfectly valid. As such, forcing a card to use a copy buffer rather than a swap buffer doesn’t matter.

Humus · September 30, 2005, 1:41pm

Originally posted by sharp:
2. Intel doesn’t do PFD_SWAP_COPY in the buffer swap method. It only does PFD_SWAP_EXCAHNGE. Both ATI and Nvidia do PFD_SWAP_COPY and retain valid back buffer’s content.
The only swap behavior supported on ATI the last time I checked was WGL_SWAP_UNDEFINED_ARB. The driver will choose what swap behavior is the most appropriate given the situation. If you’re running in a window with no AA, then it will be copied. If you enable AA (which the end user may do behind the back of your app) you’ll get black. If you’re in fullscreen you’ll get the previous frame’s contents, or in case triple buffering is enabled in the control panel you get the contents of two frames ago.

This is by design in OpenGL. If you need a particular swap behavior, you’ll have to specify it when selecting a pixel format using WGL_ARB_pixel_format. But no implementation is forced to support any pixel formats with a particular swap behavior so you may end up without any matching pixel formats.

As for the pixel ownership test. Personally I would have preferred if this was speced to only apply to the front buffer and the backbuffer would be guaranteed to be updated. But unfortunately that’s not what the spec is saying so you can’t rely on that.

sqrt_1 · September 30, 2005, 5:03pm

Originally posted by Humus:
[quote]Originally posted by sharp:
2. Intel doesn’t do PFD_SWAP_COPY in the buffer swap method. It only does PFD_SWAP_EXCAHNGE. Both ATI and Nvidia do PFD_SWAP_COPY and retain valid back buffer’s content.
The only swap behavior supported on ATI the last time I checked was WGL_SWAP_UNDEFINED_ARB. The driver will choose what swap behavior is the most appropriate given the situation. If you’re running in a window with no AA, then it will be copied. If you enable AA (which the end user may do behind the back of your app) you’ll get black. If you’re in fullscreen you’ll get the previous frame’s contents, or in case triple buffering is enabled in the control panel you get the contents of two frames ago.
[/QUOTE]Sorry for the thread hijack, but what you said about fullscreen and AA enabled. I have a current issue that glCopyTexSubImage gets the previous frames data if running ATI+fullscreen+AA. The full screen window is not covered or overlapped by any other window.

Is that expected? I was assuming it was a bug. (Quite an annoying bug as well)

Trahern · October 1, 2005, 12:01pm

I have also the same problem with the incosistency of the swap buffers operation ( but I dont blame anyone ) and I have read several threads about this and mostly people are suggesting to use PBuffer workaround. But if I really want to do PFD_SWAP_COPY operation, isnt it easier to do this just by calling glCopyPixels ( with ReadBuffer=GL_BACK and DrawBuffer=GL_FRONT )? Or is there any problem with that ( like worse performance or whatever )…the only thing I can think about is that backbuffer can be invalidated by overlaping windows but its generaly not a problem becouse OS tells you what area was invalidated so you can easily redraw it.

And the second question associated with this is if I have to call glFinish before glCopyPixels or not?
( I have tested it on my hardware and I dont have to call glFinish but Im not sure if I can count on it… )

Humus · October 1, 2005, 4:58pm

Originally posted by sqrt[-1]:
[b]Sorry for the thread hijack, but what you said about fullscreen and AA enabled. I have a current issue that glCopyTexSubImage gets the previous frames data if running ATI+fullscreen+AA. The full screen window is not covered or overlapped by any other window.

Is that expected? I was assuming it was a bug. (Quite an annoying bug as well)[/b]
If you call glCopyTexSubImage after the swap, then yes, that’s expected behavior. If you call it before the swap, you of course get the current back buffer contents (unless of course you have called glReadBuffer(GL_FRONT)).

Humus · October 1, 2005, 5:06pm

Originally posted by Trahern:
But if I really want to do PFD_SWAP_COPY operation, isnt it easier to do this just by calling glCopyPixels ( with ReadBuffer=GL_BACK and DrawBuffer=GL_FRONT )? Or is there any problem with that ( like worse performance or whatever )…the only thing I can think about is that backbuffer can be invalidated by overlaping windows but its generaly not a problem becouse OS tells you what area was invalidated so you can easily redraw it.
Should work. I don’t know what performance you’ll get though.

Originally posted by Trahern:
And the second question associated with this is if I have to call glFinish before glCopyPixels or not?
( I have tested it on my hardware and I dont have to call glFinish but Im not sure if I can count on it… )
You never have to call glFinish() or glFlush(). Any operation that requires sync will sync up automatically. If you know you’ll have to do an operation that’s going to require a sync, such as copying or reading back pixels, you may get better performance though if you call glFlush() right after submitting the last draw call and then try to do meaningful tasks on the CPU for a while before attempting the read-back/copy. Or you can [ab]use occlusion queries to get better control.

Trahern · October 2, 2005, 2:11am

Thanks for answer humus,
however I have reached one problem with my glCopyPixels workaround and its VSync. As far as I know openGL is performing VSync only in SwapBuffer call ( in win32 ) and in my solution I dont use it. So does anyone know if its possible to synchronize my app even without calling swapBuffers or not?
If not I will probably have to do the PBuffer/FBO workaround …

tamlin · October 2, 2005, 5:36am

Trahern wrote:
So does anyone know if its possible to synchronize my app even without calling swapBuffers or not?
I suspect you are confusing synchronizations here.

Vsynch (vblank synchronization) is only to avoid “tearing” during SwapBuffers.

The synchronization Humus wrote about is the synchronization the server needs to finish e.g. a drawing job before it can get the intended result from a buffer modified by a previously dispatched operation.