OT: about GL + drivers + OS

WARNING: This is an incredibly OT post.

I am wondering about how the GL API works. How a driver is implemented and what is the role of the OS in this.

Someone once said this :

“D3D does kernel mode switches much more frequently than OpenGL. Last I looked, kernel mode switches are expensive. If OpenGL is ever made to enter the kernel for each buffer set-up call, that’ll certainly slow it down tremendously.”

What does the above mean exactly? Does “kernel mode switch” mean switching to ring 0?

For example NVidias implementation, they have a file called nvoglnt.dll Im guessing that nvoglnt.dll is a standard dll which manages the GL contexts that exist in the system, and that it`s role is to send commands to the graphics card and basically “run GL”.

What does it do to do that? Does it call functions in some .sys file?

What about D3D? What exactly does it do that is different than GL?

I would like to know more such as how memory is managed on the video card? Is the driver 100% responsible for uploading what the GPU needs. Is it responsible for managing AGP memory as well?

At the chance of talking out of my ass with the very limited knowledge I have of the internals of the gl driver model on windows …

So yes, a kernel mode switch would be to switch to a kernel mode that has access to the hardware, such as ring 0. Don’t know if other rings has access to the hardware directly. Knowing that a crashing driver may freeze the whole computer such that the OS can’t do anything about it I guess it is indeed running in ring 0 (stated according to disclaimer above).

[This message has been edited by Humus (edited 08-14-2003).]

First some 386+ background: The CPU can work in several “execution” modes, dubbed ring 0 to 3, being 0 the most privileged mode of execution where any instruction will be executed “as is”, in the other modes “privileged” instructions (port accesses, memory accesses as described by the memory descriptor tables, etc) can cause faults to occur, which can be trapped by a ring 0 routine and fixed in a proper way.

Windows only uses ring 0 (for kernel & drivers) and ring 3 for apps. A call to a function implemented inside a driver (Escape, ExtEscape calls, OS service interrupts, IOCTLs…) imply a context switch from app (ring 3) to kernel (ring 0).
Actually for video drivers, there’s a third player in the game, the miniport driver which is normally in charge of implementing hardware interrupt handling and IOCTLs.

In win32, the ICD is a lonesome cowboy (as opposed to the MCD model where you only had to implement some hooks), the OS provides the minimum help (this is mainly window tracking callbacks which hook to some display driver code, access to GART mapping routines, and some other kernel-level utility functions for e.g. synchronisation) and the ICD implements the rest of the functionality (every glXXX call and then the wgl functions), normally in userspace.

The role of opengl32.dll is to act as a bridge between the application and the ICD. The ICD exports a big table of functions to opengl32 (at least one entry for each OpenGL function - this is a slight simplification I’m making here), so the only thing opengl32 has to do is to jump to the adequate function from that function table when the app issues a glXXXX call.
This approach somehow dettaches the opengl app from the ICD being used, but it brings more problems than it solves (we have to stick to OpenGL 1.1 version until Microsoft decides to upgrade - if ever).

Calling the ICD a driver (Installable Client Driver) is quite misleading, as the ICD is just a userspace DLL, running in userspace (same as the directx library supplied by MS - not the directx driver supplied by IHVs).

As I said, some of that functionality is implemented in the display driver and called from the ICD via driver escapes (and thus executed in ring 0), but most of it is implemented in userspace.

Obviously this means that at initialisation time, and via a call to a display driver function, the ICD requests several resources to be available from userspace (DMA buffers, AGP memory mappings, graphics chip registers…). This is something Microsoft doesn’t like at all, because it’s a security risk from a stability standpoint, as any application could sweep the whole memory space hitting hardware registers and crashing the system.

There are safer ways of implementing the ICD, but they are normally slower, for example not mapping DMA buffers/hardware registers in userspace, so the ICD has to build some kind of stream of commands in a normal memory buffer and issue a driver Escape call to a routine (implemented in the driver) which will copy/translate the stream of commands into the DMA buffer.

Regarding directx vs. opengl, directx has a userspace dll (the directx runtime written by Microsoft and used whatever graphics card you have) and then the IHVs provide a directx driver.
This approach limits the way IHVs can implement things (all the IHVs code is executed in ring 0) but makes things safer (the driver doesn’t need to map things into userspace, as it won’t be able to access them from there) and more integrated in the system (for example it will be hard for you to make XP’s window transparency capabilities work over an OpenGL window, but they work - or should work - flawlessly over a directx one).
Drawbacks are that because the directx driver is “really” a driver, any programming error there is going to bring the whole system down, in addition kernel debugging is harder than userspace debugging. These are some of the reasons why you normally see more hangs using directx than opengl.

I don’t have much knowledge on the unix side of things, but I think that the following is accurate: In linux/unix you have the vanilla GLX client/server model (which is too slow to be used in PC’s), DRI’s implementations (Direct Rendering Infrastructure - bypasses GLX network protocol) and IHV self-brewed version of DRI (I think that nvidia’s “closed source” driver falls into this category).

[This message has been edited by evanGLizr (edited 08-14-2003).]

Yes, I sort-of meant “switching to ring 0” when I said “switch to kernel mode”, except it’s not just the cost of getting to ring 0 (which I think is “just” a pipeline flush to enforce that pending memory accesses take exceptions in the right protection level) but also all the OS overhead of protecting argument areas and OS stacks from each other, dispatching to the appropriate driver, etc.

Meanwhile, calling into a DLL which is loaded into the application memory space is very cheap. Good hardware allows you to map the command pipeline memory for a single hardware execution thread into the address space of the application, so you don’t have to switch to write that particular memory; you only have to switch once you’ve filled up your pipe and need to bang the “go” register.

I’m sure the details are a little different (and varying between hardware) but I have reason to believe that this is the gist of the difference. I’d be very interested if vendors could elaborate, of course :slight_smile:

Originally posted by jwatte:
Yes, I sort-of meant “switching to ring 0” when I said “switch to kernel mode”

OT, but … V-man == jwatte?

[This message has been edited by Humus (edited 08-15-2003).]

Doh!

Now you got me :wink:

OT, but … V-man == jwatte?
Not unless V-man was quoting himself in the original post and asking someone to clarify a statment he had made. Maybe they are the same person, but he has multiple personalities. Hey V-man and jwatte, is either one of you by any chance a member of a fight club?

BTW, I have nothing whatsoever to contribute to this thread. Carry on.

Heh … got it now
Missed that the original post was actually had a quote.

Sorry about the confusion. I just didn`t want to name any names when quoting.

From what evanGLizer said about beeing a third party in the game, the miniport driver, I gather that the IHV`s GL DLL calls functions inside the miniport driver, which finally takes care of the communication with the graphics card.

How does the switch between ring 0 and ring 3 occur exactly? The OS knows that you are trying to call a function inside a driver (SYS or VXD) and sets the CPU`s registers or …?

Originally posted by evanGLizr:
Obviously this means that at initialisation time, and via a call to a display driver function, the ICD requests several resources to be available from userspace (DMA buffers, AGP memory mappings, graphics chip registers…). This is something Microsoft doesn’t like at all, because it’s a security risk from a stability standpoint, as any application could sweep the whole memory space hitting hardware registers and crashing the system.

Why does the ICD ask for DMA buffers, hardware registers, graphics hw registers?

AGP memory is different since we know the famous NV function that gives us access to it.
Actually, I dont know if I heard a explanation for that one. Howcome we can write and read from AGP memory. Shouldnt it cause a fault?

Also, I beleive it should be possible to replace opengl32.dll
If all it does is load the IHV`s DLL and blindly route functions, anyone can write opengl32.dll
Brian Paul should have put some effort into this.

One more thing:
When we install the video drivers, do the MS Direct3D DLL get overwitten?

So in essence , for D3D

myAPP –> ihv_D3D.dll –> port.SYS –> hardware

For GL,

myAPP –> opengl32.dll –> ihv_gl.dll --. port.SYS –> hardware

Originally posted by V-man:
From what evanGLizer said about beeing a third party in the game, the miniport driver, I gather that the IHV`s GL DLL calls functions inside the miniport driver, which finally takes care of the communication with the graphics card.

No, the miniport contains support routines for the display driver, not for the ICD. It also contains the hardware interrupt handling routines and I think that hardware initialisation routines as well, called by windows hardware config layer at OS startup.

The display driver normally accesses the miniport routines through IOCTLs (Input/Output Control calls).

Anyway, I just mentioned the miniport for completeness, almost all the opengl-related routines implemented in the display driver do not need calls into the miniport.


How does the switch between ring 0 and ring 3 occur exactly? The OS knows that you are trying to call a function inside a driver (SYS or VXD) and sets the CPU`s registers or …?

One of the ways of doing ring switching (besides causing a fault and thus tringgering the fault servicing routine) is by calling an interrupt (much in the way you called 10h interrupt in old DOS to access VGA BIOS routines, for example). This and other ways of ring switching are described in Intel’s x86 documentation.

Nevertheless, in Windows you never call the system interrupt directly, but you use Escape or ExtEscape win32 api function calls which allow you to issue escapes to the display driver.
Those driver escapes get serviced by the display driver in the escape servicing routine.


Why does the ICD ask for DMA buffers, hardware registers, graphics hw registers?

The way you can control a given hardware is either by:

  • accessing its registers (via out/in instructions).
  • accessing memory areas which are mapped to hardware registers.
  • putting commands inside a buffer and telling the hardware somehow to consume them asynchronously from the CPU (DMA transfer mechanism).

So you need one (or more) of those three resources to be able to access any hardware. You need to map them, either to be accessible from the ICD, from the display driver or from both. In my previous post I explained why you need them to be accessible from the ICD (performance).


AGP memory is different since we know the famous NV function that gives us access to it.
Actually, I don`t know if I heard a explanation for that one. Howcome we can write and read from AGP memory.

Hmmm there’s a lot of confusion regarding AGP memory. AGP is just a piece of (normally uncached) system memory (RAM) that is mapped suitably through the GART (Graphics Address Remapping Table - see Intel’s link for more information).

Reading AGP memory doesn’t cause a fault, it’s just slow because it’s uncached.
The reason why it’s normally allocated as uncached is because the AGP protocol doesn’t snoop any of the CPU caches (this is different to how PCI protocol works), so either you flush your caches after filling the AGP memory or you just map that memory uncached and off you go.

Actually, it’s not totally true that the memory is mapped uncached, if available it’s normally mapped as “write-combined”, which means that the CPU will buffer writes to sequential addresses and send the data to memory in bursts (which acts as if you had a tiny cache). That’s why writing to sequential addresses of AGP memory is fast, but “reading” or “writing random addresses” is slow.


[On AGP accesses]
Shouldn`t it cause a fault?

I fail to see why it should cause a fault. If a piece of memory is mapped to be accessible from a user application, it won’t cause a fault.


Also, I beleive it should be possible to replace opengl32.dll
If all it does is load the IHV`s DLL and blindly route functions, anyone can write opengl32.dll
Brian Paul should have put some effort into this.

Sure, anyone can write opengl32.dll, and there are several “utilities” based on that fact (gltrace, wallhacks, etc).
I guess that the biggest problem here will be an IP one, AFAIK since the Farenheit fiasco, Microsoft is the only one with rights to implement OpenGL on windows.


One more thing:
When we install the video drivers, do the MS Direct3D DLL get overwitten?

The driver is independent from the directx dll.

[b]
So in essence , for D3D

myAPP –> ihv_D3D.dll –> port.SYS –> hardware
[/b]

Not exactly, it would be something like:

app -> MS directx layer -> IHV directx driver.

Assuming that the given function call the app used, forces a call to the directx driver (I think that MS’s directx dll normally batches calls and only calls the driver for drawprimitive routines and the like).

[b]
For GL,

myAPP –> opengl32.dll –> ihv_gl.dll --. port.SYS –> hardware
[/b]

Again, it would be something like

app -> opengl32.dll -> IHV ICD -> IHV display driver

And that’s assuming that the given OpenGL function requires a driver escape to be executed, otherwise (most of the time, even for rendering routines) the function call may just end at the IHV ICD.

[This message has been edited by evanGLizr (edited 08-18-2003).]

Originally posted by evanGLizr:

I guess that the biggest problem here will be an IP one, AFAIK since the Farenheit fiasco, Microsoft is the only one with rights to implement OpenGL on windows.

There would be consistency issues. What if you had an NV card in your machine. Then you pulled it out and threw in an ATI card. If they used different versions of opengl32.dll, a lot of things would break. So all vendors would have to replace opengl32.dll for their own drivers.

But if they did that, they probably wouldn’t pass WHQL.

There has to be a layer between the OS and the driver that is stable and consistent across different hardware and drivers. The unfortunate part is that that layer is owned by MS.

Originally posted by PK:
[Replacing opengl32.dll]
There would be consistency issues.
[…]
So all vendors would have to replace opengl32.dll for their own drivers.

Obviously I didn’t mean that any IHV could replace opengl32.dll unilaterally (although that would be fun! ). I was thinking more along the lines of having some ARB member doing that (for example, Intel already has an OpenGL extension SDK/helper).
I think there was some talk about that at the time of OpenGL 2.0 proposals.


But if they did that, they probably wouldn’t pass WHQL.

Funny thing is that AFAIK there’s no WHQL test for OpenGL. The only one I know of is a mode-change stress test, which I think belongs to an optional set of tests.
And I don’t think they care to run SGI’s OpenGL conformance tests for WHQL (although I could be wrong).


There has to be a layer between the OS and the driver that is stable and consistent across different hardware and drivers. The unfortunate part is that that layer is owned by MS.

Technically speaking, I don’t think you need that layer at all, that’s one of the benefits of using dynamic libraries.

You can implement the whole ICD as if it was opengl32.dll and you would be fine. Each vendor could supply their own version of opengl32.dll and nothing would break, unless you are trying to run opengl 1.4 apps with a 1.1 dll, of course (but that’s the same problem as with any other program requiring updated versions of dlls - mfc/comctl32 anyone?).

The case where it would break is when someone codes the app in a way that assumed a given implementation of opengl32.dll (importing functions by ordinal, for example).

I see two cases where the ICD model (splitting opengl32 from the driver) is relevant:

  • When an app wants to be able to work with, say, 1.1 and 1.4 implementations, using 1.4 functions where available. In this case having a common opengl32.dll with 1.4 exports (and using the current ICD indirection mechanism) will allow the app to startup and determine at runtime which rendering path to use.
  • In multiboard && multivendor && multimonitor scenarios, where you need a common opengl32.dll but several drivers. I don’t think this scenario is very usual.

[This message has been edited by evanGLizr (edited 08-18-2003).]

I fail to see why it should cause a fault. If a piece of memory is mapped to be accessible from a user application, it won’t cause a fault.

Well, what does being mapped do exactly?
If in our app we ask for AGP memory, then I suppose this is like using malloc or new, so the AGP memory now belongs to our process, and no other process can access it.

From what I have learned, when a process does a memory access, the cpu checks if the address belongs to the process (in protected mode).
I have forgotten the details of what the steps are.

I know those Intel pages, but it`s not enough to know about system programming.

Doing a search for ExtEscape, the only things I found was people trying to get extra info about their printers and one page saying that RivaTuner uses it.

I looked at atioglxx.dll and nvoglnt.dll
They both export a set of Drv* functions.
ATI exports all gl 1.1 functions while NV doesn`t.
I will try to create a opengl32.dll (out of interest) but it will take time.

gltrace and other such dlls basically bind with the systems opengl32.dll so they arent replacements.

being mapped means being mapped into appropriate process space

since 386 days x86 cpu’s hold a virtual-to-linear page table that effectively remaps what’s where.

i’m pretty certain everyone here’s noticed that most .exe’s under win32 load at 4 meg address (0x00400000 and above), this is not the linear (contiguous & physical) address, it’s the virtual address (it’s a mapped view-of-file really)… since every process (task in x86 speak) has its own page table each process/task can live in illusion of being alone

in the context of agp memory here, win32 maps the agp aperture (window, if you will) into the requesting process’s space by remapping virtual-to-linear pages of that same process so that its page table now has access to agp memory

In multiboard && multivendor && multimonitor scenarios

style

[This message has been edited by mattc (edited 08-21-2003).]

woh! kewl, ok im a newbie, plane language is in affect.

what would i use as a universal interface for opengl version x, for programming.

ie. #include ~.h

i am so not in the know about arbs and extensions and so forth, so i need a good starting point besides running through all the posts trying to make sence.

im trying to stay as broad based as possible vendor / os, and i know the specs pretty much keep everything standard, just what provides a common / standard starting point for opengl and still gives access to all the fast hardware rendering capabilities.

do i need to use wgl.h and then include a common opengl32.dll? from which i could probably do an inventory of all the extensions and functionality, from which then i could build an interface.

ciaran,

please start your own thread instead of posting here. Your question is totally unrelated to this topic.

Originally posted by evanGLizr:
- In multiboard && multivendor && multimonitor scenarios, where you need a common opengl32.dll but several drivers. I don’t think this scenario is very usual.
cautiously raises hand
I have Catalysts, Dets and Kyro drivers installed on my machine. Whenever I want to check something out, I simply swap cards. Shutdown, one out, one in, reboot, done.

Vendor specific opengl32.dlls would make this impossible, I’d have to reinstall drivers each and every time.

“Vendor specific opengl32.dlls would make this impossible, I’d have to reinstall drivers each and every time”

No, it wouldn`t cause a problem for you. The OS detects the card and installs the right driver for you.

Not that this thread is about creating vendor specific opengl32.dll

Anyway, I tried some stuff and I would like to say that its not as easy as I thought it would be. There is a complex interaction between gdi32.dll opengl32.dll and the vendors DLL.

I created my own DLL and it sort of works. I just touched the tip of the iceberg. The problem is I don`t know how to use the Drv* functions.
I have no idea what the return value is suppose to be, and ditto for the parameters.

I had to guess and luckily, nothing crashed but a certain function was returning 0, which I think means failure.

Not easy at all!