Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 23

Thread: OpenGL and how the driver works ?

Hybrid View

  1. #1
    Junior Member Newbie cippyboy's Avatar
    Join Date
    Mar 2008
    Posts
    15

    OpenGL and how the driver works ?

    I have been a graphics programmer for quite some time now (OpenGL and DirectX) but I have not quite understood some of the most intricate details of rendering or their implementation in hardware/drivers.

    1) My first question to whoever knows how a driver works or should work is draw call consistency. And that is something I realize it exists, I'm just curious if the official specs talk about it or not cause I haven't read anything of the sort anywhere. So what do I mean ? Well, let's take a drawcall, I send vertices, gets to pixels, but let's say I have 1000 streams in hardware and the last 50 pixels get processed. That obviously means 50 parallel threads with 950 idling cause there's no more pixels right ? Does that mean that the GPU waits until all execution commands from a draw call finish before beginning a new draw call ? Or can it start a new drawcall and process new vertices, even pixels, before the last one is entirely finished ? Cause heck, those 950 streams could do even 2 small drawcalls before the last one is finished if the pixel shader of the previous one is complex enough (theoretically). If it waits however it could explain why sending more data (with newer hardware) is faster than sending a dozen smaller batches.

    2) My second question is actually a bit prior to this, do pixel shaders get invoked only after all vertices are processed ? Let's say we have again 1000 streams and 48 vertices down the pipe, 952 streams idling. Can the 952 streams process pixels from the previous vertices or do they all wait for all vertices to get processed. The pipeline describes that all stages are in order but I haven't seen it say that pixels should not be processed right after rasterization is complete.

    3) The case for new hardware architectures : If we already have unified architectures that can do any type of computation in parallel, does current gen hardware (HD7XXX and GTX 7XX) still has fixed hardware dedicated to say rasterization ? or AlphaTesting or logical operations on a framebuffer ? For example when we had DirectX10 level hardware and everyone was saying they have unified architectures, I would've assumed that going to DirectX11/GL4 features would not imply needing new hardware. Why ? Well, you could implement tesselation as a shader stage just like the other 3 stages, in effect it doesn't actually require new processor operations. I know DX11 also introduced bit shifting and some other things but I don't see how tesselation needed new hardware in a unified architecture.

    4) Do drivers work in server/client mode or just user/kernel/device mode ? I think the second option is true, but I can't be 100% sure. I initially thought it's the first one due to GL specs talking about "client" and "server", so what I thought was like that right after my GPU boots, there's like an operating system, technically a second computer in it with one program, the driver, and when I send commands to the GPU it would just be like PC networking sendings messages in a socket and getting them out at the other end, doing the processing and sending me back the results. I couldn't have imagined true parallelism to happen any other way. Until I read some driver code from mesa and saw that there's actually a ton of CPU code in the driver that doesn't look like it's dealing with sockets, and then some DirectX driver API where they even standardized GPU command buffers.

    What I really wanted with 4) a few years back was to know when or if a GPU command was finished. I now realize there's synchronization APIs in DirectX10+/GL3+ that deal with that

  2. #2
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    As for #1 and #2, the OpenGL spec neither knows nor cares about these implementation details. All OpenGL says is that the visual results you get will be equivalent to what you would get if everything processed exactly in the order you provide the commands to the GL. How the implementation achieves this is irrelevant.

    However, it is not unreasonable to assume that drivers and GPUs are not stupid. If there are processing resources available and there is work to be done, it is reasonable to assume that drivers will attempt to allocate these resources to that work if it is at all possible. That is the point of the unified shader architecture, after all. GPUs have various means to ensure that everything comes out the other end in order, so they engage in quite a bit of fudging during the processing of various rendering calls.

    Having shader execution units idling doesn't sell hardware. So you can expect that GPUs will do whatever possible to ensure that available work is done on these. Your responsibility is to provide that work so that it can pass it off to the appropriate processing elements as needed.

    If we already have unified architectures that can do any type of computation in parallel
    That's not what "unified architectures" means. A unified shader architecture simply meant that there aren't dedicated vertex or fragment processing units anymore. There are simply shader processors, which can be allocated "dynamically" as needed to any of the available shader processing stages, based on the current workload.

    The processing stage still has to exist in the hardware in order for shaders to be allocated to it; you can't just make up new pipeline stages without hardware changes.

    A unified shader architecture does not mean that the entire pipeline is handled via shaders.

    still has fixed hardware dedicated to say rasterization ? or AlphaTesting or logical operations on a framebuffer ?
    That depends on the various hardware. But it's safe to assume that rasterization is still fixed function, as are logic ops.

    I initially thought it's the first one due to GL specs talking about "client" and "server"
    The specification is quite clear about these terms, as it stops to define them. And they don't (necessarily) have anything to do with networking:

    Quote Originally Posted by From the Spec
    The model for interpretation of GL commands is client-server. That is, a program (the client) issues commands, and these commands are interpreted and processed by the GL (the server). The server may or may not operate on the same
    computer or in the same address space as the client.

  3. #3
    Junior Member Newbie cippyboy's Avatar
    Join Date
    Mar 2008
    Posts
    15
    Quote Originally Posted by Alfonse Reinheart View Post

    The processing stage still has to exist in the hardware in order for shaders to be allocated to it; you can't just make up new pipeline stages without hardware changes.
    Thanks for the snappy response and insightful answers, however this part bothers me a bit. You're basically saying there's dedicated hardware bits for each shader stage, so basically for a DirectX11-level hardware, if I'm not using geometry or tesselation shaders I'm taking a small (but not zero) toll on performance since data still has to go (unchanged) through those units ?

    I originally thought that the shader stages are just stages controlled by the driver, taking all the data and sending work commands for vertices/triangles/pixels to the shader processors, so going from that to tesselation would be like, telling it to do some intermediate work inbetween geometry and pixel stage, and if you don't use that stage, then everything is exactly like it was on DX9/10 level hardware. If indeed the hardware dictates the shader stages, not using a shader stage implies either a performance hit since a part of the pipeline won't be utilized and will have to memcpy the data through it OR the hardware bits that control that stage will just do idle spins ?

  4. #4
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    You're basically saying there's dedicated hardware bits for each shader stage, so basically for a DirectX11-level hardware, if I'm not using geometry or tesselation shaders I'm taking a small (but not zero) toll on performance since data still has to go (unchanged) through those units ?
    Even if the hardware had to pass triangles through null tessellation and geometry stages, your putting a shader there would not make it faster. So any "toll on performance" that you might have is going to be there as a function of the hardware's design; making an explicit passthrough shader would not remove it.

    Which is exactly why hardware doesn't do that. If you don't use a tessellation evaluation shader, your primitives don't go through tessellation. If you don't use the geometry shader, your primitives aren't processed by it.

    The optional stages are optional; that doesn't stop them from being explicit, discrete pieces of hardware. The processing elements themselves aren't, but the hardware built around those stages very much are.

    Geometry shaders are attached to a primitive assembly unit; that's what converts the stream of vertices provided into primitives. The GS is also hooked up to some form of buffer, where the output vertices and primitive data go to be processed by other hardware.

    Not to mention that the tessellation primitive generator is entirely fixed function. The tessellation control shader feeds data directly to the primitive generator.

  5. #5
    Member Regular Contributor
    Join Date
    Jun 2013
    Posts
    474
    Quote Originally Posted by cippyboy View Post
    4) Do drivers work in server/client mode or just user/kernel/device mode ? I think the second option is true, but I can't be 100% sure. I initially thought it's the first one due to GL specs talking about "client" and "server", so what I thought was like that right after my GPU boots, there's like an operating system, technically a second computer in it with one program, the driver, and when I send commands to the GPU it would just be like PC networking sendings messages in a socket and getting them out at the other end, doing the processing and sending me back the results.
    With X11, it may literally be client and server communicating via TCP/IP networking. The way that X works is that the X server has exclusive access to the video hardware (as well as the mouse and keyboard). Clients (i.e. GUI applications) connect to the X server and send it requests (e.g. to create, destroy or manipulate windows, draw on windows, etc), and the X server sends back requested information, error messages, and events.

    This is the environment for which OpenGL was originally designed (SGI made Unix-based workstations). OpenGL is implemented as an X extension (GLX), i.e. as a set of additional commands which can be sent to an X server implementing the GLX extension. This is why OpenGL commands don't return status codes, why glFlush() and glFinish() exist, etc. Most implementations also support "direct rendering" as an optimisation for the case where the client happens to be running on the same system as the X server to which it connects. This allows the client to perform most operations by talking directly to the video driver without having to go through the X server.

    This design turned out to be useful even on other systems (e.g. Windows). Although the PCI(e) bus is much faster (higher bandwidth, lower latency) than a network connection or even a local socket, modern video hardware is so fast that even the local bus can be a bottleneck, and the need for immunity from network latency resulted in an API which is inherently suited to pipelining, which in turn facilitates large-scale concurrency.

    So while Windows systems don't implement OpenGL using a literal "server" process, the API is such that they could do so. E.g. you can't get pointers to internal data, any access to client memory occurs at well-defined points (typically, any function which accepts a pointer reads the data before it returns, except for client-side vertex arrays which are read before the draw call returns), and the buffer-mapping protocol is designed not to require hardware-level (MMU-like) mapping. So considering that the client and server might be separated by a network connection sometimes helps clarify how certain commands will interact.

  6. #6
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    why glFlush() and glFinish() exist
    Those exist for more reasons than just supporting networking the API.

  7. #7
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    578
    Looking over the documents of times of old, GLX, was really an after thought in terms of GL (also, OpenGL was the successor of an SGI 3D proprietary API, I think the name was IrisGL)

    I strongly suspect that why many GL calls did not in the past return status code was not for the sake of X or network transparency, but for the sake of buffering and pipelining.

    Additionally as far as the GL server running on a separate machine as the GL client is something of the past. To give an idea of why here are some reasons:
    1. Buffer objects. Before buffer objects vertex data was transmitted over the wire at each call allowing for the conversion between the endianness of the client and host at each draw call. Buffer objects being raw bytes meant that the conversion cannot be known until draw. There are workarounds admittedly: insist that client and server have exact same endianess or GL implementation detects endiannness of client and the GPU itself can handle data in different endian
    2. The vast majority of core GL calls do not have a GLX protocol... there are unofficial bits from NVIDIA for many API points but they are just that unofficial AND they do not by any stretch of the imagination cover all the GL calls of GL core profile


    If there is one thing I wish I could do would be to utterly eliminate the incorrect notion that X's network transparency idea is a good idea.

  8. #8
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,126
    Quote Originally Posted by kRogue View Post
    If there is one thing I wish I could do would be to utterly eliminate the incorrect notion that X's network transparency idea is a good idea.
    If you don't use it, I can see where you might say that. If you do (as I do), it's "very" useful. It blows away the mentality that to do something graphical on a machine (run GUIs, etc.) you need to go to that box. Desktop mirroring + virtual machines is a lame attempt to give you this capability which X has had for decades.

    That said, it's less useful for OpenGL, for the reasons described.
    Last edited by Dark Photon; 08-25-2013 at 10:20 AM.

  9. #9
    Member Regular Contributor
    Join Date
    Jun 2013
    Posts
    474
    Quote Originally Posted by Dark Photon View Post
    That said, it's less useful for OpenGL, for the reasons described.
    It's less useful for OpenGL only because the wire protocol hasn't kept up to date with recent progress. Shaders are there, but buffers seem to be a sticking point.

  10. #10
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    Strictly speaking, this isn't specific to GLX. The same issues would apply to using a graphics card in a system whose CPU has a different byte order to the GPU.
    Actually no. The OpenGL standard requires that, if the client writes a string of bytes as a "GLuint", then the server must interpret those bytes as a proper "GLuint". So whatever bit fiddling that the server needs to do must be built into whatever processes the server uses to read that memory.

    FWIW, I have trouble understanding why there seems so little interest in exploiting one of the features which really sets OpenGL apart from DirectX.
    Because:

    1: It requires having more than one computer.

    2: Doing so requires being Linux-only.

    3: It relies on the asymmetric computing situation, where your local terminal is weak and a central server has all the processing power. This situation becomes less valid every day. Between GLES 3.0-capable smart phones and Intel's 4.1-class integrated GPUs, the chance of not being able to execute OpenGL code locally is very low.

    It's very difficult to exploit this feature unless it's explicitly part of your application's design requirements. It may differentiate OpenGL from Direct3D, but it's such a niche thing that very few people ever have a bone-fide need for it. It's nice for when you need to do it, but you can't say that it's a pressing need for most OpenGL users.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •