Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 10 of 13

Thread: Work groups with priorities

Hybrid View

  1. #1
    Junior Member Newbie
    Join Date
    Jan 2008
    Posts
    13

    Work groups with priorities

    Problem:
    Currently, you have no way of telling/hinting the GL (or the driver) which of your commands are important (real time critical) and which ones are not. You want to keep the command queue full, but at the same time you do not want to compete for resources (GPU time, memory, bandwidth) with time critical tasks.

    Examples:
    1. You are running a complicated physical simulation (in an old ping-pong pixel shader, in a compute shader, or as an OpenCL kernel) which is calculated at 10Hz. The renderer runs at typical 60Hz, limited by vsync, and interpolates the simulation results over several frames. The simulation takes considerable time (say, 5-10ms), but it suffices if the result is ready 5 to 6 frames (= 83 to 100ms) in the future.
    2. You are calculating a histogram of the previous frame to do tonemapping. The calculation could start as soon as the frame is available as texture (at the same time as tonemapping/displaying it) and could execute while the GPU is not doing anything (such as during vsync), but it should not compete with tonemapping/blitting the previous frame or delay swapping buffers.
    3. You are doing a non-trivial amount of render-to-texture (say, to display a "page of text" out of a book in an e-reader, or in a game). The frame rate should be constant, as it would be disturbing to see all other animations "freeze" for a moment when one opens a book. On the other hand, nobody would notice if 2-3 frames passed before the book is opened or a page is flipped -- as long as everything stays "smooth".
    4. Your worker thread has just finished loading a texture from disk into a mapped buffer object. Now you would like to use it (next frame). So you unmap the buffer and call glTexImage to allocate storage and define the texture's contents. You want to do this early to give the driver a chance to asynchronously upload the data, but you do not want to compete with the rendering (frame time budget!) for PCIe or or GPU memory. You certainly do not want to stall for half a millisecond while the GL or driver is doing memory allocator work (and maybe even kick a texture that is still needed later this frame!) to make room for the new texture.


    You have no way of telling the GL that you don't need the physics immediately. You have no way of telling the GL to start calculating the histogram but not to compete with the rendering -- or worse, wait for histogram calculation to complete before swapping buffers. You have no way of telling the GL to allocate and upload the texture whenever there is time (i.e. generally as soon as possible), but not at the cost of something that must finish this frame.

    Yes, swapping buffers likely won't be delayed by "unimportant" tasks since most implementations render 2-3 frames ahead anyway, so there is no clean-cut end of frame. But still, you cannot be certain of this implementation detail, and you do not even have a way of hinting as to what's intended.
    The driver must assume that anything you pass to GL (or... CL) is equally important, and anything you submit should be ready as soon as possible. At the same time, you want to push as many tasks to the GL as fast as you can, as to prevent the GPU from going idle.

    With some luck, the driver is smart enough (or lucky enough) to get it just right, but ideally you would be able to hint it, so it can do a much better job.

    Proposal:
    Commands submitted to the GL are grouped into work groups (name it differently if you like). There is a single default workgroup with "normal" priority to accomodate programs that are not workgroup-aware.
    A subset of commands can be enclosed in a different workgroup with a different (lower) priority using a begin/end command pair (say, glBeginGroup(GLenum priority); and glEndGroup();). Implementations that are unwilling to implement the feature simply treat the begin/end function calls as no-op.

    (As a more complicated alternative, one could consider "workgroup objects" much like query objects or buffer objects. This would allow querying the workgroup's status and/or synchronizing with its completion, and one might change a workgroup's priority at a later time, or even cancel the entire workgroup. However, the already present synchronization mechanisms in OpenGL are actually entirely sufficient, and it's questionable whether changing priorities and cancelling workgroups are really advantageous features. They might add more complexity than they are worth.)

    An elaborate system of priorities (with dozens/hundreds of priorities as offered by operating systems) is needlessly complex and has no real advantage -- a simple system with less than half a dozen possible levels, maybe only 2 or 3, would be more than enough.

    For example:
    GL_PRIORITY_NORMAL --> default, want to see this ready as soon as possible
    GL_PRIORITY_END_OF_FRAME --> not immediately important, best start when done with this frame (or when main task is stalled)
    GL_PRIORITY_NEXT_FRAME --> don't care if this is ready now or the next frame (or in 2 frames), but still want result in "finite time"
    GL_PRIORITY_LOW --> rather than going idle, process this task -- otherwise do a higher priority one

    Ideally, there'd be interop between GL and CL for this, too.

  2. #2
    Junior Member Newbie
    Join Date
    Jan 2008
    Posts
    13
    An alternative (probably easier to implement -- but not what I would prefer, since it would require shared contexts, extra synchronization, more resources) might be being able to mark an entire context as "lower priority".

    That way, one could for example do lengthy tasks such as complicated simulations or render-to-texture in the lower priority context without competing for the GPU with the time-critical render thread. While the context belonging to the render thread is stalled or waiting on vsync, the lower priority context takes over. Eventually, the result will be available, and can be used.

    As a side thought: Having lengthy tasks in a separate work queue with lower priority might be interesting for other things as well (think GPGPU). As long as a normal-priority lightweight task keeps performing little or no work and swapping buffers at 60fps, WDDM will not be inclined to kill the display driver because it is "not responding", if a compute task takes, say, 45 seconds.
    Last edited by thomas.d; 05-24-2013 at 03:16 AM.

  3. #3
    Intern Contributor
    Join Date
    Jul 2006
    Posts
    72
    Interesting. I would call that multithreading as there is a sense of concurrency. No idea how one would implement such a thing - looks like a synchronization can of worms to me :/. How does it report back when stuff got done for use in primary (or some other) thread?

    Quote Originally Posted by thomas.d View Post
    ...
    A subset of commands can be enclosed in a different workgroup with a different (lower) priority using a begin/end command pair (say, glBeginGroup(GLenum priority); and glEndGroup();). Implementations that are unwilling to implement the feature simply treat the begin/end function calls as no-op.
    ...
    I do not like this. Just having a glSetThread(thread) would be preferable. "thread" would be an object that contains the priority. "0" => default thread.
    * no need to specify what commands are eligible.
    * no need to repeatedly restate priority whenever something is added to thread.
    * no need to know whether some "workgroup" is open.

  4. #4
    Junior Member Newbie
    Join Date
    Jan 2008
    Posts
    13
    It's not about threading, though. You can already do threading just fine using shared contexts, just not in a "healthy" way that doesn't negatively affect you.

    Changing thread priorities wouldn't be any good. It is not the thread that submits commands that should run at lower priority. This would likely cause the pipeline to become empty, resulting in inferior performance. It is not the thread that processes items on the work queue either that should run at lower priority, and there shouldn't normally be more than one such thread. That would make the driver a lot more complicated (and bring threading issues) and isn't even guaranteed to work any better. Playing with thread priorities in a driver most likely will cause havoc. What if the lower priority thread is starved by a high-priority application thread?

    What should happen is that the driver thread picks up commands from the work queue as usual, but anything it encounters inside a "low priority" section, it just stores on its "not immediately important" list and continues processing "normal priority" tasks. After all, the contract says "this isn't needed immediately right now". Eventually, there will be a stall or the queue will be empty, so the driver will process some of the lower priority tasks in the mean time.

    Now what happens if you need a result from a low priority calculation? Say you are rendering your physics for the next frame (as "low priority" so it doesn't disturb your present frame), and then during the next frame bind the buffer with the result to read from it? Problem? No.
    The same happens as always. Data is not ready, so the command blocks. Now that the "normal priority" queue is blocked, the GPU is free and the lower queue is worked off -- your half-finished physics calculations are finalized. As the result is ready, the "normal priority" queue is unblocked, and since it's higher priority, it takes over the GPU again (you've maybe already queued the calculations for another 2-3 frames in the future, but they won't impact you).

    The GPU never goes idle, but also you never have less than 100% power available for your main task when it matters.

    Now of course there exists the theoretical issue that a programmer might submit total nonsensical sequences of commands/priorities. You could bind a texture with low priority (even for no good reason, maybe just because it's possible), and immediately draw from it with normal priority. What would probably happen is that the GL would draw without a texture bound since the low priority task isn't executed (resulting in a black screen), and later bind a texture, which is good for nothing.
    But honestly, this theoretical issue is not really a problem. If someone tries to play stupid, that's just bad luck for them. You cannot and should not expect from the GL that submitting nonsense produces something useful. It doesn't do that as it is, either.
    Also, this theoretical issue could be avoided alltogether, even in theory, with the "per context" priority approach and shared contexts.

  5. #5
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    It's not about threading, though.
    But it is. You want one set of GPU commands to automatically interrupt another set of GPU commands if it has a higher priority. That's the very definition of preemptive multitasking. They may not be CPU threads, but they are still separate paths of execution which all operate in the same memory space on the device.

    That's a thread.

    Eventually, there will be a stall or the queue will be empty, so the driver will process some of the lower priority tasks in the mean time.
    Why will there be stalls or an empty queue? Generally speaking, that is considered a bad thing in a high-performance graphics rendering system. You should be doing everything in your power to ensure that this doesn't happen, that the GPU always has stuff to be doing.

    Which means, if you do your job right, those low priority tasks will never execute. Well, not until you force them to via priority inversion. At that point, you may as well have done the priority inversion manually: execute the "unimportant" commands when you need their results.

    You've gained nothing in this case.

    You could bind a texture with low priority (even for no good reason, maybe just because it's possible), and immediately draw from it with normal priority. What would probably happen is that the GL would draw without a texture bound since the low priority task isn't executed (resulting in a black screen), and later bind a texture, which is good for nothing.
    That makes no sense. Binding a texture, or indeed, setting any actual state, is not a "command". It may translate into some GPU operation, but in all likelihood, it's purely a CPU-side construct.

    Furthermore, this goes against exactly what you said earlier about priority inversion: a later "command" which is dependent on the execution of an earlier "command" will still have to wait on the execution of the earlier "command". If you consider binding a texture (or any other state change) to be just another "command", then later operations that depend on the execution of that command must wait. It's no different from using a buffer that some low-priority command renders to.

    This is part of the reason why a separate context for command priorities makes far more sense. This what, it's clear exactly what depends on what. It's the state of objects that can cause priority inversions, not random CPU state like object bindings and such. OpenGL already has a well-defined model for when changes to shared state becomes visible to other contexts.

    But honestly, this theoretical issue is not really a problem. If someone tries to play stupid, that's just bad luck for them. You cannot and should not expect from the GL that submitting nonsense produces something useful. It doesn't do that as it is, either.
    First, there's nothing nonsensical about that. You bind a texture, then you render with it.

    Second, it's a poor API that encourages abuse of itself. And your suggested API makes it far easier to screw it up than to do it right.

  6. #6
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    The only way I can image such a thing to happen is in the form of assigning priorities to contexts, then do multithreading + multicontext rendering.

    If e.g. you have two groups one after the other and the first has lower priority than the second. What state the second will run in? It can happen that the low priority task runs first, then the state set by that will affect the second task. However, it may happen the other way too. This would practically make all GL state that could potentially be effected by any of the tasks undefined, which makes the whole feature useless.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  7. #7
    Junior Member Newbie
    Join Date
    Jan 2008
    Posts
    13
    That's the very definition of preemptive multitasking. They may not be CPU threads, but they are still separate paths of execution which all operate in the same memory space on the device.
    In a very general way, one could probably see it as "threads". Though I would not like to think of it as "preemtive multitasking". Rather, it's some form of cooperative multitasking. The "main thread" (default) will only yield when it blocks/stalls (e.g. waiting on a result) or when it has "nothing to do" (e.g. the end of the frame). The lower priority "thread" will use what GPU time is left over, and when it's available. It may stop "pulling" new compute tasks from the queue if a condition makes the "main thread" ready, but it ceratinly wouldn't interrupt a running workgroup (that would be insanely expensive!).
    Since there's a well-defined point in time (end of frame) when the "main thread" chooses to yield (by calling SwapBuffers, or explicitly if you want), there's not much risk of a background task never running.

    That makes no sense. Binding a texture, or indeed, setting any actual state, is not a "command". It may translate into some GPU operation, but in all likelihood, it's purely a CPU-side construct.
    Gee, I knew this would come up...
    What does it matter for the example whether binding a texture to a texture unit (note that I'm not talking about glBindTexture), is or is not a GPU operation (besides, it necessarily is, only not necessarily immediately). Regardless whether it adjusts state or runs a compute task, it is still a "command". Even a command that merely adjusts a selector (assuming that OpenGL will still not be direct state next revision) is a "command". They all go into one queue.
    And indeed, the point of my example was that it makes no sense to do for example such a thing as putting a command on which another one depends into another priority block, because it will very obviously not work in a meaningful way. But again, if you do deliberately stupid stuff, then undefined results will come out. Unix has very successfully worked by the "type shit in, and shit comes out" principle for 45 years. It's not a problem.

    Now you're inevitably going to object that this cannot work anyway because for example selectors may be changed and afterwards you have no way of knowing what was what. There are at least three ways to address this. Either, since by definition, lower priority tasks are not related to the main queue (not directly, anyway), every subgroup could start with default values. Or, the present state could be "snapshot", for example using a copy-on-write mechanism. Developers are careful not to do too many state changes anyway, and you won't have a hundred different priority blocks in a frame either, so the overhead should be tolerable. Or, just screw selectors alltogether and go DSA, which a lot of people would welcome anyway.

    But yes, the "per context priority" solution would conceptually be easier to implement, that's sure. And even this would be a big win already (even though not my preferred solution).

    Why will there be stalls or an empty queue? Generally speaking, that is considered a bad thing in a high-performance graphics rendering system. You should be doing everything in your power to ensure that this doesn't happen, that the GPU always has stuff to be doing.

    Which means, if you do your job right,
    The problem is, as it stands, OpenGL does not allow you to do your job right. Not in a reliable way, anyway.

    Certainly, you can assure that the pipeline never stalls and never gets empty. Nothing easier than that. But then you'll inevitably have different tasks competing for the GPU, and not all of them must be ready at the same time. With some luck, there's enough horsepower, so nobody notices. With some luck, some driver hack kicks in (application-profile, prerendering, EXT_swap_control_tear,...), and maybe not.

    But doing it properly in a somewhat predictable, reliable way is an entirely different story.

    For example, you have the choice of submitting a physics simulation for the next frame at the beginning of a frame, somewhere in the middle, at the end before SwapBuffers, or at the end after SwapBuffers. Or, you can do it in a second, shared context (or in CL). These are all your options. Which do you choose?

    The first two will result in the simulation competing with drawing the current frame, and with some "luck" will cause you to just miss the frame time by 0.1ms, causing your frame rate to drop from 60 to 30. Now you wish you had submitted it later.

    Submitting just before swapping buffers will reduce this competition, but is likely to cause the pipeline to run empty, and it may still cause you to miss the frame time. The driver has no way of knowing (other than by application-specific driver hacks, or by prerendering 3 frames, none of which you can rely on) that you actually want to swap buffers immediately -- it must assume that whatever you submit still belongs to the same frame.

    Submitting the physics after SwapBuffers, on the other hand, may work or may not work as intended. Again, you have no way of knowing, and it's nothing you could rely on. The driver might let you render ahead 1-3 frames or not. You might be blocked inside SwapBuffers for 3-4ms during which the GPU will be idle. Now you wish you had submitted it earlier (because now, running the computations would be "free").

    Using a shared context has none of the waiting-on-SwapBuffers problems, but it competes with the other context all the time, much like submitting the computation early or in the middle. The driver cannot know (without an application-profile hack) that one is more important than the other because it will exceed its frame time budget.

  8. #8
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    Though I would not like to think of it as "preemtive multitasking". Rather, it's some form of cooperative multitasking. The "main thread" (default) will only yield when it blocks/stalls (e.g. waiting on a result) or when it has "nothing to do" (e.g. the end of the frame). The lower priority "thread" will use what GPU time is left over, and when it's available. It may stop "pulling" new compute tasks from the queue if a condition makes the "main thread" ready, but it ceratinly wouldn't interrupt a running workgroup (that would be insanely expensive!).
    That's still preemptive multitasking, because the lower priority thread's actions must be preempted by the execution of the high priority thread. You said so yourself: an example you gave read, "it suffices if the result is ready 5 to 6 frames". In order to make that work, your low-priority commands must be interrupted by the main thread several times. That is, when whatever "end of the frame" wait time (see below for a discussion of this fiction) finishes, the "main thread" has to take over again. This involves a full-fledged task switch.

    Whether you think of it that way or not, that's preemptive multitasking. There's no cooperation happening there, because it is not at all clear when, during the execution of the low priority task, that the high priority task can start taking over again.

    Since there's a well-defined point in time (end of frame) when the "main thread" chooses to yield (by calling SwapBuffers, or explicitly if you want), there's not much risk of a background task never running.
    You seem to have this idea that, when you call `SwapBuffers`, the GPU just stops or something. That later commands will have to wait to execute until some pre-determined time after this, possibly related to vsync. You can always turn vsync off, you know. And even if you can't, that's what multiple buffering is for: so that you don't have to wait for the vsync.

    GPUs already do this stuff for you. Really, there is no "end of frame" waiting time where GPUs are stalled. There is no "well-defined point in time" "where the 'main thread' chooses to yield". GPUs don't stall just because you call SwapBuffers.

    And indeed, the point of my example was that it makes no sense to do for example such a thing as putting a command on which another one depends into another priority block, because it will very obviously not work in a meaningful way.
    And my point was that the example you gave just before that specifically stated that it would work. Allow me to quote you:

    Quote Originally Posted by You
    Now what happens if you need a result from a low priority calculation? Say you are rendering your physics for the next frame (as "low priority" so it doesn't disturb your present frame), and then during the next frame bind the buffer with the result to read from it? Problem? No.
    The same happens as always. Data is not ready, so the command blocks. Now that the "normal priority" queue is blocked, the GPU is free and the lower queue is worked off -- your half-finished physics calculations are finalized. As the result is ready, the "normal priority" queue is unblocked, and since it's higher priority, it takes over the GPU again (you've maybe already queued the calculations for another 2-3 frames in the future, but they won't impact you).
    You say that all OpenGL functions are commands, all commands go into a queue, and the "normal priority" queue will block if it tries to access data from the "low priority" queue that is not yet ready. Then why should calling `glBindTexture` in a low-priority thread then accessing it in a high-priority one behave any differently from rendering to a buffer object in a low-priority command and then accessing it in a high-priority one? They are, by your own logic, the same thing: data set by a low-priority command that is being accessed by the high-priority one, but is not yet ready.

    So why does one work and the other not work?

    You can't have it both ways. You can't have the products of low-priority tasks be accessible just like they would have been without priority, yet have `glBindTexture` in a low-priority task somehow not be accessible from a high-priority task. It doesn't make sense. Either everything works under the "as if it were specified in order" rule (which is where the whole "Data is not ready, so the command blocks" comes from), or everything does not. You can't pick and choose arbitrarily.

    Or if you want to pick and choose arbitrarily, you need to explain exactly what you want to be picked and chosen.

    Submitting just before swapping buffers will reduce this competition, but is likely to cause the pipeline to run empty, and it may still cause you to miss the frame time. The driver has no way of knowing (other than by application-specific driver hacks, or by prerendering 3 frames, none of which you can rely on) that you actually want to swap buffers immediately -- it must assume that whatever you submit still belongs to the same frame.
    I don't understand what this means. Presumably, your "physics simulation" isn't rendering to the framebuffer or reading from it. So the driver already knows that these commands have no relation to any of your prior rendering commands. So, assuming that there is time to complete both tasks, why exactly would this cause you to miss the frame time?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •