OpenGL Next via OpenCL

One of the reasons I was ok with what GL3 eventually became was that OpenGL did not need the distraction of a major refactoring of the existing graphics abstraction on the cusp of that abstraction becoming obsolete.

I’ll go out on a limb and say that we’re well past the point of diminishing returns trying to make the existing OpenGL hardware abstraction support first class tessellation, order-independent transparency, global illumination, micropolygon rendering, virtualized texture. Even adding seemingly simple things like texture arrays and geometry shaders takes years.

CUDA, OpenCL, and DirectX Compute all illustrate that the GPU is really coming into its own as a general purpose computing device. What’s being mostly ignored is that graphics should be the killer app for the compute mode of these devices.

CUDA and OpenCL essentially ignore graphics, OpenGL ignores compute, and DirectX Compute tries to bury a “general purpose” mode into the existing graphics abstraction. None of these seem to be a good fit for taking graphics forward in new and interesting ways on modern GPUs.

The only effort right now that may be sniffing in the right direction is the Larrabee Native Interface, by changing the focus to a general compute device that can function efficiently as a GPU. But obviously this interface will not be an open standard, making it a non-starter for most developers.

I propose that the most forward-looking and interesting direction for OpenGL Next is to define its function and implementation fully in terms of OpenCL.

Said another way, if OpenGL Next cannot be efficiently implemented atop OpenCL, then I think Khronos will have missed a golden opportunity to set the right direction for open, portable graphics in the age of the GPGPU.

I’d like to see this too; it’s only a matter of time before the “wheel of reincarnation” comes full-circle and the work done by specialized graphics hardware is once more folded back into the CPU.

I imagine it’s probably a number of years before this approach would be competitive enough to warrant writing commercial applications with such an API, but I’d certainly try using it for a personal project or two.

From the scant preliminary outline given in a Siggraph pdf I gather that GL and CL actually share resources, as the design is
aimed to make interoperation between GL & CL efficient. It looks as though we end up with 2 different languages (GLSL and CL’s C99+extensions), but the upshot is we get an offline compiler. The feeling here is definitely more GPGPU than graphics.

Browsing the DX11 presentations from Gamefest I was struck by the new features in HLSL, interfaces and classes in particular - looks a lot like Cg (“subroutines”, dynamic linking function thing). And aside from the now-familiar song and dance on tesselation, I was wowed by the new read-write buffers/textures, so-called “unordered resourse views” (then I had a good chuckle over the fact that Effects are back in D3DX). Looks like the compute shader in DX is an offshoot of the pixel shader, to lend itself initially to graphics post processing tasks, it appears.

With that, I admit I was fit to tied with the next wave of technology around the corner, but you’re absolutely right in the long view - that’s exactly where it seems to be going. Seems like a great opportunity to get a jump start on an appealing inevitability.

I didn’t realize the presentations were up already - nice!

It looks like one can use the compute shader separately from the graphics pipeline (Dispatch()), but the pipeline has been updated to emit more general structures for use with the compute shader. :slight_smile:

EDIT: RWTexture2D !!! I am excited.

Anywho, back on topic now… :wink:

And in case it’s not clear, I think this direction for OpenGL Next shouldn’t make any effort to be backward compatible. People have certain expectations in naming though, so perhaps a name like OpenCL-G (the graphics library for OpenCL) would avoid compatibility battles from the get-go.

Ultimately OpenCL-G should be the very modern, clean interface that LP proponents were craving, but the basis for that clean interface should come from a GPGPU abstraction, not a DX9-class-GPU one.

The reason I think this direction is important is because it seems that graphics experts are ignoring this new upstart “compute mode” when their primary focus should be on making it the Right Way to do graphics.

Today Compute is not a superset of Graphics, Compute is what needs to change to address that. Graphics experts, with the goal of making OpenCL-G, need to be a major driving force in OpenCL.

It worries me that Intel seems to be the only company openly advocating this direction. Where are the other Khronos members?
Do they have a different vision, and if so, what is it?

There seems to be much going on at id concerning this direction. For others that haven’t seen it: Possibly relevant link.

This discussion, in my mind emphasizes how the ARB missed the boat when the GL2.0 rewrite was being considered, many years ago. Now, given the lack of LP, the ideal API will have to wait until a theoretical OpenCL-G comes about. However, the potential advantages of that approach are many.

You should understand that there is pretty significant overlap in company and individual staff participation between the CL and GL working groups. There’s no “Chinese wall” between the two that I can perceive…

It will probably be more interesting to talk about how CL could stand alone as a rendering facility once there are some 1.0 implementations actually out there to play with, so people can see what it can or can’t do at that point. I would expect to continue to see GL evolve and track hardware (and interoperate with CL, very important).

Hi Rob,

I do understand that, however, observe that CUDA comes from the company with the best OpenGL implementation available, and it’s clearly not a viable replacement for OpenGL. So overlap alone isn’t sufficient for expecting good things to “just happen”. It also must be a goal, and it’s not a trivial goal that will just fall out of the process by accident.

Trying to set a major “killer app” for CL post 1.0 would be a tragic missed opportunity. Wait until 1.0 comes out (when’s that?), with solid implementations (when’s that?), and people have experience and recommendations? Sounds like a recipe for years of delay. It may just mean that Khronos cannot innovate; that they are really best at standardizing innovation from NVIDIA, Intel, etc. That’s not exactly a criticism - I understand it’s hard for innovators to give up their first mover advantage or to telegraph their strategies.

CL/GL interop is not a bad idea if you want two completely separate things that will remain separate (like CL and GL3 and below). CL and CL-G interop is built in by definition.

Thanks -
Cass

An OpenCL-G would be an analog to what a Larrabee API would provide, right? I couldn’t find the information about the OpenCL working group, but is Intel even on it? It seems we might be in for a whole bunch of API choices in the future for compute/graphics.

Arguably the current hardware has some limitations which make a fully general purpose GPU API a complex problem, and require current graphics specific interfaces to be efficient.

(1.) Lacking ability to start new tasks on the GPU without CPU involvement. Sure it might be possible to use a shader program to modify/write GPU side command buffers. Would have to abstract the common functionality of all GPUs into an API which can be called from within a shader. Some sick person (like myself) might actually enjoy doing this.

(2.) IMO a primary limitation of CUDA is the lack of an efficient way to do bandwidth efficient general scatter of small values (with cache-able locality), and atomic operations on small scatter/gather values. This is basically what it seems that the PTX .surf or surface cache was designed for (functionality which hasn’t been either exposed in CUDA or perhaps not yet found in the hardware).

Of course the GPU has this type of functionality exposed via the ROP/OM. To my knowledge there is no way to emulate Z buffer like functionality efficiently in CUDA. However, one can currently make general scatter efficient as long as you scatter in a multiple of half warp sized objects (min 64 bytes/object).

Seems like most current CUDA parallel solutions involve bandwidth bound scans/sorts of the entire data set or scatter via gather or just bandwidth wasting scatter. This isn’t a good solution for the tremendous size of graphics data sets. Take a simple case of just trying to write out 2M z-buffered points per frame into a frame buffer, something which doesn’t need any special raster hardware and is trivial and not bandwidth bound on DX/GL, but is a nightmare to do in CUDA.

I still find the graphics APIs better for GPGPU, even with the triangle setup bound issues, often using the vertex shader for all computation (bypasses the lack of ALU efficiency for pixel primitives in the fragment shader) followed by using the fragment shader just to scatter the results with depth test used to gather the maximum or minimum result in case of a scatter collision.

So while I would really like a fully general purpose GPU API for current hardware, how exactly would one currently do that without loosing the efficiencies of both graphics or compute?

Hi Timothy,

Current hardware has definitely not addressed the problem of an efficient general purpose API for graphics. And it won’t unless that’s an actual goal. It is pretty clear that Larrabee is intended to be an offering very much in that direction though.

Given where we are today would you rather see the next 2 years spent adding tessellation or order-independent transparency or sparse voxel oct-tree traversal or virtualized texture to the current graphics pipeline (assume you can only pick one), or would you rather see it spent making the GPU programmable enough to support those things efficiently through clever general purpose programs?

IMO the latter path is inevitable. It’s just a question of how long it will take Khronos to realize it, and which member companies capitalize on it as first movers.

Thanks -
Cass

Current hardware has definitely not addressed the problem of an efficient general purpose API for graphics.

That’s because the concept of a “general purpose API for graphics” is an oxymoron. An API designed for graphics is, by definition, not general purpose. Even if you used Larrabee or OpenCL to write a new graphics API, it would still be a graphics API. Thus not particularly useful for genera-purpose computation.

Furthermore, hardware doesn’t define APIs; software does (drivers).

Given where we are today would you rather see the next 2 years spent adding tessellation or order-independent transparency or sparse voxel oct-tree traversal or virtualized texture to the current graphics pipeline (assume you can only pick one), or would you rather see it spent making the GPU programmable enough to support those things efficiently through clever general purpose programs?

Well, since the latter is not going to happen in 2 years, I’ll take the former.

I know Intel is hot on their Larrabee thing, but I just don’t buy that it’s going to be that great of a GPU. And if nVidia and ATi don’t go along that path (simply providing OpenCL support to those who want it), then nobody’s going to want to use it. And if nobody uses its power, what’s so good about it?

Let’s not make this a semantics argument, Korval. Clearly you know what was meant was a “general purpose API that supports graphics”. Back in 1992 OpenGL was formulated to be the thinnest possible veneer over a hardware abstraction - the OpenGL Machine - that SGI was planning to build. If you think software “defined” that API, you’re simply wrong. I suspect you know this and were just engaging in semantic jousting though.

Intel is not the only company that’s hot on general purpose programmability. All the major players have horses in this race and what we hear publicly is only what those companies are ready to reveal. As a software developer, I think it’s useful to be proactive about what you want - especially if what you’re hearing publicly isn’t tracking what you want.

I can tell you where I don’t see a lot of action though: revolutionizing OpenGL or D3D. What’s the point? All the easy stuff has been done. Seriously. The next cool things just don’t fit into the old abstraction. If you disagree with me, please see the last 2 years. Even GL3’s “revolution” was just going to be a refactoring of the DX9 abstraction. Yawn. (I’m not saying it isn’t a crufty old API in need of refactoring, but where’s the business case? The people that want a new, clean API shouldn’t insist on the termination of a decade old, stable, useful, and above all lucrative industry standard. If you don’t care about backward compatibility, just make a brand new 3D API.)

In the end, top software developers will vote with the code they write. My prediction is that the most generally programmable architectures will offer the richest experience to consumers because it will offer developers greater ability to innovate and differentiate.

There are a lot of interesting and exciting problems in real-time graphics left to solve. I just don’t see OpenGL 4.0 or D3D 11 as the sole tool for solving them. The OpenGL/D3D abstraction will remain an important tool, but the toolbox is in dire need of other tools. There are only so many things you can make with just a hammer. I’m pretty sure I’m not the only graphics software developer that feels that way.

I can’t imagine that Korval would argue for argument’s sake… no way :wink:

As a software developer, I think it’s useful to be proactive about what you want - especially if what you’re hearing publicly isn’t tracking what you want.

Couldn’t agree more. It’s the age old battle of the status quo (and in this case, status quo ante as well), from which many a famous heretic have emerged.

It’s the age old battle of the status quo (and in this case, status quo ante as well), from which many a famous heretic have emerged.

Alternatively, one can see it as the age-old battle of needless change vs. right tool for the right job.

It is true that you can only make so many things with just a hammer. But there’s no better instrument for driving a nail.

To me, OpenGL and OpenCL have different, though related, purposes. Looking at the GPU even as a generic processing resource, what I see are APIs designed for specific patterns of use for that resource.

OpenCL is not a general-purpose accessor to a GPU. It is designed for small-ish tasks that operate on fairly finite datasets. That is, you do something with a specific set of data, and get certain other data in return. These are tasks with very well defined inputs and outputs.

OpenGL is designed around rendering triangles. Everything about it is built around that concept. Now, maybe graphics libraries in general need modification to best use a “generic processing resource”.

You could modify OpenCL to actually do rendering tasks. You could give it specialized access to texture units and textures. You could give it specialized access to the framebuffer (to take advantage of hardware features like Hi-Z and so forth). But what would you gain from that? Is OpenCL better at being a graphics API, even with those features, than OpenGL? I would say no.

You’re essentially wanting to throw the hammer away and use a shoe. You can walk on a shoe. And yes, you can use a shoe to drive a nail, but you used to have a hammer.

To me, the best of both worlds is having them interoperate. OpenCL is very good at general-purpose computational tasks. It would be quite good at doing, for example, vertex generation in complex ways. As good as the tessellation shaders might be, it would probably be better overall to use OpenCL for something as complicated as certain subdivision surface algorithms. Especially with iterative algorithms and complex data structures.

Why haven’t OpenGL or D3D been revolutionized in the last two years? Because they’re mature tools. We still use hammers despite the fact that the basic technology has been around for centuries. Mature tools don’t need revolutions; they do their jobs very well. OpenGL and D3D could do with some restructuring, but even the structure of their shaders exists for the purpose of optimization. Even with Larrabee, a completely general-purpose processor, the fundamental structure of shader stages will be vital towards optimizing the shaders that Larrabee uses.

I’m not arguing for the sake of arguing. What I want to see is not one gigantic, monolithic interface to “many core” processing resources. I want to see lots of interfaces, each specifically designed for a purpose. OpenCL is a start. I also want to see one that is higher level, possibly involving functional programming constructs that can use more memory (better optimizing) and so forth. I want to see OpenAL or some other audio system designed specifically for these kinds of processing resources. And so on.

I’d don’t know about you, but I really like how DX11 incorporated the compute shader into HLSL.

Personally I’d rather have a single “interface” for graphics, AI, physics, etc. Something sorta like what we already have on the CPU, only faster. Ostensibly I’m talking single “language”, multiple applications. Not exactly sure what Cass has in mind, but superficially that’d be a great foot forward for OpenCL-G IMHO. And I agree, strenuously, that innovation would soar in this more general context.

To my mind, the question is when, not if. I submit that when we finally move away from some of the last canned, graphics-specific stages in hardware, something like this becomes not only viable but eminent. Stuff like triangle rasterization could be layered onto this context if needed (though it seems likely that the triangle’s days as a rendering primitive are numbered). So the “graphics API” needn’t really be a fully fledged API at all, but rather an extension to a more general context, an API within an API, so to speak.

The thing that’s weird about DX11 compute (or adding a GLSL compute shader to OpenGL) to my mind is that it enshrines this notion of a split personality device, graphics and compute, into the 3D graphics abstraction. That’s about as natural as adding a physics shader to OpenGL or D3D. It does force the interop question, which is a plus, but then it makes you do general purpose programming in a “3D graphics shading language”. Odd choice, and definitely not one you’d choose if designing from first principles. It feels much more like a “tack-on” solution.

If you start from the other direction, general purpose that can implement graphics efficiently, you naturally want to start with a toolchain much more like what we have in the CPU world today. C/C++ is the natural first language, but the toolchain shouldn’t inhibit others or combinations as appropriate.

I agree with Korval that OpenGL and D3D are mature tools. That’s exactly why it’s fundamentally difficult to revolutionize them now, and why so much effort is going into compute and APIs (like phys-x) that can live on top of compute.

The real innovation over the next few years is going to be in compute because it is the low hanging fruit - where massive GPU horsepower can be brought to bear on a lot of under-served markets. I want graphics people to be driving compute to solve graphics problems efficiently. Not just conventional OpenGL-style graphics, which are essentially a solved problem, but problems like global illumination, micropolygon rendering, true (virtualized on-demand) procedural texture generation, order-independent transparency, procedural geometry, parallel scene traversal, parallel render targets. Most of these problems do not lend themselves to a simple ASIC implementation that works for everybody.

There’s just a lot of stuff worth doing that can’t be done with a hammer alone.

To me mature means stable, reliable, useful, maybe even essential, but also boring. There’s not a whole lot more you can do with stdlib that’s going to set the world on fire. :wink:

So rather than polish the hammer for another few years and miss the compute party, I think graphics people should be focusing on what new rendering tools OpenCL-G could enable us to make. Empowering software developers to be graphics tool makers is an important transition for our industry. Some smart people will profit from it; everybody will benefit from it.

The compute party is happening with or without our involvement. But things can only be better if there’s an OpenCL-G contingent at the party.

If you think of it as being really “general”, then without doubt you’re right. But, as I’m understanding DX compute, it keeps it’s target audience in mind. Games. That’s not so general. Maybe for game programming it is a natural approach to embed GPGPU into the graphics system? (I’m no game programmer, so I’m just speculating here.)

Furthermore GPGPU has this second G for “graphics”. For me this abbreviation is a contradiction in terms. As long as GP is run on something that you call GPU, it will be associated with graphics. Seeing it this way, DX compute seems to be a coherent development. Ok, that’s semantics again, but maybe semantics are important here?

CatDog

It’s a very subjective question. I don’t especially like it, but then that’s just my personal opinion.

Certainly it will be the most convenient adoption path for some developers simply by virtue of being in DirectX.

Taken from the Gamefest talks my understanding of MS’s “vision” for the compute shader is that games will latch onto it first for fullscreen post processing effects and then they want to grow it from there.

So, there is every chance that once MS have got people used to doing compute work like this they will pull another more general purpose API out, based on the work going on with DX11 compute to refine it, to fill that gap. (that’s my guess work btw)

So, yeah, the compute shader is very much aimed at games right now. One of the examples given at Gamefest, iirc, was using a compute shader to calculate the exposure value for an HDR scene. Typically done by reducing an image down in size over multiple passes the compute shader exploites work groups and local cache to do it all in one pass over the data and write out a single value.