direct_state_access reloaded

oc2k1 · September 17, 2009, 11:56am

It seams to be that many people liked the EXT_direct_state_access, but unfortunately it’ gone in OpenGL-3.2 or 3.1 with forward compatible bit.

The extension self is written against 2.1 For 3.2 there would be the question which parts are deprecated by design… :

Maybe leave that current version as is and start a revised version for 3.2 as ARB or anything else.

Removing the functions for deprecated stuff should be easy. More problematic are the texture functions: They have still the border argument. Because this functionality is deprecated, it should be removed. (It requires new function names ARB at the end would solve that)

One interesting point would be that there is nothing that can’t be emulated by non driver code. Sure it is a little bit slower, but the code is easier to maintain. And if the IHVs may implement that later, it may improve also the speed…

I think the right way to get that working would be:

rework the extension paper
write a program that writes the emulation code
hope that it will become a real extension written against 3.x

Chris_Lux · September 18, 2009, 1:49am

this extension was the single good thing a year back and i hope i will be picked up for core profile OpenGL 3.x.

imported_Groovounet · September 18, 2009, 5:32am

And maybe just OpenGL 3.2 core!
I’m really look forward to see this extension updated!

elFarto · September 27, 2009, 4:24am

It’s not really worth directly converting the direct_state_access extension. You would still be asking the driver to do lots of extra int -> object lookups. It would be much better to have a variation of this extension with opaque pointer types.

Here’s a mock up of a fictional direct_buffer_objects extension:

typedef void* Buffer

Buffer CreateBuffer(int size, void *data, enum usage);

GenBuffers(int count, Buffer *buffer)
BufferData(Buffer buffer, int size, const void *data, enum usage);
BufferSubData(Buffer buffer, int offset, int size, const void *data);
DeleteBuffers(int count, Buffer *buffer)
FlushMappedBufferRange(Buffer buffer, int offset, int size);
GetBufferParameter<T>(Buffer buffer, enum param, T *data);
void* GetBufferPointer(Buffer buffer, enum param);
GetBufferSubData(Buffer buffer, int offset, int size, void *data);
void* MapBuffer(Buffer, enum access);
void* MapBufferRange(Buffer, int offset, int size, enum access);
UnmapBuffer(Buffer buffer);
CopyBufferSubData(Buffer src, Buffer dest, int srcoffset, int destoffset, int size);

BindBuffer(enum target, Buffer buffer); //target cannot be array_buffer, element_array_buffer, copy_{read,write}_buffer
BindBufferRange(enum target, uint index, Buffer buffer, int offset, int size);
BindBufferBase(enum target, uint index, Buffer buffer);

//vertex arrays
VertexAttrib(int index, int size, enum type, boolean normalized, Buffer, buffer, int offset, int stride);
VertexAttribI(int index, int size, enum type, Buffer, buffer, int offset, int stride);

ElementPointer(int index, enum type, Buffer buffer, int offset, int size);//if size == 0, use whole buffer
ElementPointerRange(int index, enum type, int start, int end, Buffer buffer, int offset, int size);

DrawElements(enum mode, int instances, int elementIndex, int baseVertex);
MultiDrawElements(enum *mode, int *instances, int *elementIndex, int *baseVertex);

//texture buffers
TexBuffer(Buffer buffer, enum internalformat);

BindBuffer is unfortunately still there due to texturing methods still using it.

The ElementPointer and [Multi]DrawElements are new, in an attempt to stem the explosion of Draw combinations.

Regards
elFarto

mfort · September 27, 2009, 8:14am

Sorry, I do not understand all the hype about ints vs pointers.
What is the slowdown of the lookup? On 32bit OS the object number generated by glGen* could be a pointer (at least in OpenGl 3.x) right now. Nobody said that the number must be monotonously increasing integer starting from 1.

What stops driver implementers to use pointer compression methods that are used in 64bit Java VM (32bit pointers accessing 32GB RAM)?

I think the ARB should focus on getting rid of binding mechanism. Replacing ints by pointer is not the future.

The_Fiddler · September 27, 2009, 8:43am

What stops driver implementers to use pointer compression methods that are used in 64bit Java VM (32bit pointers accessing 32GB RAM)?

The simple fact that OpenGL 2.1- allowed you to specify your own object IDs. You could say glBindTexture(GL_TEXTURE_2D, 0xDEADBEEF) and the driver would have to keep working.

Using opaque pointers is a huge step forward in this regard and a hypothetical ARB DSA extension is the best way to implement this (break compatibility only once instead of breaking it with DSA and breaking it again with opaque types).

Huge ++ to elFarto’s suggestion. I’d prefer some extra type safety in there (different types for vertex buffer and texture buffer), but otherwise 100% agreed.

elFarto · September 27, 2009, 8:54am

I think part of the problem is that you used to be able to tell the OpenGL what id/name you wanted to use for your object. Prehaps the driver programmers haven’t gotten round to changing it, prehaps it’s a lot of work for them. Without looking at the driver’s code it’s impossible to say.

On your second point, nothing stops them, it’s just not a very nice solution (you can’t store an arbitrary 64-bit pointer in 32-bits). My 64-bit XP can use 128GB of physical memory, which needs 37 bits to store, assuming 8-byte alignment, you could pack it to 34-bits, which is still 2 short.

It’s not so much making them pointers, it’s making them opaque types to help make sure you don’t pass the wrong type of object into a function, and if your going to change the type system you might as well make them wide enough to hold a pointer so no tricks are need to store one.

You could, but they are technically interchangeable, so you can bind one as the other. R2VB is an example of when you really do want to do this, use a PBO as a VBO.

I’ve got some other API’s reworked aswell (Image, FBO, VAO), I’ll try and get round to finishing the rest of them.

Regards
elFarto

mfort · September 27, 2009, 8:58am

Hmm, 2.1 is the most recent spec?
FYI, the 3.0 deprecated application defined object names. End of story.

Of course opaque objects are mostly better but there are far bigger problems then this.

elFarto · September 27, 2009, 9:02am

Of course there a bigger problems (textures/samplers being one IMHO) but that doesn’t mean we don’t want it fixed

Regards
elFarto

Alfonse_Reinheart · September 27, 2009, 11:18am

What is the slowdown of the lookup? On 32bit OS the object number generated by glGen* could be a pointer (at least in OpenGl 3.x) right now. Nobody said that the number must be monotonously increasing integer starting from 1.

What is the slowdown? Have you looked at NVIDIA’s bindless graphics performance statistics? NVIDIA took out most of the object-based stuff and gained substantial performance increases. Now, not all of this is due to int-to-pointer conversions. But some of it is.

Essentially, the problem is cache performance. Because your application does rendering and then other stuff every frame, by the time it gets to the render loop, the cache has lost all of the rendering information. Therefore, every graphics memory access, every one, is a cache miss. And every access of an object is a cache miss.

You might think this is insignificant overall; you’re wrong. Most of the other low-hanging fruit in performance has already been picked. This is what is left.

Taking out the int-to-pointer conversion won’t undo all of the memory accesses compared to bindless graphics. But it’ll take away some of it, and that’s better than nothing.

mfort · September 27, 2009, 11:47am

Well, I am very well aware of that paper. But this is about something totally different.
They extract GPU address for some GL object and than the app supplies this address in render loop instead of object name. So they totally skip the object name -> object data -> gpu address path.

But if you make object pointers instead of object names you won’t get all this excellent speedup. You still have to make object data -> gpu address resolution. Low hanging fruit would be to make object name a pointer on 32bit OS. Then everybody can see what is the speedup. BTW. Driver developers should do this benchmark.

So instead of making new “pointer” API I would rather go for new “bindless” API. If programmer has to change its implementation then there must be really good reason.
I do not want to rewrite the code for pointer API and then again N months later to bindless API.

Alfonse_Reinheart · September 27, 2009, 12:49pm

But if you make object pointers instead of object names you won’t get all this excellent speedup.

I know. I said, “not all of this is due to int-to-pointer conversions?”

The point is that you get something.

I do not want to rewrite the code for pointer API and then again N months later to bindless API.

That’s not going to happen. For many reasons.

Bindless graphics is a horrible API. It breaks the basic model of vertex shaders (let alone uniforms), requiring you to write shaders specifically for it. It is also incredibly low-level, which makes widespread implementation difficult if not impossible.

A better solution is to identify places where driver developers have to do lots of verification of things and eliminate them.

For example, part of bindless graphics is the inclusion of the ability to “lock” a buffer object. The reason for this is because OpenGL implementations can and will move that buffer object around. The only way to get a consistent buffer object pointer address is to tell the implementation not to do it anymore.

Rendering with a VAO normally requires querying every attached buffer object to get their current GPU address pointer, and uploading those buffer objects if they are not in the GPU. This adds a lot onto the overhead and cache issues of rendering.

If however, you could lock a VAO, thus telling OpenGL that it shouldn’t move the buffers attached to it around (as well as preventing you from being able to delete those buffers), then the implementation is free to not bother to query the location of each buffer object in the VAO at render time. Instead, it can just build a GPU-ready sequence of commands to start the rendering.

Locking the VAO would also cause it to become immutable; attempts to modify or delete it (or its attached objects) would fail.

You can even combine locking with getting pointer names. Locking a VAO would mean you get a pointer back, which would be used in pointer APIs. While the VAO is locked, you cannot bind the integer name at all; you must use the pointer API.

The_Fiddler · September 27, 2009, 12:59pm

[quote=“mfort”]

Hmm, 2.1 is the most recent spec?
FYI, the 3.0 deprecated application defined object names. End of story. [/QUOTE]
FYI, 3.2 introduces the compatibility profile which brings them back. End of story?

Alfonse_Reinheart · September 27, 2009, 1:25pm

FYI, 3.2 introduces the compatibility profile which brings them back.

You don’t have to use the compatibility profile. Implementations aren’t required to even support them. The default when using the new create context is to get a core profile.

Dark_Photon · September 28, 2009, 5:43am

Yeah, but who’s still on one of those

BTW, for others reading, the unstated NVidia bindless graphics assertion being discussed is here, pg. 4-5.

So instead of making new “pointer” API I would rather go for new “bindless” API. If programmer has to change its implementation then there must be really good reason.

Up to 7X speedup is a pretty damn good reason, if you’re selling commercial products built on OpenGL. Content sells. Only academicians can afford to adopt the purist theoretical perfection argument, since it’s about ease of learning not performance.

Though long-term, you’re probably right. If devs get wide-spread speed-ups through this technique, probably a general API retrofit would be best. Though it’s great that OpenGL supports adding prototype features like this through an extension mechanism so developers can take them for a spin and provide feedback before OpenGL bets the farm on them. And kudos to NVidia for doing so. Big plus for GL over D3D.

I do not want to rewrite the code for pointer API and then again N months later to bindless API.

In a perfect world I don’t either. Then again, back to (commercial) reality, where it’s adapt or die.

mfort · September 28, 2009, 6:09am

Yeah, but who’s still on one of those
[/QUOTE]
Vast majority. Perfectly enough for benchmarking the potential benefit.

Yes, it is. Go for it. But I also agree with Alfonse Reinheart that the NV bindless API is little bit too low level. Please beware that object pointers is not the same as bindless API. I simply do not believe that the object name to object pointer resolution is the bottleneck due to cache miss. It can be done without memory indirection at all, such as an index to vector of objects.

elFarto · September 28, 2009, 7:56am

It’s extremely likely that the OpenGL drivers use a hashtable, rather than an array/vector. Vectors don’t make good data structures for this purpose, especially if you need to support the user specifying their own handles (which ATI and NVIDIA’s drivers do).

Regards
elFarto

mfort · September 28, 2009, 8:22am

if you need to support the user specifying their own handles (which ATI and NVIDIA’s drivers do).

they don’t need to (in OpenGL 3.0+)

Application-generated object names - the names of all object types, such as
buffer, query, and texture objects, must be generated using the corresponding
Gen* commands. Trying to bind an object name not returned by a Gen*
command will result in an INVALID_OPERATION error. This behavior is already
the case for framebuffer, renderbuffer, and vertex array objects.

Alfonse_Reinheart · September 28, 2009, 10:15am

they don’t need to (in OpenGL 3.0+)

But since the same driver has to support both, they can’t assume that.

mfort · September 28, 2009, 10:32am

they are not stupid. They can have two modes. Pre 3.0 and 3.0+. Decent junior programmer must be able to do that in C++. This is typical low hanging fruit. Replacing one module that gets rid of cache misses during translating object names to pointers.
Do not change API just because you think you cannot make better implementation of it. Otherwise you will only be rewriting your code all the time. You can think that the new code would be better until you realize you are in the same sh*t. Designing API and making compatible implementations is hard.