batch rendering/render state management

what about render state management?

something like glBeginBatchRender() and glEndBatchRender() where all opengl commands between batchbegin/end are collected, the state switches are optimized (sorting for shaders, textures …) and actual rendering is started with batchend.

So you don’t have to build your own render state sorting system. One issue would be with front to back ordering. But with multipass algorithms where you fill the zbuffer once (for example with shadow volumes) this is no problem. And maybe some hint system could be implemented where the user could influence the driver sorting by saying some sections are more important than other:

glBeginBatchRender();
glPriorityHint(0);
glPriorityHint(12);
//render this section before the other ones //with lower priority
glPriorityHint(0);
glEndBatchRender();

[This message has been edited by valoh (edited 03-18-2004).]

This is contrary to the immediate mode design of OpenGL, which says that commands are always executed in the order in which they are received.

A hint system like you propose also suggests that implementors are free to ignore it, in which case different drivers could render vastly different things when given the exact same sequence of commands.

– Tom

Originally posted by Tom Nuydens:
This is contrary to the immediate mode design of OpenGL, which says that commands are always executed in the order in which they are received.

Yes, but only for the section between glBeginBatchRender() and glEndBatchRender(). What would be the problem with that?

Its always suggested to render with some state change minimization sorting, so why don’t provide API functionality for that?

[b]
A hint system like you propose also suggests that implementors are free to ignore it, in which case different drivers could render vastly different things when given the exact same sequence of commands.

– Tom[/b]
Ok, hint is perhaps the wrong term. The priority setting for the different sections should be mandatory. No commands with lower priority should be executed before all other commands with higher priority. So lets call it better glRenderPriority(int) than glPriorityHint(int).

I really dont see the use of it. If you have real integer values for the batches you want to render, why not just sort them yourself?

Letting the driver cache and sort them afterwards cannot be any good. First the driver needs to cache all data you sen to it, doubbeling the memoryusage and possible get an outof memory error, and lowering the spped by the allocations allone, and if your ‘object’ is calculated in realtime ( with millions of millions of polygons ) you will set a maximum number of polygons you can render. instead of the current system that allows infinit polygons to pass through the render pipeline.

The priority setting wouldn’t be the normal case. Only if you have constraints, like rough front to back render, nontransparent and transparent triangles and so on. If you don’t care, all is sorted by the driver. The priority setting would be used to create sub-batches which are rendered according their priority setting.

Ok, with the glBegin(primitive) glEnd() calls this would really make no sense. But with vertex_buffer_object or vertex arrays no huge data amounts need to be cached.

You don’t think it make sense to have API support for batching/render state sorting?
State sorting is a common suggested technique so why not supporting it in the API?
Are the state sort scenarios so application dependent that it would prevent API support?

You don’t think it make sense to have API support for batching/render state sorting?
OpenGL is an low-level API. As such, there are things it should do and things it shouldn’t. Reordering rendering commands is one of them.

Originally posted by Korval:
OpenGL is an low-level API. As such, there are things it should do and things it shouldn’t. Reordering rendering commands is one of them.
And you think OpenGL should remain on the low level where it is now? You don’t have to use this “high-level” features.
Shading language with built-in compiler is also not so low-level.

Imo it wouldn’t be a disadvantage to raise the low-level. And I don’t mind if that get’s to the core of OpenGL, some glu, glut or other related standard API, but I think to stay competitive to direct3d (apart from the OS dependency) it wouldn’t be a bad idea to offer more high-level functionality.

So I’m more interested in technical aspects which would be against such batching / render state sorting.

btw regarding low-level: What do you think of the effect file concept in directx? Haven’t used them yet, but seem promising to have also a more data driven render state configuration.

And you think OpenGL should remain on the low level where it is now? You don’t have to use this “high-level” features.
It’s not a question of what I use. It’s a question of what driver developers should spend their time on. Nowadays, they have to write a full-fledged compiler into their drivers. At least, however, there is some minor argument for that, because they have all the information necessary to optimize this. However, now you’re asking for them to reorder object rendering for you.

Why not just ask them for the Unreal 2k4 engine while you’re at it? Object reordering is clearly an issue for the higher level to deal with. OpenGL is a low level library and it should remain that way. Other code can be built on top of it.

Shading language with built-in compiler is also not so low-level.
If you do a forum search, you’ll find I wasn’t particularly happy about that one either. At least, not about having a full high-level shader compiler in the driver.

I think to stay competitive to direct3d
D3D isn’t moving in this direction either. If anything, D3D is lower level than OpenGL. Just look at VBO vs. vertex buffers. VBO is much higher level; the API lets the driver do virtually anything behind the scenes. D3D, however, forces the user to deal with some of the memory management issues.

So I’m more interested in technical aspects which would be against such batching / render state sorting.
Oddly, the answer to that question is one of the issues :wink: There is no technical issue, because the hardware is not involved. It wouldn’t be done in hardware; if implemented, it would be a purely software solution. Every frame, the driver would build up some render list, reorder it, and send it using the OpenGL API.

This is a prime candidate for an add-on library; not GL itself.

OpenGL drivers create subbatches anyway. Every time you change state and issue the next draw command, you get a new batch. State machines can do that.

State sorting is a common suggested technique so why not supporting it in the API?
State sorting is effective, because it reduces redundant state changes.
Ie
1)bind Texture A, render
2)bind Texture A again, render
3)bind Texture B, render
4)bind Texture A again, render

1=>2 would be just stupid. If you can’t be bothered to write something like

if (current_texture!=desired_texture) ...

, you deserve punishment.
2=>3 is a good reason for a new batch, okay.
2=>3=>4 is bad luck. I assume this is what you’re trying to get rid of here.

To do this properly, you need to
a)“hold back” a batch for some time, to be able to check it against later batches and merge them.
b)compare state vectors. If they are equal, corresponding batches can be merged. If there are only small differences between two state vectors, the corresponding batches can be moved closer to each other (hello, Mr travelling salesman).

a constitutes added latency, lots of bandwidth, and the memory footprint.
b is a much bigger problem for an OpenGL driver than it is for a scenegraph. GL state is a lot of stuff. Scenegraph state isn’t.
A scenegraph also has its geometry data hanging around in memory already (it submits it to the GL, after all). Why make extra copies?

It can be done, sure, and it can be made consistent. But it can never be as efficient as a well implemented home-grown solution that doesn’t need to handle everything.

Let me get on my little soapbox here.
glDrawRangeElements was an extension and became core. IMO it’s a good addition. Why? Because it can do things that can’t be layered on top of GL. Range couldn’t be specified on the old glDrawElements interface, so the only purpose of the extension (potential for more efficiency) would have been lost in a layered approach.
Can your suggestion be layered on top of GL? Yes, it can be. I don’t see any problems with that.

Originally posted by valoh:

Ok, with the glBegin(primitive) glEnd() calls this would really make no sense. But with vertex_buffer_object or vertex arrays no huge data amounts need to be cached.

You are allowed to change the content of an vertexarray or VBO directly after you drawn it, so you really need to cache all that data somehow.

probably you are right, and it’s not a good idea for opengl.

Nevertheless I think this would be a very usefull feature for some kind of (standard) lib. The problem I see there is that you have to replicate all render state and use some kind of wrapping of the OpenGL interface, which make it a very heavy weight lib with many constraints on you programm. So my first (naive?) idea was to integrate it somehow with opengl.

And you are right this isn’t a direct hardware support feature but I think an usefull graphic API should also provide higher level features. And the level of abstraction raise with the years.

btw: glutSolidTeapot isn’t very low-level. Ok, glut is not OpenGL but it is closely related.

Perhaps I should better suggest: provide a (better) high-level standard API closely related to OpenGL :wink:

Yes, I know you can implement this all for yourself, but does it make sense? And I think that this has nothing to do with “please provide Unreal 2k4 Engine Features” but more with why should everybody reeinvent the wheel. Ok, I can use some huge/fat existing Scenegraph library, but unfortunately I then have to use the complete system. I think render state sorting is an orthogonal feature which only dependent on render state and render order and I want to use it without some other scene graph system. And as everbody say sort your states for speed up and this is a somewhat application independent procedure, I wonder if not a general API solution or support would be possible and wanted. Well, wanted obviously not :slight_smile:

Of course for almost every problem a home-grown solution could be faster than a more general solution, but still most people prefer some general system and high-level over low-level.