PDA

View Full Version : GL3 and matrices



V-man
11-14-2007, 04:18 AM
Sorry, I can't resist.

Will the standard matrices be eliminated? Will there still be
glMatrixMode();
glFrustum();
glOrtho();
glTranslate();
glRotate();
glScale();

or we just make our own uniforms in the shader?
Of course, what about ftransform()?

Groovounet
11-14-2007, 06:40 AM
They said at tha last OpenGL BOF: "No more matrices neither matrices stacks". That's part of the fixed pipeline so no reason to keep it. Our own uniforms is the right way to go.

No more ftransform as well.

knackered
11-14-2007, 07:08 AM
I'm not entirely sure I agree with that decision. Removing such fundamental information from the hardware such as a modelview/projection matrix and ftransform() has got to be a bad thing. They can remove everything else (light parameters etc.), but this one thing seems to be a backward step.

Jan
11-14-2007, 08:17 AM
I agree with knackered. Also i don't remember them stating it that radical, so i am not sure ALL matrix stuff will be gone.

You will at least need a viewport and depth-range. So, why not tell the API modelview and projection matrix as well. Also, if the new and shiny display lists should actually be able to cull geometry, the API needs modelview and projection matrices and the vertex-shader needs to do ftransform. Otherwise the driver cannot do any culling.

I am pretty sure, that ftransform will stay. And matrices will stay in some form, as well. Stacks will be gone, sure, no problem. Matrix operations like glTranslate, glRotate, etc. will be gone, too. Maybe even glMultMatrix and glLoadIdentity, but doing EVERYTHING through shaders and uniforms would be, IMO a step backwards.

Jan.

Trenki
11-14-2007, 09:00 AM
I agree that viewport and depth-range have to stay but modelview and projection matrices are in the end just uniform parameters and IMH should therefore not get any special treatment.

Display lists? I was hoping they would go away too. Are you suggesting they will stay in a modified version?

[ www.trenki.net (http://www.trenki.net) | vector_math (3d math library) (http://www.trenki.net/content/view/16/36/) | software renderer (http://www.trenki.net/content/view/18/38/) ]

Overmind
11-14-2007, 09:21 AM
It would be easy to provide glu functions for all the old matrix functions. Take this together with the implicitly defined "default" uniform block they were talking about a while ago, and you have the same semantics and the same syntax (plus an extra 'u' ;) ) as before, without bothering the spec.

Zengar
11-14-2007, 09:24 AM
Some geometry-based display lists are rumoured to stay in the new API.

Personally, I see no reasons for keeping the matrix stack. Albeit, arguments of knackered and Jan sound plausible.

Roderic (Ingenu)
11-14-2007, 10:51 AM
I'd rather go 100% matrix free, and have everything done through shader's uniforms.
I doubt we use those functions much anymore anyway...

knackered
11-14-2007, 12:19 PM
We would kiss goodbye to culling of 'geometry lists' and any form of optimised transform hardware.
I'm all for a programmable pipeline, but something as fundamental as this should be isolated for possible hardware acceleration opportunities.
And what do you mean "I was hoping display lists would go away"? What harm have they ever done to you? They seem like an eminently sensible acceleration opportunity, they always did and they always will.
Fundamentalists, the whole bally lot of you.

Brolingstanz
11-14-2007, 12:27 PM
hear, hear.

Korval
11-14-2007, 12:34 PM
Removing such fundamental information from the hardware such as a modelview/projection matrix and ftransform() has got to be a bad thing.

Why?

We're going 100% glslang here. As such, what's the benefit of having matrix stacks?

And what about those of us who, for example, like the concept of a specific camera matrix and the concept of world space? The only reason to provide fixed-function matrix stacks is if you expect the context or something in the rendering system to actually look at them and infer something from it.

And that means that users of GL 3.0 will gain some benefit in adopting the FF-matrix stack in lieu of a set of matrices that they might otherwise prefer.


optimised transform hardware

There isn't any; that's all gone. Once hardware went all shader and dumped the FF stuff, it was all removed.

knackered
11-14-2007, 01:30 PM
No, the 3dlabs Realizms were one of the first SM2.0 capable cards, but I know for sure they had dedicated transform circuitry which kicked in when it hit ftransform. I was told that the vertex was transformed in parallel to the vertex shader being executed.
Granted they're out of the workstation business now, but it at least shows that they appreciated the savings they could make by doing something 99.9% of vertex shaders will do completely in parallel to the rest of the shaders execution - and the API made it possible. You're going to remove that for no other reason than as part of a clean up operation.
Let's not throw the baby out with the bath water.
BTW, I'm not advocating the stack, or even the separate matrices, just the ability to declare a single user-defined mat4 uniform as the VertexTransform uniform and the ftransform() function. That's all I'd suggest.

Actually forget it, after reading back what I've written, the argument to keep it is pretty lame.

Humus
11-14-2007, 02:42 PM
So, why not tell the API modelview and projection matrix as well.

A modelview and projection matrix are good concepts in the future too, but rendering is in no way tied to such concepts, so the API doesn't have to know your application uses such a concept. A lot of rendering uses neither modelview nor projection matrix. And for the cases where you do, it's often better to use a fused MVP matrix, but not always (for instance rendering lots of low-poly objects with different modelview matrices). The developer is most likely in a better position to know which will be faster for his work than the driver.

Humus
11-14-2007, 02:46 PM
And what do you mean "I was hoping display lists would go away"? What harm have they ever done to you?

Adding overhead to every single GL function. With display lists in the API every GL entry point will have to include a check to see if we're currently assembling a display list.

dorbie
11-14-2007, 03:40 PM
I also have concerns about the elimination of matrix operations being a mistake, especially for newer developers and code protability/sharing between libs in general, OpenGL ES 2.0 has the same design in this respect. (I don't really care if your opinion is you can write it in 20 minutes with a strong cup of coffee Mr Trevett).

As for display lists, they were always more abused than used. With drawelements(etc) and VBOs being the preferred path the right thing is elimination of display lists, it is a win for development and validation of drivers, support etc. Clearly the price is worth it. Yep some of us have to port procedural non optimal rendering code that could trivially build a display list for reuse, but when we're using the driver layer as an allocator and data formatter maybe we deserve and need the exercise. The lack of a trivial glBegin/glEnd style immediate mode for trivial stuff is related and at times more annoying, but the benefits are even more obvious in terms of complexity performance and validation, again well worth it.

In general going to an entirely programmable system has issues for state driven rendering paradigms (ingrained in a lot of higher level rendering apps & middleware) due to the need for on the fly compilation for trivial 'discovered' state changes or complex state driven shaders but this is the nature of the beast. The benefits outweigh the problems.

These changes do make OpenGL more difficult to use especially for a beginner, but they also make it more difficult to abuse and significantly easier to deliver robust tested drivers. As a whole it is worth the pain and we can almost certainly expect to see matrix helper functions in a new glu lib.

knackered
11-14-2007, 04:06 PM
Adding overhead to every single GL function. With display lists in the API every GL entry point will have to include a check to see if we're currently assembling a display list.
Well, I only meant the geometry type of display list. Of course you'd bin the state display lists.
Anyway, surely the gl functions have a jump block? In other words, as soon as you hit a glNewList() you'd swap the function pointer table to one that feeds the dlist compiler.

dorbie, it's not (just:)) about laziness it's about giving the driver the opportunity to optimise the geometry for its particular implementation. It may want to interleave, or it may want to upcast int8's to int32's because it knows they'll be faster on this particular hardware revision. It may wish to cull.

Korval
11-14-2007, 04:40 PM
Anyway, surely the gl functions have a jump block?

The user might have cached that function pointer, so the "jump block" would have to be on the other side of every function call. As such, it's still overhead; just not a whole lot.

In any case, the point is totally moot because GL 3.0's API for geometry display lists will be object based like everything else. So creating a display list would involve creating a display list template object, filling out the appropriate parameters (the VAO and draw call you use with it), and then building the object. It's one entrypoint for building the list, maybe one for deleting it, one for binding a display list to the VAO slot in the context, and one for rendering with a display list instead of a VAO.

dorbie
11-14-2007, 09:21 PM
dorbie, it's not (just:)) about laziness it's about giving the driver the opportunity to optimise the geometry for its particular implementation. It may want to interleave, or it may want to upcast int8's to int32's because it knows they'll be faster on this particular hardware revision. It may wish to cull.

The driver is not going to tristrip for index caching. You will still have to do the hard stuff for most if not all the gains. The driver cannot know the shader (well it might I suppose but that's not always the case) and even so you have arbitrary attribute intent and arbitrary shader code, it can't cast (no knowledge of intent although I guess promotion would be OK but I'm sure pointless) as for data packing.... I dunno, I think the front end of all these pipelines need to support contiguous and strided batched prefetch and are particularly fast from VBOs in VRAM (we're inevitably talking about non-volatile data here) and they're very good at this, it's what they do. I think it may be a dated argument.

Then consider the development cost, not just to IHVs but to everyone in terms of features and driver quality....... If you want fast stick some big VBOs as indexed cache coherent tristrips in VRAM, once you do that I'm skeptical you'll have any more to be gained that's worth the expense and effort unless you've done something assinine like have non interleaved strided data in which case you get what you deserve. Finally is display list support really worth it if there is a marginal benefit that could be gained in an app anyway?

I think we should relegate them to the past.

P.S. on culling you have a point, but only a fool would rely on display list culling for optimization, again IMHO you get what you deserve & this is firmly in the display list abuse category and is a complete misunderstanding of their correct useage. Although in the banchmarketing stakes this has probably happened at times and I know MUCH worse has.

Ask me over a beer which company did a bound box test on display lists and used it to simply drop half of the lines drawn to cheat on the Viewperf CDRS benchmark because they knew they were near degenerate. It's a sleazy business..... Not the culling you had in mind.

Korval
11-14-2007, 10:43 PM
The driver is not going to tristrip for index caching.

nVidia's drivers do.

The only problem with display lists is that you cannot guarantee that an implementation will make it faster.

But you can guarantee that it won't get slower.


to everyone in terms of features and driver quality

At a bare minimum, they will just take the VAO, copy the data out of the bound buffers into an identical VAO. It'd take maybe 4 hours to implement.

Trenki
11-15-2007, 04:14 AM
Well, display lists complicate things for the driver writer. If you have them each function has to check if it should be recorded into a display list. Then the display list stuff has to be stored somewhere and the driver would also need to be able to play them back. Life for the driver writer would be much easier if they didn't exist.

[ www.trenki.net (http://www.trenki.net) | vector_math (3d math library) (http://www.trenki.net/content/view/16/36/) | software renderer (http://www.trenki.net/content/view/18/38/) ]

Groovounet
11-15-2007, 07:29 AM
Display lists are probably the best way to get high performances but I agree the issue for drivers writers (is any one haven't complains about drivers?) and there also make complicate things for the programmer. For a small sample display list are just great but wish a large account of source code... a be more complicated. A solution would have been to limite their uses on geometry but what could they do that VBO couldn't in that case?

Maybe in the futur display lists will make their come back but right now I just want a OpenGL 3 spec to hand on the futur!

Any bet for GL3 spec for siggraph 2008 ?

Korval
11-15-2007, 02:02 PM
If you have them each function has to check if it should be recorded into a display list. Then the display list stuff has to be stored somewhere and the driver would also need to be able to play them back. Life for the driver writer would be much easier if they didn't exist.

Please pay attention to the thread. Geometry-only display lists do not work that way. They do not record anything. You do not build them by pretending to run the rendering system.

V-man
11-15-2007, 06:19 PM
If you have them each function has to check if it should be recorded into a display list. Then the display list stuff has to be stored somewhere and the driver would also need to be able to play them back. Life for the driver writer would be much easier if they didn't exist.

Please pay attention to the thread. Geometry-only display lists do not work that way. They do not record anything. You do not build them by pretending to run the rendering system.

I'm confused. If you will create some GL object and just feed it your vertices and the driver will optimize it, how is this different than from a static VBO?

Some one says that nvidia driver optimize display lists (because it does scene culling)
It should be able to do the same for a static VBO.

geometry only display lists seems useless to me.

Korval
11-15-2007, 08:20 PM
If you will create some GL object and just feed it your vertices and the driver will optimize it, how is this different than from a static VBO?

Because, while a static VBO will likely be in video memory, the arrangements of vertices and elements are exactly and only what you asked for them to be. Which means that they do not have to conform to what the hardware would like them to.


Some one says that nvidia driver optimize display lists (because it does scene culling)

No, nVidia's display list optimizations include culling; they aren't only culling. They also include proper stripping (for the specific hardware caches of the chip) and so forth.

dorbie
11-16-2007, 06:12 PM
@ korval... you're saying NVIDIA *drivers* will tristrip and index calls sent to a display list for vertex cache optimization?! I don't believe you, where did you read that? It wouldn't even have a rational vertex set to begin with. At a minimum, known weaknesses in nvtristrip make this a nasty proposition. If you're saying it will regurgitate anything indexed & stripped you send it but from fast memory that's a nobrainer and misses my point.

@Groovounet.... best by what standard? The best way is to write good dispatch code from well ordered data in fast memory. Buidling a display list has a cost in terms of time, support and most of all driver complexity. It is also an abused feature.

Culling of display lists does make a lot of sense for idiotic apps in a fixed function pipeline, BUT with vertex shaders it's a different ballgame. You cannot cull until at least vertex positional transformation for affine transformations, and for any decent app even the legacy functionality should be redundant. So this would needs all sorts of analysis just in your shader compiler to see if it's even possible THEN you have to split your shader to realize a partial win, good luck with that.

Jan
11-17-2007, 06:49 AM
If you use ftransform in your shader, the driver can flag that shader as "using ftransform" upon creation. When you have bound that shader to your pipeline and you render some display list, the driver knows, that it can cull the object, as long as it also knows the modelview and projection matrices.

This is what is necessary to enable such a feature. I don't say it should be done, but i always see it as a good thing, if an API at least exposes such possibilities for drivers to optimize things.

Even apps that use many fancy shaders, will very often still do some "standard" rendering. So why remove the modelview and projection matrix and ftransform which are not only matrices/functions but also inhabit some very fundamental semantics? Sure, i like a lean and mean API, too. But i really don't like restricting myself and driver-writers possibilities for optimizations, just for the sake of removing redundancy. You are NOT removing redundancy by this, you are also removing some meta-information, that can be used reasonably.

Jan.

Ysaneya
11-17-2007, 10:34 AM
We would kiss goodbye to culling of 'geometry lists' and any form of optimised transform hardware.

And IMO that's not a bad thing. Culling in display lists in a heresy to me. It benefits the "lazy" programmers that create their whole scene with display lists without any form of scene graph, but any moderately advanced engine will perform its own visibility/culling processing, and so the work is done twice and you actually waste performance.

Y.

Brolingstanz
11-17-2007, 11:27 AM
Incidentally, in one of the GDC 2007 presentations, one of the features mentioned for d3d10's future is a command-buffer object, which smacks to me of a DL of sorts... "Commands stored as replayable macros" ... "Fast resubmission of common command set."

Hmmm........ seems reasonable.

Humus
11-17-2007, 03:53 PM
Culling in display lists in a heresy to me.

I agree. There are also side effects such as if you do any form of profiling with hardware counters it'll register less geometry submitted than was actually done because the driver threw away some stuff.

Xmas
11-17-2007, 07:49 PM
I agree. There are also side effects such as if you do any form of profiling with hardware counters it'll register less geometry submitted than was actually done because the driver threw away some stuff.
So what? A renderer with early-Z will count much less shaded fragments than one that exactly follows the theoretical OpenGL pipeline. Yet you will certainly agree that it's a perfectly valid optimization.

dorbie
11-17-2007, 09:21 PM
I agree. There are also side effects such as if you do any form of profiling with hardware counters it'll register less geometry submitted than was actually done because the driver threw away some stuff.
So what? A renderer with early-Z will count much less shaded fragments than one that exactly follows the theoretical OpenGL pipeline. Yet you will certainly agree that it's a perfectly valid optimization.

I agree it's a valid optimization (unlike the abusive example I gave) but it helps the worst of developers and only adds overhead to the best since it should never succeed in culling anything, it would be redundant. If your app relies on this you have issues as a developer. I have also pointed out the potential for problems doing this with more advanced vertex shaders.

Xmas
11-18-2007, 04:02 AM
I agree it's a valid optimization (unlike the abusive example I gave) but it helps the worst of developers and only adds overhead to the best since it should never succeed in culling anything, it would be redundant. If your app relies on this you have issues as a developer. I have also pointed out the potential for problems doing this with more advanced vertex shaders.
Some developers may not have the luxury of time to fully optimize their application. With more advanced vertex shaders culling can just be disabled.

But back to topic, you neither need ftransform nor gl_Vertex and gl_ModelViewProjectionMatrix to have the compiler check that gl_Position is the result of a uniform mat4 * attribute vec4 operation.

Humus
11-18-2007, 02:05 PM
So what? A renderer with early-Z will count much less shaded fragments than one that exactly follows the theoretical OpenGL pipeline. Yet you will certainly agree that it's a perfectly valid optimization.

Sure, but the hardware counts culled fragments as well, so the developer will have the full picture of what's going on.

Xmas
11-18-2007, 02:32 PM
Sure, but the hardware counts culled fragments as well, so the developer will have the full picture of what's going on.
There's nothing preventing the driver from counting the culled triangles.

knackered
11-18-2007, 05:08 PM
It seems the general opinion in this thread is to remove API opportunities for the hardware to perform optimisations simply because they could be performed by the application and therefore on the CPU. This contradicts the mantra's of recent times that more graphics tasks should be off-loaded to the GPU. With geometry display lists the hardware has the opportunity to perform both frustum and occlusion culling. Without geometry display lists it is virtually impossible for the hardware to do this.

Jan
11-18-2007, 05:27 PM
There are many small programs or tools, that are just hacked together to get something done. Especially in the academic field. Sure, we CAN optimize all our programs. But being forced to optimize every single pi**-program just because there is NO optimization whatsoever done by the driver will be a major pain. For example, i have an editor, that displays many small 3D gui-objects to manipulate selected items, which consist of a few lines or triangles. With display lists, i can make sure, that i can render them at least with only one drawcall, instead of a bunch of glBegin/... calls. When i render like 50 of them, because the user currently selected 50 objects, it would be nice, if the driver would at least do basic frustum culling.

Tools are often written by people who are not so much into the details of OpenGL and optimizations. Having at least basic optimizations, especially for such small stuff, would IMO be a good thing. One should not forget, that even OpenGL 3 is not ONLY intended for game-programming, where a company needs to go the extra mile of optimization, but it is also for many academic and other semi-professional purposes.

With the few state-objects that D3D10 uses (and OpenGL 3) i really don't see the point of a "command-buffer" anymore.

Jan.

Ysaneya
11-18-2007, 05:28 PM
This contradicts the mantra's of recent times that more graphics tasks should be off-loaded to the GPU. With geometry display lists the hardware has the opportunity to perform both frustum and occlusion culling. Without geometry display lists it is virtually impossible for the hardware to do this.

Yes, but unfortunately culling isn't performed by the GPU, but by the CPU, hence why it's done twice and wasting cpu cycles. Maybe you don't care ? Fine, but I do.

I could live with a hint you could set, specifying whether DL should do culling or not. But doing culling when I already do it.. just no.

Y.

Lord crc
11-18-2007, 05:52 PM
I thought the point of OpenGL 3 was to reduce the complexity of the drivers. As such, I find it strange that display lists should be included, as they've always appeared as a rather complex beast. Not the concept, but the implementation. They modify the behavior of a rather large bunch of calls, which makes it more error prone.

Why couldn't display lists be implemented in something like GLU? Except from culling, from what I can see, most other optimizations could be done by such a library. It would keep the drivers clean, and people who would like the ease of display lists would be able to use them. It should also make the performance of display lists more consistent between systems.

Korval
11-18-2007, 11:29 PM
With geometry display lists the hardware has the opportunity to perform both frustum and occlusion culling.

Actually, no.

Geometry display lists imply that all that is being stored is the pre-T&L geometry itself. That is, it could be usable with any vertex shader that accepts the inputs that the pre-T&L geometry provides.

Frustum culling requires the implementation to know how the vertex shader with transform the vertices, which by GL 3.0 standards is now entirely arbitrary. Thus, no frustum culling is possible.

By "occlusion culling," I assume that you mean performing occlusion queries on some bounding region and then checking later to see if that object was visible before rendering the actual geometry. The problem there, once again, is the arbitrary T&L. The implementation cannot even tell what the input positional data is, let alone build a bounding volume that is guaranteed to encompass the post-T&L region.

No, the advantage of geometry display lists is in giving the driver the opportunity to rearrange your vertex data into a form most appropriate for rendering. For example, Humus mentioned in a previous thread the idea of making some vertex attributes accessible from a texture rather than a buffer object, to more effectively use parallelism. Well, the driver knows best when to do this, and the only information it needs to know is covered by the GL 3.0 Vertex Array Object (the shader can be patched to get its vertex data from a different place. It's a quick patch). Thus, a geometry display list from well-built drivers will be able to parse your vertex data and split it into textures and so forth and more optimally use the hardware for maximum vertex throughput.

The only way for geometry display lists to be able to perform any kind of culling would require the reinstatement of fixed-function T&L.


I thought the point of OpenGL 3 was to reduce the complexity of the drivers.

Does nobody read the thread? How many times has it been mentioned how stupidly simple it is to implement geometry display lists? Let me make it abundantly clear for you all:

Driver complexity is not a valid issue here!


Why couldn't display lists be implemented in something like GLU?

Because the purpose of geometry display lists is to get optimal performance for a particular piece of hardware. You cannot achieve that without hardware-independent code. And GL implementations cannot alter GLU.

Lord crc
11-19-2007, 12:44 AM
Driver complexity is not a valid issue here!

With "simplifying driver development" being one of the stated goals for OpenGL 3, how can it not be a valid issue?


How many times has it been mentioned how stupidly simple it is to implement geometry display lists?


Because the purpose of geometry display lists is to get optimal performance for a particular piece of hardware.

So a display list MIGHT give you optimal performance. Or should the driver just not create the display list object if it cannot deliver an (more) optimal version?

Personally I would prefer the core API to give me predictable performance between platforms. As it is now (pre gl3), that is not the case.

Xmas
11-19-2007, 04:29 AM
Frustum culling requires the implementation to know how the vertex shader with transform the vertices, which by GL 3.0 standards is now entirely arbitrary. Thus, no frustum culling is possible.

[...]

The only way for geometry display lists to be able to perform any kind of culling would require the reinstatement of fixed-function T&L.
To quote yourself: "Please pay attention to the thread."

It is entirely possible for the compiler to check whether the vertex shader calculates gl_Position as uniform mat4 * attribute vec4 (or similar). If that's the case mark the uniform as "MVP matrix" and the attribute as "vertex position". No fixed function required at all.

knackered
11-19-2007, 04:45 AM
Yes, but unfortunately culling isn't performed by the GPU, but by the CPU, hence why it's done twice and wasting cpu cycles. Maybe you don't care ? Fine, but I do.
so you're only arguing against frustum culling in geometry display lists, and not arguing against geometry display lists themselves?
if so, and if you've been using display lists in recent years on nvidia hardware, you've already been paying the price twice - and yet nvidia display lists enable maximum hardware spec throughput and have done for years. We're not talking about a proposed feature, this feature has been in successful use for some considerable time.
Also, the fact that you recognise the cost of frustum culling in terms of cycles should indicate to you how much your application would gain from offloading the task to the GPU. Some engineering scenes I've dealt with have cost 40% in the cull traversal alone.

Korval
11-19-2007, 01:28 PM
With "simplifying driver development" being one of the stated goals for OpenGL 3, how can it not be a valid issue?

It's not a valid issue because it has been shown several times that implementing geometry display lists doesn't make drivers more complex.


So a display list MIGHT give you optimal performance. Or should the driver just not create the display list object if it cannot deliver an (more) optimal version?

The bare minimum I would expect of a geometry display list implementation is to simply store the VAO and draw call(s?) internally, and simply regurgitate them upon command. That is, if the driver can't/won't do better than your VAO and buffers, then it will simply use your stuff directly.

It takes maybe 4 hours to code.


Personally I would prefer the core API to give me predictable performance between platforms.

Nobody's forcing you to use display lists.

Lord crc
11-19-2007, 03:32 PM
It's not a valid issue because it has been shown several times that implementing geometry display lists doesn't make drivers more complex.

I guess we're talking about different kinds of complexity then, so never mind.


The bare minimum I would expect of a geometry display list implementation is to simply store the VAO and draw call(s?) internally, and simply regurgitate them upon command. That is, if the driver can't/won't do better than your VAO and buffers, then it will simply use your stuff directly.

If it is guaranteed that GDL's would be the fastest alternative (as in no other alternative, including extensions, would be faster), then I guess it would be nice to have them. However if that isn't the case, then imho I just don't quite see the point of having them in the core API.

pudman
11-19-2007, 03:34 PM
It takes maybe 4 hours to code.

I estimate more like 3.5 hours. At least, that's the rate at which an nVidia developer would do it in the G8x GL3.0 driver. ATI I estimate 3.7 hours. It would be more like 3.4 hours without the AMD merger.

Sorry, I can't help myself. Simplicity or speed of coding the implementation plays little part in deciding an architecture. It's definitely not a reason to leave in a 'feature'.

That said, I have no opinion on this 'feature', only on subjective coding times.

knackered
11-19-2007, 05:20 PM
If it is guaranteed that GDL's would be the fastest alternative (as in no other alternative, including extensions, would be faster), then I guess it would be nice to have them. However if that isn't the case, then imho I just don't quite see the point of having them in the core API.
You have no guarantee of anything in OpenGL, as you don't in D3d. You have only your own benchmarking to go on. If you thought otherwise, then you've naive. What you would have would be a guarantee that they would be no slower than if you manually set up the objects and called the draw commands yourself.
If you seriously can't see the benefit of having this light-weight semantic in the API after all the points that have been made in this thread, then there's not much else to say.

Humus
11-19-2007, 06:25 PM
There are many small programs or tools, that are just hacked together to get something done. Especially in the academic field. Sure, we CAN optimize all our programs. But being forced to optimize every single pi**-program just because there is NO optimization whatsoever done by the driver will be a major pain.

But the driver is not the right place to put this. It's better put in a middleware layer. I'm sure there are plenty of open source libraries that you could use.

knackered
11-19-2007, 07:23 PM
They need to be in the ICD to benefit from any hardware acceleration or hardware specific optimisation.

Humus
11-20-2007, 02:06 PM
If you want hardware assisted culling, I think we should come up with a proper API for that instead of expecting automagic action under the hood for display lists under limited conditions. Predicated rendering is an example of a proper hardware assisted form of culling. I'm open to other ideas.

knackered
11-20-2007, 03:56 PM
aside from not wanting the hardware to be given the opportunity to cull, have you any objections to the other reason for geometry display lists? i.e. giving the IHV the opportunity to optimise the mesh for that specific piece hardware, automagically-so-to-speak?

MZ
11-20-2007, 06:58 PM
from older GL3 thread:

If you want something more high-level, get an unreal engine license.

Ysaneya
11-21-2007, 03:29 AM
aside from not wanting the hardware to be given the opportunity to cull

Come on, nobody said the hardware shouldn't be able to cull. We are arguing that display lists isn't the right place for that.

knackered
11-21-2007, 04:18 AM
I think the culling has become a bit of a distraction for you all. Forget about the culling - it's just something you could get as an added bonus to geometry display lists, not their main reason for existing.

Xmas
11-21-2007, 04:08 PM
Come on, nobody said the hardware shouldn't be able to cull. We are arguing that display lists isn't the right place for that.
Well, why not? Every implementation does a lot of optimizations behind your back, why is this one particularly harmful?

knackered
11-21-2007, 04:31 PM
For me there just isn't an argument against them. It's such a simple thing to add, has so many possible benefits for prototyping (culling) and full blown apps (buffer re-formatting), and has already been proven to provide incredible performance on nvidia hardware. If the API is supposed to be a true abstraction of current and future hardware, then you have to accept that renderers use buffer objects primarily to render 'meshes' which in turn should be given their own level of abstraction - so long as it doesn't add unnecessary complexity to an implementation, which with the new object API it simply won't.

Ysaneya
11-21-2007, 05:28 PM
Come on, nobody said the hardware shouldn't be able to cull. We are arguing that display lists isn't the right place for that.
Well, why not? Every implementation does a lot of optimizations behind your back, why is this one particularly harmful?

Because it's not optimizing anything, in my case it's actually slowing me down (spending cpu cycles on something I already do).

Y.

knackered
11-21-2007, 05:44 PM
But,but,but that's an implementation detail. You have no control over what an implementation does, you just benchmark. If something slows you down on your test hardware, don't use it - if you get a boost, do use it. The same with every other opengl feature. Nobody's forcing you to use it - whether it's there or not, using it is your choice.
Why should everyone else pay the price of reduced acceleration slots just because you've some vague paranoia about the driver possibly expending some cpu cycles in some implementation detail?

Xmas
11-22-2007, 05:04 AM
Because it's not optimizing anything, in my case it's actually slowing me down (spending cpu cycles on something I already do).
If you do your own culling you're probably not using display lists. It's an optimization for the common case. And it's likely that any culling used for display lists is inexpensive.

Overmind
11-22-2007, 05:21 AM
Could anyone please explain to me what can be culled in display lists? Because at display list creation time, the modelview matrix is usually unknown, so how could the driver possibly know what to cull?

Xmas
11-22-2007, 05:49 AM
The driver can generate bounding volumes at display list creation time. Then at draw time it can transform the bounding volumes and check whether they are completely outside the view frustum.

Ysaneya
11-22-2007, 11:02 AM
But,but,but that's an implementation detail. You have no control over what an implementation does, you just benchmark. If something slows you down on your test hardware, don't use it - if you get a boost, do use it. The same with every other opengl feature. Nobody's forcing you to use it - whether it's there or not, using it is your choice.

The problem is, it's not an OpenGL feature. It's an Nvidia-implementation "side effect". You cannot rely on it, and you have no way to disable it. I wouldn't complain if there was a glHint or if it was clearly specified in the GL specs that DLs had to have a culling option. At the moment, I feel like it's mixing apples and oranges. It should stick to pure geometry optimizations IMO.

Y.

Xmas
11-22-2007, 11:50 AM
The problem is, it's not an OpenGL feature. It's an Nvidia-implementation "side effect". You cannot rely on it, and you have no way to disable it. I wouldn't complain if there was a glHint or if it was clearly specified in the GL specs that DLs had to have a culling option. At the moment, I feel like it's mixing apples and oranges. It should stick to pure geometry optimizations IMO.
A similar thing could be said about early Z and stencil optimizations. You cannot rely on it, and you have no way to disable it. On some implementations an early Z pass is a total waste of time, while on others it can help performance.

Korval
11-22-2007, 12:06 PM
A similar thing could be said about early Z and stencil optimizations.

There is a difference, though. Early-Z and such are hardware features that you know will exist on modern hardware.

Display list culling is not a hardware feature; it's a driver thing. Furthermore, it is an nVidia driver-only thing.

Xmas
11-22-2007, 03:55 PM
There is a difference, though. Early-Z and such are hardware features that you know will exist on modern hardware.

Display list culling is not a hardware feature; it's a driver thing. Furthermore, it is an nVidia driver-only thing.
Early-Z may stop working on some GPUs under certain conditions, but not on others. It's something you have to know about the implementations if you want to rely on it. Same thing for display list culling. And in the end it doesn't matter much whether it's hardware or software, it's an implementation detail.

And the only way you can tell whether a pre-Z pass is worthwhile is by benchmarking it.

knackered
11-22-2007, 05:10 PM
still nothing new about GL3. Almost december now.

Korval
11-22-2007, 07:50 PM
Early-Z may stop working on some GPUs under certain conditions, but not on others. It's something you have to know about the implementations if you want to rely on it. Same thing for display list culling. And in the end it doesn't matter much whether it's hardware or software, it's an implementation detail.

Here's the difference.

The likelihood of your GL application being run on a non-early-z implementation is remote, according to Valve's most recent Steam survey.

THe likelihood of your GL application being run on a non-display list-culling implementation is much greater; approximately 50%.

One of them exists because just about every scan-line renderer needs it for performance reasons. I would expect Intel's Larrobee or whatever they call it to likewise provide it or something similar.

The other exists primarily because one IHV decided to spend some time to make it work.

Therefore, it is reasonable to rely on early-z behavior and not on display list culling if you're interested in cross-platform portability.


still nothing new about GL3. Almost december now.

I'm guessing we will hear something one way or another by the end of the month.

I just wish the ARB had the same commitment as the C++0x team. Both of them know that they're very late getting these features to us, but the C++0x guys have set a firm deadline (end of 2009), and are willing to make sacrifices where necessary to achieve this.

Xmas
11-23-2007, 04:00 AM
Here's the difference.

The likelihood of your GL application being run on a non-early-z implementation is remote, according to Valve's most recent Steam survey.

THe likelihood of your GL application being run on a non-display list-culling implementation is much greater; approximately 50%.
Well, that's now. At some point in the past the percentage of early-Z hardware was low, and while you did not rely on it you still used it for a performance boost on certain hardware. That's the same way you can use display list culling.

Overmind
11-23-2007, 06:27 AM
I just wish the ARB had the same commitment as the C++0x team. Both of them know that they're very late getting these features to us, but the C++0x guys have set a firm deadline (end of 2009), and are willing to make sacrifices where necessary to achieve this.

If I could choose, I'd rather like them to extend the deadline than making sacrifices.

The API cleanup is so long overdue, one or two additional months don't really matter. But design errors do matter, because once the spec is out and implemented, we're stuck with it.

Overmind
11-23-2007, 06:34 AM
That's the same way you can use display list culling.

The difference is that display list culling is something that you can do yourself. Any sensible engine culls geometry. This is something you have to do yourself anyway, because the driver doesn't have any information about your scene graph. Just throwing the whole scene at the driver isn't going to work, no matter if it culls display lists or not.

So if the driver culls display lists, it's going to do something you have already done yourself, so there is no performance boost. And even if you haven't, at least you *could* have done it.

It even could be done by a utility library if you're to lazy to implement it, so it really has no business being in the driver.

On the other hand, early Z is something you cannot do yourself.

Jan
11-23-2007, 06:39 AM
1) What makes you SO sure ATI does not use display-list culling, too? Their performance is much worse than nVidia's but maybe they DO use basic (bounding-sphere) culling (but nothing else).

2) Display lists in OpenGL 2 are quite a complex beast, given that you can do all those state-changes. Quite a good reason for ATI (actually anyone) to ignore optimizing them. OpenGL 3 display lists will be so damned simple (geometry only, period). So what makes you so damned sure, ATI won't do culling for them?

You are expecting a lot of laziness from ATI. Although they are actually doing it exactly the right way. They optimize things, that make sense. Optimizing OpenGL 2.x display lists (more than culling them) just makes no sense. Including state-changes into them was just a bad design-decision from the start.

Whether some some optimization is done on the CPU or in hardware is not that important. It depends on the speed-increase, whether it makes sense or not.

Early-z is also some wicked beast. It can break, just by having a "wrong" driver-version installed. Or you use some state-configuration and suddenly it doesn't work. Relying on this "implementation-detail" is advised by vendors, to get optimal performance, though it is nowhere mentioned in the specs and thus behaviour is defined by the implementation. On some hardware you get early-z, hi-z, early-stencil and what not. On some other hardware you get this but not that, it's impossible to really know it at runtime, if the exact hardware is not determined. Relying on that feature, when it is not available can be quite some performance issue.

And what would you say, if vendors would implement frustum culling for vertex-buffers / display lists in HARDWARE? Would you be against it? Certainly you would be in favor of it. Removing modelview and projection matrices and the ftransform-function from the API would remove the semantics that would allow such possible optimizations. Such an API would not be forward looking, at all.

Making everything consistent is nice, but not in every aspect it would yield to an improvement.

Jan.

Xmas
11-23-2007, 08:20 AM
The difference is that display list culling is something that you can do yourself. Any sensible engine culls geometry.
As already stated in this thread some projects may not have the luxury of time to do full optimization, and don't use an "engine". If you do geometry culling you likely don't use display lists. I'd guess the purpose of this optimization is mainly to make unoptimized applications faster.


It even could be done by a utility library if you're to lazy to implement it, so it really has no business being in the driver.
Then you can't use a display list any more since they're static. A utility library can't just remove part of a display list.


Including state-changes into them was just a bad design-decision from the start.
Pure state lists would make a lot of sense.


Removing modelview and projection matrices and the ftransform-function from the API would remove the semantics that would allow such possible optimizations.
No, it doesn't. Please read the thread.

Jan
11-23-2007, 08:59 AM
1) Don't tell me to read the thread. Only because not every second post is done by me, doesn't mean i don't follow it carefully.

2) Yes, of course there are ways to "determine" the modelview-projection-matrix, but that would be a big effort to implement. And then we all know OpenGL 3 is supposed to REDUCE driver complexity. Reducing complexity makes it imperative to give it all the possibilities to easily optimize things. Removing things, that force the driver to do complex analysis, just to enable some basic optimization, means COMPLICATING the driver.

Jan.

Jan
11-23-2007, 09:03 AM
Pure state lists would make a lot of sense.


Maybe, but that's not what we are talking about.

Xmas
11-23-2007, 12:23 PM
2) Yes, of course there are ways to "determine" the modelview-projection-matrix, but that would be a big effort to implement. And then we all know OpenGL 3 is supposed to REDUCE driver complexity. Reducing complexity makes it imperative to give it all the possibilities to easily optimize things. Removing things, that force the driver to do complex analysis, just to enable some basic optimization, means COMPLICATING the driver.
You're suggesting there's a huge difference between checking that the final unconditional assignment to gl_Position in a vertex shader is gl_Position = ftransform(), and checking whether the assignment is gl_Position = <uniform matrix> * <attribute vector>.
I disagree. Both checks are quite simple for a compiler.


Maybe, but that's not what we are talking about.
I know, what I'm saying is that the usefulness of pure "state display lists" might have influenced the initial design decision.

Overmind
11-23-2007, 12:41 PM
Whether some some optimization is done on the CPU or in hardware is not that important. It depends on the speed-increase, whether it makes sense or not.

I never said it's important if the optimization is done on the CPU or GPU. It's important if it's something you cannot do yourself (like early-z), or something that's just optimizing the "lazy developer" case.

Culling in display lists does not increase the speed of a well written engine, because a well written engine will never try to draw a display list that would be culled.


And what would you say, if vendors would implement frustum culling for vertex-buffers / display lists in HARDWARE? Would you be against it?

Yes, I would still be against it, because it would still be a wasted feature. But if they manage to do it at zero cost, I would not care.

Xmas
11-23-2007, 12:56 PM
Yes, I would still be against it, because it would still be a wasted feature.
If you had an implementation that didn't require depth sorting for early-Z efficiency, would you say that's a wasted feature just because many well-written engines do rough front-to-back sorting?

knackered
11-23-2007, 01:25 PM
GDL's would be a point for acceleration, whether gpu or cpu. Hardware abstractions should abstract every part that could feasibly offer an opportunity to accelerate. There's no argument against them.

Ysaneya
11-23-2007, 01:26 PM
As already stated in this thread some projects may not have the luxury of time to do full optimization, and don't use an "engine". If you do geometry culling you likely don't use display lists. I'd guess the purpose of this optimization is mainly to make unoptimized applications faster.

You should append to your last sentence "and make optimized applications slower". Because that's exactly what's happening in reality.

Xmas
11-23-2007, 01:48 PM
You should append to your last sentence "and make optimized applications slower". Because that's exactly what's happening in reality.
Optimizations very often are tradeoffs. If Nvidia wants to sacrifice performance in some cases for more performance in others, that's their call. And I honestly don't think the culling cost is significant.

knackered
11-23-2007, 01:55 PM
I get close to hardware spec performance with a display listed object heavy (8000+) engineering scene, fully in frustum. On nvidia hardware. If it is costing me significant cycles, I'm not noticing it.

Humus
11-24-2007, 01:57 PM
I'd guess the purpose of this optimization is mainly to make unoptimized applications faster.

I generally don't think conveniences belong to the API. Just like it's not the driver's task to load .jpeg files for you, it's not its task to cull geometry. If you don't want to write your own jpeg loader, you use any suitable library. If you don't want to write your own culling, you use a suitable library. OpenGL should remain a low-level API and I believe the main reason why DX is taking over at an alarming rate is that OpenGL way too often has gone the route of trying to appeal to the hobby coders at the expense of bloating the API making things harder for large scale applications and drivers.

Xmas
11-24-2007, 04:11 PM
I generally don't think conveniences belong to the API. Just like it's not the driver's task to load .jpeg files for you, it's not its task to cull geometry.
And I'd argue that it's not a convenience but a performance optimization. No matter how fast your program runs, you still have to load your textures from a file. But you don't have to cull geometry, the only reason to do that is to get higher performance.


OpenGL should remain a low-level API and I believe the main reason why DX is taking over at an alarming rate is that OpenGL way too often has gone the route of trying to appeal to the hobby coders at the expense of bloating the API making things harder for large scale applications and drivers.
I believe it's mostly because Microsoft isn't afraid of a fresh start while OpenGL carries way too much baggage around (and I think it's only that old baggage you could describe as "trying to appeal to the hobby coders").

Anyway, we're talking about an implementation detail here, not about the API at all.

Korval
11-24-2007, 05:45 PM
Anyway, we're talking about an implementation detail here, not about the API at all.

Actually, if you're talking about culling, you are talking about the API.

Geometry display lists can't auto-cull because culling is based on the shader, which is not part of geometry display lists (hence the term "geometry display list). If you want auto-culling, you would need to change how geometry display lists would work, adding more than just geometric information.

plasmonster
11-24-2007, 06:07 PM
On the original topic and for the record, I would not miss the matrix stacks and associated utility functions, nor would I miss any of the built-in shader variables exposed in GLSL. I wouldn't object to their presence either, so long as they don't get in the way (whatever that means).

Xmas
11-25-2007, 04:59 AM
Actually, if you're talking about culling, you are talking about the API.

Geometry display lists can't auto-cull because culling is based on the shader, which is not part of geometry display lists (hence the term "geometry display list). If you want auto-culling, you would need to change how geometry display lists would work, adding more than just geometric information.
Actually, I wasn't talking about "geometry display lists" but an existing implementation that culls geometry in display lists, without changing the API.

And even if you limit display lists to geometry I don't see a reason why you shouldn't be able to cull any more.

Overmind
11-25-2007, 06:56 AM
If you had an implementation that didn't require depth sorting for early-Z efficiency, would you say that's a wasted feature just because many well-written engines do rough front-to-back sorting?

That's different. When I want better front-to-back sorting, I have to invest CPU cycles. Early-Z is saving me the work.

Culling is something I have to do anyway, because only I have the necessary information to do it efficiently. The driver knows nothing about my scene graph. I can cull a much larger amount of geometry in much less CPU time than any driver ever could.

So automatic culling in display lists does not save me CPU cycles in the same way as early-Z does, because if I use less CPU cycles for culling, the driver will have to use much more, and it will do a worse job at culling. So I used more CPU cycles for worse geometry culling.

With early Z I can just forget about depth sorting and save a lot of CPU cycles, and get a much better job done just at the cost of a Z only pass at the beginning. So I have used less CPU cycles for better fragment culling.

If the hardware really evolves in a direction that would allow hardware culling, the proper way to expose it would be to make an interface that would allow me to supply sufficient information to make culling efficient. Behind the back culling in display lists is not the way to do this.


There's no argument against them.

I'm not arguing against geometry only display lists. I'm arguing against optimizations that the application developer could have done himself.


You should append to your last sentence "and make optimized applications slower".

I phrased my sentence exactly as it is for a purpose. I don't think display list culling actually makes optimized applications slower.

I just think that driver developers should spend their time on things that actually make sense, like good optimizing GLSL compilers.

Korval
11-25-2007, 09:15 AM
And even if you limit display lists to geometry I don't see a reason why you shouldn't be able to cull any more.

Because there's no way to even identify the position in a set of geometry without any fixed-functionality, let alone determine how the position will be T&L'd. The latter needs access to the shader, which the display list (being geometry only) does not have.

Humus
11-25-2007, 11:35 AM
I believe it's mostly because Microsoft isn't afraid of a fresh start while OpenGL carries way too much baggage around (and I think it's only that old baggage you could describe as "trying to appeal to the hobby coders").

You're right about fresh starts for sure. OpenGL had a chance to do a fresh start with OpenGL 2.0, but they opted not to do it, something we've suffered greatly for. But there's still the conveniences that are always baked into the API. Like all those built-in variables in GLSL. They should never have been there. Just give me plain uniforms and I'll upload the constants I need. If I need any conveniences, I can write them myself on top of the API.

Korval
11-25-2007, 12:47 PM
Like all those built-in variables in GLSL. They should never have been there.

Not having them in glslang would have made an already crazy and confusing API even moreso.

Remember, the API is supposed to be able to work with or without glslang. Which means that fixed-function state is a part of the API, and new functionality should not simply cast it off because it can. It would too rigidly separate the use of glslang from the entire rest of the API, would effectively would have made it two separate APIs.

It was a good choice for glslang to have these built-ins when it was a part of the GL 2.x API. It is not a good choice for glslang as part of a new GL API, which is why they're going away.

Humus
11-25-2007, 04:48 PM
Well, I disagree. DX never added any such conveniences, despite preserving fixed function all the way to and including DX9. It didn't hurt the API and to be honest I can't remember much complaints about this state of affairs in DX forums and mailing lists. I would say it was a good thing. Mixing shaders and fixed function was never a good idea anyway.

Komat
11-25-2007, 06:28 PM
Not having them in glslang would have made an already crazy and confusing API even moreso.

And having them caused the real world to be even bigger minefield with drivers forgetting to update them when the fixed function state changed, uploading them incorrectly in some cases or even failing the compilation after driver update because of a mismatch in supported tracked state between Nvidia assembly level compiler and GLSL compiler.

Xmas
11-25-2007, 06:37 PM
If you had an implementation that didn't require depth sorting for early-Z efficiency, would you say that's a wasted feature just because many well-written engines do rough front-to-back sorting?
That's different. When I want better front-to-back sorting, I have to invest CPU cycles. Early-Z is saving me the work.
Without early-Z sorting geometry front-to-back is a lot less useful as you only save framebuffer bandwidth but no shader cycles.

To make good use of early-Z you either need to sort front-to-back or do a Z-only pass. Yet there are implementations where both are a complete waste of time. Would you say that's a wasted feature?


Because there's no way to even identify the position in a set of geometry without any fixed-functionality, let alone determine how the position will be T&L'd. The latter needs access to the shader, which the display list (being geometry only) does not have.
The driver could take different approaches, e.g. creating bounding volumes just for the first vertex attribute, or for all of them. The compiler can determine whether the vertex shader calculates gl Position = uniform mat4 * attribute vec4. So all that's required at draw time is checking the attribute binding, looking up the bounding volumes and testing them using the uniform matrix.

Yes, it won't always work. Maybe it isn't worth the effort. But it's possible.

plasmonster
11-25-2007, 07:22 PM
It was a good choice for glslang to have these built-ins when it was a part of the GL 2.x API. It is not a good choice for glslang as part of a new GL API, which is why they're going away.

Agreed. It makes sense to maintain some semblance of continuity between the legacy fixed-function and new programmable models, even if only to ease transitions.

I didn't know the GLSL built-ins were going away. :)

sqrt[-1]
11-25-2007, 08:26 PM
I didn't know the GLSL built-ins were going away. :)



Well, you can still use them (apparently) but they are just exposed as a uniform variable you have supply yourself.

tfpsly
11-25-2007, 11:30 PM
The driver can generate bounding volumes at display list creation time. Then at draw time it can transform the bounding volumes and check whether they are completely outside the view frustum. And nVidia's hardware does. The funny thing is it seems to do that on vao too, or at least the D3D ones: I worked on a game where the nv driver would crash because part of a mesh's vb and ib were not initialized.
Setting them to 0 fixed that: looks like someone was reading every vertices of the VB to compute a bbox (and assuming the part not used in the draw primitive was still valid).

plasmonster
11-25-2007, 11:51 PM
]


I didn't know the GLSL built-ins were going away. :)



Well, you can still use them (apparently) but they are just exposed as a uniform variable you have supply yourself.

Nice.

CatDog
11-26-2007, 03:35 AM
]Well, you can still use them (apparently) but they are just exposed as a uniform variable you have supply yourself.

Does this mean, that for example glGetUniformLocation will work for "gl_.." uniforms?

sqrt[-1]
11-26-2007, 05:58 AM
Well I know very little how it will work, but I recall it will be something like that. (Except you set uniforms in blocks?)

From this post:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=229374&fpart=1

# GLSL related changes:
* - Legacy gl_* GLSL state variables are accessible through a common block.


Note that I have not contacts with the ARB or anything, I only know what is publicly released.

CatDog
11-26-2007, 06:14 AM
] * - Legacy gl_* GLSL state variables are accessible through a common block.

Yes, I saw this. But to be honest, I don't understand it. What is meant by a 'common block' in this case?

Overmind
11-26-2007, 06:39 AM
As I understood it, uniform blocks are a feature that enables you to define common uniforms that can be supplied to the shader as a buffer object, and that can be shared between different shaders.

Think of it as global state variables. Not something specific to a shader, but specific to the rendered object or scene or whatever, like for example light parameters, or the matrices.

With the GL 2 API (and without using builtin uniforms), you had to set the value separately for each program.