PDA

View Full Version : OpenGL Siggraph BOF



Korval
08-02-2006, 08:37 PM
Is there any information online about what was said at this talk? Are there program notes or PDF slides or something available for us to look at?

smileyj
08-03-2006, 12:54 AM
Plus, the whole OpenGL community would benefit from access to a podcast, if anyone recorded the event (audio or video).

Or if you were at the BOF, how about blogging and posting a link here, posting your impressions here, linking to press articles here, etc.

BlackBox
08-03-2006, 09:53 AM
I third that... we poor also need info. :)

elFarto
08-03-2006, 02:23 PM
Clicky! (http://www.khronos.org/developers/library/siggraph2006/)

Korval
08-03-2006, 08:27 PM
OK. Well, now I'm kinda glad that I didn't attend SIGGRAPH; I'd hate to fly cross-country for that...

Skimpy is probably a word I'd use to describe it. Disconcerting is another.

The GL 3.0 proposal is still being shown off as a joint ATi/nVidia thing rather than a fully-backed effort of the ARB/Kronos. I don't like what that suggests with regard to how far along the 3.0 process is. By this point, I would have expected the ARB/Kronos to have accepted it and be working on refinement. Obviously an actual specification is still a ways off, but the slides told us nothing about what 3.0 looked like. We know more about the new object model than we do about 3.0 (though I suspect the object model is a part of 3.0). The 3.0 presentation seemed more like form than substance. A repetition of 3.0's direction, rather than any real information.

As for the new object model itself, it mostly seems reasonable. Though I'm concerned about 2 things:

1: Object Overload. There's a lot of objects being defined here. And I'm not sure that they're the best way to go about doing this sort of thing.

For example, take program objects. OK, you have a fully linked program. In order to use it, you need a uniform object that is created based on this program. Fair enough. But you also need a Vertex Array object based on this program, so that you can bind your buffers to it. What's the draw call going to look like?

glDraw(FBO, VAO, UniformObj, ProgramObj)

It seems a bit verbose. Here's what I mean.

Having the uniform object being separate from the program object makes no sense. The uniform object was not only created from the program object, it is almost certainly incompatible with any other program object. While the vertex array object is more likely to be cross-program compatible, is it really necessary to do that?

I suggest instead that the linked program object produce one or more "program instance object". This object holds on to uniform and attribute state for that program. That way, your instance only has one object to worry about (the linked program can be stored off as a global resource that people only touch to create instances).

The only problem this creates is that the instance is fully dependent on the original program. If that program object is deleted, the instance that referenced it becomes invalid. And that's probably not good. But it would make the whole thing more reasonable to the user.

2: What is it with these "attribute objects"? It seems to me that they're just fancy ways of passing lots of parameters to object creation functions so that the whole immutability thing that you're wanting (for good reason) works out. As well as providing extensibility.

The problem is that the code just looks hideous. I mean, you create this object (on the heap, no less) for the sole purpose of passing it to an object creation function and then deleting it immediately afterwards. There's got to be a better way to do this, right?

BTW, if buffer object parameters are no longer going to be hints, will you be specifying more clearly the actual behavior that an implementation is expected to provide when you use a particular set of hints?

Humus
08-03-2006, 08:51 PM
Originally posted by Korval:
OK. Well, now I'm kinda glad that I didn't attend SIGGRAPH; I'd hate to fly cross-country for that...Well, I wouldn't fly for just the OpenGL BOF, but the conference was awesome IMO. :)


Originally posted by Korval:
The GL 3.0 proposal is still being shown off as a joint ATi/nVidia thing rather than a fully-backed effort of the ARB/Kronos. I don't like what that suggests with regard to how far along the 3.0 process is.The question was raised at the BOF about when an OpenGL 3.0 spec was expected to be ready, and the answer was that they are targetting Siggraph 2007. If they can pull that off, and with Vista delayed to 2007 and probably slow adoption of DX10, I think OpenGL could be in good shape. Of course, some of the ideas for GL3.0 were proposed already for GL2.0 (like L&M profile), and that didn't turn out like what it was supposed to be in the end.


Originally posted by Korval:
Having the uniform object being separate from the program object makes no sense.Actually it does. It's essentially the equivalent of constant buffers in DX10. You want to be able to share constants between shaders, and you want to be able to create all constants on load and avoid setting constants during runtime, so you just bind the appropriate constant buffers instead of uploading constants from system memory like you do in DX9 and to some extent in GLSL.

Korval
08-03-2006, 09:34 PM
so you just bind the appropriate constant buffersYou can bind multiple uniform objects when you render? Why didn't they say so?

I mean, once you have that (which, I have to say, can't be anything less than a pain to implement. You have to deal with not having certain uniforms being used in a program and so forth. A simple draw call is going to take a lot more time now), the idea makes a lot more sense. If you could only use one uniform object per render call, then it makes far less sense to have the uniform object be fundamentally separate from the program.

Komat
08-03-2006, 11:52 PM
Originally posted by Korval:
You can bind multiple uniform objects when you render?
DX10 HW must be capable to bind 16 constant buffers (each containing up to 4096 4-component values) simultaneously to single shader stage (vertex, geometry, pixel)



I mean, once you have that (which, I have to say, can't be anything less than a pain to implement. You have to deal with not having certain uniforms being used in a program and so forth.In DX10 the shader is responsible for defining the layout of the constant buffer in way similiar to C structs, there is no fully automatic assignment like in current GLSL. I suppose that the addition of the uniform objects in OGL will be done similiary to what the DX10 does. This way there is no problem if shader does not use some variable from the buffer as long as it defines the same structure for the buffer.

Michael Gold
08-04-2006, 12:08 AM
Korval,

I am sorry that you are disappointed by the slides. The fact is we had a lot of information and only barely covered it in the two hour session. There was a lot more said than covered by the slides, and there was more we could have said, had time allowed. In fact we only had 20 minutes total to cover GL 3.0.

Some of your assumptions and conclusions are incorrect. The ARB does not vote to approve an idea. We refine the idea until we have a spec, and then we vote on the spec. Meanwhile, considerable effort has been expended by the member groups (not just NVIDIA/ATI) on the portions that have been completed.

Do not assume that a draw call will accept a list of objects. We have published no such API. I dont like your proposed API any more than you do. :)

An overriding consideration of this design is to provide the highest performance rendering by getting the driver and CPU out of the way. These ideas come from years of driver development and seeing how apps use the API and where the common bottlenecks occur.

Attribute objects are the best solution we found for atomic creation of immutable objects. A simple data structure might have been more obvious but its not extensible by multiple vendors. An attribute object is an opaque data structure. If you have a better idea, by all means please share. You do not necessarily have to destroy an attribute object immediately after object creation; if you wish to create many similar objects you can reuse the attribute object. Obviously that will be more efficient.

I believe those who attended had a better understanding of the proposal than one can glean from simply reading the slides. I'm happy to answer specific questions (as time allows; I'd rather focus on writing the spec).

smileyj
08-04-2006, 12:59 AM
Originally posted by Michael Gold:
...we had a lot of information and only barely covered it in the two hour session. There was a lot more said than covered by the slides, and there was more we could have said, had time allowed.
...
I believe those who attended had a better understanding of the proposal than one can glean from simply reading the slides. I'm happy to answer specific questions (as time allows; I'd rather focus on writing the spec). Suggestion: audio podcasts hosted on the OpenGL site would solve these problems. You could make presentations as long as they need to be to get all the information across.

The simplest approach would be to have each presenter record an audio file (e.g. mp3) that people could download separately to accompany each of the existing slide sets from OpenGL.org. Sync to the slides by adding a remark like, "Slide 18" every time the presenter moves to a new slide. Quick, easy and the entire OpenGL community has something almost as good as being at the BOF (or any other presentation speakers provide).

I appreciate the ARB and BOF contributors are very busy, but it improves everyone's situation to make good information widely available.

Think of all the time you'll save posting articles to clear up misunderstandings from lack of information. Time that could be going into developing the OpenGL specs. :-)

Jan
08-04-2006, 02:48 AM
The new object-model seems to be going into the same direction as D3D10 does it. I think that's a good idea. Additionally, the typical style of OpenGL to set attributes through a function-call with an enum to specify the attribute is kept. In my opinion that's the best thing one could be doing. In D3D10 you fill out a struct. That's not extensible in any way. Argueing about efficiency here is stupid, because that's all done at creation-time and one is talking about a few hundred objects that are created once per app, not per frame or something.

All in all, i like what i have seen.

What i'd like to add is, don't forget "future" hardware. There was only one slide mentioning, what cool stuff is coming in the near future. I hope that OpenGL 3.0 will incorporate these things from the ground up and work perfectly with them. I think this is important. If you only take the functionality of 2.1 and create a new API that fits perfectly to it, it might be not flexible enough for functionality that's coming with the next hardware generation. Of course it CAN be added through extensions, but i hope it will be integral part of the new API and thus will fit perfectly into it.

And something else: What about extensions? Will the extension-mechanism stay, as it is now, or will it be changed / made easier? I recall there was a discussion a long time ago to make it easier to check for extensions and to get the entry-points, etc.? Is it possible to make it easier, at all? Or will the new "ecosystem" simply include one standard library like GLEW ?

Jan.

Michael Gold
08-04-2006, 11:53 AM
Originally posted by smileyj:
Suggestion: audio podcasts hosted on the OpenGL site would solve these problems. You could make presentations as long as they need to be to get all the information across.
This is a good suggestion, and I wish someone had recorded the event for this purpose. But please bear in mind, our intention was not to provide a comprehensive tutorial of an in-flux specification, but to give a taste of what is to come. :)

That being said, I would like to give a comprehensive tutorial when the time is right.

Michael Gold
08-04-2006, 12:12 PM
Jan,

I'm glad you understand our vision and the tradeoffs we are making w.r.t. object creation. The key point is that objects are used a lot more often than they are created, so optimizing for use is better than optimizing for creation.

We are definitely considering future extensibility, as you observe with attribute objects. In particular, the next round of extensions is already on our mind.

There is actually a whole slide set about upcoming functionality: the presentation is titled "NVIDIA_-_OpenGL_upcoming_features.ppt". I want to make clear, since someone expressed confusion on this point, that the features described in this presentation are not tied to GL 3.0 and will be available sooner.

Good question about the extension mechanism. We're still tied to opengl32.dll on Windows and all the limitations that implies. I believe the extension loading libraries remain a good solution for now; if you have suggestions how we can improve it from the implementation side, I'd love to hear them.

Jan
08-04-2006, 01:11 PM
Actually i have no complains about the Extensions mechanism, as long as there are libraries doing the dirty work for me :D

Maybe one "suggestion" would be, to not only writing the spec for extensions, but also to have one standard form for an XML file (or something similar), where the technical details will be filled in for that extensions. Than one could create a tool, that automatically parses those files and creates C++/Java/whatever code to be able to get access to those extensions.

I think one of the free libraries already does that. However, it would be more convenient to have a standardized file-format, so that those tools will be reliable.

Also, once written, the tool could be run by the same guy uploading the extension-spec, so that we could get the updated library along with the new spec immediately.

Jan.

Jon Leech (oddhack)
08-04-2006, 09:50 PM
Originally posted by Jan:
Maybe one "suggestion" would be, to not only writing the spec for extensions, but also to have one standard form for an XML file (or something similar), where the technical details will be filled in for that extensions. Than one could create a tool, that automatically parses those files and creates C++/Java/whatever code to be able to get access to those extensions.Hi Jan,

This actually is planned. We are revamping the entire extension registry to use up to date file formats and tools, and cover other APIs in Khronos like OpenGL ES and OpenVG. Part of that will be moving the "specfiles" which define function signatures, enum values, GLX protocol, etc. from a custom flat file format to a suitable (yet to be defined) XML schema.

We already use the specfiles to generate a variety of stuff (GLX protocol spec, glext.h, GLX wrapper code in the OpenGL SI) and this will make it easier for other people to do the same (although I am not sure why people go to extremes like parsing glext.h to construct their wrapper libraries, when they could just use the same Perl scripts used to generate glext.h in the first place :confused: - the specfiles and scripts have always been available on oss.sgi.com).

BTW, if anyone knows of existing schema/DTDs which would be a good starting point for this, I'd love to get pointers to them. I am not actually looking forward to writing my own DTD from scratch :eek: - although I suppose creating the XSL to turn them into headers will be a good exercise in learning functional programming.

Korval
08-04-2006, 10:11 PM
DX10 HW must be capable to bind 16 constant buffers (each containing up to 4096 4-component values) simultaneously to single shader stage (vertex, geometry, pixel)This is about OpenGL, not Direct3D. The new object model and 3.0LM had better be usable on non-D3D10 grade hardware.


Argueing about efficiency here is stupid, because that's all done at creation-time and one is talking about a few hundred objects that are created once per app, not per frame or something.Efficiency (on several levels) is a big judtification for the new object model to begin with, so I would respectfully suggest that it is a perfectly valid point for discussion.

It seems to me that the underlying assumption, that building these objects are only creation-time concerns, is one that is, perhaps, becoming less valid as graphics applications progress. In the not-too-distant future, it would not be unreasonable to have an engine where the different kinds of uniform objects are being swapped into and out of render objects fairly frequently. This would be to accomidate a situation where, in one case, an object is being affected by two shadows, but then later is only being affected by one, but you don't change the program when you do it (conditional branching taking care of the problem). Maybe you're in 3 lights one frame and in 2 the next. And, since the days of one object being rendered by one shader are, for the most part, thankfully behind us, you're talking about shuffling around a lot of uniform objects. Maybe you have to create them on the spur of the moment, or maybe you have a library of them around.

It doesn't matter; what matters is the cost of having a "render object" (one that stores the program and all of the uniforms that it uses, among other things) change which uniforms it uses. In effect, of deleting that render object and creating a new one. This involves a lot of iterating through all of the uniforms and doing a great many string compares. Seeing which elements of the uniform are valid for the shader, and which ones aren't. Then formatting an appropriate piece of memory (potentially in hardware space) for such data. Not to mention the other overhead in creating a new render object (validating your vertex array, etc).

I'm concerned that you're moving the inefficiency to another location, one that in the near future may be where developers increasingly want to go.


The ARB does not vote to approve an idea. We refine the idea until we have a spec, and then we vote on the spec.But the ARB does process ideas and turn them into specs.

The slides themselves strongly implied that the L&M work was primarily an nVidia&ATi thing: "These are the opinions of two ARB member companies and may not reflect the opinions of the ARB as a whole." This wasn't attached to the beginning of the .ppt for the new object model or the superbuffers work. So it does draw a distinction between GL 3.0 L&M and the rest of the work that the actual ARB is doing.

The quote makes it sound very much like ATi and nVidia are off doing their own thing, and then sometime later the ARB will come along and work on what they return. That nobody else in the ARB is as of yet involved with the process of creating perhaps the most important OpenGL version ever.

It also sounds very much like there is some significant possibility of this proposal becoming nothing, much the same way the 3DLabs GL2.0 proposal became mostly nothing. After all, if the ARB were fully behind this effort, then it seems that such a disclaimer would be unnecessary.

I don't like the implication that this L&M GL proposal may be just another big tease like 3DLabs proposal was, that dissent in the ARB/Kronos will eventually cause it to just evaporate and we'll still be stuck with what we had before. What does it mean for this proposal to not "reflect the opinions of the ARB as a whole" when the others lack such a disclaimer?


that the features described in this presentation are not tied to GL 3.0 and will be available sooner.Then it seems that we have 3 potentially competing concepts that is creating a degree of confusion. Particularly considering that they are all conceptually interrelated.

* GL 3.0 LM
* New features
* New object model

It's the interrelation that makes the statement of these being separate confusing. After all, L&M pretty much relies on the new object model; it would be silly to add a new object model after creating a new API.

And it would be redundant to add new features (like a new kind of shader, etc) to the old object model just so that you can come along with the new object model and redefine how that feature works (the reasoning behind not promoting ARB_async_object to 2.1).

And the new object model itself is meaningless without the deprication implied by GL 3.0 L&M; otherwise, you would always be adding new features to both profiles. Plus, the new object model comes with bunches of new entrypoints and so forth, functions that are useless to the old way of doing things (and vice versa). Which implies a degree of deprication.

Maybe the next newsletter could go into explaining how these three are different and related.


have one standard form for an XML file (or something similar), where the technical details will be filled in for that extensions. Than one could create a tool, that automatically parses those files and creates C++/Java/whatever code to be able to get access to those extensions.Something of that nature already exists for specifications. The format for a spec is fairly strict, so it is easy enough to parse through them and build a list of entrypoints and enumerants.

Though as an XML'o'phile and a middling XSLT user, I wouldn't mind seeing DocBook being used for GL specifications. Certainly, you can use the various attributes over standard DocBook elements to make it work. And you could publish them as .pdfs as well as html or basic text.

Michael Gold
08-04-2006, 11:02 PM
Korval,

I feel really good about the current roadmap, the progress we're making within the ARB and the spirit of cooperation we are enjoying while working toward a common goal. If you wish to read between the lines and look for trouble where none exists, this is your right; but I'm not going to waste effort trying to convince you otherwise.

Jon Leech (oddhack)
08-04-2006, 11:15 PM
Originally posted by Michael Gold:
Korval,

I feel really good about the current roadmap, the progress we're making within the ARB and the spirit of cooperation we are enjoying while working toward a common goal.Ditto. Really what's been happening is that the ARB is fully occupied with getting the new object model, GLSL updates, FBO updates, etc. complete, so the 3.0 process just hasn't taken up too much of our time yet. But I think I'm safe in saying that the ARB as a whole is comfortable with the general direction of the 3.0 proposals from ATI and NVIDIA.

Once the object model is worked through and applied to different classes of objects as outlined in Michael's BOF slides, combined with using OpenGL ES 2.0 as a starting point for the L&M profile, there probably isn't that much more to do. We can talk for a long time about exactly what needs to be put back into ES 2.0 to create the L&M profile, and maybe we will, but that's not defining new functionality.

Finally, it is really, really important that ATI and NVIDIA are on the same page on this, because they collectively represent essentially the entire market for cutting-edge graphics hardware now that 3Dlabs is out of the desktop graphics business (not to discount Intel, who is a very important hardware vendor and the winner by market share last time I looked - but they seem to consciously avoid high-end discrete graphics products, and that's where new features show up first). What has most often held up progress in the ARB in the past is disagreement between the major hardware vendors.

Jon (speaking solely for myself)

elFarto
08-05-2006, 02:58 AM
Hi

One little thing I couldn't quite figure out how it works is the relationship between GLSL samplers, image objects and sampler objects.

Could you (Mr Gold) possibly write a short piece of example code to show the relationship of the above, prehaps showing how to 'bind' an image object to a GLSL sampler.

Regards
elFarto

Komat
08-05-2006, 04:07 AM
Originally posted by Korval:

DX10 HW must be capable to bind 16 constant buffers (each containing up to 4096 4-component values) simultaneously to single shader stage (vertex, geometry, pixel)This is about OpenGL, not Direct3D. The new object model and 3.0LM had better be usable on non-D3D10 grade hardware.
I was not talking about what kind of hw the model should support. I simply mentioned that DX10 HW is required have such capability. Because that capability would be in the HW, it is very likely that it will be exposed trough the new model. On the older HW the API may expose hard limit on number of buffers or it may merge buffers with associated performance hit (probably small if the layout of the buffers that can be bound to individual "bind points" is known beforehand), whichever they choose.

Michael Gold
08-05-2006, 08:18 AM
Originally posted by elFarto:
One little thing I couldn't quite figure out how it works is the relationship between GLSL samplers, image objects and sampler objects.

Could you (Mr Gold) possibly write a short piece of example code to show the relationship of the above, prehaps showing how to 'bind' an image object to a GLSL sampler.No guarantee this won't change, but the current thinking is roughly as follows. This code assumes a few self-documenting utility functions as hinted in the BOF slides.


GLimage image = gluCreateImage2D(format, width, height, levels);
GLsampler sampler = gluCreateSampler2D(GL_LINEAR, GL_LINEAR_MIPMAP_LINEAR, GL_CLAMP_TO_EDGE, GL_CLAMP_TO_EDGE);
GLshader shaders[2];
shader[0] = gluCreateShader(GL_VERTEX_SHADER, vshader);
shader[1] = gluCreateShader(GL_FRAGMENT_SHADER, fshader);
GLprogram program = gluCreateProgram(2, shaders);
GLuniformBlock uniformBlock = gluCreateUniformBlock(program);
GLint location = glGetUniformLocation(uniformBlock, "mySampler2D");
glUniformSampler(uniformBlock, location, image, sampler);
glBindProgramObjects(program, 1, &uniformBlock);
DrawSomething();

elFarto
08-05-2006, 11:59 AM
Originally posted by Michael Gold: *snip*Thank you very much.

Regards
elFarto

Korval
08-05-2006, 01:51 PM
But I think I'm safe in saying that the ARB as a whole is comfortable with the general direction of the 3.0 proposals from ATI and NVIDIA.Fair enough.

My question now is this: what is L&M that isn't the new object model?

The more I look at the new object model APIs, the more it seems to me that they are the new API. They are L&M, in effect if not in fact. Here's why.

The new object model effectively depricates (and suggests layering) everything that doesn't use it. You can't use old-style objects with new-style ones, so there's a specific pressure to use the new-style exclusively. And, of course, there's the "minor" new features like (multiple) uniform buffers, the changes to VBO behavior, etc. Oh, and the obvious performance advantages from making objects immutable.

This sound suspiciously like what L&M was supposed to provide. According to the slides:


On the older HW the API may expose hard limit on number of buffers or it may merge buffers with associated performance hit (probably small if the layout of the buffers that can be bound to individual "bind points" is known beforehand), whichever they choose.That's not good. People are going to want to actually use that functionality. They aren't going to want to querry a limit and see that it can only bind 1 uniform object; they're going to want to design their code around having multiple uniform objects. I, for one, don't want to have to write multiple paths just to set uniforms efficiently.

The API does need to expose future hardware, of course, but it should not do so in such a way as to suggest renderer design elements that work against performance in current hardware. Particularly so if there are easy ways to avoid it. It's not as bad as requiring Vista in order to use the API, but it has similar effects. Either the feature goes unused for a good 3 years, or developers have to write multiple paths for performance reasons, or developers just accept slower performance on current cards.

It is possible to provide for multiple uniform buffers and so forth efficiently in current hardware. All it requires is a creation-time connection between an array of uniform objects and the program that this array will be used for. That would be when the mapping from each program uniform to the uniform in the individual blocks would be made.

Using Michael's example code, my suggestion would be:


GLimage image = gluCreateImage2D(format, width, height, levels);
GLsampler sampler = gluCreateSampler2D(GL_LINEAR, GL_LINEAR_MIPMAP_LINEAR, GL_CLAMP_TO_EDGE, GL_CLAMP_TO_EDGE);
GLshader shaders[2];
shader[0] = gluCreateShader(GL_VERTEX_SHADER, vshader);
shader[1] = gluCreateShader(GL_FRAGMENT_SHADER, fshader);
GLprogram program = gluCreateProgram(2, shaders);
GLuniformBlock uniformBlock = gluCreateUniformBlock(program);
GLint location = glGetUniformLocation(uniformBlock, "mySampler2D");
glUniformSampler(uniformBlock, location, image, sampler);
GLprogramInstance instance = glCreateProgramInstance(program, 1, &uniformBlock, &vertexArrayBlock);
glDrawInstanceToFBO(frame_buffer_object, instance);The glCreateProgramInstance would formally bind the list of uniforms and the vertex array (possibly a list of vertex arrays) to the program. This would be a creation-time thing, so the cost is only incurred once. Oh, and it could use one of those attribute objects instead of a straight function call, for extensibility sake.

Binding a uniform block (particularly multiple uniform blocks) to a program is a CPU-intensive operation. It's going to involve a lot of string copies and so forth. Sure, an implementation can speed things along for the simple case of one block, one program and the block was created from the program (if the driver can detect that). But for the general case of several blocks, which is something that engine designers will want to incorporate at the engine level, this is heavy.

Obviously the spec is still in flux, but I would certainly appreciate it if they would consider making the binding of uniform arrays to programs a create-time thing rather than a runtime one (assuming it isn't already). You should still be able to change the data in the uniforms as normal. It's the string mapping overhead that I want to make sure never/rarely turns up at runtime.

Jon Leech (oddhack)
08-05-2006, 02:29 PM
Originally posted by Korval:
My question now is this: what is L&M that isn't the new object model?Removing redundant and obsolete paths. The entire fixed-function pipeline and all the machinery and state that goes with it. Immediate-mode geometry specification. And so on.

That stuff is to be supported by a compatibility layer running (conceptually, at least) above the L&M profile. So older apps that require the backwards compatibility that OpenGL has always offered would still work, but using the entire "3.0" API stack. New apps written to the L&M profile would not need the compatibility layer.

Jon

Komat
08-05-2006, 03:43 PM
That's not good. People are going to want to actually use that functionality. They aren't going to want to querry a limit and see that it can only bind 1 uniform object; they're going to want to design their code around having multiple uniform objects.
I would be more happy with fixed limit I can query than with driver doing some "clever" things behind my back (e.g. sw emulation of glsl shaders if hidden limit is hit).

At time the new API would be reasonably usable (finished API & solid driver support) I would likely primary target dx10 level features with fallback for dx9 hw in which case I would have to write several code paths anyway. Or, if that happens in really short time, I would target dx9 level hw in which single code path optimal for it would be sufficient.

Actually on the dx9 level hw I would probably create only one uniform block shared by all shaders of the same type (vertex, pixel) and managed by me, like the global environment of the ARB_*_program extensions which is thing I greatly miss from the GLSL shaders.

Of course your priorities may be different.



Binding a uniform block (particularly multiple uniform blocks) to a program is a CPU-intensive operation. It's going to involve a lot of string copies and so forth.
There is no need for any string manipulations during block binding. The driver can create global string_name->some_id mapping table during shader compilation or during uniform creation and use the strings only on api entry points. The required fixups can be then stored in several fixed tables and hashes created at shader/block creation time.

If another block with same format is bound instead of already existing block, which is likely the most common use case, no additional fixup other than copy of the uniform values on the dx9 hw should be necessary.

The DX10 avoids this overhead by exposing explicit constant buffer slots with explicit variable layout within each buffer so final variable storage location is known at shader compile time. That would be also easily usable for merging the constant buffers on dx9 level hw.

Jan
08-05-2006, 04:31 PM
It seems that the new object model is the biggest and most important change to OpenGL 3.0. However, i think one shouldn't be reading to much between the lines. It's not the time to ask for all the details, because up to now we can only grasp on how the final API will look like and i don't think we should bother the guys working on it, with too many complicated questions.

I for one, am very happy, that ATI and nVidia have taken this on. It seems that finally some big change is going on. However, we all know, that the guys at the ARB and Khronos are very skilled, so i don't think that we need to be worried.

So, let the guys work (and let them enjoy their weekends ;) ). We can discuss the details, when we actually got a spec.

Jan.

knackered
08-06-2006, 07:13 AM
This isn't MSN Messenger, they don't have to keep replying. I'm finding this discussion really interesting, but have nothing to contribute.

Michael Gold
08-06-2006, 10:10 AM
Must... reply... can't... resist...

On the question of uniform blocks: efficiency is the whole point. The layout of uniforms is fixed at program creation, so there are no string operations required at bind.

Flexibility is also a goal. You may create custom uniform blocks in order to share uniforms between programs, and/or swap a subset of uniforms without modifying the rest. This must be done prior to program creation in order to retain efficiency.

Binding will fail if the uniform block(s) is/are not compatible with the program.

Jon Leech (oddhack)
08-06-2006, 09:45 PM
Originally posted by Michael Gold:
Must... reply... can't... resist...Apparently this is MSN Messenger, then... :eek:

Hampel
08-07-2006, 03:42 AM
Are there any anticipated dates for releasing the first 3.0 LM proposal/specification and when to expect first driver impls for that proposal/specification from NVidia and ATI?

mrbill
08-07-2006, 08:00 AM
Originally posted by Hampel:
Are there any anticipated dates for releasing the first 3.0 LM proposal/specification...Not on the slides, but stated at the BOF - Siggraph 2007 is the goal.

-mr. bill

Korval
08-07-2006, 02:22 PM
The layout of uniforms is fixed at program creation, so there are no string operations required at bind.Now I'm really confused...

As I understand it, in hardware, uniforms are just a flat "array" of registers, numbered 0 through N-1. When you link a program, it assigns each uniform variable name to one or more uniform registers. So, the mat4 declared with the name "localToWorld" gets, say, uniforms 0-3.

However, a second program may declare a mat4 uniform with the same name, but because the order of stuff may be different, it gets uniforms 6-9.

When you build your custom uniform block, you say that it has a mat4 named "localToWorld". If you bind this uniform block to both programs (as it is a shared uniform), it can't have the "localToWorld" matrix in both hardware uniforms 0-3 and 6-9. So it seems like one of two things needs to happen.

One, you, at bind time, determine the layout of where each uniform as defined in the uniform blocks get assigned based on the program. So, you do a lot of searching. You find the mat4 that has been named "localToWorld". This isn't onerous, but it isn't free either.

Two, you, at bind time, patch the program by defining the layout based on the uniform blocks. So you walk into the program and move all of the 0-3 to 6-9 or wherever the uniform blocks say that things get laid out. But if programs are stored in GPU memory, this can't be a quick operation.

So, what exactly am I missing here that makes object binding not slower than it could be?


Binding will fail if the uniform block(s) is/are not compatible with the program.How is compatibility defined?

BTW, as a matter of interest, how do you deal with uniforms that are structs? Since the program defines what the structs are, do you not need the linked program to build that uniform block?

Additionally, it might be a good idea to be able to, instead of just creating a default uniform object from a linked program, that it creates a mutable attribute object that would create a default uniform object. That way, you can edit the attribute object (removing shared uniforms, for example. Assuming things in an attribute can be removed) before creating the per-instance uniform block.

One last thing: format objects.

This is something I didn't notice on my first reading, but you were talking about objects for things like image formats, right? GL_RGB8, etc? Presumably this exists so that you can ask for an available image format that corresponds to some set of parameters, rather than just say, "Give me an RGB image of some kind."

OK, one thing after the last: display list objects.

I'm thinking that, with the concept of geometry-only display lists as well as vertex array objects, what you really want is just a "derived" class of vertex array object. An object that is totally compatible with VAOs, but they just have a different method of creation (rather than with buffers and so forth). That sounds like a really good idea.

This sounds like extension territory, though; it's really complicated and is something that probably shouldn't hold up the new object model.

[ edit Because I keep coming up with stuff based on the new object model ]

Something just occured to me. Because all images are alike, it is therefore possible/reasonable to take a "renderbuffer" (an image created from a format that, I guess, suggests being a render target as its primary function?) and bind it as a texture to a sampler? Will there be combinations of these bindings that don't work, whether binding a depth sampler to a non-depth texture or just the wrong format to an image?

Komat
08-08-2006, 01:35 AM
One, you, at bind time, determine the layout of where each uniform as defined in the uniform blocks get assigned based on the program. So, you do a lot of searching. You find the mat4 that has been named "localToWorld". This isn't onerous, but it isn't free either.
From what was told by Michael I assume that when you create the shader object with uniform blocks, you will need to specify format of those blocks by providing an attribute objects used to create them or created instances of that blocks or something similiar. This way the program compilation can determine that content of specified block (or part of it) will be stored from specific offset in array of registers (or in specific buffer slot on dx10 hw). That information will never change after the program has ben created so it can easily be stored in table (for dx9 hw the table for buffer with format id X for shader Y might be something like: copy range (Z-W) from buffer to constant array starting at offset V). On dx10 hw the bind will associate buffer with slot, on dx9 hw it wil copy part of buffer content to that specified location without need to search for anything.



But if programs are stored in GPU memory, this can't be a quick operation.
If program modification is required for some reason, there is no need to do it directly on the GPU. The driver always has system memory copy of the binary program. It can do all modification in system memory and upload entire new program to the GPU.





Binding will fail if the uniform block(s) is/are not compatible with the program.How is compatibility defined?
Probably by having same format of attribute object/uniform block as that one used during program creation.

Michael Gold
08-08-2006, 08:42 AM
Originally posted by Korval:
So, what exactly am I missing here that makes object binding not slower than it could be?You are making a lot of assumptions. First off, the model you have described is not the only possible implementation.

The layout of the uniform block is known at the time the program is compiled, so the program can hard-code the relative offset of each uniform. Since the program is immutable and will only work with a uniform block of this layout, the code never needs to change. So our task at bind time simply becomes: put the uniform block where the program can find it.

If the hardware works as you describe, the driver simply copies the uniform block into the register bank. No searching is required, the uniform names are long forgotten at this point, everything is already in the proper order.


How is compatibility defined?The uniform block must match the exact layout expected by the program. For simplicity sake, lets assume you need to bind the original block used at program creation, or a clone of that object.

BTW, as a matter of interest, how do you deal with uniforms that are structs? Since the program defines what the structs are, do you not need the linked program to build that uniform block?This is no different from any other data type; you need to create the uniform block from the program, or you need to create a uniform block which matches the data types expected by the program.


Additionally, it might be a good idea to be able to, instead of just creating a default uniform object from a linked program, that it creates a mutable attribute object that would create a default uniform object. That way, you can edit the attribute object (removing shared uniforms, for example. Assuming things in an attribute can be removed) before creating the per-instance uniform block.Problem is we don't want the layout of the uniform block to change after the program is linked, for the reason described above.

I'm not prepared to talk about format objects or display lists at this time.


Something just occured to me. Because all images are alike, it is therefore possible/reasonable to take a "renderbuffer" (an image created from a format that, I guess, suggests being a render target as its primary function?) and bind it as a texture to a sampler?An image may be used as a render target, or a texture, or both. This usage must be specified at creation time and will be strictly enforced. This is important because the implementation may make storage decisions based on usage, and we can do a better job if we don't have to guess. :)

Korval
08-08-2006, 03:55 PM
The layout of the uniform block is known at the time the program is compiled, so the program can hard-code the relative offset of each uniform. Since the program is immutable and will only work with a uniform block of this layout, the code never needs to change. So our task at bind time simply becomes: put the uniform block where the program can find it.
This makes sense, but what about uniform block sharing? I'm getting the impression from some of the things you've said and from careful reading of the slide on uniforms that the way to do it is like this.

You build the shared uniform block before building the programs themselves. You then pass this shared uniform block (or blocks) to the linking function when you're creating the program.

At which point, you have made a binding (no pun intended) contract with the programs that these particular uniform objects (or, as you say, a clone of those objects) will be bound whenever the program itself is to be used. When you use the program to generate its uniform block, what it generates are all the uniforms that are not satisfied by those uniforms used at creation time.

This mechanism seems like it solves all of the problems I described. Hardware that doesn't natively support the construct would need a few block mem copies, but that's hardly onerous.


This usage must be specified at creation time and will be strictly enforced.Hurray for strict enforcement of "hints"!

BTW, if the hoped-for timescale for the 3.0 API is SIGGRAPH '07, what's the timescale for the new object model?

Michael Gold
08-08-2006, 06:28 PM
Your understanding is basically correct, modulo some minor details to be ironed out.

The new object model is integral to 3.0. We may roll out some of the new objects as extensions to 2.x before the final spec is complete; this will give us an opportunity to prove the functionality before the interface becomes "immutable". :)

Hampel
08-09-2006, 02:40 AM
Will 3.0 still support immediate mode rendering? or something like the proposed "pseudo instancing"?

Jan
08-09-2006, 03:26 AM
In the slides it is mentioned, that Immediate Mode will be available through a seperate layer, that works on top of 3.0s Lean&Mean Layer. So, it won't be "natively" supported, but it will be supported.

In fact, that's essentially what the drivers do today, anyway, if i am not completely wrong.

I don't know about instancing, but i do hope, that that feature will be available.

Jan.

psyduck
11-03-2006, 04:09 PM
Do not assume that a draw call will accept a list of objects. We have published no such API. I dont like your proposed API any more than you do. :) Can you guys show us how would a draw call look like in this new object model? Or maybe give us some hints?

Cheers on your great work.

Daniel

psyduck
11-03-2006, 04:53 PM
May I just add a question.

I've seen a lot of specifications and patterns on creating data on this new object model. But almost none on manipulating the data itself.

Although most objects will hava an immutable structure, the contents are still manipulable, right? If so, are there any patterns on manipulating OpenGL data?

Cheers.
Daniel

Jon Leech (oddhack)
11-03-2006, 06:34 PM
Originally posted by psyduck:

Although most objects will hava an immutable structure, the contents are still manipulable, right? If so, are there any patterns on manipulating OpenGL data?It depends on the type of object. Data objects usually contain large lumps of data such as images and buffers, whose contents are mutable (although their size probably cannot change). Think calls like BufferData and TexImage. In some cases there will be type-specific methods like SignalSync, which explicitly changes the status of a sync object.

But most types of objects are not mutable; their contents are specified with an attribute object, which has a set of interfaces for defining (key,value) pairs for various combinations of key and value types. The general pattern is

- create attrib object on the client side, with default values for the object type to be created
- modify attrib values as needed
- pass attrib object to the GL server to create the "real" object
- use the real object
- the attrib object may be reused to create more objects, and its attrib values may be changed at any time

psyduck
11-04-2006, 10:06 AM
Thanks for the help!

I've been thinking about the new object model, immutable stuff, and another question is bothering me.

Pixel Buffer objects.

If I'm not mistaken, in the current OpenGL specification it's possible to render to a Pixel Buffer Object and then use it as Vertex Data without the need of a redundant copy operation.

Are there any ideas on how would one do such a thing in this immutable object paradigm? I mean, how to change the interpretation of a Buffer Object without the need of a Redundant Copy Operation?

Cheers, and thanks again!
Daniel

Korval
11-04-2006, 10:27 PM
If I'm not mistaken, in the current OpenGL specification it's possible to render to a Pixel Buffer Object and then use it as Vertex Data without the need of a redundant copy operation.No, that's not possible. PBO's are copy targets, not render targets.

psyduck
11-05-2006, 07:03 AM
Well... This is what I meant.
From the ARB_pixel_buffer_object specification:

"There are a several approaches to improve graphics performance with PBOs. Some of the most interesting approaches are:
(...)
- Render to vertex array: The application can use a fragment program to render some image into one of its buffers, then read this image out into a buffer object via glReadPixels. Then, it can use this buffer object as a source of vertex data."

About mutability, from the same document:
"3) Can a given buffer be used for both vertex and pixel data?

RESOLVED: YES. All buffers can be used with all buffer bindings, in whatever combinations the application finds useful. Consider yourself warned, however, by the following issue. (...)"

Cheers!

Overmind
11-06-2006, 12:23 AM
There is no difference between a pixel buffer object and a vertex buffer object. You don't need to change the object just to use it at different binding points.

It may be neccessary to state the expected use at creation time. Then you would be able to say either vertex or pixel, or both, and you won't be able to change it later.