PDA

View Full Version : Pipeline Newsletter Volume 4



Brolingstanz
06-20-2007, 01:05 PM
http://opengl.org/pipeline/

lpMapBuffer -- oh yeah!

Overmind
06-20-2007, 03:17 PM
void lpDrawElements(LPenum mode, LPsizei *count,
LPsizeiptr *indices,
LPsizei primCount,
LPsizei instanceCount)Hmm... No indices from buffer objects?

Korval
06-20-2007, 03:27 PM
Everything about this document is satisfying. Except one thing.

No date is mentioned. Should I be concerned about that? I mean, it all seems pretty solid at this point (except for the misc. state stuff). Are you still aiming for a summer or SIGGRAPH release?

Commentary and minor questions:

Longs Peak is interesting. First, it seems, from the example given, that the notion of a default framebuffer went away. So, how does one actually draw to the screen? Is that a non-GL function (like wgl/glx/agl)? Do we have to do our own buffering?

Texture rectangle types seem to have gone away. Will hardware that supported them bot not NPOTs be able to handle NPOTs with no mip levels?

I find it interesting that programs are still not instanced. If you want to use a program object over multiple objects, you have to re-bind all the attached objects. Should we expect program object copy operations to be cheap (ie, not storing multiple copies of the compiled shader, etc)?

I also recall, from the last Newsletter, that it was possible to create separate program objects for each shader type, and bind them to the context separately. Is that still do able? Because it's not mentioned on the diagram that multiple programs can be attached.

The new lpMapBuffer turns me on...

BTW, how is non-serialized map buffer access not incredibly dangerous? Or is this one of those, "Our developers are big boys now. They can take care of themselves," kind of things?

What I want to know more about:

Format objects. These have been danced around for far too long. I want specific knowledge of what they can do.


Hmm... No indices from buffer objects?I assume if one is bound to the VAO, then the "LPsizeiptr" will act as an integer offset to start from. Otherwise, it is a pointer.

k_szczech
06-20-2007, 03:31 PM
First of all, let's not start discussion here, so we don't delay ARB at their work :D

Since I'm an addict I cannot resist to put some comments. Even on new function prefix, hehe ;) I'll just put comments as I read through. When I'm done I'm gonna drop dead. I'm a dead man walking anyway (MOV ECX, 6 \ REP CALL get_up_6_30_AM_go_to_work_and_return_home_by_11_00 _PM).
If you plan to read this post, be advised: frustration ahead (don't worry, I have 2 weeks off in July).

1. "gl"/"lp" prefix - If that would be up to me it would be "og", but it's not up to me :) ogBindBuffer = "Oh, Goodness! Bind buffer". Yeah, luckily it's not up to me.

2. Debug context - interesting. My guess is that it's gonna be (a little?) vendor-specific, which I think is good. Just a guess though.

3. "Good progress was made defining what the draw calls will look like. We decided to keep it simple" - I wanna KISS (http://en.wikipedia.org/wiki/KISS_principle) you guys. As Einstein said: "Everything should be made as simple as possible, but no simpler".

4. No comments on buffer objects. No need to :)

5. Same for object management

6. glDrawArrays - ha! Multiple TRIANGLE_STRIP's with one draw call. I wonder if "first" and "count" arrays could also become buffers stored on GPU. That would be nice. Also primitive type could be an array to allow mixing TRIANGLE and TRIANGLE_STRIP in one call. I bet some geometry optimization libraries could take advantage of that. Of course there is no strong need for such feature in LP, methinks. Mt Evans maybe.

7. "We'll continue to show you details of Longs Peak in future issues of OpenGL Pipeline". I'm gonna measure time in a new unit: "2 OpenGL Pipeline Newsletters ago" sounds fashionable... Ok, seroiusly, this sounded like LP will be out after at least two more newsletters (6 months?). Perhaps that's why this fragment got my attention.

Sorry for the frustrated tone of my post. Ok, time to drop dead... Err... I mean go to bed.

Rob Barris
06-20-2007, 03:43 PM
The "lp" prefix is a placeholder, and will not be used in the final spec.

k_szczech
06-20-2007, 03:49 PM
The "lp" prefix is a placeholder, and will not be used in the final spec.Yes, I know. It's stated clearily in the newsletter. I just felt in mood to put something into that place. As I said - let's not start discussion over it here. I think comments are welcome on everything, but discussion should rather focus on important things.

k_szczech
06-20-2007, 03:54 PM
Hmm... No indices from buffer objects?When I looked at glDrawArrays I thought it's nice that we can define multple areas of vertex array to draw in just one call.
It would be nice if we could do exactly the same thing with index arrays. Would be usefull for frustum culling for example.

Ok, I swear I'm really going to bed now. I wanna be capable of thinking at work tomorrow :)

Rob Barris
06-20-2007, 04:09 PM
You will be able to source indices from buffer objects in LP.

Keep in mind, the Pipeline newsletter submissions were written and submitted some weeks ago. There has been steady progress since that time on the LP spec effort.

knackered
06-20-2007, 04:34 PM
Originally posted by Korval:
Longs Peak is interesting. First, it seems, from the example given, that the notion of a default framebuffer went away. So, how does one actually draw to the screen? Is that a non-GL function (like wgl/glx/agl)? Do we have to do our own buffering?Noticed this on page 8:-

Create a framebuffer object to render to. This is the fully general form for offscreen rendering, but there will be a way to bind a window-system provided drawable as a framebuffer object, or as the color image of an FBO, as well.
So presumably that means there'll be a wglGetImageObject(HDC) call somewhere. I just wonder how that's going to work with back buffers and quad buffered stereo. Maybe they'll have wglGetBackImageObject(HDC), wglGetFrontImageObject(HDC), wglGetBackLeftImageObject(HDC) and wglGetBackRightImageObject(HDC).

It's about time you could map a range of a vertex buffer, been in d3d for years.

The whole object model does look very nice indeed now I see it written down.

Corrail
06-20-2007, 04:43 PM
void lpDrawArrays(LPenum mode, LPint *first,
LPint *count, LPsizei primCount,
LPsizei instanceCount)
...
Finally, instanceCount is used for geometry instancing; the entire set of ranges will be drawn instanceCount times, each time specifying an instance ID available to the vertex shader, starting at 0 and ending at instanceCount-1.
Wouldn't it be nice to specify the instance ID on my own? Something like

void lpDrawArrays(LPenum mode, LPint *first,
LPint *count, LPsizei primCount,
LPint *instanceIDs,
LPsizei instanceCount)If that is possible I can store all objects of a similar type (for example trees) in one VAO and use instancing for rendering. I have only one got big uniform object with an array of transform matrices, colors, etc. which I have to bind. Then some culling algorithms check which of these are seen and those instance IDs are passed by my modified function. Then in vertex shader I have all information to render all seen trees. With this method I only need one draw simple draw call.

I don't know long peaks/the object model very good but its just an idea...

Jan
06-20-2007, 04:45 PM
"As I said - let's not start discussion over it here."
Well, this is a discussion board...

I demand, that the next newsletter is posted at 10 AM European time, not American, this is just too late.

The only thing, that comes to mind right now, is that drawcalls do not include an "offset" parameter for the indices (offset added to each index, not the thing the "first" parameter is used for). As discussed in several long threads, that i leave to other people to find links to.
Just want to make sure, it is not forgotten.


I also am wondering, whether rectangle textures might be gone for good. I am not sure, whether i think this is a good thing, maybe it is, but i do find them useful, even with NPOT textures, so, as long as hardware can be made faster with rectangle textures than with NPOT textures, i'd like them to stay (including non-normalized texture coordinates).

I like the idea, to remove the framebuffer and use FBOs throughout the pipeline. The question, how we present the result on screen is of course a valid one.

I find the "Non-serialized access" weird. If i am forced to make sure i don't do anything wrong, using semaphores and sync-objects, doesn't that come down to the same stuff, the driver needs to do? I am pretty sure only very few people would use this fragile feature. Not with partial and whole buffer invalidation, which seems to be a good idea.

None of the ideas for shader/program objects, i have read/used so far have convinced me. The ARB_xxx_program system was a mess, the current GLSL system is not good and the way i understand the lp system it seems not to really change. Also, i fear i need to link or create new program objects, every time i only want to bind a different texture to the shader / pipeline.

Well, some interesting information, nothing that surprised me (though that's certainly a good thing).

Jan.

knackered
06-20-2007, 04:48 PM
I've never seen the point in texturerect. I must be missing something. For me they just clutter up glsl.

Jan
06-20-2007, 04:49 PM
About Corrail's idea to specify ones own instance IDs: That's GREAT !!!

Rob Barris
06-20-2007, 04:59 PM
Non-serialized access is roughly equivalent to the non-blocking MapBuffer extension on OS X OpenGL (flush buffer range).

It turns out to be straightforward to use if adopted as part of a "write once" policy; if for example you set up a large VBO, map and write the first N KB, then draw that, then map and write the next N KB, and draw that, etc - having this option means the second map need not wait for the drawing of the first batch to complete, giving you concurrency. But note, in this example each block of data was only written to once and so there is no risk of a scheduling hazard.

Using the non-serialized option is much more difficult in situations where you might overwrite a section of the buffer with previously specified data in favor of new data, and if you are not handy with fences, should definitely be avoided.

Note that you can still potentially get concurrency (avoiding a block condition in MapBuffer) if you specify write-only, invalidate-range and explicit-flushing, because this gives the driver an "out" to optionally provide an efficient scratch buffer for your writes, which it can deliver into the specified buffer range later, based on the flush mapped data calls you must make after writing when explicit flush is enabled.

(why/how is this different from BufferSubData? because it allows you to maintain any representation for source data that you like, and to be able to uncompress directly to the destination using your own code - BufferSubData would not allow this, the source data must be in a copyable form)

Korval
06-20-2007, 05:12 PM
If that is possible I can store all objects of a similar type (for example trees) in one VAO and use instancing for rendering.Um, you can do that currently (well, currently as LP stands). What you can't do is specify directly what the ID will be. I don't really see a problem with that. Is an increasing counter not good enough to index into an array?

I mean, in the typical case, the contents of that array are being updated every frame. So, you're going to have to fill out the buffer anyway; you may as well fill it out in order.

The thing I like about this is that it suggests that this functionality will be properly emulated transparently for non-DX10 hardware. That way, we don't have to worry about it. Worst-case, it acts like a number of repeated draw calls that modify a uniform value, which would be what you would have done in the first place.


Just want to make sure, it is not forgotten.It probably hasn't been forgotten so much as made unnecessary.

If VAOs do all the validation and so forth up-front, then there is an open question as to the need for such a thing. After all, the impetus for the call is performance based on the current GL API. It's entirely possible that simply making a VAO for each of the things you want to render and swaping them in/out as needed will offer all the performance benefits of the parameter.


The ARB_xxx_program system was a mess, the current GLSL system is not good and the way i understand the lp system it seems not to really change.What is the specific problem?


I've never seen the point in texturerect.The main point was to have NPOTs before we had NPOT hardware. It's rather important to be able to at least expose something of that hardware, such that we can create unmipped textures on hardware that supported rects and not generalized NPOTs.


It turns out to be straightforward to use if adopted as part of a "write once" policySounds like the perfect way to implement a GL 2.1 wrapper ;)

k_szczech
06-20-2007, 11:56 PM
Instance ID's - we usually have to specify some instance-specific data anyway (like transformation matrix), but it could be usefull if that data is static and we just want to skip some instances (frustum culling). It would give some extra performance in this case.
But what if someone is drawing a bunch of particles and he doesn't want to do frustum culling for every one of them individually (that would be crazy)? Then such array of instance ID's is unnecessarry and introduces additional performance cost.
So if there will be such functoinality, then I should be able to pass NULL as a pointer to this array, so my instances will receive ID's automatically.


Also, i fear i need to link or create new program objects, every time i only want to bind a different texture to the shader / pipeline.Why would one have to link or create new program object when binding another texture? It doesn't make any sense to me.

elFarto
06-21-2007, 01:55 AM
At last, it's been released (of course I've been reading it since before it's release :D ).

Everything looks good, apart from no spec :(

Judging by the comments in this thread, it seems like we could use an 'lpSuperDrawArrays' and 'lpSuperDrawElements':


void lpSuperDrawArrays(LPsizei instanceCount, LPsizeiptr *instanceIDs.
LPsizei primCount,
LPenum *mode, LPint *first,
LPint *count, LPint *offset)
void lpSuperDrawElements(LPsizei instanceCount, LPsizeiptr *instanceIDs,
LPsizei primCount,
LPenum *mode, LPsizei *count,
LPsizeiptr *indices, LPsizei *offset)Where offset is the value to add to each index read from the buffer. instanceIDs[i] can be NULL if want to draw all instances of primitive i and instanceIDs can be NULL if you want to draw all instances of all primitives. Same for offset.

I also rearranged the parameters, they seem to make more sense to me like this:, i.e. draw 1 instance of 2 primitives.

Of course this raises the question, why can't all these options + all the objects that are bound to the context (fbo/vbo/program objects/etc..) be wrapped up into a 'Draw Object'? Then the draw command is just lpDraw(drawObject);

Regards
elFarto

Jan
06-21-2007, 02:44 AM
When you also introduce a "drawobject", that would mean you need to have an awful lot of drawobjects. I am thinking about my octree, which has over 20 thousand nodes and through culling there can be many different combinations of parts of the array i want to render. So, doing this on-the-fly is, in my opinion, the way to go. Otherwise, in this scenary, i would be creating and deleting hundreds of drawobjects every frame (or even worse, i would need to reuse them, for performance reasons ... if they would be mutable, at all).

"Why would one have to link or create new program object when binding another texture? It doesn't make any sense to me."

I have to admit, that i haven't fully understood what the program object contains and which parts can be changed, without relinking it. However, it sounds like, when changing some buffer-object (that contains uniforms), the shader needs to be validated (linked?), because that buffer might have a different layout. I hope it is not that way, because changing a buffer object is a very common operation.

Jan.

Komat
06-21-2007, 03:17 AM
Originally posted by Jan:
However, it sounds like, when changing some buffer-object (that contains uniforms), the shader needs to be validated (linked?), because that buffer might have a different layout.
If the buffer has different layout, you will almost certainly need different program object because the shaders must be updated to match the layout so layout will likely be part of the immutable attachment properties of the program object.

It is possible to modify the attachment point of existing program object to reference different buffer with the same layout so, when swapping multiple buffers with the same layout, you can use single program object.

skynet
06-21-2007, 03:57 AM
To me, VAO's look a bit scary. If it really means, IŽd have to create and fill out a specific VAO whenever I want to draw a certain range out of one or more buffer objects it could be even worse than Jan's 20000 statically allocated VAOs.

IŽm using a number (ca. 100) of buffers of medium size (ca. 4mb) to upload geometry on the fly. Because of this, each geometry might end up in a different buffer at a different position each time it gets uploaded again. Sometimes, even the attribute count differs between geometries (so they don't have a homogeneous layout inside one buffer).

This means: every time, geometry is uploaded, I'd have to recreate a corresponding VAO and for each uploaded piece of geometry (several thousands) IŽd have to keep a VAO around.
Of course, at draw time IŽd save alot of calls with this model!

I just want to make sure, that LP is ready to create/delete/recreate/keep several thousand VAOs :-)


If a VAO really means an immutable combination of:
buffer(s), generic attribute Id(s), offset(s), size(s), type(s) and strides(s)
it would sound logically to me to include the indexbuffer into it as well... and we would wind up with what elFarto called "DrawObject".

What just comes to mind: often you need to render the same object, but with different attributes. Imagine for instance, a z-fill pass, where youŽd need only the position (no normals, no texcoords). Would that mean, IŽd have to create another VAO for each object just for this pass?
Of course, if the shader is not using the additional attributes, it should not hurt correctness, but since the 'normal' VAO also specifies normals and texcoords the driver believes IŽm accessing buffers which in fact are not accessed. The driver then possibly blocks those buffer objects unnecessary.

Anyway, looks like VAOs need some more detailed explanation :)

knackered
06-21-2007, 06:52 AM
well it looks like they're just like d3d vertex declarations...
http://msdn2.microsoft.com/en-us/library/bb206335.aspx

elFarto
06-21-2007, 06:57 AM
skynet, from my understanding of it, and judging by the diagram, the buffer objects used by the VAO are mutable, the rest isn't.

Jan, looking at it that way draw objects are certainly not a good idea and I can see now why they've done it how they have.

Regarding the program objects they have (according to the diagram) the following attachment points:
Vertex program Fragment program Buffer object (used for the uniforms) Image objects and Texture Filter Objects. (probably set in one go, see this (http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=3;t=014645;p=1#0000 20) , the glUniformSampler line)

Regards
elFarto

Brolingstanz
06-21-2007, 10:34 AM
A couple of impressions and an exclamation ...

1. Increment of ref count when used -- Personally I'd prefer managing object lifetimes myself, without having to second guess the API.

2. Default framebuffer -- I don't mind supplying my own default framebuffer, in fact I'd prefer it that way.

3. Debug context -- oh yeah!

Komat
06-21-2007, 11:11 AM
Originally posted by bonehead:

1. Increment of ref count when used -- Personally I'd prefer managing object lifetimes myself, without having to second guess the API.
Any OGL specification which creates objects in current OGL version specifies what happens, when you delete object which is currently bound somewhere. The driver then needs to correctly implement that behavior which might have some performance and implementation cost for handling of this very special case which almost never happens.

Such situation can not happen with the reference counting approach because the object has at least one reference as long it is bound somewhere so both specification and driver does not need to have special case to handle it.

Korval
06-21-2007, 11:15 AM
When you also introduce a "drawobject", that would mean you need to have an awful lot of drawobjects. I am thinking about my octree, which has over 20 thousand nodes and through culling there can be many different combinations of parts of the array i want to render. So, doing this on-the-fly is, in my opinion, the way to go.Why? What's wrong with having 20,000 VAOs? Odds are that you have 20,000 C++ objects, one for each node, anyway. It's not that much more memory for the implementation. A few pointers, some offsets, a couple of stride parameters. Hell, you're going to have to store that information yourself anyway. Just let the driver do its job and stay out of it.

VAO's aren't really on-chip memory; they're client memory allocated and stored in the implementation. They contain references to objects and some state data for them.

So if you can allocate 20,000 C++ objects, why can't GL?


However, it sounds like, when changing some buffer-object (that contains uniforms), the shader needs to be validated (linked?), because that buffer might have a different layout.Why would it? The program doesn't change just because some uniforms changed, nor does the buffer need to be validated or have its layout changed.

What it does mean is that nonsensical nVidia "optimization" where they recompile the shader if you change certain uniform values goes away. But as far as I'm concerned, that's the way it should be.


Would that mean, IŽd have to create another VAO for each object just for this pass?Why not? You need to have a different program object and set of blending parameters (maybe) anyway, so what's one more object? From an API standpoint, I prefer that it have an entirely separate VAO, just so that it matches with its entirely separate program and separate blend parameters.

We are talking about an object that takes up, maybe, 32 bytes per vertex attribute. And that's worst-case; it's probably more like 16 (offset, stride, BO-pointer, and an enum/switch/bitfield for the format [int, short, etc]).


Personally I'd prefer managing object lifetimes myself, without having to second guess the API.Those objects are owned by the server, so no, you won't be doing that. You aren't doing it in 2.1, and you won't be in the future.

knackered
06-21-2007, 12:41 PM
Originally posted by elFarto:
Of course this raises the question, why can't all these options + all the objects that are bound to the context (fbo/vbo/program objects/etc..) be wrapped up into a 'Draw Object'? Then the draw command is just lpDraw(drawObject);Really, what use would that be?
The common case is state sorted, only an idiot would package everything up like that and just draw in an arbitrary order. I can't imagine a scenario where I would actually use the proposed drawobject, except perhaps in some weird prototype app.
In any case, this stuff would probably be folded into the display list object, when they get round to it.

Korval
06-21-2007, 01:02 PM
Suggestion to the ARB, on the "per-sample operation" object:

One object for all of these parameters is wrong.

From the user's perspective, blending and, say, depth testing are two different settings that are set from two different places. A user's object would know how it blends, so it should have its blend parameters/object/etc. In that way, it is similar to a program object.

But why would the object decide how the depth test works? How depth testing happens is not really something the object needs to be aware of. Currently, that's a sort of "set and forget" parameter. You set it globally, and change it very infrequently. Certainly not on a per-user-object basis.

From the user's point of view, lumping the depth test in with the blend functions is asking for trouble. It makes it hard to change the paramter globally, as you have to go around and change it in all of the objects that render with it.

elFarto
06-21-2007, 01:44 PM
Originally posted by knackered:
Really, what use would that be?None, as I'm starting to see. Just seeing all the shiney new objects, I was wondering why drawing didn't have an object, and it makes sense that it doesn't.

Regards
elFarto

V-man
06-21-2007, 05:40 PM
Originally posted by Korval:
But why would the object decide how the depth test works? How depth testing happens is not really something the object needs to be aware of. Currently, that's a sort of "set and forget" parameter. You set it globally, and change it very infrequently. Certainly not on a per-user-object basis.What the heck?
I haven't had the time to read this stuff but hope it's going to be sensible. The whole idea of a new GL was to clean out the clutter and be a thin layer. The state machine is a beautiful thing.

Or maybe they want the design it in a such a way that a fixed amount of data is sent to the GPU everytime a draw call is made.

Brolingstanz
06-21-2007, 07:20 PM
One object for all of these parameters is wrong.Hasenpfeffer.

Korval
06-21-2007, 07:53 PM
Or maybe they want the design it in a such a way that a fixed amount of data is sent to the GPU everytime a draw call is made.They're trying to make it so that a future "blend shader" can easily be dropped into place, without the API cruft that accrued when GL went with shaders.

Jon Leech (oddhack)
06-21-2007, 09:53 PM
The only thing, that comes to mind right now, is that drawcalls do not include an "offset" parameter for the indices (offset added to each index, not the thing the "first" parameter is used for).As currently defined, you specify an offset when attaching a buffer to the VAO (IOTW, it is a mutable VAO attribute). I didn't have room to go very deeply into the individual object attributes and behaviors in that article.

arekkusu
06-22-2007, 01:12 AM
Originally posted by skynet:
Anyway, looks like VAOs need some more detailed explanation :) Read the spec shipping since 2002 (http://www.opengl.org/registry/specs/APPLE/vertex_array_object.txt) .

skynet
06-22-2007, 01:44 AM
The APPLE_vertex_array_object specs almost certainly don't explain how VAO's are expected to work in LP. And I dont't think that they are in any way similar, except maybe the idea "lets put the array bindings and enables into an easily bindable object".

elFarto
06-22-2007, 01:55 AM
One thing I've just remembered. In volume 3 of the pipeline, there is a piece of example code for creating an image object. There are 2 lines in particular I'm interested in:

GLtemplate template = glCreateTemplate(GL_IMAGE_OBJECT);
...
GLbuffer image = glCreateImage(template);

Is it your intention to have a glCreate{Image,Buffer,Sampler,Sandwich,...} function for every object type? This will increase the amount of functions.

Is it possible to have a glCreateObject(GLtemplate template); function for all object types instead?

Actually I've just relised this is because the return type can then be checked by the compiler. Prehaps a macro for this:

#define glCreateImage(t) ((GLbuffer) glCreateObject(t))

Regards
elFarto

Zengar
06-22-2007, 02:44 AM
Actually I would prefer syntax like

lpCreateObject(template, obj_type)

and

lpBindObject(obj, obj_type)

to having N versions for each object type

elFarto
06-22-2007, 03:09 AM
Originally posted by Zengar:
Actually I would prefer syntax like

lpCreateObject(template, obj_type)

and

lpBindObject(obj, obj_type)

to having N versions for each object type Passing the object type to create object would be redundant. You've already specified it in the create template, the driver can just stick the type in the GLtemplate structure, and use it when you call createObject. But you would need the #define's to gain some form of type safety with this method.

I do like your lpBindObject though. In the pipeline, they say the context is like a container, so it stands to reason it could have attachment points just like the other containers. Making these attachment points more extensible can only be a good thing. Eg:


lpBindObject(LP_VERTEX_ARRAY, vao);
lpBindObject(LP_PROGRAM_OBJECT, program);
lpBindObject(LP_FRAMEBUFFER, fbo);
lpBindObject(LP_SAMPLE_OPS, sampleops);
lpBindObject(LP_MISC, misc);Regards
elFarto

Jan
06-22-2007, 04:15 AM
I do agree, that one object for sample-parameters is a bit clumsy. D3D10 distinguishes 4 or 5 pipeline stages and has therefore 4 or 5 such parameter blocks. And i think they, too, handle all the blending-stuff as a separate stage.

If one would later on drop in a blend-shader, this would make even more sense, because then you just ignore the whole blend-state, which is now part of some other state. It would be more modular.


I don't think there should be one lpCreateObject and one lpBindObject. This restricts you very much. That would mean all objects would need to be described using a GLtemplate. Additionally your compiler can't aid you with any kind of type-checking.

I think having several functions lpCreateImage, lpCreateShader, ... is the better way to go. There is only a handful of types, so the growth in necessary functions is not an issue. It doesn't grow exponentially, there are no such dependencies. Having a separate create-function means you can have separate template-structures. That makes code more readable, drivers easier to implement, compilers can check types AND you can extend it much better. If there is a new type, the extension just adds a create and a bind-function, that's it.

Doing it by declaring different enums, that you pass to the function is more or less the same thing, just less flexible.

Jan.

Ido_Ilan
06-22-2007, 05:30 AM
Please don't flame me but do we really need long peeks?
To me it seems that OpenGL 2.1 is a just little messed up form 20 years of evolution, why not remove the redundant calls/types... (like OpenGL ES 2.0).
I love the way OpenGL is organized, the current state machine is very powerful and elegant (it does have limitation on debugging/understanding) .
To me long peeks seems like some DirectX version.
Yes it will be easier for driver writers and have fewer overhead but all this can be achieved with current OpenGL going the way the ES went.

I know Korval,knackered and the rest will flame me but still do we need it?
Ido

Demirug
06-22-2007, 05:40 AM
Originally posted by Jan:
I do agree, that one object for sample-parameters is a bit clumsy. D3D10 distinguishes 4 or 5 pipeline stages and has therefore 4 or 5 such parameter blocks. And i think they, too, handle all the blending-stuff as a separate stage.If you want to compare it with D3D10 you have to count 3 state objects. Rasterizer, Blend and DepthStencil.

The two other D3D10 state objects are the Input layout and the sampler. The first one can somewhat compared with the vertex array object but it stores only the vertex layout and no references to the buffers.

The sampler state could be compared with a texture filter object.

k_szczech
06-22-2007, 06:32 AM
do we really need long peeks?Yes.
It's true that some things that will be introduced in Longs Peak could be introduced to GL2.x.
However, there are enough reasons to release new API: performance, driver development issues - you can read about it in the first newsletter.
Also, new features could turn out to be against OpenGL 1.x/2.x specs.

Besides, we're now in the age of programmable functionality. We must pay with some blood and tears to keep up. In my case it's gonna be rather joy and hapiness, but I'm willing to pay :D

knackered
06-22-2007, 07:33 AM
Originally posted by Ido Ilan:
Please don't flame me but do we really need long peeks?
To me it seems that OpenGL 2.1 is a just little messed up form 20 years of evolution, why not remove the redundant calls/types... (like OpenGL ES 2.0).
I love the way OpenGL is organized, the current state machine is very powerful and elegant (it does have limitation on debugging/understanding) .
To me long peeks seems like some DirectX version.
Yes it will be easier for driver writers and have fewer overhead but all this can be achieved with current OpenGL going the way the ES went.

I know Korval,knackered and the rest will flame me but still do we need it?
Ido Most of opengl is centred around legacy fixed functionality, so most of opengl is an anachronism.

Brolingstanz
06-22-2007, 11:15 AM
To me long peeks seems like some DirectX version.If you're referring to D3D10, I don't see anything wrong with that ;-)

Besides, it comes as no surprise to me that new APIs targeting today's and tomorrow's hardware should look pretty similar.

knackered
06-22-2007, 04:19 PM
Doubt microsoft is putting the same thought into the long term. They'll happily change the whole api in a couple of years. We can laugh, but at least d3d10 is out and being used.

V-man
06-22-2007, 05:47 PM
If it is similar, it is better since the thousands of new games that will come out soon could easily be ported to Mac and Linux. I'm pretty sure no one will switch from DX to GL on Windows itself.

BTW, MS has no choice but to obsolete their own software. They are a business.

Ido_Ilan
06-22-2007, 11:56 PM
If it is similar, it is better since the thousands of new games that will come out soon could easily be ported to Mac and Linux. I'm pretty sure no one will switch from DX to GL on Windows itself.
Even worse, who will use the new API?, current academic are using old fixed pipeline or switching to DirectX (see many recent gpu related graphics papers), games are using DirectX, maybe the workstation market will switch but it is a field changing very slowly.
people are used to the current API if they need to learn a new one what will make them learn OpenGL over DirectX?
I fear that the new API will be the downfall of OpenGL instead of its future.

V-man
06-23-2007, 01:44 AM
You would have to go to those academics and do a little market research. If it's gaming related, then it makes sense to use DX. You have to serve your audience.

If it's some GPGPU thing, from what I understand these guys want GLdouble support and some specific hw features. They typically use GL and shaders are important to them.

Ido_Ilan
06-23-2007, 01:54 AM
..If it's gaming related, then it makes sense to use DX. You have to serve your audience.
Why does it make sense? The gaming world is the engine that pushes the graphics industry, if OpenGL is of no usage in that area it will die slowly. Why should someone who is interested in graphics learn OpenGL if most companies uses DirectX. People don't care if its OpenGL/DirectX as long as its look good and run smoothly.
OpenGL will turn into a niche market (smaller then today) and I don't believe that alternative OS is almost any factor in choosing to learn OpenGL in the future.

Jan
06-23-2007, 04:58 AM
Well, that alternative "Linux" OS might not be that important for gaming in the near future. However, Apple just started to get onto the same track MS did 1995, that they want their OS to become important for gaming. I am not a Mac user, mostly because Macs are no gaming machines today, but i do hope, that their strategy is successful, because OpenGL is THE one and only 3D API for Macs and it would become very important for game developers then. Who would want to use D3D, if one can develop with OpenGL for EVERY platform, except the 360 (and maybe the Wii, i don't know) ?

In my opinion OpenGL does have the potential to become very strong again, if Apple is successful in bringing mainstream gaming to the Mac. However, for that to happen, we need an API which is designed with hardware in mind that we have today and possibly in the next 5 to 10 years. In this market, that is one of the fastest growing one, it is a pain to work with an API that was designed 15 years ago. And it is getting worse with every new hardware generation.

Longs Peak does not even force you to do the API switch in one go. You can just create a new context, and start using it, while still using your old code, so you can slowly adapt to it.

In the end it will be EASIER, to learn and use Longs Peak efficiently, than it is today to learn OpenGL. Lets face it, OpenGL is a mess. Granted, you can still get started with it in few hours, but MASTERING it is more like black magic, because there are so many ways to do things, so many unknown and unintuitive pitfalls.

With Longs Peak there will hopefully be only few ways to do things and those will be the "right" ways. For starters it will be more difficult, when you only look at Longs Peak, because stuff like immediate mode and display lists will be gone. But the idea is to build an "ecosystem" around OpenGL, that will provide such things in utility libraries. So, instead of glBegin/glEnd you might be writing gluBegin/gluEnd with Longs Peak and the utility library will encapsulate all that stuff for you, so that you can get started as fast as today.

I don't doubt, that within half a year after Longs Peak is out, we will have libraries like glu, glut, glew and SDL for it, that will make everyone's lives very easy. And i am pretty sure, that 90% of todays OpenGL programmers will happily adapt to it, because of easier use, less maintenance, less extensions to work with and, very important, better and faster drivers.

And i don't doubt, that the ARB will not disappoint us with the final specification.

Jan.

k_szczech
06-23-2007, 06:26 AM
mostly because Macs are no gaming machines todayHave you seen recent presentation of Id Tech 5 engine? John Carmack presented his engine on Macintosh. It's a very good platform for games, so it's only the question of market actually.
Remember that Playstation 3 means OpenGL.
So now you have 3 alternatives:
#1 - DirectX 9:
Xbox 360 + Windows XP/Vista
shader model 3.0
#2 - OpenGL:
PS3 + Windows XP/Vista + Macintosh (+ Linux)
shader model 3.0/4.0
#3 - DirectX 10:
Windows Vista
shader model 4.0

#2 gives you similar market to #1. PS3 instead of Xbox - I think that's disadvantage, but you get Macintosh as compensation.
But the thing is that #2 also gives you what #3 does.
So you either develop in both DX9 and DX10, or with OpenGL (with SM4.0 features used if available).
Now if we could only make ATI expose OpenGL API under XBox, but I bet we can't. Microsoft surely has taken care of it (by the means of technology and law).

knackered
06-23-2007, 07:03 AM
in the end it doesn't really matter which api you use, does it? It's currently a piece of cake to have a renderer support gl and d3d, and pretty necessary. Just as it doesn't matter which OS you use, the differences are trivial. Currently the only significant platform consideration is the shader language you use and the number of CPU cores, and how best to utilise them.

k_szczech
06-23-2007, 07:42 AM
in the end it doesn't really matter which api you use, does it?From programmer's point of view it doesn't matter. It's the platform that enforces given API upon you. And it's the market that enforces platform.

My choice today would be DX9, DX10 and OpenGL only for PS3. My choice after release of Longs Peak / Mt. Evans would be: DX9 only for xBox and OpenGL for all others.

Korval
06-23-2007, 09:27 AM
OpenGL will turn into a niche market (smaller then today) and I don't believe that alternative OS is almost any factor in choosing to learn OpenGL in the future.The problem with your argument is simply this:

Things for OpenGL are not going to get better by not radically redesigning the API. Minor alterations, removal of old functionality, aren't going to entice anyone who is thinking of using D3D to not use GL.

OpenGL was designed a long time ago. Many of the basic assumptions that they made in good faith back then just aren't valid today. Simply poking at it isn't going to improve it at this point. A redesign allows the ARB to create an API that is fast, easy to implement (fewer bugs. Maybe ATi implementations will stop sucking ;) ), and easy to use.

Quite frankly, OpenGL's biggest problem isn't the API per-se: it's its famous unreliability.

Game developers really want their game to work on ATi, nVidia, and Intel hardware. Unfortunately, only nVidia's drivers are actually good at OpenGL. The others are somewhere between buggy and horrible in terms of implementation quality. Meanwhile, their D3D implementations are all pretty damned good (though the Vista versions are still lacking). Which would you choose?

LP is promising because it simplifies implementations. It makes it easier for implementers to write their code by lessening the requirements on the implementation.

Jan
06-23-2007, 10:20 AM
k_szczech: You didn't read my post carefully. I was actually referring to exactly that presentation.

Macs are no gaming machines today in that there is no gaming market for them. Hardware is ok, but not high-end. If we are lucky, Macs will be opened up in the near future, so that i can build my own PC but instead of Vista i install OS X (with the gfx card I like).

But unless there are games, that are produced for Macs, MacOS is not an option for gamers.

Jan.

elFarto
06-23-2007, 11:34 AM
Originally posted by Korval:
Quite frankly, OpenGL's biggest problem isn't the API per-se: it's its famous unreliability.

Game developers really want their game to work on ATi, nVidia, and Intel hardware. Unfortunately, only nVidia's drivers are actually good at OpenGL. The others are somewhere between buggy and horrible in terms of implementation quality. Meanwhile, their D3D implementations are all pretty damned good (though the Vista versions are still lacking).I think something that would help considerably in this regard is a reference implementation and/or conformance tests. A 'one true implementation' to compare programs/drivers too so there is some sort of consistency in the OpenGL world.

An OpenGL32.dll reference impl. that I could drop into my program to see if it's my code or the driver screwing up would be great.

Regards
elFarto

ZbuffeR
06-23-2007, 01:02 PM
@elfarto: Mesa3D ... almost

k_szczech
06-23-2007, 01:53 PM
OpenGL really lacked conformance tests for versions 2.0 and 2.1. Neither GeForce 7 nor Radeon X1800 are OpenGL 2.0 compilant, but both report it.
I want my programs to talk to OpenGL server like this:
-Do you support this feature?
-Yes.
Not like this:
-Do you support this feature?
-Yes.
-Ok, let's see if it works...

We have to wait and see what requirements will Longs Peak put on implementation, but I'm hoping this issue will be addressed.
I know it's easy to ask for that and it's much more difficult to do something about it. Hardware is allready at the market and if it doesn't support FP16 filtering then what can we do about it? Remove it from specs? Limit Longs Peak to DX10 class hardware only? Of course not.

Korval
06-23-2007, 07:15 PM
But unless there are games, that are produced for Macs, MacOS is not an option for gamers.You mean like World of Warcraft? Or the upcoming StarCraft 2?


I think something that would help considerably in this regard is a reference implementation and/or conformance tests.How would that help? Do you think that ATi and Intel aren't aware of the miles of bugs that are in their drivers?

ATi isn't going to throw off their monthy driver release schedule simply because their GL implementation doesn't pass conformance. The best way to ensure that drivers are more reliable is to make it easier for developers to make them reliable.

tfpsly
06-23-2007, 07:32 PM
Originally posted by Jan:
Who would want to use D3D, if one can develop with OpenGL for EVERY platform, except the 360 (and maybe the Wii, i don't know) ?The Wii can be seen as just a overclocked Gamecube. It has a OpenGl's fixed pipeline like api. So many calls look the same after a s/gl/ngc/ regexp ;) The main differences I saw are on the multitexturing, where you set one operation for the whole set of TU (like "TU0*(TU1*0.5)+TU2"), instead of one operation per TU.

On the other hand there's also the PS3 out there, which is a third area I don't want to care about . I don't know about their gpu api, but creating an engine for ps3 will require quite a lot of specific coding on the cpu (ppe + several sce, very slow memory meanings nearly no use of virtual functions...) :)

V-man
06-24-2007, 01:04 AM
Originally posted by Jan:
Longs Peak does not even force you to do the API switch in one go. You can just create a new context, and start using it, while still using your old code, so you can slowly adapt to it.
If you make a long peak context, then the old GL functions are not available.
If you make a GL 2.1 context, then LP functions are not available.

Jan
06-24-2007, 02:52 AM
V-Man: From how i understood the previous newsletters, you can create an additional LP context in your existing GL app and then you can use both. Both contexts are separated, there will only few functions be, that accept objects from the other context, like you could attach a texture of the old context in a RTT operation that you do with the new context. So that you can use the new functions for some effects, but use the result in your existing engine.
At least that is how i understood it. If that will actually work as expected is a completely different question.

Jan.

Brolingstanz
06-24-2007, 07:08 AM
Yup, that's my take on it too. Should make the transition less painful for large projects.

But personally, I'm going all in. Clean sweep, baby! Wahoooo!

Nothing beats a fresh start :-)

knackered
06-24-2007, 11:23 AM
from what I understood you can create a legacy GL context within LP, which would then emulate the legacy GL using the LP API. Wouldn't make much sense the other way round.

tamlin
06-24-2007, 12:34 PM
While I haven't read all of the newsletter yet, I have some comments on some stuff I encountered.

Under "Buffer Object Improvements" the word "may" is used way too much for my liking - perhaps especially under "Non-serialized access:".
The word "may" unfortunately have completely different meaning in english vs. legalese. OpenGL is a software contract (which would be legalese-ish), but the words here are english. I'd prefer if a more formal, perhaps RFC-like, language was used for those parts - to keep the language english, but keep the function specifications more formal and unambigous.

If the specification for any function can't be, well, specific, it should be reworded or pulled. If the intent is clear, and it can be reworded to remove any shadows of any dubts, it should be done - but only after that's done should it be reconsidered. For now, I consider this part void.


I find the "const" keyword nowehere. Not that I think ARB missed this vital C language construct - I just wanted to point it out, as I've seen larger and more widespread libraries mess this up (f.ex. MS for the longest time thought, and still thinks (!), it should have write-access to your in-system-memory indices, if one is to trust their API).


Another thing about buffers. Partial invalidation. For this to be efficient, it'd have to not only be aligned to source platform's physical page size (or greater - in case of Win32 you map on at least 64KB boundaries, whether or not you have 4KB pages), but also destination platforms alignment requirement. Will LP provide functions, or enums, to query alignment requirements/enforcement?


If one idea is to have (the ability to have) everything in objects, and be able to only hand over object handles to the functions, wouldn't it make sense to have an array [a 2-dimensional one] with every row holding a number of object handles (including vertex and index handles), and then have a call to "just draw this batch" that eventually could be evaluated all on the GPU?

I'm thinking of something like:
name = CreateArrayName();
AddArrayType(name, VERTEX3);
AddArrayType(name, NORMAL);
...
Create, either in system memory or in a buffer on the GPU, an array of [N][M], handing it off to the GPU, and it can then switch programs/states/textures/<whatever> as efficient as it possibly can - especially if it can run many of these tasks in parallel (and let's face it, we can't predict the future, but we can look at history to say parallelism will grow, as currently sequential speeds can't be strained much more).

Initially I'd expect such a thing to run mostly on CPU, and mostly sequential, but while we're anyway designing a new API why not think of what may be possible or even norm tomorrow.

The last part was just an idea, but I think it may have use and performance improving impact.

knackered
06-24-2007, 01:14 PM
Originally posted by tamlin:
I find the "const" keyword nowehere. Not that I think ARB missed this vital C language construct - I just wanted to point it out, as I've seen larger and more widespread libraries mess this up (f.ex. MS for the longest time thought, and still thinks (!), it should have write-access to your in-system-memory indices, if one is to trust their API)."const" is not a keyword in C. It is a C++ type modifier.


Originally posted by tamlin:
Another thing about buffers. Partial invalidation. For this to be efficient, it'd have to not only be aligned to source platform's physical page size (or greater - in case of Win32 you map on at least 64KB boundaries, whether or not you have 4KB pages), but also destination platforms alignment requirement. Will LP provide functions, or enums, to query alignment requirements/enforcement?No need, the implementation can simply do the alignment itself and copy more than was flagged modified if necessary. It's an implementation detail that should be abstracted away from the user.


Originally posted by tamlin:
If one idea is to have (the ability to have) everything in objects, and be able to only hand over object handles to the functions, wouldn't it make sense to have an array [a 2-dimensional one] with every row holding a number of object handles (including vertex and index handles), and then have a call to "just draw this batch" that eventually could be evaluated all on the GPU?You mean like a display list object? They're working on it.

Overmind
06-24-2007, 01:43 PM
"const" is not a keyword in C. It is a C++ type modifier."const" was not a keyword in the original K&R C, but it is in the ANSI C89 standard. I think it's safe to assume all compilers adhere to a 18 year old standard, especially for such a widely used feature...


Under "Buffer Object Improvements" the word "may" is used way too much for my likingThe newsletter is not the spec. I'm sure in the spec they will formulate everything as unambiguous as possible ;)

tamlin
06-24-2007, 01:43 PM
Originally posted by knackered:
"const" is not a keyword in C. It is a C++ type modifier.
Rubbish, since at least 18 years.


[About buffers alignment] No need, the implementation can simply do the alignment itself and copy more than was flagged modified if necessary. It's an implementation detail that should be abstracted away from the user.See, there the efficiency took a real hit. This area was for the really hard-core down-to-the-metal uses from what I read. If my reading was correct, this was for the ones willing and able to go low, really low-level. As such, I'd expect both host CPU, bus, and target CPU- and GPU alignment requirements to be able to be met without implementation intervention (that would by neccessity decrease performance).

As noted, if wanting to upload a subset and not care about alignment, BufferSub* is already there.

Perhaps I read too much into the performance thinking? Perhaps I didn't. Let's leave that for a comment from the ARB.

<snip>

You mean like a display list object?I couldn't have said it better myself (as is obvious! :) ). Yes, almost exactly like a display list of objects (though with a constant number of objects/list_entry - to allow for simplified array traversal).

They're working on it.Excellent!

tamlin
06-24-2007, 02:05 PM
Originally posted by Overmind:
The newsletter is not the spec.I'd hope we all involved think that's obvious. :)

Still, as the newsletter about this area did leave so much to interpretation due to this seemingly innocent three-letter-word, I wanted it to be known to the ARB too.

History has (or should have) tought (sp?) us that ambigous wording has created incompatible implementations. I rather flag for non-problems at design stage, than having to file bugs after implementation.

Rob Barris
06-24-2007, 04:05 PM
Tamlin, can you construct a hypothetical situation where two implementations might in fact be incompatible, where a correctly written program generates correct results on one but not the other ?

I can see how the latitude expressed in the article leaves freedom to the LP implementor on a number of levels, but IMO that can lead to variance in levels of performance, not in correctness. If there's a specific issue that's been missed so far, let's examine it in more detail here.

edit - in case it wasn't clear I'm asking about the "mays" in the description of the new buffer object functionality, with respect to non-serialized access (or any other usage).

knackered
06-24-2007, 05:01 PM
Well I never, const is a C keyword.

tamlin
06-24-2007, 06:08 PM
Rob,

With current wording, I can't, as it leaves too much room for interpretation. Not implementation freedom, but interpretation. Let me elaborate on what I especially opposed - the wording for non-serialized access:

"When this option is engaged, lpMapBuffer may not block if there is pending drawing activity on the buffer of interest".

This can be read as "shall not block" or "will not block" (is forbidden to), "is not intended to block, but is allowed to" or "usually blocks, but is allowed to not block". Any of these behaviours are AFAICT valid interpretations both from an implementors and a users POV.

If I now write a program with realtime demands (in this area) that expects the "will not block" behaviour, but the implementation interpreted "I'm allowed to block", that difference in interpretation of "may" can and/or will break my programs expected behaviour.


"Access may be granted without consideration for any such concurrent activity".

Again, "may" can mean "will" or "shall", "is allowed to" or "is allowed to not" grant access.

In any case it's so vague it basically reads "You can't depend on behaviour". The result of that would be (to me) it's so "shaky" one should really stay away from it. Quite the opposite of what I expect the ARB's intentions are with committing time to designing it.

Anyway, this was as previously noted not an API spec but an article, and I think I may (pun intended) have pushed this too far already. I expect the specification to be non-ambigous, and hope we get a chance to look at the final API draft before it's carved in stone.

Rob Barris
06-24-2007, 09:30 PM
A few things to keep in mind here -

a) you are absolutely right, that sentence in the article could have been written a lot better. Instead of saying "When this option is engaged, lpMapBuffer may not block if there is pending drawing activity on the buffer of interest" - an improved phrasing would be "This option can eliminate the need for lpMapBuffer to block, if activity is pending on the buffer". Note that an implementation may have any number of private reasons to block on this call, that's not something that can be legislated away by the spec.

b) the spec makes no performance or timing guarantees. It is intended to specify behaviors and outcomes for correctly written apps. For this reason, discovering that some implementations run slower than others (for whatever reason, possibly including blocking when you don't want it) doesn't indicate a nonconforming implementation or bug - it is what it is, some implementations will be more aggressive than others. The flexibility in the language allows for that range of aggressiveness.

c) simply put, some drivers may not implement non-serialized access, and for some workloads that ask for it, they will not run as fast - this is not a violation of the spec or a conformance failure - the apps will still run and generate correct results. The only kind of app that will generate different results are those that are not correctly scheduling/fencing their accesses in conjunction with the unserialized option, and that's an app bug.

d) I don't personally believe that high performing OpenGL apps are successfully written or delivered without some level of testing on the target configurations. That testing process should highlight any performance hot spots or issues. If you find that your app benefits greatly from non-serialized access on one vendor's GL but suffers on another that is blocking more often, then you have every right to have a conversation with that vendor about the performance issues you are running into and what your options are. IMO this is not much different than using VBO or VAR today, there is a spectrum of implementations out there with varying performance characteristics.

A key issue here is that not every vendor has the same set of constraints or audience of developers to work with - and not all vendors will approach the task of implementing LP with the same level of aggressiveness w.r.t. performance. So there was a choice, to require true non blocking behavior on all conforming implementations, or to provide flexibility in implementation whereby an implementor could choose how far to go with it and still conform to spec.

It would be nice if something like the OpenGL spec could offer performance guarantees but at present this is not the case. The intent here was not to stick with a lowest-common-denominator approach (for example, not having the option at all), but to provide more performance headroom for aggressive implementations.

I'd also point out that the strict write-only, explicit flush, and invalidate-range options - used independently of non-serialized access - also open up a range of usages that wasn't possible before and in a pretty efficient way. So if the correctness and testing cost of developing code using the unserialized-access option is too high to bear, it may make perfect sense for an author to avoid it.

Michael Gold
06-24-2007, 09:33 PM
tamlin,

You are correct that the word "may" needs to be used carefully. In fact this very issue came up last week during an internal spec review.

We'll try our best to get the wording right in the final spec. If an ambiguity slips through, it will be neither the first time nor the last. Don't despair; spec bugs can be fixed.

With respect to alignment - please consider that this functionality has been reviewed in great detail by individuals from a variety of companies who are familiar with the capabilities of their respective hardware. Let us worry about making our implementations efficient. What's more interesting feedback is whether the described behavior is useful and complete.

Brolingstanz
06-25-2007, 12:04 AM
I think feedback on API design is complicated by the fact that other than plotting colored pixels on the screen, the rest is mostly about efficiency, and few have the interest, expertise and insider's knowledge to make a holistic assessment of what's required or even makes sense within the scope of LP.

On that note, one thing I'd be interested in hearing about is whether there will be an API to validate shader inputs (VAOs) against a particular vertex shader in advance, analogous to the InputLayout and VS signature pairing in d3d10, or if that's not really a serious performance concern in the current design for LP/ME. Not that I particularly care one way or the other, it's just that I and others may have to factor abstractions around this sort of detail.

Jan
06-25-2007, 01:25 AM
I understand that the spec describes functionality and not performance. However the feature of non-serialized access is all about performance. No one is intended to use it just for fun, but only as a low level, down and dirty way to squeeze the last bit of speed out of the GPU.

As such, it just does not make sense to allow the driver to do it better or even worse than the other options. However, of course a spec cannot force a driver writer to optimize some feature. Especially not, if it is a - possibly - rarely used feature.

So my suggestion is this: The app should be able to query the driver, whether this feature is "good". And how to do that? Well, why not put this into an extension? The driver already needs to implement all the other ways to handle arrays, why not make this one optional? If the extension is supported, one can expect, that it is at least as fast as all the others, usually better. If it is not supported, just use the default path.

This would remove the burden from the application writer to test several graphics cards from different vendors and then hardcode "if its NV, disable it, if it's ATI, enable it, except for the mobile GPUs, ....".

I don't want features in the core API, that will not be supported well on all hardware (again).

Another idea: I'd like to be able to query more detailed, what hardware my app is running on.
For example:
Vendor: NV / ATI / INTEL (unique, not changing with every driver release!)

GPU Architecture: Geforce 6 / 7 / 8 ... (the basic architecture, not detailed)

GPU Model: Geforce 8600 GTS SSE2 3DNow! .... (the stuff that is in there today, usually)

Hardware acceleration: true / false

GPU Memory: x MB (yes i know those discussions...)

Driver Name: Forceware

Driver Version: 1.2.3 (only a number, no text in here)

This way, IF anyone would ever want to use a feature based on the hardware one is running on, it will make our lives much easier to distinguish between them.

Jan.

Komat
06-25-2007, 01:51 AM
One thing I consider important is ability to retrieve/set objects which can take long time to generate (most notably compiled shaders) as some driver dependent blob so the application can store them on the disk and avoid the compilation cost during next run unless the hw or driver changes (in which case the driver will reject the blob and application will regenerate the object in ordinary way).

Is something like that planed for the LP?

Michael Gold
06-25-2007, 08:49 AM
We are discussing solutions to the problem you describe. Its unlikely to be solved for Longs Peak because of the schedule pressure, but this first release is just the beginning. :D

Korval
06-25-2007, 11:55 AM
Its unlikely to be solved for Longs Peak because of the schedule pressureAhh, what a perfect seque into matters of scheduling.

Like, when will we see LP released? Are you guys still on-track for a summer release (presumably in time for SIGGRAPH), or is it being pushed back into September?

Also, is there any indication from ISVs how long it will take to start seeing beta implementations (I have no faith that initial implementations will be anything more than beta quality) of LP?

Michael Gold
06-25-2007, 01:25 PM
For the answers to these and other questions, please come to the OpenGL BoF at SIGGRAPH. :)

elFarto
06-25-2007, 01:33 PM
Originally posted by Michael Gold:
For the answers to these and other questions, please come to the OpenGL BoF at SIGGRAPH. :) ...or wait for the presentations to be made available :D


Originally posted by Korval:
...when will we see LP released?I'm taking 2 to 1 odds on them releasing it at SIGGRAPH, 5 to 1 odds that NVIDIA will have an implementation, and 10 to 1 on ATI having an implementation, but maybe I'm just dreaming... :D

Regards
elFarto

tamlin
06-25-2007, 02:47 PM
(could someone lend me their skills to keep posts short? :) )

Rob, while your b) is true, blocking vs. non-blocking can be a make-or-break depending on whether user and implementation agrees on interpretation. This particular feature is a performance promise, iff the user interprets the wording as "shall not block". In that case the user "knows" that s/he can write data at around bus-speed (whether the bus is local PCIe or Token Ring :) ).

Michael, thank you for ACKing the "may" issue.

As for usability - to be able to map/flush sub-ranges of a buffer I consider potentially very useful (from a performance POV, obviously) instead of having to deal with multiple buffers.

I might want to request consideration for the feature others have requested - rebased indices, so that one could use f.ex. one index buffer, and one vertex buffer containing many "frames" of some (on-CPU calculated) animation where at time-of-use one could say "index 0 refers to vertex[4711], normals[5472], colors[0]". I don't know how useful it'd be in reality, but I can indeed see uses for it. A precomputed "path" for a cannon-tower on a tank turning. A "walking" or "running" sequence for a character...

However (didn't you expect it :) ), for this with sub-mapping/flushing to be truly useful I think I'd have to *know* about the alignment restrictions, as it is stated in the article "This option allows an application to assume complete responsibility for scheduling buffer accesses". The only piece of software that can tell me about (optimal) buffer alignment is the implementation.

Consider the following case if I didn't know about mapping alignment requirements:

I create a large buffer. Let's say I only use it for geometry data. I write a batch of vertices that ends in the middle of a "page" (*). I "flush" (perhaps even unmap is still a requirement?) this range to tell the implementation "you go ahead, I'm done with this range", and issue a draw call. I then merrily continue to fill the buffer starting just after my previous end position (mapping that range first, if required) - that starts in the middle of the last "page" of the previously "flushed" area.

If this buffer (memory area) is truly "mapped" over a bus (PCI-ish), it means that either the implementation needs to take a private copy of this last page and place it somewhere else in the card's memory (performance hit, not to mention requirement to fiddle with the GPU for this non-sequential "jumping around" in physical on-card memory when reading what should be sequential memory), or it needs to map the whole last "page" of the previous batch as writable again into my process' address space - thereby giving me write-access to the data I already told it "I'm done with this" and allowing me to potentially interleave (bad) writes to an area the GPU is busy reading.

An even worse scenario would be something like:
- batch 1 writing "pages" 0-1.5
- batch 2 writing "pages" 3.5-5
- batch 3 writing "pages" 1.5-3.5
as it could require both "page" 1 and 3 be mapped for the third batch, while at the same time the GPU will be reading them (both).

I suspect this is the "room for programs to screw up" I read between the lines, but I think it can be improved to prevent this - while still providing maximum possible speed - by the simple addition of the following:

Had I on the other hand been able to query the implementation about alignment requirements, I could "line up" my next write to the next "page" boundary and start writing in a fresh "page".

I therefore consider this alignment information vital for "proper" (in as-fast-as-possible, which seems to be the stated goal with) use of this feature, to be able to use it without creating neither full nor partial stalls at any level. That is, assuming I haven't misunderstood something before this short analysis.


As for the problem of not being able to save a binary blob of a compiled program; why not simply reserve some space towards the beginning of the blob for the implementation to play with, say 8 or 16 bytes (heck, save it like a PASCAL string, prepending the info required to verify compatibility with a byte telling how large the "private" data is) where it can save e.g. PCI ID and/or driver version? Such a small change shouldn't have to take more than a few minutes to implement (for each vendor), it would allow freedom of implementation (256 bytes is likely more than enough to verify compatibility) and it would be a user-mode side thing only with no need to send this verification data over the bus. Compare it to prepending a TCP packed with an IP header if you like (even that this header could, if you really need it, be variable size). That way vendors they can verify compatibility of current hardware with the pre-compiled blob, and simply report success/failure, and in case of failure I need to recompile the program. It seems so easy that I'm starting to fear I'm missing something obvious. Am I?


(*) I used the word "page" in the loosest sense. For an implementation on NT-based systems this alignment would be a "section" (64KB alignment), for a GLX implementation it'd likely be a host page. For e.g. Linux or FreeBSD with local h/w, I haven't got a clue what they use as alignment for mapping h/w to virtual memory. :)


Oh, I almost forgot:
"We'll try our best to get the wording right in the final spec. If an ambiguity slips through, it will be neither the first time nor the last."

I know, that's one of the reasons I bruoght it up. :) Will we get a chance to have a look at a "release candidate" of the spec. before it's carved in stone? More eyeballs and such...

Korval
06-25-2007, 03:35 PM
For the answers to these and other questions, please come to the OpenGL BoF at SIGGRAPH.Why not have someone set up a microphone and record the presentation, compress it in an MP3, and put it online for people to download?

Korval
06-25-2007, 03:52 PM
rebased indices, so that one could use f.ex. one index buffer, and one vertex buffer containing many "frames" of some (on-CPU calculated) animation where at time-of-use one could say "index 0 refers to vertex[4711], normals[5472], colors[0]".That's not what they asked for. What was requested was a parameter to the "lpDraw*" functions that takes an integer offset to be applied to all indices before indexing into the various arrays.


while at the same time the GPU will be reading them (both).If you map memory for writing that the GPU is reading from, you incur a stall (unless you map it using the all-purpose Get-out-of-jail-free card of "non-serialized access"). That's what mapping means.

Now, because the writing range you specify is in bytes, not pages, all the GPU needs to worry about is whether or not the bytes you're mapping match bytes that it has been told to read from. So even if Batch 3 is writing to page 2 after Batch 1 started reading from it, the GPU doesn't need to worry unless it the address range for Batch 3 is actually in the middle of Batch 1.

That is, you never need to know about pages; that's the responsibility of the implementation. The only thing you need to make sure of is that you never write outside the bounds that you mapped.

Komat
06-25-2007, 03:53 PM
Originally posted by tamlin:

As for the problem of not being able to save a binary blob of a compiled program;

It seems so easy that I'm starting to fear I'm missing something obvious. Am I?
I do not think that there is technical problem with implementing such functionality. I think it is more about the need to decide the best way to integrate the retrieval/set thing into the API, deciding how it will interact with rest of the API in various situations or which objects and what parts of them are blobable and so on. This needs to be well-thoughtout because it might be part of the API for long time.

Brolingstanz
06-25-2007, 03:58 PM
Why not have someone set up a microphone and record the presentation, compress it in an MP3, and put it online for people to download?I second that. Or go live!

BTW, what happened with the pod-cast poll?

MZ
06-25-2007, 04:18 PM
Originally posted by knackered:
well it looks like they're just like d3d vertex declarations...
http://msdn2.microsoft.com/en-us/library/bb206335.aspx Yes, provided that the VAO also has some sort of equivalent of D3D Vertex Streams. It's an essential part. The newsletter doesn't give any hints about it, unforunately.

Originally posted by Jon Leech (oddhack):

The only thing, that comes to mind right now, is that drawcalls do not include an "offset" parameter for the indices (offset added to each index, not the thing the "first" parameter is used for).As currently defined, you specify an offset when attaching a buffer to the VAO (IOTW, it is a mutable VAO attribute). I didn't have room to go very deeply into the individual object attributes and behaviors in that article. What kind of entities serve as the buffer attachment points in VAO?

In particular, are they more like array names in today's OGL?
Or are they like Vertex Streams in D3D9 Vertex Declaration?

-----

Explanation for those readers, who are less familiar with The Dark Side:

The "Vertex Stream" in D3D is a fancy name for a simple thing: a subset of active vertex attributes. When you are defining a Vertex Declaration in D3D, each listed vertex attribute is being assigned a number. Those attribs to which you have given the same number, together comprise a Vertex Stream. Note that in the most common usage, you only have one stream per Vertex Declaraion, which is equivalent of using single interleaved array in GL. Of course, multiple Vertex Streams do have their uses, just like non-interleaved arrays in GL.

In D3D9, you bind a Vertex Buffer to a Vertex Stream (of currently bound Vertex Declaration). Also, an offset is provided by user, to indicate where in the buffer the stream data starts. Note that with that functionality, another D3D9 feature - the infamous "base index", is redundant, because instead of changing the base index you could just re-bind the VB at different offset. It's a tradeof(?) of one API call for one API call. Unlike in GL, where we'd have to re-bind each attribute in a separate call.

Also, D3D9 instancing works by assigning "frequencies" or "dividers" to Vertex Streams. This is an example of case where it is necessary to have more than one stream.

In D3D10, they did change some related terminology, but the Vertex Stream is still there.

I personally think the concept of Vertex Stream makes hell a lot of sense. Many things can be said about D3D, but in this part, I think, they just got it right. If the new VAO doesn't get equivalent of D3D Vertex Streams, I predict resurrection of "I wanna base-index extension" threads...

Korval
06-25-2007, 05:04 PM
Note that with that functionality, another D3D9 feature - the infamous "base index", is redundant, because instead of changing the base index you could just re-bind the VB at different offset.No, that's not how it works. If the impetus for the index offset feature was simply the number of API calls, it wouldn't be an issue. It's not the API for offsetting a buffer-bind point; it's the internal stuff that the implementation needs to do to make it work.

The general argument is that, whenever you change an attribute pointer, you need to do some validation work to make sure that the pointer, stride, etc work. And presumably, this work is non-trivial. Thus the purpose of the offset is to avoid rebinding buffers.

Vertex streams don't change this.


Also, D3D9 instancing works by assigning "frequencies" or "dividers" to Vertex Streams. This is an example of case where it is necessary to have more than one stream.A feature that is effective dead. Nowadays, particularly with Longs Peak, the expectation is that you will use the instancing feature of the API. It will pass a number, 0 through n-1, where n is the number of instances, to your vertex shader. From there, you will figure out what you need to do.

It's up to the implementation to decide how to make this work. If it can do D3D9-style instancing, then it will create a buffer with numbers 0-n-1, and the shader compiler will turn the built-in variable quietly into an attribute. If it works like D3D10 instancing, then it will natively handle this case. Otherwise, it will simply issue multiple draw calls, changing the uniform under the hood for each call in the most efficient way the implementation can.


I personally think the concept of Vertex Stream makes hell a lot of sense.In what way?

It is a perfectly meaningless concept. It gives no actual benefit, except for the kind of older hardware that actually had vertex streams. It's exactly the kind of thing an abstraction API should abstract.

I can almost guarantee that LP will not expose streams to the user. From the user's perspective, they're just a layer of bureaucracy that provides no actual benefit that cannot be achieved in some other, more effective, way.

tamlin
06-25-2007, 05:06 PM
Korval,

About rebasing indices; OK, I stand corrected. Then let *me* add this request now. :)

While it seems vertices and normals indeed should go hand-in-hand (and therefore it could be seen reasonable that rebasing indices for one rebases the other pointer too), I see no reason for this to be true for neither texture coordinates or colors, just to name two. Quite the opposite. I see much reason to be able to keep e.g. the texcoords or the vertex colors the same over a surface while deforming the surface.

Just think about (a precomputed animation) deforming any surface. The vertices move, and therefore the normals change. But is it equally obvious that texture coords change, or colors change? I don't think so. And in the name of consistency, as I here only singled out two out of four attributes, would there be any harm in a design where all attribute "pointers" can be rebased? What about user-specified arrays (objects)?

Imagine a 100 frame animation with 10000 vertices (I just pulled those numbers), where you for simplicity had saved 1M verts and 1M normals in two LP array objects. What if I then had 1 color, a few texture-coords, and 3 other arrays for each vertex, but they could remain static for the whole animation.

What's the point in forcing allocation and upload of 100 identical copies into 100 times as large buffers, when a single copy in each buffer [EDIT: was "a single buffer"] would suffice - if I only could rebase indices per-array?

Imagine 1 color (4 bytes), 3 texture coords (36 bytes) and let's say 3 user arrays of 3 floats each (36 bytes) for each vertex. That's 72 bytes/vertex. Now compare 72*10000 = 720k, vs 72*1e6=72 million bytes (not counting overhead). Let's round off and say we compare 720KB vs 72MB.

Sure, the vertex coords+normals would require 24MB, but why add 72MB on top of that if 720KB could suffice?

Again, I don't know. Maybe I'm just dreaming up scenarios noone would ever use. Then again, maybe someone would...


Komat, while I think I understand your concern, isn't there (to be) but a single entry point to upload an already compiled program blob (of a specific type?), and is not that entry point to return success/failure?

If there were/are other ways also to upload already compiled programs I would too be vary, but are there?

I agree it needs to be well designed, as it would stay with us for (hopfully) 25+ years (just as OpenGL 1.0 can still be used, but perhaps 1.1. is the display "1.0 was wrong, we didn't think enough" - just to prove the point).

psyduck
06-25-2007, 05:26 PM
According to Pipeline Newsletter 4, Image data will be defined using lpImageData[123]D, and that's a very bad idea IMHO.

I propose using a single lpImageData signature. Since Image dimension is part of Image Format object, it is redundant to specify its dimension again. By changing offset, width, height, depth to GLint* offsets and GLint* sizes, we can have one function only.

Another point, I added an 'index' parameter to specify which cubemap face ( or array element ) are we dealing with. I believe this would be somewhat similar to the 'target' parameter in OpenGL 2.1.

This is the result:



void lpImageData( LPimage image,
LPint index, // An integer, or: CUBE_POS_X, CUBE_NEG_X, CUBE_POS_Y, ...
LPint miplevel,
LPint *offsets,
LPint *sizes,
LPenum format,
LPenum type,
void* data )It's important to note that 'index' would be zero in most cases ( except for cubemaps and arrays ).

This would be cooler, more elegant, generic, lean & mean, KISS, whatever, IMHO.

If you guys find it absolutely necessary to specify a dimension on ImageData calls, I believe it's a better idea to add a 'dimension' parameter instead of providing 3 (or more?) different functions:



void lpImageData( LPimage image,
LPenum dimension, // 1D, 2D, 3D, ... could be an integer instead. It's another option.
LPint index,
LPint miplevel,
LPint *offsets,
LPint *sizes,
LPenum format,
LPenum type,
void* data )Well, this is not really cool but would do the trick.

Gimme feedback please.

Best regards,
Daniel

tamlin
06-25-2007, 05:28 PM
Michael Gold asked about API completeness.

One thing I just came to think of... Now, I haven't really thought this through, why it may be irellevant, but is there a (planned) facility to ask the implementation if a specific range in a mapped (but flushed) buffer is completed (used and won't be needed anymore by the implementation). Also, is there a way to wait for it to be completed.

For comparison, as most here are familiar with Win32 API, think of it like TryEnterCriticalSection and EnterCriticalSection, only that this query would take a buffer name, an offset and a size.

An added bonus using this approach (buffer name + stuff) as opposed to virtual-address+size, is that one wouldn't have to apply 64-bit pointers to the API.

Consider it a brainstorming idea.

EDIT: This would only be defined behaviour if querying/waiting from a single thread. Multiple threads waiting for such a resource would invoke undefined behaviour.

Korval
06-25-2007, 06:34 PM
What if I then had 1 color, a few texture-coords, and 3 other arrays for each vertex, but they could remain static for the whole animation.So what if you did?

LP doesn't care one way or another.

Michael said you can change the offsets in a live VAO for any particular bound array. Isn't that enough? I mean, I assume that being able to alter the offset means that it will be reasonably performant, so I don't see what the problem is.

And if you need to actually change one of the buffer objects, then make a new VAO. They're small, light-weight, and it is expected that an application will be creating thousands of them.


According to Pipeline Newsletter 4, Image data will be defined using lpImageData[123]D, and that's a very bad idea IMHO.IMNSHO, it's a much worse idea to cross-post.

tamlin
06-25-2007, 06:52 PM
Korval wrote:
Michael said you can change the offsets in a live VAO for any particular bound array.Oki. I must have missed that (perhaps it wasn't in this thread). As it provides the functionality, I'm cool with that.

Rob Barris
06-25-2007, 09:23 PM
Originally posted by tamlin:
Michael Gold asked about API completeness.

One thing I just came to think of... Now, I haven't really thought this through, why it may be irellevant, but is there a (planned) facility to ask the implementation if a specific range in a mapped (but flushed) buffer is completed (used and won't be needed anymore by the implementation). Also, is there a way to wait for it to be completed.
Not in the buffer object API; but just as you can use fences under GL2.x to sort out these kinds of issues, you can use sync objects under LP to meet the same goal.

Can you describe some usage patterns you would be likely to employ in real world code?

Jan
06-26-2007, 08:36 AM
After reading the newsletter again, i stumbled upon this piece:



When an image is bound to an FBO attachment, the format object used to create the image and the format object associated with the attachment point must be the same format object or validation fails. This somewhat draconian constraint greatly simplifies and speeds validation.
Well, i can live with the fact, that the format object needs to be the same. However, i think this should be an implementation detail, that is handled behind the scene and not something a developer needs to worry about.

If i need to pass the same format object when i setup the FBO and the texture to render to, this makes my code much more complicated. In the end i will simply write some layer, that handles format-object creation and for each format-object to be created it calculates a hash-value and checks, whether such a format-object is already created. If so, it returns the same handle (thus the same object).

Now, since we all agree that object-creation is allowed to be a bit slower but object-usage should be as fast as possible, this approach is ok. However, i don't know, why I should care about this? If the driver wants to speed up validation by only accepting identical format-objects (not "equal") then, in my opinion, the driver should take care to actually return handles to the same object, if the app requests an object, that is equal to some earlier created one. It should be easy to implement and it will reduce the burden from the application writer.

How the driver internally does the validation is not my responsibility. I think the spec should say



When an image is bound to an FBO attachment, the format object used to create the image and the format object associated with the attachment point must define the same format or validation fails.
Since the objects will be reference-counted anyways, returning several times handles to the same (immutable) object should not introduce any problems.

Jan.

k_szczech
06-26-2007, 09:50 AM
Another two thoughts for some future version of OpenGL. Although the second one could be added right now.

------- #1 -------
A debug context was mentioned in this newsletter. I started to wonder if there will be some kind of "pure hardware" context - something that would guarantee that I won't hit software fallback. Or some kind of "performance" context where I can hit software emulation but not with much performance impact.
For full emulation I would use Mesa anyway, because an "emulation" context would still be vendor-specific (NVIDIA - no ATI extensions emulated and vice versa).
Better yet - instead of emulation just buy the cheapest GPU from the generation you're interested in. It would still run faster than emulation :)

Note that it's not easy to define "pure hw" context. For example - Radeon X800 emulates gl_FragCoord. It's an emulation, but it's also pure hw emulation. On the other hand it's not as precise as built-in gl_FragCoord.

------- #2 -------
My second thought is on the GL_RENDERER thing. I have no idea how it's going to be in LP, but I think it would be good to have it "reversed".
Instead of asking for renderer, you give a GL_RENDERER string and receive an answer if it's compatible. Such string would be rather general ("GeForce 6" or "NV40" for example - no "6800 LE", "9800 Pro" thing).
Well, the classic GL_RENDERER is of course still required - you have to name a renderer when you display it to user in a combobox with available renderers.

In general, I think we should not ask for driver version and other stuff. It should be all put in one long string so you can put that information into log file or crash report. It could include driver version, release date and other stuff.

Korval
06-26-2007, 11:13 AM
When an image is bound to an FBO attachment, the format object used to create the image and the format object associated with the attachment point must define the same format or validation fails.I agree. Indeed, I agreed to the point of assuming that this was what the newsletter was saying. I didn't realize that it meant literally the same format object pointer.

MZ
06-26-2007, 11:41 AM
Originally posted by Korval:
Note that with that functionality, another D3D9 feature - the infamous "base index", is redundant, because instead of changing the base index you could just re-bind the VB at different offset.No, that's not how it works. If the impetus for the index offset feature was simply the number of API calls, it wouldn't be an issue. It's not the API for offsetting a buffer-bind point; it's the internal stuff that the implementation needs to do to make it work.

The general argument is that, whenever you change an attribute pointer, you need to do some validation work to make sure that the pointer, stride, etc work. And presumably, this work is non-trivial. Thus the purpose of the offset is to avoid rebinding buffers.Changing binding offset, in single API call, adds a value to a group of pointers.
Changing base index, in single API call, adds a value to a group of pointers.

In situation with only single Vertex Stream, these above actions are interchangable, and that was the sole point of the text you've quoted. Your speculations about how big is the implicit difference in the validation work, don't bring anything meaningful or relevant.


Originally posted by Korval:

Also, D3D9 instancing works by assigning "frequencies" or "dividers" to Vertex Streams. This is an example of case where it is necessary to have more than one stream.A feature that is effective dead. Nowadays, particularly with Longs Peak, (...) I haven't postulated the idea you're trying to dismiss here.


Originally posted by Korval:

I personally think the concept of Vertex Stream makes hell a lot of sense.In what way?

It is a perfectly meaningless concept. It gives no actual benefit, except for the kind of older hardware that actually had vertex streams. It's exactly the kind of thing an abstraction API should abstract.

I can almost guarantee that LP will not expose streams to the user. From the user's perspective, they're just a layer of bureaucracy that provides no actual benefit that cannot be achieved in some other, more effective, way. You are saying such a rubbish, that I can almost guarantee that you have false (if any) understanding of the concept you are commenting. Vertex Streams are not related to any "older hardware", because they are not hardware feature at all. Vertex Streams are pure API logic, naturally reflecting the way how we use vertex data.

If you consider all kinds of data which we associate with active vertex attributes, you could distinguish two cathegories:

In one cathegory we have: vertex buffer handle, vertex buffer offset, vertex stride, vertex frequency divider.
They are almost always used in such way that multiple vertex attribs are given the same value.

In the other cathegory we have: data type, data offset, semantics (and several other D3D idioms).
For them, in contrast, such value sharing wouldn't make sense.

Let's focus on the 1st cathegory. Some of those shared properties also happen to be mutable. And when we want to change any of them, it's obviously preferable way to do it in single call, for all the vertex attribs in a group that share the property at once. In order to be able to do so, you need to be able to identify the group. Vertex Stream is such identifier.

Without this "meaningless" concept, you'd have to set the property for each vertex attrib in a separate call. At least with this part you should be familiar, since that's what we do in GL today, using batches of glXXXXArrayPointer calls.

I recommend you to learn a bit how the related parts of D3D9/10 work, and to try to imagine consequences in the API if you removed the Vertex Stream "burden" from it.

knackered
06-26-2007, 11:49 AM
I believe it meant the same format object handle, otherwise they wouldn't use the word draconian.
I agree that it would make the driver faster and simpler, and I should be able to take advantage of that - after all, FBO's and renderable textures are closely tied in my renderer anyway, so it would be no problem for me to supply the exact handle.

Korval
06-26-2007, 12:14 PM
In situation with only single Vertex Stream, these above actions are interchangable, and that was the sole point of the text you've quoted. Your speculations about how big is the implicit difference in the validation work, don't bring anything meaningful or relevant.No, it does bring something meaningful and relevant: performance.

Maybe I'm not a lazy programmer, but as long as it's fast, I don't care if I make 1 function call or 7 to change the base offset for a bunch of VAO parameters. It is no different to me one way or another.

The impetus for the feature as described (an index offset in the glDraw* call) was performance, not convenience. If there is no longer a performance concern, then it's merely a matter of API convenience.


I haven't postulated the idea you're trying to dismiss here.Yeah, I was jumping ahead. See, the only real-world use for the D3D implementation of "frequencies" and "divisors" is for instancing. Since we can do instancing in a much better way now, there's no point to the feature. It's a feature in need of an application, and until one shows up, it is 100% irrelevant.


Vertex Streams are pure API logic, naturally reflecting the way how we use vertex data.So you admit that this feature is nothing more than syntactic sugar? Then how can you possibly describe it as "an essential part?"


In one cathegory we have: vertex buffer handle, vertex buffer offset, vertex stride, vertex frequency divider.
They are almost always used in such way that multiple vertex attribs are given the same value.Maybe the way you work. For someone who may not want to interleave some of his data (possibly for memory/packing concerns, possibly for others) vertex streams are merely a pain in the butt. If you have 6 attributes each in their own buffer, OpenGL makes it work no differently from having 6 attributes in separate buffers. D3D makes you go through some vertex stream nonsense to make this work.

And, as pointed out beforehand, the construct is entirely meaningless from a performance or functionality standpoint. So, without any overriding need for the feature, I don't see the point in having it.

Michael Gold
06-26-2007, 02:13 PM
Originally posted by Jan:
Well, i can live with the fact, that the format object needs to be the same. However, i think this should be an implementation detail, that is handled behind the scene and not something a developer needs to worry about.Why would you ever need multiple copies of the same format? An application-level format cache is a good idea if you really can't structure the code otherwise. We could even provide such a cache in a layered utility library, if the need is common.

The API is optimized for peak efficiency. Optimizing for apps with complex object management semantics adds overhead for apps which don't need this level of assistance. We could implement a bunch of caches under the covers, but then we could just stick with the old state machine model, too.

We debated this very point and reached the conclusion described in the newsletter. If you feel strongly that we made a mistake, I'd love to hear your reasoning.

Komat
06-26-2007, 03:02 PM
Originally posted by tamlin:

Komat, while I think I understand your concern, isn't there (to be) but a single entry point to upload an already compiled program blob (of a specific type?), and is not that entry point to return success/failure?
It might be. However there might be hidden issues we know nothing about which depend on how different parts of the api were designed. For example depending on how the program object is associated with layout of uniform buffer, the ability to unblob program object even before any uniform buffer was created might be issue. This issue might be small or even nonexistent (e.g. solved by one sentence in the specification) however someone still needs to think about implications such feature will have to ensure that there is no room for undefined situations or interactions which are not obvious and that takes some time.

Komat
06-26-2007, 03:53 PM
Originally posted by Korval:
For someone who may not want to interleave some of his data (possibly for memory/packing concerns, possibly for others) vertex streams are merely a pain in the butt. If you have 6 attributes each in their own buffer, OpenGL makes it work no differently from having 6 attributes in separate buffers. D3D makes you go through some vertex stream nonsense to make this work.
Well to me it seems that the Vertex streams in DX are in principe very similar to the buffer attachment points in VAO. Both are simply points to which VB/VBOs are attached so they can be referenced by vertex elements/array.

There might be differences in which parameters (e.g. stride) are per stream/attachment and which are per vertex element/array or how the association between attachment and array is created (user defined value in DX, I do no know what in LP) or if the binding is global or per VAO state however this are only implementation decisions.

Korval
06-26-2007, 05:09 PM
Well to me it seems that the Vertex streams in DX are in principe very similar to the buffer attachment points in VAO.The principle difference is that you can have multiple streams for a single rendering, but you can't have multiple VAOs. A single VAO encompasses all vertex buffer binding state; you can only bind one VAO to the render context.

In essence, vertex streams live in the middle-ground between what gets attached to the context and the raw buffer objects. OpenGL doesn't have any middle-ground.

knackered
06-26-2007, 05:38 PM
Originally posted by Michael Gold:
Why would you ever need multiple copies of the same format? An application-level format cache is a good idea if you really can't structure the code otherwise. We could even provide such a cache in a layered utility library, if the need is common.You'll get no argument from me, it's an ideal candidate for the utility layer. It's the whole point of the object model, you plug objects together.

Korval
06-26-2007, 05:53 PM
On reference counting:

Is there a way to manually bump the reference count for an object? To tell the system that, "Hey, I'm 'copying' this object, so I will release it twice." Or, for us to do that, do we have to wrap our objects in reference counts ourselves?

Komat
06-26-2007, 06:03 PM
Originally posted by Korval:
The principle difference is that you can have multiple streams for a single rendering, but you can't have multiple VAOs. A single VAO encompasses all vertex buffer binding state; you can only bind one VAO to the render context.
As far as I understand the pipeline newsletter:

Single VAO is composed from definition of vertex arrays and definition of attachment points to which buffers with vertex data are bound. Each vertex array internaly has some form of reference to attachment point to which buffer from which corresponding data will be read is bound.

In DX the definitions of vertex arrays (using OGL terminology) and attachment points are separate parts. The definition of vertex arrays is encompassed in VertexDeclaration object. There is no special object to encompass attachment points, instead there are 16 global slots called vertex streams to which individual buffers with vertex data are bound. Each vertex array has integer which references corresponding vertex stream slot (aka the attachment point from VAO) to which buffer from which corresponding data will be read is bound.

While it is true that the vertex stream slots contain some informations which are in the VAO part of definition of vertex array (e.g. stride), the entire structure is very similar and combination of currently bound VertexDeclaration object and current content of vertex stream slots gives the same information (modulo instancing specific DX stuf) as the single bound VAO.

Rob Barris
06-26-2007, 09:39 PM
Originally posted by Korval:
Yeah, I was jumping ahead. See, the only real-world use for the D3D implementation of "frequencies" and "divisors" is for instancing. Since we can do instancing in a much better way now, there's no point to the feature. It's a feature in need of an application, and until one shows up, it is 100% irrelevant.
Without giving away too much detail, I'd like to point out that the stream frequency divisor style of instancing is heavily leveraged by Blizzard's Starcraft II engine, and the newer DX10-style instancing is not a one-to-one substitute for what can be accomplished with the VSFD technique.

Korval
06-26-2007, 10:09 PM
Without giving away too much detail, I'd like to point out that the stream frequency divisor style of instancing is heavily leveraged by Blizzard's Starcraft II engine, and the newer DX10-style instancing is not a one-to-one substitute for what can be accomplished with the VSFD technique.If this is true, then without asking too much, Is it planned for Longs Peak? Not the vertex stream stuff; D3D can keep that. But the actual useful functionality of frequency/divsiors?

And if it's not, then I think Lucy's got some 'splaining to do...

Jon Leech (oddhack)
06-26-2007, 10:55 PM
Originally posted by Korval:
On reference counting:

Is there a way to manually bump the reference count for an object? To tell the system that, "Hey, I'm 'copying' this object, so I will release it twice." Or, for us to do that, do we have to wrap our objects in reference counts ourselves? No. Reference counting isn't intended as a concept the developer should be working with directly, rather it's a way to describe the behind-the-scenes work the driver needs to do to ensure objects live as long as they need to.

It's possible that folks have constructed an incorrect view of handles, based on some other comments I've seen in this thread. The model we're working with is that there is only one handle, ever, for an object. You get that handle when you create it. If the object is shared, the handle value can meaningfully be shared with other contexts. Once you release the handle, there will never again be an explicit way to refer to that object - but if it's attached somewhere else, the underlying object storage won't be released so long as that's the case.

Ref counting is simply a convenient as-if mechanism to describe this behavior: the handle creates one ref on an object, each attachment or binding creates another ref. Implementations could do something other than ref counting if so inclined, like some form of GC.

bobvodka
06-26-2007, 11:11 PM
When I read the news letter I figure it was a case of the reference counting means you can 'delete' an object, but if it's bound the driver will know and will intelligently deal with deleting it when it is no longer bound.

I seem recall some 'issues' surrounding when things become invalid when GLSL objects where added to GL2.0 with respect to deleting which I guess made things hard for the driver writers.

Just to throw my two cents in I'm liking how it's stacking up thus far, amusingly it seems that myself and Korval are on the same wavelenght with it comes to stuff to ask about as early in the thread he covered some of the things I was considering.

Keep up the good work, looking forward to the release of this thing and to using it :D

MZ
06-27-2007, 09:44 AM
Originally posted by Korval:

Well to me it seems that the Vertex streams in DX are in principe very similar to the buffer attachment points in VAO.The principle difference is that you can have multiple streams for a single rendering, but you can't have multiple VAOs. A single VAO encompasses all vertex buffer binding state; you can only bind one VAO to the render context.
What on earth made you think that with vertex streams you'd need to bind multiple VAOs to the context?

Korval, you dont't understand how vertex streams work. Every single time you write a comment about them, you're spreading another bit of misinformation. Please, just stop. The MSDN is two clicks away.

Korval
06-27-2007, 10:37 AM
What on earth made you think that with vertex streams you'd need to bind multiple VAOs to the context?I didn't say that you would need to. I was saying that a vertex stream is not conceptually similar to a VAO, because you can have multiple streams, yet only one VAO. Specifically, I'm saying that VAO's are bigger that streams; they encompass streams.

Now, granted, I did misinterpret the statement I was responding to, as he was equating streams with VAO attachment points, not VAO's themselves.

However, even that is incorrect, because streams can represent multiple shader attributes (position, normal, etc), while VAOs (presumably) operate on a per-attribute basis. That is, you bind to a specific named attribute. If multiple attributes share the same buffer, then you bind each attribute to that buffer individually.

Thus, VAOs probably don't have buffer binding points at all; they have attribute binding points, and you bind buffers to them. Which is different from the stream methodology.


Korval, you dont't understand how vertex streams work.See, you keep saying this, yet you have failed to actually provide contradictory information to anything I have said. If you have some specific claim that I have made that is in error, then feel free to provide information to the contrary.

Komat
06-27-2007, 01:14 PM
Originally posted by Korval:

However, even that is incorrect, because streams can represent multiple shader attributes (position, normal, etc), while VAOs (presumably) operate on a per-attribute basis. That is, you bind to a specific named attributeImho having buffer binding point per attribute might increase cost of binding of different buffer to the VAO for hw which requires streams on hw level or which benefits from knowledge that some data are interleaved in single buffer. In that case the single bind might change situation from interleaved case to the separate buffers case and thus force the driver to reevaluate how to present the buffers to the hw. Because of this I assumed that the "buffer/stream level" might be exposed by the VAO however I might be wrong.

I think that because we both are guessing (with all those might, probably, presumably) on how the VAOs work, it will be better if we wait for more official informations.

Overmind
06-27-2007, 02:50 PM
the newer DX10-style instancing is not a one-to-one substitute for what can be accomplished with the VSFD techniqueOnly if you assume that a geometry shader with a loop reading from a texture buffer object is slower than an indexed draw operation ;)

The fact that Starcraft II makes heavy use of this features only means that Blizzard targets hardware where this is actually the case. It's understandable they don't want to do anything that pushes the minimum requirements to a GF8800 or beyond.

That doesn't mean that frequency dividers make any sense in future hardware.

Korval
06-27-2007, 03:00 PM
hw which requires streams on hw levelWatch out; MZ might get all "MZ-SMASH!" on you for suggesting that streams actually are a property of certain hardware rather than just syntactic sugar ;)

That being said, I find the pipeline paper... misleading and internally inconsistent on this, and related, points.

For example, the section describing the general classification of objects states that "container objects", which include program objects and VAOs, have immutable attachment properties, but mutable attachments. That is, it suggests that, for VAOs, you can change buffer pointers, but you can't change what attribute it refers to, the stride, offset, and other interpretive properties.

And yet, on this very forum, we've been told that you can change buffer offsets in live VAOs.

Additionally, the program object section states that there is no incremental relinking. Yet, the container object section specifically cites the case of changing a shader attachment on a program object, thus suggesting that incremental relinking is possible.

So, I suppose we simply can't say one way or another. However, the standard GL paradigm has been to abstract hardware requirements like vertex streams. And it would certainly be possible (or much easier) if the attachment properties were immutable. Further, GL has never exposed vertex streams, and I'm not sure they have too much reason to do so if the hardware limitation can be properly abstracted.


Only if you assume that a geometry shader with a loop reading from a texture buffer object is slower than an indexed draw operationThat, given the current geometry shader performance, almost certainly is slower.


That doesn't mean that frequency dividers make any sense in future hardware.Personally, I'd rather see a more generalized form of them. As it stands currently, the only thing you can do with them is instancing. Expose greater control over the index (not necessarily a program, but a more complex math operation), and you could do something with that that doesn't involve rendering instances.

tamlin
06-27-2007, 08:52 PM
Rob, as I noted I hadn't though it through, and after you mentioning it can be solved by fences... I think I'm cool. No, I didn't have any particular scenario in mind, just a gut feeling that "this could possibly be of use for the API to be _complete_, even if most of the time unused".

I'll leave up to the ARB to decide, unless someone else comes up with a convincing argument (like, wanting to start with the buffer offset 0 again, as they did that last frame, without wanting to use a fence (for some reason)...).


Jan, I sort of agree. Exactly the same format object is possibly too much to ask for. Comparison for equivalience, yes - the very same object, no. It's like requiring the pointer to the very same C++ object describing the format, even that the other object I handed it describes the very same format and memcmp returns 0 - to *require* you *obey* the flyweight design pattern (ref: GoF).

Sure, the flyweight pattern is good, but an API that forces me to use it and can't even collate equivalent objects??? F-u-c-k that!

On the other hand, lp is seemingly *all* about performance, and comparing 64 vs comparing 4 bytes can *sometimes* make a quite measurable difference (read on). I don't know how it would be measured within thousands of other calls, but still... It *is* at least 8 CPU clock cycles difference .(just to point out the absurdity of even questioning this).


Michael (Gold), peak efficiency would require lp to use data types and alignment not familiar to many. In this case, for a format, we're talking about a data type that seriously can't be larger than 64 bytes internally (if it is, something is wrong) vs a 4-byte (or soon more likely 8-byte datatype, in 64-bit systems). Yeah, sure, comparing a register already loaded from memory to another 4- or 8-byte entity is faster than comparing 16 or 8 of'em. Uhhh, OK. So? Is this such a time critical area it needs to be *this* heavily optimized (in the context of some other commands actually need to round-trip to the server)? This last question is a "real" one, I want an answer.


Komat, I don't dig. The binary blob program is to be available in two instances; 1) when you ask for it (to save it, possibly even to disk) and 2) when you load it (from disk is the issue here).

If you load it from disk and the implementation tells you "Bad BLOB", you need to recompile your program, right? So what is, or rather what can be, the problem? Call me thick, but I simply don't get it. It has AFAICT *nothing* to do with previously bound stuff or anything. To me it's a simple as "BLOB fails => recompile".

If there are hidden issues, I think this is the time we should ask the ARB to make them non-hidden. Let the ARB say something about it and make it clear. If they stay silent...

Jan
06-28-2007, 01:27 AM
I too agree, that comparing 4 bytes or 64 does not make any difference. Especially since this is a rarely used feature (compared to all the work that is then done with the FBO).

And i suggested a solution. Either drivers do the real check at runtime, or they do the check at creation time and actually hand me the same handle.

I can do this myself. However i'd like to have my texture-loading code and my FBO creation code separate and still be able to attach every texture to an FBO. Also i won't have ONE format-object for a given format, since i will create a format-object when loading the texture and then destroy it (i don't need it any longer, if i only use the texture for rendering, right?).

Anyway, even if i need to store it, syncing my FBO code with my texture-management code can become messy. What if i unload a texture? I need to destroy the format-object, i created for that texture. Oh, wait, it might be still in use by an FBO or another texture. What can i do to prevent destroying my handle? Well, what about reference counting? Great idea, but hey, doesn't the LP API do that internally, too?

It turns out, that what the LP API saves in extra computation needs to be managed by the host application. And there it can become more computationally intensive, because of this draconian restriction. (it still will be a piece of cake but the the point stays valid)

I am all for efficiency, but don't forget, 90% of the time is spent in 10% of the code. Creating a format-object in the LP API is not part of those 10% and it is not even a very complicated task. I don't believe this is about efficiency, this is more about ease of implementation. Can't blame you for that, but at this point it goes a bit too far.


Jan.

Komat
06-28-2007, 02:46 AM
Originally posted by tamlin:

If you load it from disk and the implementation tells you "Bad BLOB", you need to recompile your program, right? So what is, or rather what can be, the problem? Call me thick, but I simply don't get it. It has AFAICT *nothing* to do with previously bound stuff or anything. To me it's a simple as "BLOB fails => recompile".
One example if what I was talking about: Based on discussion in different threads it appears that during creation of program object you will need to specify uniform buffers (so the layout is known) and for created object you will be only allowed to bind buffers which have some strong relationship to the original buffers used in object creation (e.g. are clones of them). If you create the object in ordinary way, the buffers must already exist because you need them for the creation call. For the unblobing case the api needs to ensure that the relationship is recreated even if you unblob the object into freshly created content and create the uniform buffer later or disallow such behavior or even force user to specify correct uniform buffers during the unblobing call. I am not saying that this is a technical problem. What I am saying is that someone needs to decide what is the best way of doing so based on how the rest of api is designed.

The main idea that I tried to express is that even if the thing appears to be simple, someone needs time to think about it to determine if that is really so and what other mechanisms might be influenced by it.

Overmind
06-28-2007, 02:54 AM
That, given the current geometry shader performance, almost certainly is slower.On current hardware, but I'm pretty sure this will change soon ;)

Otherwise the new instancing would be pretty useless for anything more than billboards.

Michael Gold
06-28-2007, 06:05 AM
Well, you guys are making a lot of assumptions about format objects - they contain more information than you might think. As we expect the size and complexity of formats to increase over time (even Mt Evans formats will be larger than Longs Peak), we'd like to constrain the cost of comparison. A shallow compare has a fixed cost, whereas a deep compare has a variable cost.

EXT_framebuffer_object exposed a lot of deficiencies in the GL2 object model, and this was a large source of inspiration for Longs Peak. We actually considered making the attachments to FBOs immutable, but decided to allow the flexibility of swapping "like" attachments, with the condition that such a modification was extremely lightweight. If you populate the attachments of an FBO exactly once (effectively treating them as immutable after setup), the cost of comparing formats affects only initialization. If you plan to swap FBO attachments with any frequency however, the cost of comparing formats affects runtime. We don't like to make assumptions about the way you're going to use the API. We want all usage patterns to be efficient.

If you're right and we're wrong, we have the ability to relax the format compatibility rules later. But if we're right and you're wrong, and we start with your preferred design, we can't fix it. We're stuck. As we don't enjoy breaking backward compatibility, please forgive us for erring on the side of caution.

Jan: memcmp is not rocket science. This has nothing to do with implementation complexity, and everything to do with performance.

tamlin: If there are a lot of required round trips in the API, we made a big mistake. I'm only aware of one, and its shadowed by an expensive operation. If you know something that I don't, please share. And in the future, please bear in mind that the "F" word lends no strength to your technical points.

knackered
06-28-2007, 06:14 AM
I'm beginning to see how gl got into the mess it's in today. It's role is to abstract the hardware, not perform application level duties like caching for you.

Jan
06-28-2007, 08:12 AM
Well, as i said, i can live with the other concept, too. I only wanted to point out, that IF the driver does not really save much by this restriction, one should rethink whether pushing responsibility out of the API into the application really saves you anything in the end.

Micheal: Yes, we are making many assumptions, in general. That is because we don't know better. It is usually quite clear, how high level tasks are implemented, but the more it comes to the low level stuff, the less we actually know (well, Korval seems to know it all).

So, all we can give you is input from a users perspective. And some of our ideas/criticisms are, of course, not really practical to implement. We don't demand it, we only share our view about all this. We all agree, that you guys are doing a great job.

Jan.

V-man
06-28-2007, 09:01 AM
Originally posted by Michael Gold:
We don't like to make assumptions about the way you're going to use the API. We want all usage patterns to be efficient.
Yes but somehow, somewhere, somehow, someone will code something that will expose a bug. In the case of FBO, the call sequence had a effect on ATI drivers. Someone said they needed to attach a texture before attaching a depth buffer.

I know you said "efficient", which is a different matter.

I hope LP will be so simple, that even a baby will be able to make a driver for it.

knackered
06-28-2007, 09:06 AM
Well they've already explained they're introducing an object model, which means you work with objects, not parameters anymore. Seems entirely reasonable and correct to insist that the object being used to create the attachment point is the same one used to create the object to be attached. If I want to render into a whole bunch of textures each frame this is going to save me a lot of comparison work in the driver.
If you want something more high-level, get an unreal engine license.

Korval
06-28-2007, 10:33 AM
What if i unload a texture? I need to destroy the format-object, i created for that texture.Ahh, and therein lies your problem: you're creating a format object for your texture. Presumably for each texture.

Format objects should be resources in and of themselves, not merely part of image creation. You know beforehand which internal texture formats you're going to use, so you should be able to create and manage them "globally". At the very least, you can have your texture files name a texture format, and then have texture format description files that describe (in some terms) what the texture internal format means.


Otherwise the new instancing would be pretty useless for anything more than billboards.Why?

You don't need geometry shaders to make instancing useful; vertex shaders can do the transformations based on an array of uniforms or a texture or whatever.


Yes but somehow, somewhere, somehow, someone will code something that will expose a bug. In the case of FBO, the call sequence had a effect on ATI drivers. Someone said they needed to attach a texture before attaching a depth buffer.That's an implementation bug; they're going to happen. The best the API can do is make it easier to write the implementation so as to make it easier for there to not be as many bugs.

k_szczech
06-28-2007, 11:01 AM
please forgive us for erring on the side of caution.From what I've seen so far, you're doing a good job.
I have one idea, although it requires some consideration (speaking of caution). We could allow to pass identical format objects but specs would say that it can introduce some undefined performance problems, while passing the same handle would ensure maximum performance.

By the way - I have (yet another) prefix suggestion: "ogl" :D (yeah, I know it's way too long)

elFarto
06-28-2007, 11:37 AM
Originally posted by k_szczech:
We could allow to pass identical format objects but specs would say that it can introduce some undefined performance problems, while passing the same handle would ensure maximum performance.Seems like a good compromise.


Originally posted by k_szczech:
By the way - I have (yet another) prefix suggestion: "ogl" :D (yeah, I know it's way too long) I think that's the only prefix they could reasonably use.

Regards
elFarto

knackered
06-28-2007, 12:34 PM
Originally posted by k_szczech:
I have one idea, although it requires some consideration (speaking of caution). We could allow to pass identical format objects but specs would say that it can introduce some undefined performance problems, while passing the same handle would ensure maximum performance.No, that's the thin end of the wedge. Make a rule and stick by it, don't have hidden performance degradation, just make the call fail. It's that kind of ambiguity I don't want in the new API. The simpler, cleaner and more obvious the API, the better.
Plus it adds extra burden on implementations that just shouldn't be necessary.
All this kind of high level stuff is ideal for the utility libraries that can be built on top of a clean API.
Just get over it.

k_szczech
06-28-2007, 01:23 PM
I wouldn't call that ambiguity.
Consider for a moment that we choose to change specs to say "identical object", not "the same object".
Implementation could simply assume object is identical if handle is identical (note that this is not ambiguity - it's an optimization inside driver). Using the same object would be just a performance hint.

Perhaps I shouldn't wrote "undefined performance problems". It sounded like some slow path. It would be actually normal path (with format object comparison - that's what has to be done if specs will not require passing the same object). Passing the same object would be fast path (something like early Z out).


Just get over it.No problem. As for me we could stick to requirement that you must pass the same object. I just wanted to point out that we can have both actually: less strict requirements and high performance (achieved by optimization). Although this optimization is probably not something that should be described in specs, so that was my error I guess.

knackered
06-28-2007, 02:40 PM
I agree the alternative you describe doesn't have any drawbacks (except that it requires the driver have a format object cache in system memory), but as I say, it's the thin end of the wedge. As soon as you start putting caveats and special cases into the API, no matter how innocent it seems, you start to get bloat. I'd much rather these things get moved into the higher level utility libraries that will inevitable get written to automate some this object creation stuff.

k_szczech
06-28-2007, 03:57 PM
As soon as you start putting caveats and special cases into the API, no matter how innocent it seems, you start to get bloat.That's so damn true :)

On the other hand we're not talking about special case here, but about a choice between handle comparison or full object comparison. Currently we have handle comparison in specs, mainly for performance reasons, so I wanted to point out, that the other solution also gives the same performance in he same case.
As for "special cases" themselves - vendors will probably put their own special cases in implementations anyway (FP16 blending/filtering, vertex textures, gl_ClipVertex, etc).

Korval
06-28-2007, 05:03 PM
I wanted to point out, that the other solution also gives the same performance in he same case.Right, but it does two things:

1: It forces driver developers to implement the other case, which may not be a simple memcmp.

2: It allows the user to take a non-optimal path, and the only documentation that this is sub-optimal is buried in a very detailed, very technical, very dense specification.

#2 is the major issue for me; the best way to prevent stupid code from being written is to make it impossible to write. Just like the code with the most up-to-date documentation is the code that is self-documenting.

Not that #1 isn't important, of course.

Overmind
06-29-2007, 04:09 AM
You don't need geometry shaders to make instancing useful; vertex shaders can do the transformations based on an array of uniforms or a texture or whatever.True, if you have a constant vertex count and topology. This just makes my original argument against frequency dividers stronger. You just need a fast vertex texture lookup to emulate them with the new instancing mechanism...

Rob Barris
06-29-2007, 08:45 AM
Originally posted by Overmind:

You don't need geometry shaders to make instancing useful; vertex shaders can do the transformations based on an array of uniforms or a texture or whatever.True, if you have a constant vertex count and topology. This just makes my original argument against frequency dividers stronger. You just need a fast vertex texture lookup to emulate them with the new instancing mechanism... Thinking about this technique for a second, so you would need a per instance ID and a vertex texture; which class(es) of hardware do you find both of those features on ?

Michael Gold
06-29-2007, 09:27 AM
Originally posted by Korval:
2: It allows the user to take a non-optimal path, and the only documentation that this is sub-optimal is buried in a very detailed, very technical, very dense specification.

#2 is the major issue for me; the best way to prevent stupid code from being written is to make it impossible to write. Just like the code with the most up-to-date documentation is the code that is self-documenting.You've hit the nail on the head. To the extent possible, we'd like to eliminate obvious pitfalls by designing them out of the spec.

Jan's comment about creating a format per image is one such pitfall. If you have 1000 images with equivalent formats, you only need one format object, not 1000. The compatibility requirement at FBO attachment helps to a degree but doesn't solve the problem entirely: not all images are attached as render buffers.

Practically speaking its difficult to prevent this kind of application inefficiency without incurring overhead at creation time (e.g. the driver checks a creation request against all existing objects and warns of redundancy). This is a non-starter for a production driver. I can imagine a debug utility which implements such a mechanism, however.

bobvodka
06-29-2007, 09:27 AM
DX9 Geforce products; the ID value can be emulated via an attribute and the GF cards can do vertex texture fetch...

knackered
06-29-2007, 09:44 AM
and geometry shaders.

Korval
06-29-2007, 10:42 AM
DX9 Geforce products; the ID value can be emulated via an attribute and the GF cards can do vertex texture fetch...Right. Exactly the same class of hardware that exposes frequency dividers.

Though Overmind is right about the performance of vertex texturing being rather lacking in non-G80 cards.

Rob Barris
06-29-2007, 11:16 AM
Originally posted by Korval:
Though Overmind is right about the performance of vertex texturing being rather lacking in non-G80 cards. Can you elaborate on that ?

Korval
06-29-2007, 11:27 AM
Can you elaborate on that ?Not really. It's general knowledge that, while an NV40 card can do vertex texturing, it's generally not a good idea if you like your vertex shaders to be, ya know, fast.

Now, to be entirely fair to nVidia, I heard this about the GeForce 6xxx line. I don't recall any information one way or another about the GeForce 7xxx line, so maybe it's actually usable there?

AlexN
06-29-2007, 02:14 PM
Vertex texturing is also slow on the 7xxx cards, the latency of the texture fetch isn't hidden and it's pretty difficult to find enough work to do after the texture fetch to hide it yourself...G80 cards texture at full speed in all geometry stages, I believe.

Rob Barris
06-29-2007, 02:37 PM
Any insights on these vertex texturing issues on AMD GPU's ?

Korval
06-29-2007, 03:07 PM
Any insights on these vertex texturing issues on AMD GPU's ?As I recall, ATi cards didn't support vertex texturing before the R600. And the R600, like the G80, uses a unified shader architecture, so the same texture units are used in all stages. As such, I imagine they perform reasonably well.

Of course, part of the NV40 problem is that you can only vertex texture from floating-point textures, which hurts performance-wise.

bobvodka
06-29-2007, 05:57 PM
Just to confirm; no ATI card before the R600 can do Vertex texture fetches.

As I recall the reasoning was it would be too slow to be useful/worth while doing for the transistor cost as such they didn't impliment it. (This became the stuff of many flame wars in certain quaters over the lack of VTF vs SM3.0 spec...)