PDA

View Full Version : Longs Peak Object Model



k_szczech
02-23-2007, 03:23 PM
--------------------------------------------
Ok first of my observations is rather trival (a warm up?)

I'm used to 2 different naming styles:
1. this_is_some_name
2. thisIsSomeName / ThisIsSomeName
I prefer the second one, but I think I'll have some problems getting used to this:

glTemplateAttrib<name type>_<value type>I think I would prefer the '_' character to be removed and <value type> to start from capital letter. We could probably even get away with not using capital letter in <value type> because it's obvious that 'ti' means 'template-integer' and not 'template index-NOTHING', but in case there would be name conflict in future a capital letter is fine for me.

--------------------------------------------
Second thing that is on my mind is generalization.

We create a texture starting with:

glCreateTemplate(GL_IMAGE_OBJECT)This defines what set of attributes this object template will have. Now imagine I want to do render to vertex buffer. Wouldn't it be nice to have something like this:

glCreateTemplate(GL_IMAGE_OBJECT | GL_VERTEX_ARRAY_OBJECT)Such object would have both attributes of a texture and a vertex array and could be used as both. It would be up to programmer to ensure that texels of texture overlap with vertices in vertex arrray (it could actually have a limitation to 4-components + power of two - others would generate an error upon creation of such combined object).
Of course there is a problem - it's best to use interleaved arrays for vertex data. But as for non-interleaved arrays I think it could be possible to implement this on existing hardware.

For interleaved arrays it would require interleaved textures or multi-component textures (not possible to sample such texture, and when rendering to such texture it occpies few MRT's) - that would require a great deal of flexibility from hardware so I don't consider it very reasonable - for more complex cases geometry shaders or vertex texture fetch are the way to go.

Such object combination also allows to use glImageData2D on vertex array organized as grids, therefore allowing to pass let's say heightmaps from CPU on the fly - especially usefull for etremely large terrain data streamed from system memory or HDD (only edges need to be updated when moving).

My point here is - textures / arrays / uniform sets (like the environment uniforms mentioned) - they're all actually interpretations of some memory area on GPU - so should we allow to combine different interpretations (hehe, "render-to-uniform-array" ?).

I think that would require one more parameter to glTemplateAttrib:
glTemplateAttribt_o(template, GL_IMAGE_OBJECT, GL_FORMAT, _texture_format);
glTemplateAttribt_o(template, GL_VERTEX_ARRAY_OBJECT, GL_FORMAT, vertex_array_format);

Yes, I know it can be done with geometry shaders or even vertex shaders (VTF I mentioned before), but overlapping objects shouldn't be very difficult to implement and will be easier to use than geometry shaders. It could also be faster because you don't need additional shaders when rendering - there is only cost of updating.

Anyone out there thinks this makes sense or is it just me that should get some sleep?

Nah, I guess I'm a bit overworked lately, but I'll post it anyway :D

Korval
02-23-2007, 07:04 PM
This defines what set of attributes this object template will have. Now imagine I want to do render to vertex buffer. Wouldn't it be nice to have something like this:You're making some fairly strong assumptions. Particularly:

1: That it doesn't work that way already.
2: That even if it doesn't, that you can't bind an image buffer object as a buffer.

For example, take this line:


Such object combination also allows to use glImageData2D on vertex array organized as grids, therefore allowing to pass let's say heightmaps from CPU on the fly - especially usefull for etremely large terrain data streamed from system memory or HDD (only edges need to be updated when moving).Um, we already have that ability.

Jan
02-24-2007, 12:02 AM
About the naming: Really, i don't care that much. If it starts with a capital letter or an underscore, is, in my opinion, nothing that one needs to discuss about. If the Khronos group ("the ARB" was much easier to type!) decided, that this is good, so will it be.

About the generalization:
The template that you generate there is certainly only a struct, that is generated in the driver, which you fill out afterwards. It just hides the actual struct, to be extensible. What you want is not to generate a template/struct, that can be a combined thing (vertex/image..), but what is generated using the parameters passed through that template, is what you want to be a combination.

As Korval already pointed out, you assume, that this isn't possible at the moment. I highly doubt that, because certain combinations might be very useful. However, the template is not, what you want to merge. Instead you will most likely have to generate a completely different template, which was intended to create a combined object. The advantage would be, that you can use only templates, for what they are intended. You cannot generate some useless combinations, which the driver doesn't know how to interpret.

I'm sure the guys have thought this out very carefully. The small example was just a general introduction, how it looks like. We cannot derive the full functionality from it.

Jan.

Zengar
02-24-2007, 01:14 AM
As far as I understood, image objects are derived from buffer objects (they made it rather clear in the article), so you should be able to use them as such. Also, the creation routine returns GLBuffer type.

k_szczech
02-24-2007, 03:19 AM
2: That even if it doesn't, that you can't bind an image buffer object as a buffer.If I bind 4-component image as buffer then what kind of vertex data it represents? vertex2d + texcoord2D? This is why you need to use more than one template for an object, but after some thoughts I think that you can create object with one template and then assign other templates to it, enabling additional properties. So yo're probably right - it could be implemented allready.

elFarto
02-24-2007, 04:10 AM
It seems like the object model is pretty damn good, but there is lots of information we don't have which is making it hard for us (me at-least) to gauge if the design is good.

Maybe it's because we've never seen an example of VBO's in the new model, any chance of that for the next issue of the pipeline? :)


Originally posted by k_szczech:
If I bind 4-component image as buffer then what kind of vertex data it represents? vertex2d + texcoord2D? In one of the slides from the BOF I think, it was hinted that VBO's would get a state object. presumably this would contain the layout information for the VBO. I could quite easily see this in the new model:

glBindVertexBuffer(GLbuffer, GLVBOState);

Regards
elFarto

zeoverlord
02-24-2007, 07:09 AM
Originally posted by Korval:

Such object combination also allows to use glImageData2D on vertex array organized as grids, therefore allowing to pass let's say heightmaps from CPU on the fly - especially usefull for etremely large terrain data streamed from system memory or HDD (only edges need to be updated when moving).Um, we already have that ability. True, regarding this it would be nice to have something like SUN_mesh_array make it into core sometime this millenium

Korval
02-24-2007, 09:42 AM
If the Khronos group ("the ARB" was much easier to type!)They're still called the ARB. The full name of the Khronos subgroup is the "OpenGL ARB Working Group".

Korval
02-25-2007, 11:05 AM
Maybe it's because we've never seen an example of VBO's in the new model, any chance of that for the next issue of the pipeline?Actually, what I would like to see in the next issue is a fully-functional example. That is, taking the entire pipeline from the creation of the rendering context to the rendering of an object. I would like to see all the steps that are necessary to render something under the new object model. It's obviously going to be more complex than standard GL, but I'd like to see what the whole thing looks like from beginning to end.

MZ
02-26-2007, 09:00 AM
Originally posted by Korval:
It's obviously going to be more complex than standard GLMore complex than current state? Which part exactly?

Roderic (Ingenu)
02-26-2007, 10:13 AM
It's pretty straightforward, it's very much like D3D10, except instead of using structs to define state blocks you're creating an object an initializing it with function calls.
Sure it's somewhat more verbose, but it's also expendable (unlike D3D10) since you can add new enums/states to an existing state block through extensions.

Sofar so good, I also don't like the underscore in the new naming convention, I'll get over it for sure, but it could be changed to current convention it would be as good.

Basically we have buffers and state blocks that tells the hardware how to interpret the buffer bound to a given entry point. (right ?)

Are we going to have state blocks for everything just like D3D10 ?
(rasterizer, blending,...)

Korval
02-26-2007, 10:32 AM
More complex than current state?More complex in the sense that it isn't as straightforward to get something running as standard GL, but it's better overall for getting something real running.

Jon Leech (oddhack)
02-26-2007, 02:38 PM
Originally posted by Korval:
Actually, what I would like to see in the next issue is a fully-functional example. That is, taking the entire pipeline from the creation of the rendering context to the rendering of an object.I plan to do that for the next issue. There are still enough details being worked out that we can't do it today - format objects, exactly what the drawing calls will look like and how VBOs will be constructed, how some remaining bits of non-programmable state will be represented, and a few others.

As far as the "ti_o" naming convention, we're open to something better. This is just the least objectionable idea we've had so far.

On the multipurpose templates idea: basically the Longs Peak object model is a shallow tree. Buffer objects contain unformatted data; image buffers contain formatted data. There are parameters that affect use of buffers. For example, you say up front whether an image buffer can be used as a texture, as a renderbuffer, or both, which allows the driver to make intelligent decisions. But we are not planning to allow multiple inheritance :-) - so templates describe exactly and only the set of attributes that make sense for the type of object a template corresponds to. This also lets us do a certain amount of type- and range-checking on attributes on the client side, although some checks (combinations of attributes, resource limits) still can't happen until actually creating an object on the server.

Really templates are just a generalization of attrib lists (the old { NAME, value, NAME, value } sort of thing you see in GLX and EGL). We started off trying to use attrib lists, but they weren't flexible enough. Actually we used to call them "attribute objects", but the resulting naming scheme wasn't great. Michael Gold came up with "template", which we like much more.

ector
02-27-2007, 01:28 AM
This object model seems to be almost exactly the same as D3D10, but with C syntax. Very good, it's a great design and the C syntax doesn't obscure it much, and can easily be wrapped if desired.

But how is it possible that the ARB is about 2 years behind Microsoft in implementing the exact same thing? To me, it seems like something in the entire process is really wrong somewhere...

Zengar
02-27-2007, 01:51 AM
What about vertex arrays? Are they implemented as objects too? Will the old client-pointer vertex array remain (I hope not :-)?

I had the idea of using buffer objects for data transfer, with ability to format data in them. In this model, all data will be stored in buffer objects, which then can be bound to image/array objects. This would also allow the same buffer to be reused for more then one object, probably with different format. Did you consider such design?

Overmind
02-27-2007, 01:57 AM
This object model seems to be almost exactly the same as D3D10, but with C syntax.Do you have some insight that we don't have? Because from the few bits we've seen I doubt you could come to the conclusion that it's "the exact same thing".

Ok, both models are object oriented. True, both models contain roughly the same kinds of objects, but hey, both are meant for the same hardware.

But the similarity ends here. From what I've seen, the GL model seems much more flexible and extensible, while the D3D10 model seems more like the usual "we completely redesign it anyway next version, so don't bother too much with extensibility".

soconne
02-27-2007, 03:48 AM
The only thing I hope they FINALLY implement is the ability to re-index indices sent to the gpu. This would be AWSOME. I believe there was a thread a couple of months ago about all the features we wanted. They need to read that thread.

ector
02-27-2007, 06:25 AM
But the similarity ends here. From what I've seen, the GL model seems much more flexible and extensible, while the D3D10 model seems more like the usual "we completely redesign it anyway next version, so don't bother too much with extensibility". Well, the major difference (OO vs binding) between D3D and OpenGL has been eliminated. The new policy of "create immutable, do not modify" is also the same as D3D10 (because it's the right thing to do, of course).

The only obvious major difference left is the additional flexibility of the parameter lists. Nothing stops D3D10 from simply adding a new COM interface, where all the DESC structs are extended with new stuff, making a transition to the next subversion easy.

Still, nothing explains why D3D10 has been around for a year or so already, even though hw just arrived, and Longs Peak isn't expected to arrive for quite a while yet. Why didn't the OpenGL development start earlier?

Michael Gold
02-27-2007, 07:12 AM
Hi ector,

Any similarity between this object model and DX10 is purely coincidental. I have personally never seen D3D docs or code, not DX10 nor DX3 or anything in between. However, I do have twelve years of OpenGL implementation experience on which to base my ideas, and that doesn't even count the considerable experience of the rest of the ARB.

The guiding principles for this design include runtime efficiency, flexibility and extensibility. Knowledge of upcoming hardware features played a minor role, but bear in mind that Longs Peak is intended to be implementable on shader model three hardware and newer. We feel this gives developers a larger target audience than had we designed the API around a single generation of hardware which is only just hitting the market (and is only available for a single operating system).

Why did it happen now instead of two years ago? You could just as easily ask why it didn't happen four or six years ago. We are, in effect, breaking backward compatibility for the first time in the 15 year history of OpenGL. This is not a task we undertake lightly, as the burden on application vendors is, in some cases, considerable.

In hindsight I wish we had modernized OpenGL in the 2.0 timeframe. That was the original intent but the time wasn't right, for various reasons I will not address here.

Our goal is not to copy DX10 or any other API. Our goal is to build on many years of experience on both sides of the interface to deliver a forward looking, efficient graphics standard which we can all enjoy for years to come.

Michael Gold
02-27-2007, 07:18 AM
Originally posted by Jon Leech (oddhack):
Michael Gold came up with "template", which we like much more.Jon is being modest. I'm not sure I was the first person to suggest the name "template", but the concept was actually Jon's idea, which solved a difficult dilemma: how to atomically specifying all immutable properties required for object creation, while remaining extensible.

k_szczech
02-27-2007, 07:30 AM
Seems like my second assumption that using one object for two purposes is not possible was true. From other post:


Does this mean that direct rendering to vertex buffers (...) just works?
Not yet...So I believe this question is still open. Should OpenGL API allow overlapped objects of different types by assigning multiple templates to one object or by creating mixed templates?
It shouldn't be difficult to implement:
1. Create image2D template
2. Create image object
3. Create vertex array template
4. Assign to the same object
When assigning template to existing object the only limitation is the size of object described by new tempate which cannot occupy more memory than previously created (could be allowed in future).
If you define vertex3D array over RGBA8 image object then it's your problem.

There is one problem - data alignment. RGB image can be stored as RGB or RGBA depending on hardware. So it could be just stated that 4-component objects are guaranteed to overlap correctly or it could be required that driver ensures 1 element to 1 element mapping and returns error if is unable to do so.

elFarto
02-27-2007, 08:29 AM
Why limit yourself to having the vertex array template assigned to 1 image object. Just have a vertex array 'layout' object. This object would just state which fields and in what order they are in (which is pretty much a struct in C).

Then when you come to use a buffer as a vertex buffer, you just bind your buffer along with your layout object and your done. No need for extra associations between the objects.

As for data alignment, this would/should be taken care of by the format object, which should state if you want the hardware to store it in a particular format (thou must only store R, G and B and they must be stored as floats).

Regards
elFarto

Michael Gold
02-27-2007, 08:50 AM
Render to vertex array is a difficult problem in general, as memory layout for rendering often does not match memory layout for vertex pulling. Image pitch is also an issue; if you are rendering to a 2D image and wish to turn the buffer into a 1D array, it assumes no padding is required for row alignment. Implementing RTVA on such hardware often requires an implicit copy, which is why PBO readbacks is a decent solution - the copy is under user control, and can be accelerated without a stall.

A better long-term solution is to use vertex texturing. Admittedly slow on older hardware, its probably the most efficient path for new hardware, particularly where shader hardware is unified between execution units: if its fast enough to pull multiple texels per fragment, its certainly fast enough to pull multiple texels per vertex.

Regarding the suggestion of multiple templates per object; this violates the principle of object immutability. All structural and usage properties must be specified atomically at object creation. Only data may be changed thereafter.

k_szczech
02-27-2007, 10:22 AM
this violates the principle of object immutabilityI agree.
I simply consider objects in OpenGL to be nothing more but a list of attributes and assigned values and you can append to a list at any time... Perhaps I should give up all that low-level thinking and leave that to professionals :D

Korval
02-27-2007, 10:42 AM
BTW, outside of a generally cleaner API, can we expect Longs Peak to provide any additional features that GL 2.1 doesn't, or is that all in Mt Evans?


A better long-term solution is to use vertex texturing.Even with unified shader architectures, I don't see this being nearly as efficient as actual vertex arrays. After all, vertex array data is pulled through its own cache (a cache specifically designed for this purpose), and everything about the vertex pipeline is ultimately designed around the fast transfer of this kind of data. Doing texture accesses may eventually be faster than a copy-to-array + render operation, but it won't be as fast as rendering directly to the array.

elFarto
02-27-2007, 10:58 AM
Originally posted by Korval:
I don't see this being nearly as efficient as actual vertex arrays.All Michael is saying is that an implicate r2vb is unlikely to happen because it's not straight forward to implement (and that is one of the points behind the new API). I assume this means that we'd still be able to implement r2vb the way it's currently done (with a user specified copy)?

Regards
elFarto

Zengar
02-27-2007, 11:06 AM
There is still something I don't understand... As you said, r2vb won't work directly. But the Pipeline states



An image buffer is nothing more than a buffer object, which we all know from OpenGL 2.1, coupled with a format to describe the data. In other words, an image object is a formatted buffer object and is treated as a subclass of buffer objects.
According to the quote, this implies, as "we all know from OpenGL 2.1", the ability to render to a RGBA32F image and bind it as vertex array buffer. It is possible now using the VBO and PBO. Or will buffer subclasses be treated differently from the actual buffers?

Korval
02-27-2007, 11:13 AM
The API is available for implementing it. What isn't is the backend hardware to actually implement it.

Zengar
02-27-2007, 11:42 AM
Well, then the backend hardware should just do an internal copy, transformation, whatever... This works with PBO so I expect all hardware that support PBO be able to bind textures as vertex arrays.

Michael Gold
02-27-2007, 01:06 PM
And as soon as it does, we'll add API support. :D

Seriously, we're trying to get out of the business of making false promises with APIs which don't work as expected.

Korval
02-27-2007, 02:12 PM
Seriously, we're trying to get out of the business of making false promises with APIs which don't work as expected.Hmm... good point. Best to allow for the eventual possibility but wait to expose it until it's real.

Rob Barris
02-27-2007, 08:24 PM
One of the topics that Michael and I have been discussing at some length (along with the other promoters and contributors in the WG) involves proposed enhancements to the buffer object mechanism.

If any one of you could propose a "top 5 list" of things you either like or dislike about the way VBO works now in OpenGL, could you take a few moments to type it up? Now's a great time to possibly spot any lingering issues that have been overlooked to date.

(Alas I am one of those people that has read a bit of Direct3D code, so I have my own biases, which I am leaving out of this post)

Rob

Rob Barris
02-27-2007, 08:28 PM
Originally posted by Michael Gold:
And as soon as it does, we'll add API support. :D
Hardware provides things like the vertex stream frequency divider:

http://msdn2.microsoft.com/en-us/library/bb173349.aspx

http://msdn2.microsoft.com/en-us/library/bb174602.aspx

Has the topic of hardware assisted instancing come up previously? After VBO enhancements I think this is the next one on my mental hit list.

I would think the exposure of instancing-related hardware capability could be considered complete, if we could express all four techniques in the second link, using OpenGL.

Rob

edit - snarkiness removed

elFarto
02-28-2007, 01:32 AM
The only extension I can think of to buffer objects would be a 'Geometry Buffer Object', which would be like a display list. They would be created by using a VBO, the layout of the VBO and what primative type the VBO is in (GL_TRIANGLE, GL_QUAD, etc...).

This would then allow the driver/hardware to organise the buffer into the most efficient layout. Once it's been created it can't be edited, it just becomes a blob. So you'd end up with a draw command like:

glDrawGBO(gbo);and maybe:

glDrawInstancedGBO(gbo, 10, ...); //add whatever instacing parameters are neededRegards
elFarto

elFarto
02-28-2007, 01:41 AM
One more idea I had was a 'Draw Object'. This object would consist of all of the components that could be bound to be drawn, but neatly wrapped up into an object. E.g.

GLtemplate template = glCreateTemplate(GL_DRAW_OBJECT);

glTemplateAttribt_o(template, GL_FRAMEBUFFER, fbo);
glTemplateAttribt_o(template, GL_VERTEXBUFFER, vbo);
//...

GLdrawobject do = glCreateDrawObject(template);

glDraw(do);I'm not completely sold on the idea, I'm not sure how difficult it would be, or if it would be of any advantage. But it's an idea none-the-less.

Regards
elFarto

Overmind
02-28-2007, 02:30 AM
Just something I don't really understand in the discussion about render to vertex array:

What do we need render to vertex array for, when we can just capture the output of the geometry and vertex shader?

From the pipeline:

Finally, capturing the output from the vertex or geometry shader in a buffer object offers an incredibly powerful mechanism for processing data with the GPU’s programmable execution units without the overhead and complications of rasterization.It seems a much more straight forward way to do what we are now doing with render to vertex array. Just generate the vertex data in a geometry program and store it in a buffer. No need to solve all problems that result from rasterization, data format conversion and so on ;)

V-man
02-28-2007, 02:35 AM
Originally posted by Rob Barris:
If any one of you could propose a "top 5 list" of things you either like or dislike about the way VBO works now in OpenGL,
Rob I don't like enabling arrays.

Enable vertex array
Enable normal array
Enable texcoord0
bind client texcoord1
Enable texcoord1
bind client texcoord2
Enable texcoord2
bind client texcoord3
Enable texcoord3
bind client texcoord0

Can't all this be collapsed into 1 call?

I store my data in a few formats : 4 I think
So I have to enable/disable some array.
A single call would be more elegant.

Overmind
02-28-2007, 03:21 AM
I think it's pretty obvious, but here is it anyway:

The whole semantic of binding is anoying.

glBindBuffer has no immediate effect, the current array is only effected when gl*Pointer is called again.

It should be the other way round, I'd like to specify the format (the equivalent of the current gl*Pointer) just once at creation time. Then on usage a single glBindBuffer should be enough.

Michael Gold
02-28-2007, 04:58 AM
Originally posted by Rob Barris:

Originally posted by Michael Gold:
And as soon as it does, we'll add API support. :D
Hardware provides things like the vertex stream frequency divider:Let me qualify my remark. We add core API support for hardware functionality which has long term utility. Short term hacks with no future do not belong in the core. For better or worse, another API reinvents itself every two years, and throws away features that didn't work as expected. The OpenGL philosophy differs; we don't like breaking backward compatibility, and Longs Peak marks the first and only major discontinuity in fifteen years. We don't wish to make this a habit. Experimental functionality belongs in experimental extensions (e.g. NVX_instanced_arrays).

I would think the exposure of instancing-related hardware capability could be considered complete, if we could express all four techniques in the second link, using OpenGL.Just for you Rob, I clicked the link and have forever tainted myself. Yikes! I'm so glad I work on OpenGL. :D

I can't say that I fully understand the terminology on that page, so let me describe the instancing options we have considered for OpenGL.

1) Frequency divisor instancing. This is what we attempted in NVX_instanced_arrays. Feedback we received was that this fixed-function approach is extremely limiting and inflexible. Performance was disappointing; there was a small upside in some cases but a larger downside in others. We consider this a dead end and pulled the extension.

2) Pseudo-instancing. Enabling vertex arrays for the per-vertex values, the draw call is invoked in a loop, setting per-instance values with immediate mode calls. This is reasonably efficient and requires no API support.

3) Shader instancing. The vertex shader has access to a vertexID and an instanceID which can be used to fetch per-vertex or per-instance values from a vertex texture or a buffer object (PaBO or TexBO), giving maximum flexibility and excellent performance on SM4 hardware. This technique can be emulated on SM3 hardware by explicitly passing a vertexID and instanceID to the shader, although this is less efficient. This seems like the best long-term technique and is available today.

Michael Gold
02-28-2007, 05:06 AM
Originally posted by Overmind:
The whole semantic of binding is anoying.
Originally posted by V-man:
I don't like enabling arrays.Agreed on both counts.

Can't all this be collapsed into 1 call?Yes! While details are still being discussed, we plan to offer a VAO, or vertex array object, which encapsulates the many existing pieces of VA state into a single object, thus collapsing a large number of discreet calls into one.

Question for the community: would anyone be terribly upset if we dropped support for client arrays and required all vertex attribs to live in buffer objects? At a minimum we'd like to drop support for a mix of client and server arrays in a single state vector; better still is to drop support for client arrays altogether. We need feedback on these ideas.

RigidBody
02-28-2007, 05:18 AM
michael,

i've made an app for linux systems which has to run via network (it runs local too, but it still needs network capability).

the first release used display lists, which showed good performance.

i added vbo support to improve performance; worked better but of course only when run locally. vbo obviously works only with a direct gl context, and a remote computer can not create a direct gl context on a local computer/x-server.

then i read somewhere that display lists will not be supported in future opengl, so i tried vertex arrays. the performance was worse than bad. apparently the vertex array has to be sent from the remote to the local computer for each frame (a display list, i guess, is stored on the local computer's x-server- so only a simple glCallList has to be sent over the network, which minimizes network traffic).

is network capability taken into account at all for further opengl development?

elFarto
02-28-2007, 05:37 AM
Originally posted by Michael Gold:
would anyone be terribly upset if we dropped support for client arrays and required all vertex attribs to live in buffer objects?I'm guessing the only reason people use them is because they're there.

As long as it's simple to update the buffer with new data (the only reason I see to use client side arrays is if your changing the data every frame), I see no problem in removing client side arrays.


Originally posted by RigidBody:
...vbo obviously works only with a direct gl context...I believe VBOs and PBOs lack GLX support. There is some information on this in the PBO extension document. I really can't see them dropping support for GLX anytime soon.

Regards
elFarto

Michael Gold
02-28-2007, 06:00 AM
vbo obviously works only with a direct gl contextVBOs are intended to work on indirect renderers. As I've been toiling primarily under Windows for the last decade, I'm not familiar with the GLX limitations to which our flatulent friend alludes.

As long as it's simple to update the buffer with new data, I see no problem in removing client side arrays.Is BufferSubData easy enough? Rob and I have been debating other mechanisms for efficient updates on a direct renderer, but its not clear what will come of this.

elFarto
02-28-2007, 06:15 AM
Originally posted by Michael Gold:
vbo obviously works only with a direct gl contextVBOs are intended to work on indirect renderers. As I've been toiling primarily under Windows for the last decade, I'm not familiar with the GLX limitations to which our flatulent friend alludes.


ARB_pixel_buffer_object specification, GLX Protocol section:
ARB_vertex_buffer_object has similar issues and lacks specified GLX protocol for its functionality.I take this to imply that no GLX protocol information has been designed for these 2 extensions and therefore cannot be use on a remote connection.


Originally posted by Michael Gold:

As long as it's simple to update the buffer with new data, I see no problem in removing client side arrays.Is BufferSubData easy enough?Yep :D .

Regards
elFarto

Zengar
02-28-2007, 06:31 AM
Client side arrays should vanish :-) I use .NET, and they really get in the way...

Overmind
02-28-2007, 06:36 AM
I don't think client side arrays are necessary. The data has to be sent to the server anyway, and forcing the application writer to do it explicitly prevents nasty performance surprises.


Is BufferSubData easy enough?For streaming data, MapBuffer would be nice, too ;)

At least a write-only MapBuffer, I can see there might be performance troubles with read-write mappings.

Of course, for an indirect renderer the copy still has to happen, but the API would still work (just allocate a temporary buffer on Map and copy on Unmap).

RigidBody
02-28-2007, 06:38 AM
Originally posted by elFarto:
I take this to imply that no GLX protocol information has been designed for these 2 extensions and therefore cannot be use on a remote connection. i doubt that the problem is incompleteness of the GLX protocol. i rather think that a vbo needs access to graphics hardware, and for security reasons the x server denies hardware access from a remote computer. remember, linux is a safe os ;)

anyway, is there any chance there will be a way to store static vertex data on the server, so that it has to be transferred only once over the network? this is possible with a display list, but rumour has it that there will be no display lists any more.

Cyranose
02-28-2007, 06:45 AM
Michael, dropping client arrays is good. The only reason(s) I keep them now is as a fallback for the less "compliant" HW+drivers out there. Making the new GL spec narrower should theoretically help those other companies make better drivers. :)

But also, there's a big win for MP/bus safe programming when we keep our app copy of vertex data separate from the render-side copy. An explicit Map+BufferSubData is a good sync mechanism, better for me than VAR (fast as it may be).

RigidBody
02-28-2007, 06:51 AM
Originally posted by Michael Gold:
[QB] VBOs are intended to work on indirect renderers. As I've been toiling primarily under Windows for the last decade, I'm not familiar with the GLX limitations to which our flatulent friend alludes.intended- indeed, and i hope the intention will result in a working solution ;) i took a look into the GL_ARB_vertex_buffer_object spec in the extensions registry, and it says:


How does indirect rendering work?

It is not currently specified, but the basic planned outline is
as follows. [...]well, at least there's some hope that vbos will be implemented for indirect rendering.

Michael Gold
02-28-2007, 07:03 AM
Originally posted by RigidBody:
anyway, is there any chance there will be a way to store static vertex data on the server, so that it has to be transferred only once over the network? this is possible with a display list, but rumour has it that there will be no display lists any more. This is how buffer objects are intended to work. BufferSubData obviously copies the data to the server. MapBuffer might not work so well, however. A write-only MapSubBuffer sounds like a good idea to me, but on an indirect renderer this is no different from BufferSubData.

Its not clear what will happen with display lists. They may or may not go away but in any case they won't work like they do today.

ccbrianf
02-28-2007, 07:10 AM
Originally posted by Michael Gold:
A write-only MapSubBuffer sounds like a good idea to me, but on an indirect renderer this is no different from BufferSubData.Edit: Sorry, you are correct for an indirect renderer.

BufferSubData requires an extra copy by the CPU to system memory that a write only mapping does not. Memory writes are expensive when processing large amounts of data and should be optimized away when possible. A direct write is a real performance savings.

Michael Gold
02-28-2007, 07:52 AM
A write-only mapping may provide a piece of uncached, write-combined system memory from which the GPU may DMA directly to a video memory buffer object. A direct write by the CPU across the bus to video memory is often much slower than performing this DMA. It depends greatly on the CPU, GPU and chipset.

Zengar
02-28-2007, 07:53 AM
What about immediate mode? Will it remain? What I like a lot is the attribute registers paradigm, will it stay? If you plan on leaving the immediate mode in the API, i would suggest an additional command for vertex submitting, and not like it is now with glVertex(...) or attribute 0

I would like to see immediate mode in Longs Peak to be honest, it should put no strain on driver developers but can be very handy. What I always liked about GL was it's verbosity combined with ease of use, immediate mode and attribute registers being an important part of it.

Michael Gold
02-28-2007, 08:01 AM
Immediate mode remains an open question. While its clearly easy to use, it has not been the performance path for quite a few years. I could imagine a client library which emulates immediate mode and builds batches into vertex arrays. Would people find this objectionable? You get to keep your ease of use and the driver need not try optimizing the unoptimizable. One ramification of this might be an upper limit on the number of vertices allowed between begin/end.

One goal of this cleanup is to eliminate gratuitous flexibility. When we provide too many mechanisms for accomplishing the same task, its less clear to developers which path to choose, and its more difficult for implementors to optimize all the paths.

Rob Barris
02-28-2007, 08:46 AM
Originally posted by Michael Gold:

Originally posted by Rob Barris:

Originally posted by Michael Gold:
And as soon as it does, we'll add API support. :D
Hardware provides things like the vertex stream frequency divider:Let me qualify my remark. We add core API support for hardware functionality which has long term utility. Short term hacks with no future do not belong in the core. For better or worse, another API reinvents itself every two years, and throws away features that didn't work as expected. The OpenGL philosophy differs; we don't like breaking backward compatibility, and Longs Peak marks the first and only major discontinuity in fifteen years. We don't wish to make this a habit. Experimental functionality belongs in experimental extensions (e.g. NVX_instanced_arrays).

I would think the exposure of instancing-related hardware capability could be considered complete, if we could express all four techniques in the second link, using OpenGL.Just for you Rob, I clicked the link and have forever tainted myself. Yikes! I'm so glad I work on OpenGL. :D

I can't say that I fully understand the terminology on that page, so let me describe the instancing options we have considered for OpenGL.

1) Frequency divisor instancing. This is what we attempted in NVX_instanced_arrays. Feedback we received was that this fixed-function approach is extremely limiting and inflexible. Performance was disappointing; there was a small upside in some cases but a larger downside in others. We consider this a dead end and pulled the extension.

2) Pseudo-instancing. Enabling vertex arrays for the per-vertex values, the draw call is invoked in a loop, setting per-instance values with immediate mode calls. This is reasonably efficient and requires no API support.

3) Shader instancing. The vertex shader has access to a vertexID and an instanceID which can be used to fetch per-vertex or per-instance values from a vertex texture or a buffer object (PaBO or TexBO), giving maximum flexibility and excellent performance on SM4 hardware. This technique can be emulated on SM3 hardware by explicitly passing a vertexID and instanceID to the shader, although this is less efficient. This seems like the best long-term technique and is available today. This is encouraging but as you can imagine I'm most interested in accelerating older SM3 hardware, which may offer the frequency divisor but not the SM4 capabilities.

Do current (SM4) and future hardware designs still retain the frequency divisor circuitry for upward compatibility?

k_szczech
02-28-2007, 08:50 AM
we can just capture the output of the geometry and vertex shaderLet's assume I have water surface based on simple heightmap. Now I want to add circular wave on it.
If I can render to vertex array then I simply render one round shape with triangle strip and blend it into texture which is also my vertex array.

Currently we have to copy a square portion of texture that have been modified into vertex array. Kinda waste of memory bandtwidth, especially, when circle becomes large (remember that it's empty inside).

Other option is to use VTF in vertex shader. We can send 2D vertex array and fetch third coordinate from texture - will be a bit slower than just using 3D vertex array I think, but still ok.

The option you suggest also requires VTF, but only once - during update. To implement it, hovewer, one has to create list of polygons in the water surface affected by that update and render only these polygons. You could render one quad, covering affected area and use geometry shader to produce more polygons, intercept output and write back to vertex array in proper location. Will work, but it's just a bit complex approach to simple problem.

That's why my opinion is that direct rendering to vertex array is a good thing. There are ways around it (like VTF), but none of them beats direct render to VA (VTF is very close).

I think you'll agree, that the only thing that is against R2VA is hardware limitations. I believe overlapping RGBA32F texture with 4D vertex array is not a problem as long as texture's width is power of two (perhaps in future it would be also 'at least 2', 'at least 4' or so). I can see why such solution shouldn't be in core API, but perhaps NVIDIA and ATI could support an extension that would define new template type, that allows just this one format of data to be overlapped. If for some reason (like data alignment, texture not stored as rows/columns but blocks, wrong row pitch or anything else) creation of such object wouldn't be possible - return GL_INVALID_VALUE. Even if future GPU's wouldn't support this extension anymore, that would be exactly what we have now - use extension if available or fall back to core API.

So perhaps this is no longer "Longs Peaks" discussion, but a suggestion for an extension.


would anyone be terribly upset if we dropped support for client arrays and required all vertex attribs to live in buffer objects?The only thing I'm mixing right now is to have all vertex attributes in VBO ad pass index array from client using glDrawElements - it's usefull for frustum culling. I was also using client side arrays for meshes that get updated every frame, but it this case I could just upload array once every frame, which is more less what was actually happening. No, I won't be upset :)

Korval
02-28-2007, 09:45 AM
If any one of you could propose a "top 5 list" of things you either like or dislike about the way VBO works now in OpenGL, could you take a few moments to type it up? Now's a great time to possibly spot any lingering issues that have been overlooked to date.Sure.

1: Buffer Object hints. I understand the idea behind hints for behavior specification. However, for constructs as performance critical as buffer objects, a greater degree of specificity is paramount. The spec tries to explain what the hints expect from the user with regard to how it will be used, but even so, it's hard to get consistent cross-platform behavior with anything that isn't, "I only intend to draw from this buffer."

2: Lack of offset functionality. This has been often discussed here, but in several different ways, so let me be clear about what I'm asking for. When you call glDraw* (or its Longs Peak equivalent) to draw with the currently bound vertex arrays (or its Longs Peak equivalent), you should be able to pass an offset that is added to each index before fetching from the buffer object. D3D has this, so it should be no problem for GL to implement it too.

3: Vertex Attribute 0 and shaders. Not fully virtualizing the concept of attribute 0 was required in order to support glBegin/End and vertex shaders. That is, if you're not using the standard vertex attributes, you have to specify that one of the generic ones is attribute 0. We don't need that in Longs Peak, and I want the vestiges of it gone.

4: Map/Unmap. The idea of buffer mapping not working out (ie, the data was not uploaded) is not good. It makes the entire concept seem incredibly unreliable. If it can't be relied upon, it shouldn't be in the API, and if it can, then it should not be able to fail.

5: There is no 5.


would anyone be terribly upset if we dropped support for client arrays and required all vertex attribs to live in buffer objects?I'd be terribly upset if you didn't. There should be no need for such an API under Longs Peak.


I think you'll agree, that the only thing that is against R2VA is hardware limitations.I don't think you're understanding what capturing the result of pre-rasterization data means.

It gives you render-to-vertex-array, but without the problem of having to actually pretend a generic buffer object is an image (or vice-versa). You can write any vertex data you want to it. Through the use of geometry shaders, you can generate extra vertices. You can access textures. You have every tool you need to do what you want.

And the best part: it actually exists. Unlike render-to-vertex-array, which doesn't exist.

Rob Barris
02-28-2007, 09:59 AM
can you elaborate on what kind of buffer hints exactly ? (#1 on your list)

Jon Leech (oddhack)
02-28-2007, 10:10 AM
Originally posted by elFarto:
I believe VBOs and PBOs lack GLX support. There is some information on this in the PBO extension document.You are correct, GLX protocol has not been defined for them yet. It is tricky, not least because of the mixed client+server VBO array case. Ian Romanick at IBM has been working on this for Mesa and X.org, and has at least reached the point of a working prototype (last I pinged him about it was about 6 months ago). When he's comfortable that their implementation is working robustly, the ARB will sign off on the protocol. All the X.org/DRI-based drivers should pick this up as soon as it goes into a public X.org distribution, and I'd expect NVIDIA will also support it quickly at that point.

So basically, people who need VBOs in indirect rendering cases should make their case over on the X.org and/or Mesa developer lists.

barthold
02-28-2007, 10:21 AM
Originally posted by RigidBody:
then i read somewhere that display lists will not be supported in future opengl, Please don't assume that. We're not done with the display list design. Display lists do have utility, especially when rendering multiple primitives with only a few vertices per primitive in one glCallList() call. Nothing in OpenGL beats the efficiency of rendering such batches.

Some ideas thrown around (and again, this is not decided upon at all yet) for display lists:

1) Limit display lists to only store geometry data. I.e. no more storing GL state commands in a display list.
2) Get rid of the COMPILE_AND_EXECUTE mode. Only have the COMPILE mode.
3) Require that a display list stores a complete primitive. For example, in GL 2 you can write the following code:

glBegin()
glNewList()
<bunch of vertex data>
glEnd()
glEndList()

Where the glBegin is outside of the display list. The idea is that the glBegin() is required to be inside the display list as well, effectively fixing the primitive type.

Barthold

Michael Gold
02-28-2007, 10:23 AM
Originally posted by Rob Barris:
Do current (SM4) and future hardware designs still retain the frequency divisor circuitry for upward compatibility? Naturally, its completely different from the SM3 implementation. This kind of "oops!" in API design is exactly what we don't want to copy. The SM3 design wasn't very useful, and the SM4 design is uninteresting. In truth, NVX_instanced_arrays was implemented in software for GeForce 7xxx, anticipating the SM4 functionality. In the end it proved lackluster enough that we killed it. Despite the man months I personally invested in writing and implementing the spec, I wasn't sorry to see it go.

Rob Barris
02-28-2007, 10:29 AM
The SM3 design (stream freq divisor) is currently providing a 4:1 reduction in CPU-sourced particle system bandwidth (streamed verts) for a game we are working on, and from that POV it's very useful.

k_szczech
02-28-2007, 10:47 AM
I don't think you're understanding what capturing the result of pre-rasterization data means.GL_NV_transform_feedback? Just guessing... :D
Just think of the example I mentioned. How much coding you need to get it working this way?

Ok, I won't insist. I'm stil convinced it would be better if we have it, but I also agree it's little gain over what we have now (but still gain :) ).

Zengar
02-28-2007, 10:55 AM
I my say that my "5. Top" list overlaps with the one of Korval :-) What I would like to add: please remove the selection and feedback mode, but I guess you would anyway :-)

k_szczech
02-28-2007, 11:12 AM
1) Limit display lists to only store geometry data. I.e. no more storing GL state commands in a display list.
2) Get rid of the COMPILE_AND_EXECUTE mode. Only have the COMPILE mode.
3) Require that a display list stores a complete primitive.#2 and #3 seem obvious.
As for #1 I think you're in for some though decisions. I'm for requiring glBegin/glEnd to be paired inside a list and disallowing glCallList inside glBegin/glEnd block. As for attributes - sometrimes you want to have glMultiTexCoord(GL_TEXTURE0...) in the list, but glMultiTexCoord(GL_TEXTURE1...) outside a list. And there is also current state. Let's say I'm rendering an object that has a few polygons that use blending - should I be forced to make two display lists out of it?

Korval
02-28-2007, 11:54 AM
can you elaborate on what kind of buffer hints exactly ? (#1 on your list)Um, all of them? You know, STREAM/STATIC/DYNAMIC_DRAW/READ/COPY_ARB?

I want the hints to be requirements, for them to actually mean something strict to the API.

As it stands now, you can use a buffer set to STATIC_DRAW_ARB just like you would one set to STREAM_READ_ARB. They are usage hints, rather than something firm.

The main problem I have is not knowing which usage hint to use for data that may not change often, but does occassionally. Like if you're building a streaming terrain engine, where each sector is the same size. According to the spec, it's not STATIC_DRAW, because you're going to occassionally (at most, once every 3-5 seconds) change the data. But DYNAMIC_DRAW doesn't have the performance characteristics of STATIC_DRAW, even in this situation.

God only knows what the right usage hint for a buffer meant to be used for occasional geometry readback is.

The need to have to fiddle with hints to get the right performance, and to have to do so cross-platform, is the problem.


As for #1 I think you're in for some though decisions.Considering that Longs Peak is all about making GL state commands go away entirely (or almost entirely), I don't see the problem. States become objects, which in GL 2.1 parlance are tiny display lists. A geometry-only display list would just be an alternative to using VBOs.

Michael Gold
02-28-2007, 12:06 PM
Originally posted by Korval:
I want the hints to be requirements, for them to actually mean something strict to the API.Indeed. The wrong hint is worse than no hint.

k_szczech
02-28-2007, 12:21 PM
I hate to be pain in the... But it doesn't matter what we use to change state.
Consider the example I mentioned - object with few polygons requiring alpha-blending to be enabled. It's a waste to have blending enabled on all polygons. If display list will only represent vertex data, then we have to split this object into two display lists.
It's true that when using VBO you have to make two draw calls in such case. But why should this also be true for display lists? Note the programming issues - when your object can be drawn without changing states, and later you modify it and state change is required - you have to modify your code to manage two display lists now.
I believe state changes is something that can be optimized, too. AFAIK display lists can also perform frustum clling - why do it twice for one object?
This is why I think that #1 requires a bit more consideration than #2 and #3, but I don't say it's a bad idea.

Korval
02-28-2007, 12:49 PM
But why should this also be true for display lists?Because the purpose of the Longs Peak display list should not be to represent a bunch of GL function calls. It should be a new construct, designed as a mechanism for allowing a driver to optimize a group of vertices that will be used to render something. Attempting to do too much was what made the old display list a failure.

The old purpose of a display list is no longer needed, with the liberal use of objects.


I believe state changes is something that can be optimized, too.Well, I'm willing to listen to the ARB when they say that it is not necessary ;)


AFAIK display lists can also perform frustum clling - why do it twice for one object?With the (presumed) removal of the per-vertex state that display list culling relied upon, any Longs Peak display list will also not have frustum culling.

k_szczech
02-28-2007, 01:19 PM
Well, perhaps I'm outdated here :) One of the reasons I think state changes could be optimized in display list is that after you change state you can end up in something that cannot be run in HW and driver must verify it. Perhaps new rendering contexts will have some more strict requirements regarding this. Guess I'll just wait and see.
Thanks for your patience.

Michael Gold
02-28-2007, 01:22 PM
The new object model attempts to address state changes in a manner more efficient that display lists ever could.

Flavious
02-28-2007, 01:35 PM
Question for the community: would anyone be terribly upset if we dropped support for client arrays and required all vertex attribs to live in buffer objects?I'd be delighted.

And on a side note, I'm really very fond of the VertexAttrib* API in 2.0. I think once we're rid of that pesky attribute 0 business, we'll be home free on that front (it's actually pretty easy to get around, but pesky nonetheless :) ).

Jan
02-28-2007, 03:07 PM
* Of course VBOs should be easier to bind, as stated before.

* I would be happy to see client-side arrays go away, at last.

* I would also like to see display lists become something much more focused/simpler, for ONE special purpose. And please, no state-changes inside them.

* In my opinion the shader-based instancing method is best.

* I really want to be able to pass an index-offset to a draw-call (as discussed in other threads).

* For VBOs i think glBufferSubData (or so) is usually very well, but mapping the buffer for write-only access would be very useful indeed. Maybe with an additional hint, that the whole existing content of the buffer can be discarded and only what will be written needs to exist in the buffer afterwards.

* Better hints for VBOs would be great.

* I'd like to see a "black box VBO". For typical data, i don't mind how the layout of the vertex-array is. I just want to pass my position/normal/color/.. data to the GPU, i don't need to control in which order they are stored and most of all, i don't want to think about how to arrange/interleave data, to make it fit well into memory. If possible, let the GPU decide on that.

* I'd like to see a trimmed down list of texture-formats. And please, only formats, that REALLY make sense. And i'd like to see more useful compressed texture-formats, that are not vendor-specific.

* Also, i'd like to see compressed vertex-arrays (normals), if that makes sense.

* In my opinion the stream-out method, as D3D10 features it, is the best method for R2VB, i don't like vertex-textures, and rendering to an image/vertex-hybrid-format is just ugly. And admit it, it will never be supported on all hardware equally.

* I DON'T want to see immediate-mode in gl anymore. In my opinion immediate-mode should be put into glu or something similar. I DO want to be able to use immediate-mode, it is just the most practical thing about OpenGL, but i think it shoud be "outsourced".

* I'd like to have a completely new and easy way to set up my OpenGL window, it should be more or less equal on every OS, encapsulating all the nasty bits inside a DLL.

That's just from the top of my head, there's certainly much more.

I appreciate, that the community is asked for feedback, that's how it should be.

Jan.

Korval
02-28-2007, 03:27 PM
I'd like to see a "black box VBO". For typical data, i don't mind how the layout of the vertex-array is. I just want to pass my position/normal/color/.. data to the GPU, i don't need to control in which order they are stored and most of all, i don't want to think about how to arrange/interleave data, to make it fit well into memory. If possible, let the GPU decide on that.Wouldn't that be, pretty much, what a geometry-only display list is?

Not that I'm against it, mind you.

V-man
02-28-2007, 06:04 PM
Originally posted by Overmind:
I think it's pretty obvious, but here is it anyway:

The whole semantic of binding is anoying.

glBindBuffer has no immediate effect, the current array is only effected when gl*Pointer is called again.

It should be the other way round, I'd like to specify the format (the equivalent of the current gl*Pointer) just once at creation time. Then on usage a single glBindBuffer should be enough. We have to able to offset into the array since I'm using 16 bit indices, so those gl***Pointer calls become necessary, however, I prefer a single call rather than 9 or 10.

Michael Gold
02-28-2007, 07:24 PM
Why is it important to offset into an array? Not saying we won't support it but I'm curious. If you are storing multiple batches of data in a single buffer, is this because of the cost of switching buffers? If switching was fast, would offsetting still be important?

Korval
02-28-2007, 08:48 PM
Why is it important to offset into an array?Some research:

Thread 1 (http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=3;t=014831;p=2) (pages 2&3)
Thread 2 (http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=3;t=012219)
Thread 3 (http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=3;t=014659)

Pretty much every conceivable thing about this particular feature has been said.

We don't need this functionality if you can guarantee the following from all implementations:

1: Small VBOs have a negligible cost (memory-wise)

2: Switching from one set of buffer objects to another set for rendering (if rendering the same vertex attributes) has a truly negligible performance cost compared to doing an offset. That is, the binding time is comperable to having an offset.

Otherwise there's a valid argument for having the feature.

Furthermore, we've never gotten a straight answer from any IHV on this issue: D3D has had it for quite some time. Why the hesitancy to expose it though GL? I mean, the hardware must already have the ability to pull it off one way or another.

Flavious
02-28-2007, 08:56 PM
If switching was fast, would offsetting still be important?I think we'd all rather have fast switching.

Let's treat the disease, not the symptoms ;)

Cheers

Flavious
02-28-2007, 09:01 PM
D3D has had it for quite some time.Call overhead in D3D is a bit heftier than it is in OpenGL.

Cheers

Flavious
02-28-2007, 09:14 PM
We have to able to offset into the array since I'm using 16 bit indices, so those gl***Pointer calls become necessary, however, I prefer a single call rather than 9 or 10.Would the driver simply iterate over some internal vertex attribute format and bind streams, or would it be something else entirely? A vertex format object would seem to fit the bill in the former case.

Though I think in terms of setting attribute pointers, the VertexAttrib API is awesome, despite the (meager) overhead of a few calls.

But I agree that in the case of APIs, less is more :)

Cheers

RigidBody
02-28-2007, 11:25 PM
Originally posted by barthold:
We're not done with the display list design.nice to hear that. display lists are just great. you can accelerate existing code by just inserting two commands.

btw: i have never used GL_COMPILE_AND_EXECUTE ;)

Jan
03-01-2007, 12:40 AM
About index-offsetting: It is all said in the threads Korval gave links to. But even if switching buffers would be really really fast (which i wouldn't mind ;-) ), it then would mean, that i needed to slice my data into several hundreds of buffers, just to reuse the same indices on different parts of the data.

However, slicing my data and putting it into many buffers has two drawbacks:
* I cannot operate on ALL the data at once, anymore, i need to iterate over many buffers just to do one basic operation on all of it.
* I need to manage many buffers from inside the applications, in contrast to just having one buffer, calculating the offset, when needed and that's it.

So, even if switching buffers was really fast, with this approach you would still increase the workload of the CPU, AND making things possibly more complicated for the programmer.


In the end, it would be best, if you just had this option, it would solve some important issues and it doesn't introduce new ones, as slicing the data would.

And, as said many times before: No one ever told us, what's the problem with this feature. Everyone just asks "do you really need this??", no on says, whether there are any issues regarding it. If there are issues with it, one could have an additional draw-command for it, so that the hardware really only has to reset the offset, when the user actually requested it, using a dedicated draw-command.

Jan.

Jan
03-01-2007, 12:47 AM
Originally posted by RigidBody:

btw: i have never used GL_COMPILE_AND_EXECUTE ;)
Me neither.

elFarto
03-01-2007, 12:55 AM
Originally posted by Jan:
I'd like to see a "black box VBO".You mean, like the 'GBO' method I stated earlier :D . To create a GBO you'd need a VBO, a VAO (Vertex Array Object, see Michael's last post on the first page) and the type of polygon it is (GL_TRIANGLE, GL_QUAD, etc...).

This covers (what I believe) is one of the more common usages of VBOs and display lists, static geometry.

Regards
elFarto

k_szczech
03-01-2007, 02:13 AM
About that "black box VBO" - I think it's not that different from what we have now. Look at textures - you specify internal format and format of data you pass from the client side.
With VBO you only specify format of data on client side, which you must do so the driver knows how to interpret data you pass.

Compressed data in vertex arrays? Good idea, but it's allready possible with vertex textures. Although compressed vertex arrays would be more effective I think.

V-man
03-01-2007, 05:15 AM
Originally posted by Michael Gold:
Why is it important to offset into an array? Not saying we won't support it but I'm curious. If you are storing multiple batches of data in a single buffer, is this because of the cost of switching buffers? If switching was fast, would offsetting still be important? Yes, multiple objects in one VBO. All those objects would be the same vertex format.
In parallel, the same happens to the indices.
It reduces GL calls greatly.
I think everyone has objects with 100 vertices or some such low number.

I don't know what Doom 3 and other big engines do. I guess Doom 3 is quite old now.

Michael Gold
03-01-2007, 05:28 AM
Originally posted by Jan:
About index-offsetting: It is all said in the threads Korval gave links to. But even if switching buffers would be really really fast (which i wouldn't mind ;-) ), it then would mean, that i needed to slice my data into several hundreds of buffers, just to reuse the same indices on different parts of the data.
OK, I think two different concepts are being confused here. One is the ability to offset the base of a VBO (which you can do today with the *Pointer calls). The other is the ability to add an offset to the index values. As I understand it, the former is being used to emulate the latter, is that correct?

V-man
03-01-2007, 05:33 AM
Originally posted by elFarto:

Originally posted by Jan:
I'd like to see a "black box VBO".You mean, like the 'GBO' method I stated earlier :D . To create a GBO you'd need a VBO, a VAO (Vertex Array Object, see Michael's last post on the first page) and the type of polygon it is (GL_TRIANGLE, GL_QUAD, etc...).

This covers (what I believe) is one of the more common usages of VBOs and display lists, static geometry.

Regards
elFarto Black box VBO would help for static geometry but for people who want to stream, it is a problem.
The same can be said for people who want to play back video using glTexSubImage2D.

The easiest solution is tho know what format and alignment the GPU wants.
OK, we all know BGRA8 is the fastest for streaming textures.

What's a GBO? This is a geometry shader thing?

Michael Gold
03-01-2007, 05:34 AM
Originally posted by V-man:
In parallel, the same happens to the indices.To be clear, people are asking for an offset to the indices in the element array, not the base of the arrays, correct? If you can offset the indices, there is no need to offset the array base also: The index offset is used for DrawElements, and DrawArrays has an explicit start parameter. I suppose the state could be defined as offsetting the start of DrawArrays too.

So let me rephrase my question. If we exposed an ElementBase(), is there a need to offset the base of a VBO?

Please understand that I am not suggesting we don't want to do this; I simply wish to understand what is required to accomplish common usage. There are some downsides to offsetting the array base (e.g. hardware alignment constraints).

elFarto
03-01-2007, 05:54 AM
Originally posted by V-man:
What's a GBO? This is a geometry shader thing?A 'Geometry Buffer Object', it's not actually in OpenGL, it's just an idea I had that I posted on the first page of this thread.

Regards
elFarto

skynet
03-01-2007, 07:01 AM
So let me rephrase my question. If we exposed an ElementBase(), is there a need to offset the base of a VBO? Yes, you still need the base offset into the vbo. Otherwise it would be almost impossible to store multiple geometries (all differently structured) into one VBO. ElementBase() is only suitable, if _all_ vertices inside the vbo have the same structure. Also, how would you setup interleaved data if there´s no buffer offset?
Just give us both, the offset and the base element :-)

In our case we use a fixed number of VBOs with fixed size to manage a whole lot of dynamically uploaded geometry. Since each chunk of geometry might have a different layout, we make heavy use of the buffer offset.

elFarto
03-01-2007, 07:04 AM
Originally posted by skynet:
In our case we use a fixed number of VBOs with fixed size to manage a whole lot of dynamically uploaded geometry.Why do you not use a single VBO for each uploaded piece of geometry?

Regards
elFarto

Overmind
03-01-2007, 07:09 AM
If we exposed an ElementBase(), is there a need to offset the base of a VBO?I find the idea to use anything else than zero as base pointer to a VBO very strange. I think what everyone really wants is an offset to the indices.

But I still have not understood why everyone wants this. The number of GL calls can't possibly be an argument, as changing the offset needs a GL call and binding a different buffer also needs a GL call. Difficulties of managing multiple buffers also sound strange, because the offsets have to be managed somehow, too.

So it really comes down to the following question: If you can switch all active arrays to another VBO with a single call, and this call has the same performance as the hypotetical glSetBaseIndex, would you still need a base index?

And of course the equivalent question for the implementors: Would you be able to provide a method for switching to another VBO with the same format that has the same (or even better) performance as setting a base index?

Overmind
03-01-2007, 07:13 AM
Too slow ;)


Otherwise it would be almost impossible to store multiple geometries (all differently structured) into one VBO.Not almost, it would be impossible, and that's good. A single VBO should have one format. As I understand it, you are using VBOs like VARs. Memory management is the responsibility of the driver, not the application.

Michael Gold
03-01-2007, 07:20 AM
Originally posted by skynet:
Also, how would you setup interleaved data if there´s no buffer offset?Duh. Note to self: coffee before posting.

No promises on ElementBase, but I'll see what I can do.

Overmind
03-01-2007, 07:28 AM
Duh. Note to self: coffee before posting.Hehe. Same here :p


Also, how would you setup interleaved data if there´s no buffer offset?I'd say that's part of the format description and therefore should be immutable, but of course you are correct that it's needed.

skynet
03-01-2007, 07:43 AM
I doubt that the opengl memory manager would be happy when I started to regularly create and delete several thousand VBO´s (different size each time). Fragmentation would occur quickly. The other suggestion (using only one huge VBO) doesn´t work either... I doubt it would be possible to allocate a single 400mb VBO :-)

What I liked to see (in this context, where video ram is only used as temporarly geometry cache, not long-term storage of geometry) would be to disallow the driver to make shadow copies of the VBO in system memory, even if this would mean to deal with "lost" buffers (which in my case would just be a "cache miss" which I have to cope with already)

Korval
03-01-2007, 09:29 AM
To be clear, people are asking for an offset to the indices in the element array, not the base of the arrays, correct?I thought I spelled it out about as simply as I could:


When you call glDraw* (or its Longs Peak equivalent) to draw with the currently bound vertex arrays (or its Longs Peak equivalent), you should be able to pass an offset that is added to each index before fetching from the buffer object.If specified in terms of standard OpenGL 2.1 lingo (ie, relative to glArrayElement), it would be the following:

glDrawElementsOffset(GLenum mode, GLsizei count, GLenum type, const GLvoid *indices, GLsizei offset);

Functions as glDrawElements, except the array index fetched from the index array will be incremented by 'offset' before being passed to glArrayElement.

That is the functionality being requested.

Cyranose
03-01-2007, 12:19 PM
Michael, FWIW, I agree with Korval on the usefulness of offsetting. And I'd love bind overhead to be minimized in any event. But I want to take a step back to look at the bigger picture. Maybe this will help.

No matter how fast binding is, I'll always find cases where lots of semi-static objects have common state and should ideally be "compiled" together into bigger vertex arrays. Practically speaking, right now, that means allocating them out of one or more VBO-heaps and even pre-transforming vertices to factor out matrix changes (not as necessary now with some VP magic). The problems always come in when these objects change detail, appear, disappear, etc.. In the case of Second Life, that was a major PITA and the cause of much benchmarking.

What I'd love this new "object" approach to do is abstract and optimize the heap stuff away.

If I have N objects of identical state and vertex layout (but with varying or even dynamic length), there should ideally be a way for me to specify each of those objects as if it was in its own buffer-space (i.e., indices local to the start of the object), but yet pool these into big enough batches for the big win.

I think that's the goal for a lot of us, though most of us are used to doing it the hard way. If you can't tell, I've written the heap thing like three times and I'm kind of done with that. :)

What I think this implies is another concept: lighter-weight than VBO but broader too, in that vertices and indices would be co-managed as "geometry," and where the driver can do the offsetting based on actual runtime allocation within bigger buffers.

So for example, we'd first bind a VBO, which might ideally contain a master template for its full memory/vertex layout (not like DX if possible). As long as the binding API allows me to select which actual attribs (bitmask, list...) I want to enable & use at any given time, that should be flexible enough.

Then, after the VBO is bound, we'd ideally isssue the glDrawLWBO(handle) style calls, changing state in-between as needed. The LWBO/GBO would ideally encapsulate the begin/end,index-offset stuff to keep the indices local to the LWBO/GBO verts -- this avoids us having to know where a LWBO/GBO winds up within the bigger buffer. Ideally, each LWBO/GBO can also be resized, with all that implies. And there's nothing wrong with having only one (or zero if some people feel the need) LWBOs per buffer to act like the current method.

It wouldn't kill things if the indices were automatically transformed on definition to make this work. But if there's a built-in index+offset, all the better.

[edit: the reason why I say "no matter how fast binding is" is that there are often times where I'll render the same data once as a monolithic buffer and other times I'll change state intbetween bits of geometry -- the bigger buffer so far has always been faster.]

Jan
03-01-2007, 01:27 PM
What i would need is to add an offset to each index, when rendering. Just the way Korval explained it nice and clearly.

Of course you can "emulate" this using the gl*Pointer way. I wouldn't need to actually have the base-pointer changed. I don't know why one would need that. Mixing vertex-layouts in one VBO is a disgusting idea, and i don't know why one would want to do that. Since i would need to respecify all the gl*Pointer stuff, when i want to change the vertex-format to use, i can also switch to another VBO. But that's only my opinion, maybe someone has good reasons to do this, but i cannot imagine any.


What it comes down to, and was mentioned several times before: One wants to pass indices to the GPU and they should be interpreted relative to some address in the VBO, instead of absolute (or relative to 0). The intention is not to reduce the number of drawcalls but simply to reduce complexity/overhead on application-side. It has nothing to do with increasing rendering speed, but to support programmers doing complex tasks.

Jan.

tranders
03-01-2007, 02:00 PM
I'll put in my two cents ...

1) Display lists are my bread and butter and shouldn't change drastically from what they are today (e.g., attribute stacking, transform stacking, mode blocks, etc.) -- record once, play many

2) I don't tend to use VBO's or ClientArrays, but forcing interleaved data for vertex, color, normal, texture, etc. really balloons the contiguous size of storing a complex model -- this will put a strain on memory management trying to locate contiguous blocks of memory. It can also cause cache thrashing when manipulating only one attribute.

3) While immediate mode is not a performance winner by any means, it is still a heck of a lot easier to use when managing dynamic data and makes it trivial to do quick and dirty tasks that only require a small amount of graphics (e.g., overlays, HUD displays, etc.)

4) Please do not implement anything like was mentioned in one response that suggested the concept of a "lost" buffer. Automatic backing is the one aspect that sets OpenGL apart from D3D in terms of simplicity of managing data -- Vista UAC kills all D3D9 devices but OpenGL display lists survive through the secure desktop transition

5) Not that this is part of Longs Peak (or any other proposal -- I really haven't looked), but please don't get rid of the fixed function pipeline like D3D10 has done

6) I know graphic card vendors would like to see OpenGL and D3D migrate to a similar API construct, but from a legacy point of view OpenGL is tons easier to implement because of the wide range of utilitarian functionality (e.g., matrix stacking, generalized display lists, decent integration between the fixed function pipeline and shaders, etc.) -- I would hate for OpenGL to drop the aspects that have set it apart from D3D

-- Tim

Zengar
03-01-2007, 02:15 PM
Tim, I must disagree: the fixed-function pipeline MUST go away. It adds unnecessary API and spec complication and is often misleading. Current hardware has no fixed pipeline anymore and there is no reason for one to be. To be honest, how are you going to model a fixed function pipeline that would suit modern hw?

knackered
03-01-2007, 02:44 PM
tranders, it sounds like the layered mode will be your next API. What's being discussed here is the new low level OpenGL API, which should be a very shallow abstraction of the new generation of hardware that couldn't have even been imagined when GL1.0 was initially designed. Your fixed function, and full state capturing display lists will be external library functions which use the new low-level API to emulate the old fixed function pipeline for you, by creating and updating uniforms whenever you call glLight or whatever. (basically doing the stuff that probably most other contributors to this forum have spent the last 3 or 4 years building into our shader orientated renderers).
If you want your software to 'just work' in the same way it always has, then that's what the layered mode will be for. As far as I know, the layered mode will be written once, probably distrubuted through the same channels as GLUT and glext, and used by all GL3.0 compliant drivers whenever you link to opengl32.dll or whatever.

Michael Gold
03-01-2007, 04:16 PM
Hi Tim,

The current API will still be available and existing applications will run without modification. Using Long's Peak will require special setup. That being said,

1) Display lists are almost certain to change. There are way to many problems with them today - and in general, they may be convenient but they don't buy anything in performance, when used as you describe. And they are a tremendous burden to the implementation. The exception, geometry-only dlists, will likely remain in some form.

2) Getting rid of client arrays doesn't mean data must be interleaved. It just means you need to put the data into buffer objects - all data in one, or a buffer object per attrib, or anywhere in between.

3) Immediate mode is very problematic too... but fortunately can be layered fairly easily. Since its not a performance path anyway, layering it should not be a problem I think. Do you agree?

4) Actually I am thinking that it should be optional; if you want backing store you can set this as an attribute. Otherwise you should be prepared to re-create any lost data. Some apps really don't want the GL consuming memory to back up textures, etc.

5) Fixed function is likely history, but again it can be layered. Fixed function hardware disappeared three generations ago. The driver is generating programs on your behalf to emulate the legacy interface. We're mindful of the fact that many people still prefer fixed function, and we're weighing our options.

6) I hope we can retain most of the advantages of OpenGL while updating the API. Certainly this is a consideration... but the API has enough inefficiencies that its time for a cleanup. I'll disagree that its easy to implement - we believe Longs Peak will be easier.

Feedback is good, even if it does not agree with the masses. :) We're really trying to please a large and diverse community of developers here.

k_szczech
03-01-2007, 04:20 PM
tranders:
1) I use it ocassionally for non-crucial portions of code that need cheap optimizations. I'd like to have display lists in more less the same form they are now, but I don't care if it will be in glu or in some other 3rd party library. I don't even care if it offers any optimizations except for limiting data/command transfer.

2) My opinion: bigger blocks => less blocks => less memory fragmentation

3) I use immediate mode for UI and debugging - as many other people do. I don't need it in the core API though. So again this should be in glu or layered and I believe that's what's gonna happen.

4) Don't worry - I dont think it's gonna happen. I was always wondering what was the idea behind these lost surfaces. Even if Windows suddenly needs some moemory on the GPU it should kindly ask the driver to free some resources so it can make backup copy on the fly. Unless I pull the plug, driver crashes or GPU performs some trashy operation this shoudn't happen. Is Windows ignoring the fact that the driver us there or something?

5) Layered.

6) With lots of old stuff being layered and the core API becoming "lean and mean" it can turn out that OpenGL will be preferred by vendors to implement (if it isn't allready :) )

Korval
03-01-2007, 05:46 PM
from a legacy point of view OpenGL is tons easier to implementNo, you mean easier to use. It's a nightmare for someone to implement (as in write an OpenGL implementation).

BTW, you can pretty much forget about the entirety of your list happening in Longs Peak. LP is about getting rid of basically every feature you asked for. Well, except for #4. More on that below.


tranders, it sounds like the layered mode will be your next API.Actually, I don't think there's going to be a "layered mode" anymore. At least, not one written by the ARB or is in some way official. I think there's just going to be GL 2.1 (which drivers already have), which will be supported for a time, but not improved upon, and there will be GL 3.0: Longs Peak and Mt Evans and other Colorado mountains. Eventually, 2.1 support in drivers will either be discarded entirely or unofficially layered over LP, and that is at the choice of IHVs (actually, I imagine that 2.1 is right now being layered on top of a pseudo-LP in ATi's new rewrite of OpenGL).

In short, if you want new features, you need to get on board with LP.


4) Actually I am thinking that it should be optional; if you want backing store you can set this as an attribute. Otherwise you should be prepared to re-create any lost data. Some apps really don't want the GL consuming memory to back up textures, etc.Hey, hold on there a second.

DirectX can get away with it because it tells you when this stuff goes away. Unless OpenGL is going to have some kind of mechanism to inform the user that their videomemory (and who knows what this entails?) went away, we need to keep the backing store.

And if you are going to lose it, I definitely want a parameter to turn it back on (or, better yet, you should send a parameter if you want it off).


I was always wondering what was the idea behind these lost surfaces. Even if Windows suddenly needs some moemory on the GPU it should kindly ask the driver to free some resources so it can make backup copy on the fly.Windows doesn't (or, at least, pre-Vista. I know nothing about this in Vista) treat video memory as something that gets allocated and deallocated. An application (and its drivers) have control of the memory, until the user context-switches away. At which point, the contents of the memory are no longer guaranteed. That way, Windows doesn't have to deal with how a driver allocates video memory, whether it leaks it, etc; it just has its way with the memory, and there's nothing you can do about it.

FYI: I'm not endorsing this. Just explaining it.

Jan
03-02-2007, 01:28 AM
Don't even think of allowing data to get lost! If so, you will lose me too.
D3D10 finally removed this disablement and you think about introducing it??!!! You should be ashamed of yourself.

The only thing i can accept would be to tell the driver "this texture/vbo/whatever is streamed to the GPU every frame, so don't create a backup-copy". Something like VBOs STREAM-hint certainly means. But that's it!

Windows Vista virtualizes the GPUs memory, just as RAM, so under Vista it shouldn't be a problem anyway.

Jan.

V-man
03-02-2007, 03:01 AM
Originally posted by Michael Gold:

Originally posted by V-man:
In parallel, the same happens to the indices.To be clear, people are asking for an offset to the indices in the element array, not the base of the arrays, correct? If you can offset the indices, there is no need to offset the array base also: The index offset is used for DrawElements, and DrawArrays has an explicit start parameter. I suppose the state could be defined as offsetting the start of DrawArrays too.

So let me rephrase my question. If we exposed an ElementBase(), is there a need to offset the base of a VBO?

Please understand that I am not suggesting we don't want to do this; I simply wish to understand what is required to accomplish common usage. There are some downsides to offsetting the array base (e.g. hardware alignment constraints). If ElementBase is exposed, no there is no need to offset into the vertex array as long as element base accepts a 32 bit unsigned.
I think that's what current GPUs are capable of doing anyway.

elFarto
03-02-2007, 03:24 AM
Originally posted by Jan:
Don't even think of allowing data to get lost! If so, you will lose me too.
D3D10 finally removed this disablement and you think about introducing it??!!! You should be ashamed of yourself.From what Michael is saying, it'll be optional. Best of both worlds. Although I think it should default to having the backing store enabled rather than disabled. Otherwise this could trip alot of people up and enabled is what most people will want.

Also by explicitly disabling it your basically saying "It's ok I don't need the safety net, I know what I'm doing".

Regards
elFarto

Michael Gold
03-02-2007, 07:31 AM
Originally posted by Jan:
Don't even think of allowing data to get lost! If so, you will lose me too.
D3D10 finally removed this disablement and you think about introducing it??!!! You should be ashamed of yourself.You may be surprised how many requests I have heard from developers asking that the driver not keep a backing store of textures and buffer objects. Will I lose you by offering this as an options I don't understand your reaction.

With respect to DX10, please remember that OpenGL runs on other operating systems besides Windows Vista.

Overmind
03-02-2007, 07:41 AM
I think that many people requesting this feature don't understand that the data may actually get lost and has to be reuploaded. Many people seem to think that the backing store is only for those readback commands.

That being said, I think it can't hurt to offer this as optional feature, as long as it doesn't complicate the API for people who are not going to use it. And that shouldn't be a problem.

Jan
03-02-2007, 11:16 AM
In a few situations it might be a good idea to tell the driver, that it shouldn't bother, when the data gets lost. Therefore, as an option, sure, allow it.

But it should really only be a seldom used option, something the user has to explicitly request. The fact that OpenGL doesn't have to deal with lost surfaces is a BIG plus. Talk to any programmer working with D3D and after a few minutes he will tell you about the nightmare of lost surfaces.

I just want to make sure that an "optional feature" doesn't suddenly become default behavior.

Jan.

LarsMiddendorf
03-02-2007, 12:20 PM
Agreed.
For a fullscreen application, the device is only seldom lost. You don't want to reserve e.g. 512MB of system memory for textures all the time if the application can reload them easily from disk.

Flavious
03-02-2007, 12:40 PM
Yes, if this is made an option (disabled by default), I think everyone wins (except perhaps the people that have to implement it ;) ).

Korval
03-02-2007, 01:26 PM
You don't want to reserve e.g. 512MB of system memory for textures all the time if the application can reload them easily from disk.And what if the application doesn't know what textures it has loaded (as in, it didn't bother to store that information somewhere and rebuilding it would take time)?

I agree that having a flag is the best option. But the flag should default to having the backing store.

And I still say that if GL doesn't tell us when something has been lost, we shouldn't be able to turn off the backing store to begin with. There should be some kind of (fast performing) query that can be made to ask whether stuff should be re-uploaded, and it should be part of the OpenGL API, not an external thing.

Being able to ask when uploading is needed is the only reasonable way to turn off the backing store.

Flavious
03-02-2007, 02:12 PM
And I still say that if GL doesn't tell us when something has been lost, we shouldn't be able to turn off the backing store to begin with. That's a great point. And I agree that the API, whatever it is, should be transparent to those not using this feature, and of course convenient for those that do.

What would this API look like? My first thought was to check GetError (or something new) for something like RESOURCE_LOST, upon binding textures, programs and so on (probably bogus, but well meaning). But this way you wouldn't need two paths for an enable/disable (i.e. it would always return "NO_ERROR" when disabled), and it's the kind a call we routinely make anyway. (Just a humble offering.)

What about granularity at the individual object level: is there a way to specify, say at object creation time, whether an object should be "managed" in this way?

Cheers

k_szczech
03-02-2007, 02:18 PM
And what if the application doesn't know what textures it has loadedThen it's just not properly implemented. And should use default mode.

The questoin is - should the application be informed that resource is being removed (not sure if operating system will be kind enough to assure this) or should it be informed that resource it tries to use is no longer available.
I think the second option seems ok, because even if you know some resource is being removed you usually take no action. It makes no sense to reload that resource now, since it has just been removed and probably for a good reason. You'll probably want to reload it shortly before next usage. So you can just test if resource you intend to use is avalable by 'using' it.

The 'surface lost' approach can actually be used to reduce stalls in the game when resources need to be reloaded - if you're in control you could, for example keep all textures in system memory yourself, but without top 2 mipmap levels. In case of lost surface you quickly upload that 'low-quality' texture and initiate full texture reload in background or idle time.
By not keeping backup of two top-level mipmaps you need 16 times less memory for backup :)

elFarto
03-02-2007, 02:27 PM
Originally posted by Flavious:
What would this API look like? Dare I say it, a callback system?

Regards
elFarto

k_szczech
03-02-2007, 02:51 PM
a callback systemIt crossed my mind...
Note that OpenGL has the client-server architecture - it's unusual for the server to perform actions on it's own client.

Just imagine that you could get a callback informing you that texture <n> is no longer available, but you have allready sent rendering commands to server.

To be honest, it could be that the same problem exists if we want to return an error. At the time we issue rendering commands resource may still be available... Ok, i think I've reached the point where I should stop thinking about it ;)

Flavious
03-02-2007, 04:02 PM
I agree, and that's why I have my doubts ;)

But my guess is that since OpenGL already has the ability to manage resources, the driver could wait until the command buffer is completely drained or otherwise synchronized with some desirable state before raising a flag of some kind (e.g. it might defer releasing the resource(s) until all commands have completed, or when it's safe to do so).

Since this would be optional, I'd imagine there would be a bit of overlap internally. I'm just guessing tho.

k_szczech
03-02-2007, 04:45 PM
How to take advantage of resource management is a discussion for the community. How to implement it is not. I don't believe that that drivers on various platforms will have such control over operating system.

Korval
03-02-2007, 04:47 PM
What would this API look like?Why not simply, "glAreMyLosableObjectsOK()" or something like that. If it returns GL_FALSE, that means you will need to re-upload any and all objects that you designated as being "losable".

k_szczech
03-03-2007, 07:49 AM
That would put requirement on OpenGL implementation, that after you issue this command, no object can be removed from GPU memory for a whle. For how long?
Well, you could explicityly state: "don't remove objects now untill I'm done rendering" - but I think operating system may not like it when driver holds on to a resource, so it will require... keeping a copy of all resources in system memory - just the way we have it now.

Also keep in mind that OpenGL server and client work asynchronously.

A callback would be simplest to use. It's like "I'm performing operation you issued a while ago, but I'm missing this resource". I just don't see how does callback fit into client-server architecture.
As I said, I try not to think about it - I'm sure that even if API for handling this will not be very "convenient" (I don't say it won't be) I trust it will be "sufficient" and we don't have to instruct anyone from ARB how to do it :)

V-man
03-03-2007, 07:57 AM
Could someone explain to me this lossed surface problem of Windows. If user changes the display mode, do textures and things get lossed? What other cases cause a loss?

What's the problem? It can't refresh the DRAM for a few milliseconds and all memory cells become zero?

k_szczech
03-03-2007, 08:23 AM
Well, stability of operating system and it's user interface is more important than stability of running applications. If operating system needs some memory on GPU to draw it's desktop it may take some memory and inform the driver of such event.
It could ask the driver to free some memory, but that means response-time of system depends on response time of driver.
Well, Windows takes this to the next level ;) - you're likely to get 'surface lost' when user locks his computer. For example: "Switch user" option in Windows XP.

k_szczech
03-03-2007, 08:34 AM
I'm concerned about one thing - some objects, like textures and vertex arrays are easily uploadable, but shaders require compilation/linking. So I think that it should be possible to download compiled shader in binary, driver-friendly form from GPU and store it for maximum upload efficiency when restoring lost objects.
API would only define means of uploading and downloading such data. Interpretation would be purely platform dependent. This has been requested for shaders allready, but it could also prove usefull for other objects: no need to respecify vertex array or texture format/dimensions - it's all in such data block. There would be one common upload/download API for all types of objects since no interpretation would be defined.

I believe that this is actually implemented on all existing hardware - it's just not exposed by driver which manages swapping in and out by itself, so an extension to OpenGL 2.x would also be nice.

Komat
03-03-2007, 10:23 AM
Originally posted by k_szczech:
That would put requirement on OpenGL implementation, that after you issue this command, no object can be removed from GPU memory for a whle.
No. The simplest behavior would be that as long as object is in such state, all rendering commands using such resource would produce reasonably undefined values (e.g. ignored, bad color, no crash or lockup) and commands that make object data for lost object visible to the application, would report special error.

Actually even in current OGL there are at least two things that might become lost. The pbuffer memory can be lost if it is not bound to texture. Buffer object content can be lost if it is memory mapped (using the MapBuffer call) when such event happens.

Overmind
03-03-2007, 11:26 AM
I'm still not really sure why we would need this on the PC. I can see this would be useful on embedded platforms, but that's what GL ES is for.

Everyone seems to get lost in the discussion *how* this should be done. Looking back in this discussion, this whole discussion was started by a single post requesting this feature, without any explanation why we would need this. From there on everyone seems to be concerned how to do it, but noone bothered to ask why :p

I think we should take a step back and reconcider the whole thing.

I would appreciate if the people who would actually use this feature tell us why they need it. I'm concerned this whole discussion creates the impression that this feature is required by many developers just because everyone seems to be discussing it. But somehow I have the feeling that most people here just take part in the discussion, while noone actually needs it.

Komat
03-03-2007, 12:07 PM
Originally posted by Overmind:

Looking back in this discussion, this whole discussion was started by a single post requesting this feature, without any explanation why we would need this. From there on everyone seems to be concerned how to do it, but noone bothered to ask why :p
LarsMiddendorf mentioned why. With memory size of latest cards reaching 1GB, the size of system memory necessary to store backup copies of high resolution textures is somewhat impractical when those textures can be often easily reloaded from original application data.

Michael Gold
03-03-2007, 12:31 PM
On certain platforms (e.g. WindowsXP, Linux, maybe others) events can occur asynchronously which causes the contents of video memory to be lost. This includes mode switches, power save mode, and system standby/hybernation. I believe DX's "fullscreen exclusive" mode also boots all other client allocations out of vidmem. Its impractical, and in some cases impossible, for the driver to back up application data in vidmem (e.g. textures, renderbuffers, buffer objects) when these events occur. The safest solution is for the driver to keep a shadow copy in virtual memory, but this may be undesireable for various reasons (e.g. when working with large data sets, the application may require all available memory). If the application is prepared to recreate the lost data when these events occur, there is no reason the GL should burn virtual memory creating shadow copies. What is being discussed is the possibility of allowing applications to inform the GL which objects are allowed to be "lost" when this occurs.

I don't wish to debate the implementation here, although you are all free to discuss whatever you wish. And to those with grave concerns about this topic:

1) If we offer this option it will be optional. That's what "option" means. :p
2) We only consider this because of developer requests.
3) Obviously there would be some way for applications to detect when buffers need to be reloaded - it would be a pretty useless interface otherwise.

I'm going to duck out of this thread now. Its been fun, but I need to get back to work, and GDC is looming.

Thanks everyone for the great feedback! We'll talk again soon.

Overmind
03-03-2007, 12:34 PM
Thanks for all the information, and keep up the good work ;)

k_szczech
03-03-2007, 04:28 PM
I'm still not really sure why we would need this on the PCWell, there's no separate OpenGL API for PC :) Embedded systems can suffer greatly from memory consumption. Also, I would gladly use it to decrease application stalls when resource needs to be recreated on GPU:
Without manual texture recovery:
1. texture is required but is lost
2. driver recreates texture from backup
With manual texture recovery:
1. texture is required but is lost
2. application sends 4x downsized 'emergency' copy of texture
3. applications initiates full texture recovery in background
This is the approach I would like to implement.


No. The simplest behavior would be that as long as object is in such state, all rendering commands using such resource would produce reasonably undefined values (e.g. ignored, bad color, no crash or lockup)So we accept bad rendering results and restore lost objects during next frame? That's not acceptable if I'm using GPU for physics.
This is why I think you use some resources first, and then check errors (as you described) and not use glAreMyLosableObjectsOK before rendering as Korval suggested. Such function is of not much use if it doesn't put that impossible requirement I mentioned on OpenGL. That was my point.

Perhaps we should give up the discussion on implementation details. It's really not necessary.
As Overmind pointed out - more important is if we need such feature and what for, and that question has allready been answered.
I think we all trust ARB enough to leave it up to them :)

The only thing I'm curious about is how are we going to upload resources back to GPU. I see it a bit wrong to use glShaderSource / glCompileShader / glLinkProgram just to restore one shader :]
Again - I trust we will be given effective means to deal with it and I don't care (much) about API details. I'm just curious.

Komat
03-03-2007, 05:47 PM
Originally posted by k_szczech:
So we accept bad rendering results and restore lost objects during next frame? That's not acceptable if I'm using GPU for physics.
In the model I suggested (which is compatible with the glAreMyLosableObjectsOK) you have two choices:

If you allow that some resources can become lost, you have to accept the consequence that in the frame, in which that happens, the rendering using those resources might be incorrect. Which is fine for normal visualization.

If you can not allow that to happen for some operation (e.g. physics), you will have to leave all resources used by such operation in memory backed mode.



The only thing I'm curious about is how are we going to upload resources back to GPU. I see it a bit wrong to use glShaderSource / glCompileShader / glLinkProgram just to restore one shader :]
I assume that the compiled shaders are reasonably small so there would be no advantage from having the ability to lost them.

Korval
03-03-2007, 05:48 PM
So we accept bad rendering results and restore lost objects during next frame?Which is why the option is there. If you want/need to guarantee accuracy at all times, you have to have GL keep a backing store.


I see it a bit wrong to use glShaderSource / glCompileShader / glLinkProgram just to restore one shaderMaybe shaders won't have that option. They're not particularly big objects, after all. Or maybe there's an option, but you won't opt-in for it for shader objects.

Jan
03-04-2007, 12:39 AM
To improve response times, one could specify a hint at texture-creation time, to tell the driver, that it would be ok to upload and use the small mipmap-levels first, when it has to restore a texture.

For the driver this should be pretty easy to accomplish. For an application-developer this is a complex task, that requires many additional hours to implement, especially when you already have a fully functional resource-management system, that was not developed with such stuff in mind.

Certainly textures and vertex-buffers are the only resources taking up so much memory, that not storing them in RAM might be worth considering.

For textures one could tell the driver to only store the first n mipmaps. If something gets lost, it can restore at least n mipmaps and the user is informed, that all other mipmaps are lost. So he can decide when to upload the high-res mipmaps.

For vertex-buffers i have no idea. When such a buffer is lost, you usually need to fully restore it, unless it is a LOD mesh.

In my opinion, if this option makes it into LP, it should be restricted on data, that can be "easily" recovered. Shaders would certainly be a big problem, because they represent a part of the rendering-pipeline, they are no data, that is operated upon. If i do RTVA (or stream-out) and the shader is lost at that particular moment, it's possible that for the rest of the apps lifetime some data i generated on the GPU is corrupted. GPGPU applications would also suffer. In general, it just doesn't make sense for shaders to be able to get lost.

However, i don't think this is such an interesting feature, at all.

Jan.

V-man
03-04-2007, 02:19 AM
Originally posted by Komat:
Actually even in current OGL there are at least two things that might become lost. The pbuffer memory can be lost if it is not bound to texture. Buffer object content can be lost if it is memory mapped (using the MapBuffer call) when such event happens. [/QB]p-buffer is dead end.

It's normal for Mapbuffer to do that since it might give you a new memory location while the original is being sourced by the GPU. So your buffer is not really lossed.

Komat
03-04-2007, 02:33 AM
Originally posted by V-man:

It's normal for Mapbuffer to do that since it might give you a new memory location while the original is being sourced by the GPU. So your buffer is not really lossed. This is not what I am talking about.

From VBO specification:


UnmapBufferARB returns TRUE unless data values in the buffer's data store have become corrupted during the period that the buffer was mapped. Such corruption can be the result of a screen resolution change or other window-system-dependent event that causes system heaps such as those for high-performance graphics memory to be discarded. GL implementations must guarantee that such corruption can occur only during the periods that a buffer's data store is mapped. If such corruption has occurred, UnmapBufferARB returns FALSE, and the contents of the buffer's data store become undefined.

santyhamer
03-05-2007, 06:48 PM
I like the changes made in Longspeak but... hmmmm... somebody mentioned the possibility to add an offline shader compiler like Direct3D does.. but I can't see this in Longspeak... perhaps will be in Evans then?

Flavious
03-05-2007, 07:21 PM
Quite right. Such a construct would be very welcome indeed.

Cheers

Dark Photon
03-06-2007, 04:00 AM
Second that. Compiling and linking GLSL shaders on NVidia cards at run-time is pretty slow, as you'd expect of a high level language with a full optimizer under the hood.

I also don't like having to pre-render with all the shaders to force shader optimization (like you have to do with textures to force them onto the card). We currently need to do both to avoid frame skippage in areas with lots of materials.

tranders
03-06-2007, 04:24 AM
Layered support for legacy OpenGL will be OK as long as I dont' walk into the office one morning with 10,000 support calls saying they just loaded the latest driver and their performance dropped 50%. IMO it will be critical to the success of LP to retain existing performance levels side-by-side (much like D3D9 and D3D10 does on Vista).

A couple of questions about the loss of fixed-function rendering:

Will that affect stylized wireframe?
Will we lose the automatics in GLSL?

Tim

knackered
03-06-2007, 05:13 AM
No reason why performance should drop - you're effectively using layered mode now.

Overmind
03-06-2007, 06:03 AM
Will we lose the automatics in GLSL?What's an automatic? Do you mean built-in attributes, varyings and uniforms? Then the answer is propably yes, but don't quote me on that ;)

tranders
03-06-2007, 07:54 AM
Originally posted by knackered:
No reason why performance should drop - you're effectively using layered mode now. Anytime someone changes the underlying architecture to favor one methodology of programming (shader based) over another (fixed function), there is reason to be concerned for the least favored approach.

tranders
03-06-2007, 08:09 AM
Originally posted by Overmind:

Will we lose the automatics in GLSL?What's an automatic? Do you mean built-in attributes, varyings and uniforms? Then the answer is propably yes, but don't quote me on that ;) Yes, I was referring to built-in attributes, etc. From my point of view, the primary reason I did not select a shader-based approach for D3D was the lack of access to built-in propogation of transforms, lights, fixed-function colors, etc. While shaders are great for advanced style renderings, they are a burden for the generic, fixed-function smooth shaded or hidden surface displays that are common for CAD applications. If GLSL under LP has similar requirements for me to manage my own stacks and states and propogate that to the shader, then the expense of moving to a pure LP implementation will be huge. For a windows-only based [fixed-function] application, it would be simpler to migrate to D3D10 since LP wouldn't necessarily offer any significant advantage.

knackered
03-06-2007, 09:22 AM
You'll lose cross-platform compatibility, correct quad-buffered stereo, frame locking/swap groups, and a clean API with a specification you can reference when you need to be certain of correct behaviours.
You would gain absolutely nothing by using D3D10 over LP, except an unhealthy heavy dependency on Microsoft Windows.

Korval
03-06-2007, 09:42 AM
Anytime someone changes the underlying architecture to favor one methodology of programming (shader based) over another (fixed function), there is reason to be concerned for the least favored approach.Well, considering that most hardware already doesn't have a fixed function pipeline at all, you're already using the least favored approach. The only difference is that your API is too convoluted to let you know what the most favored approach is. Which is why Longs Peak exists.


I also don't like having to pre-render with all the shaders to force shader optimization (like you have to do with textures to force them onto the card).The OpenGL 2.1 specification says nothing about having to use a shader to optimize it. That's something that a particular vendor does, and that behavior will only change if the vendor wishes it to.

tranders
03-06-2007, 11:00 AM
Originally posted by knackered:
You'll lose cross-platform compatibility, correct quad-buffered stereo, frame locking/swap groups, and a clean API with a specification you can reference when you need to be certain of correct behaviours.
You would gain absolutely nothing by using D3D10 over LP, except an unhealthy heavy dependency on Microsoft Windows. As stated -- "a Windows-only application" -- means that cross-platform compatibility is a moot feature. D3D10 also implies Windows Vista which does not support windowed, quad-buffered stereo, proper vsync support, along with a laundry list of other nice features supported on XP. As for a specification to reference for behavioral correctness, I think the shader only concept begins to obfuscate that by shifting correctness to the shader algorithms and not so much to the pipeline feeding it the raw data (i.e., LP).

Don't get me wrong I am a very strong proponent of OpenGL but current business pressures require at least some level of support for D3D (i.e., Vista) and I can only split my resources so much.

Tim

knackered
03-06-2007, 11:21 AM
There's full OpenGL support in Vista.
Just in case you're interested, in my area of the market I don't know of one single customer that is even entertaining the idea of migrating to Vista. I've never known that situation before - people were quick to move from NT4 to W2k, and subsequently W2k to XP, but there's absolutely no talk of moving from XP to Vista. These are big name companies I'm talking about.
Maybe this isn't the case with your customers.

Zengar
03-06-2007, 12:20 PM
All the fixed-function features you like so much, like matrix stack (emulated in software by the driver) and state changes can be easily emulated by your application. It will probably take not a full work day to have a well-working application layer. What you talk about is a graphics library that provides a specific model. What we generally need, is a graphics library providing a general model, and this is what Longs Peak is about: it will allow you to do everything you want with maximal efficiency. To summarize it, no one forces you to use it. For your task, you should use teh legacy GL, which will work just as fine. I am sure that a emulation legacy/GL libraries will appear soon, so I am sure that at least GL2.0 will be supported by all Longs Peak implementations.

tranders
03-06-2007, 04:34 PM
Originally posted by knackered:
There's full OpenGL support in Vista.
Just in case you're interested, in my area of the market I don't know of one single customer that is even entertaining the idea of migrating to Vista. I've never known that situation before - people were quick to move from NT4 to W2k, and subsequently W2k to XP, but there's absolutely no talk of moving from XP to Vista. These are big name companies I'm talking about.
Maybe this isn't the case with your customers. Full support depends on your baseline. You need to read the release notes from NVIDIA or the recent Pipeline regarding the limitations and changes of OpenGL in Vista - windowed, quad-buffered stereo and overlay planes being two of those. ATI has yet to release a driver for the FireGL line of graphics cards so the jury is still out there.

I don't recommend to any of my customers that they switch to Vista, but the marketing machine is spreading enough FUD in the business arena that the switch to Vista is either inevitable or unavoidable per IT mandates amoung coporate users. Vista is not a significant problem for full-screen applications although for multi-windowed CAD applications, there are several. FWIW, we still have customers running currently supported versions of our application on W2K.

tranders
03-06-2007, 04:52 PM
Originally posted by Zengar:
All the fixed-function features you like so much, like matrix stack (emulated in software by the driver) and state changes can be easily emulated by your application. It will probably take not a full work day to have a well-working application layer. What you talk about is a graphics library that provides a specific model. What we generally need, is a graphics library providing a general model, and this is what Longs Peak is about: it will allow you to do everything you want with maximal efficiency. To summarize it, no one forces you to use it. For your task, you should use teh legacy GL, which will work just as fine. I am sure that a emulation legacy/GL libraries will appear soon, so I am sure that at least GL2.0 will be supported by all Longs Peak implementations. Implementing a state engine may be trivial but integrating it into a program that has been in constant development since 1994 will take a little more than a day. FWIW, I've had to do a similar thing for D3D but only because I had to.

Forcing everyone to write a shader (OK - I'm not being "forced" but the coersion is pretty stiff) is not unlike forcing everyone to write new code in assembly because there's just too much overhead in these old programming languages that have been out there for decades.

Sorry but for new applications or games that are already geared towards running on shaders, LP is great but for the professional CAD applications it's just not that simple.

Michael Gold
03-06-2007, 06:18 PM
I'm trying not to get sucked back into this thread but the recent discussion needs to be addressed.

I predict that native implementations of GL2.x will co-exist with Longs Peak for the forseeable future. Layering will occur at some point but we can't afford to cripple 100% of existing applications when we roll out a new API. That being said, its not clear that GL2 will continue being enhanced once Longs Peak ships.

There is a compatibility plan for allowing migration between the old and new interfaces. We don't want to force developers to rewrite code which has been in development for the last 14 years. For example some CAD apps are using GL for their user interface and have no desire to touch that code. Perfectly understandable - existing code will continue to run, but you might choose to write a new pipeline in LP for the actual modeling. They can co-exist in the same application but will likely require a MakeCurrent to transition between the interfaces.

Fixed function really is not that interesting to support in the core; as has been previously stated, we've just shipped the third generation of hardware which lacks fixed function support - the driver is generating programs under the covers. Nonetheless enough people have expressed concerns that we'll want to ensure that its easily layered, if we do in fact remove it.

I'm going back into my cave now.

tranders
03-07-2007, 04:20 AM
Thanks Michael!

Side by side support will give us all some time to migrate without the fear of "missing the boat".

Mars_999
03-14-2007, 01:11 PM
Ok I started a thread on this elsewhere but deleted it. I just got done looking at the GDC '07 slides and I like the single call for setup of the objects vs. many calls to setup width, height, format ect... I find it messy where one call is clean. I haven't read through the 4 pages due to I just found this thread. So from what I saw they had two versions, are they still trying to decide? or has that been done already?

Korval
03-14-2007, 02:55 PM
I find it messy where one call is clean.Tough. The ARB has explained why they're doing it this way numerous times. Indeed, we've known about this API (the use of attributes/templates for building immutable objects) for about 9 months or so. It's better for all involved.

If you don't like the API, feel free to wrap it in whatever syntactical sugar you like.


So from what I saw they had two versions, are they still trying to decide?What you saw was a set of OpenGL functions, and a single GLU function. 'U' as in "utility". As in, not really a part of OpenGL. It's like any other GLU function; it does nothing more than what the user could do with the API.

Flavious
03-14-2007, 03:20 PM
Great slides!

Here's a link, for convenience:
http://www.khronos.org/library/detail/gdc_2007_opengl_21_and_beyond/

The transform feedback extension looks very promising indeed (the terrain subdivision example is very interesting).
http://www.khronos.org/developers/library/gdc_2007/OpenGL//New-OpenGL-Features.pdf

Korval
03-14-2007, 04:09 PM
Actually, the slides on Longs Peak offer a few more details than the newsletter.

Specifically, as it relates to "vertex array objects". They apparently encapsulate a specific set of buffer object & offset bindings. The specific set of attachments is immutable, as are the properties (data format, stride, etc), but you can swap buffer objects and offsets in and out.

It's also interesting to see how much Khronos has changed the ARB. Before, the ARB basically met 4 times a year, and little more than discussions happen in between. Now, you've got 5 working groups that are all meeting weekly and more frequently.

It's things like that that make me think that Longs Peak will actually be out this summer, and that Mt Evans will only be 3 months behind it. Two years ago, there'd be no way a spec as unfinished as Longs Peak would be out inside a year.

barthold
03-14-2007, 05:14 PM
Originally posted by Korval:
It's also interesting to see how much Khronos has changed the ARB. Before, the ARB basically met 4 times a year, and little more than discussions happen in between. Now, you've got 5 working groups that are all meeting weekly and more frequently.

It's things like that that make me think that Longs Peak will actually be out this summer, and that Mt Evans will only be 3 months behind it. Two years ago, there'd be no way a spec as unfinished as Longs Peak would be out inside a year. Hi Korval,

Thanks for bringing this up. Its a good segway into a bit of history on how we got to where we are today.

At the ARB meeting in September 2005 (before the Khronos merger, which was September 2006) several members came to the ARB stating that we needed to do more, otherwise OpenGL was going to diminish in importance rapidly. "OpenGL is in Danger", is what one member stated. By the December 2005 ARB meeting ATI and NVIDIA had come up with the first stab at what is now called OpenGL Longs Peak and Mount Evans. At that time, the need for having more of an ecosystem focus was clear too. Hence the ecosystem working group. Since then the ARB has been working hard not only on OpenGL Longs Peak and Mount Evans, but also on the SDK, newsletter, manpages, a new version of GLSL (1.20), OpenGL 2.1, FBO multisample and FBO blit, and probably some stuff I already forgot.

The merger with Khronos was beneficial to the ARB for various reasons:

1. It broke down an IP barrier. Before, we were not able to talk to the OpenGL ES designers, for example, because of IP issues. That is no longer the case. We now have good cross-participation from the ES group in the ARB, and vice versa.
2. We get great marketing support. Khronos has money to pay a professional organization (the Gold Standards Group) to organize trade shows, set up the f2f meetings, do the layout for the OpenGL Pipeline, do press releases, etc etc.
3. We can leverage conformance work that the OpenGL ES group has already done, and payed for.
4. Pay for infrastructure, this site for example!
5. Leverage the knowledge in the OpenGL ES, Collada and Collada FX groups to help define glFX. You saw the announcement, I hope! See http://www.khronos.org/developers/library/gdc_2007/News//Introducing-glFX.pdf .

The ARB was meeting 4 times a week exactly a year ago, versus 5 times a week today. As the chair, one thing I am trying to do is to focus the ARB more on deadlines and schedules. This means we cannot operate in a mode of "we'll work on it till we get it right" without regard of market reality. Another positive factor is that the ARB members are serious about making this happen, and are contributing time and manpower.

Ok, back to work for me!

Barthold

bobvodka
03-15-2007, 07:45 PM
Originally posted by Korval:
Specifically, as it relates to "vertex array objects". They apparently encapsulate a specific set of buffer object & offset bindings. The specific set of attachments is immutable, as are the properties (data format, stride, etc), but you can swap buffer objects and offsets in and out.Yes, I was very happy to see that in the slides.

When talking with the hoards of D3D programmers I know one of their OGL hates was the amount of work it took to bind things, bring that down to a function call or two should shut them up ;)

soconne
03-29-2007, 10:41 PM
Has anybody mentioned the ability to access the current framebuffer pixel? I don't know WHY they havn't optimized the architecture to take advantage of this. I know it was mentioned in the latest OpenGL specs but discarded due to performance reasons. Well then CHANGE the architecture gosh darnet!

nystep
03-29-2007, 11:08 PM
About future of gpus in general, i wonder:

- Is there any plan to let the user also choose the way the pixels are processed? or is it left to the hardware for ever?

- It is possible to do really fast radial blurs (single pass, single texel read) if you can choose the way you process data. But these technics are left to software rendering still. The introduction of stuff like CUDA might solve the issue though, i havent watched deeply into it yet.

- Is there any plan for a generic extension that would enable CUDA to be accessed in a generic way, in the GLSL fashion? i guess it's mainly constructor dependant still, but maybe R600 will change it (or at least i hope so).

Korval
03-29-2007, 11:27 PM
Has anybody mentioned the ability to access the current framebuffer pixel?Not going to happen.

And if it were to happen, it wouldn't happen in Longs Peak itself. LP does not add features; it merely changes the API to something much more reasonable.

Overmind
03-30-2007, 02:46 AM
Well then CHANGE the architecture gosh darnet!There were a lot of discussions on this board about this. And in all of them someone explained why it is not so easy to change the architecture to allow reading the current framebuffer pixel without taking massive performance hits even when the feature is not used.

I think we don't need to reiterate all the reasons here. Just believe us when we tell you it's not possible to do it (yet). Or read those other threads.

knackered
03-30-2007, 04:02 AM
you have the ability to render into a texture and sample any 'framebuffer' pixel you want.

michagl
03-31-2007, 12:24 AM
Just some silly cosmetic nitpicking (this is all new to me -- no new opengl programming for a year or two)

Where can I find a good general LP faq?

What's with the name? Sounds proprietary or something??? edit: Code name perhaps?

And why on earth naming conventions like, "glTemplateAttribt_i", rather than something more conventional like, "glTemplateAttributei", or probably slightly better, "glTemplateAttribi"?

My apologies to whoever has made this decision, but "glTemplateAttribt_i" is just ugly... but most of all, logically arbitrary in every way.

EDIT: Would it make any sense to prefix the functions with "lp" rather than "gl". Opengl has a long legacy of compatibility, and I really don't think bleeding edge performance will be that important a few years down the road. This LongPeaks business while progressive, looks like something that could phase out just as fast as hardware can come and go. Just seems like a logical proposal, especially with people suggesting "outsourcing" the current api to a large extent, if not entirely so.

Jan
03-31-2007, 01:55 AM
There is no faq, since this is all in development at the moment. If you want to know more about it, read the ARB newsletters and the discussions in this board. There are also a few GDC slides, links are also in the threads on this board.

It has all been discussed to death already, if you want answers to your questions, just read all that stuff.

Jan.

Flavious
03-31-2007, 03:37 AM
Yes, I'm afraid the discussion of the underscore ran into overtime, and I'm partly to blame for that. My apologies.

I think that in the final analysis, it's a small price to pay for forward motion. And I've come to realize that my personal dislike of the underscore is largely groundless and irrational.

Cheers

elFarto
03-31-2007, 04:59 AM
Originally posted by Flavious:
I think that in the final analysis, it's a small price to pay for forward motion. And I've come to realize that my personal dislike of the underscore is largely groundless and irrational.Besides, C always has #define, so we can fix it later :D .

Regards
elFarto

yooyo
03-31-2007, 03:50 PM
@Michael Gold:
Regarding 'resource or device lost' signals, I would preffer that driver keep all given resources at all costs. Imagine heavy GPGPU work that runs simulation on hardware for 10 hours and it doesn't keep all previous iterations. Application cannot restore state at 'lost point'.

Only reasonable solution is that app checks every frame for 'resource or device lost' event and if it's happens app must have chance to readback some stuff from GL server.

Korval
03-31-2007, 10:08 PM
Regarding 'resource or device lost' signals, I would preffer that driver keep all given resources at all costs.Then create all your objects with the setting that keeps them that way.

In the end, they're only providing the option for it. If you don't want it, don't use it.

Personally, I don't know what would make me want to take this option, but it's there...

Zengar
04-01-2007, 03:16 AM
Well, I can imagine (particularly for games) some situations were resources can be easily recreated and additional resource and CPU time usage by the driver isn't welcome. But this is an interesting question, if device information can be really lost anytime, how are resources like FBOs stored? Will the driver copy the framebuffer contents each time it is rendered to?

Korval
04-01-2007, 10:47 AM
how are resources like FBOs stored?Well, the actual framebuffer object, the thing that references the textures and renderbuffers, is purely client-side. So it can't get lost.


Will the driver copy the framebuffer contents each time it is rendered to?Hmm, that's a good question.

Eosie
04-01-2007, 10:27 PM
Originally posted by Korval:

Will the driver copy the framebuffer contents each time it is rendered to?Hmm, that's a good question. The driver can save command streams for rendering to FBOs and after the restoration of all other resources it will just call the command streams again to restore the content of FBOs. I guess it's the way how it is implemented right now.

Korval
04-01-2007, 10:58 PM
The driver can save command streams for rendering to FBOs and after the restoration of all other resources it will just call the command streams again to restore the content of FBOs.That seems unlikely. Especially if you used multiple, dependent command streams (ping-ponging). Or if that stream relies on data that no longer exists (a texture was deleted). There are a number of cases that would cause this mechanism to break down.

I imagine that the image is either lost or locally stored with a DMA transfer (likely provoked by a buffer swap). And since it's a downstream transfer, it wouldn't be something a user would notice performance-wise (unless the user was using a lot of downstream bandwidth). Or maybe the transfer happens when you unbind the image from the rendering context (that is, the texture can still be lost while it is bound).

yooyo
04-02-2007, 05:56 AM
Before OS take all resources from driver, it should notify driver about that event. Driver can save state in virtual memory and release all resources. Then OS can do whatever it want with VRAM.

Taking resources without notification is kind of outrage.. isn't? :)

Zengar
04-02-2007, 08:30 AM
Ok, this is a reasonable explanation... But are you sure this actually works that way? My main concern is whether such driver behaviour (storing all data in RAM and not allowing it to be lost) does actually reduce the run-time performance.

yooyo
04-02-2007, 12:02 PM
AFAIK, current OGL implementations on windows save duplicates of all uploaded textures in sys mem (or in virtual memory). Unlike D3D (where app must respond on D3D_DEVICELOST event) OpenGL app doesn't bother with this. There is a one case where loosing resources may cause trouble... When app map buffer (pbo or vbo) pointer to videomemory may not survive resolution or colordepth change.

Anyway.. on older hardware when app upload textures, driver change texture pixelformat to native pixelformat. If desktop is in 16-bit textures is converted from 24 to 16 bit. In this case changing desktop color depth may cause trouble to ogl driver. This is reason why JC reload all textures in Quake 3 and 4 or Doom 3 when user change resolution or color depth.

Chris Lux
04-03-2007, 02:12 AM
Originally posted by Michael Gold:
Question for the community: would anyone be terribly upset if we dropped support for client arrays and required all vertex attribs to live in buffer objects? no, not sorry at all. get rid of them...

with buffer objects and no immediate mode in the core api i think it is the best to kill of the client arrays. so there is only one unified way to store geometry, which makes things simpler.

and my vote also on a VAO extension.

V-man
04-03-2007, 02:37 AM
Originally posted by yooyo:
AFAIK, current OGL implementations on windows save duplicates of all uploaded textures in sys mem (or in virtual memory). Unlike D3D (where app must respond on D3D_DEVICELOST event) OpenGL app doesn't bother with this. There is a one case where loosing resources may cause trouble... When app map buffer (pbo or vbo) pointer to videomemory may not survive resolution or colordepth change.
In D3D, it depends on how you created the resource. You can place it in the managed pool so you won't have LOSSED_DEVICE issues.

I think some things could not be placed in managed pool like render to texture and dynamic VBO.
It's obvious that there would be a performance issue if they would have to back up to RAM every time a texture was updated.