PDA

View Full Version : OpenGL 3 Updates



Pages : [1] 2 3 4 5 6 7

Khronos_webmaster
10-30-2007, 08:56 AM
You understandably want to know where the OpenGL 3 specification is. I have good news and some bad news. First the bad news. Obviously, the specification isn't out yet. The OpenGL ARB found, after careful review, some unresolved issues that we want addressed before we feel comfortable releasing a specification. The good news is that we've greatly improved the specification since Siggraph 2007, added some functionality, and flushed out a whole lot of details. None of these issues we found are of earth-shocking nature, but we did want them discussed and resolved to make absolutely sure we are on the right path, and we are. Rest assured we are working as hard as we can on getting the specification done. The ARB meets 5 times a week, and has done so for the last two months, to get this out to you as soon as possible. Getting a solid specification put together will also help us with the follow-ons to OpenGL 3: OpenGL Longs Peak Reloaded and Mount Evans. We don't want to spend time fixing mistakes made in haste.

Here's a list of OpenGL 3 features and changes that we decided on since Siggraph 2007:
State objects can be partially mutable, depending on the type of the state object. These state objects can still be shared across contexts. This helps in reducing the number of state objects needed in order to control your rendering pipeline. For example, the alpha test reference value is a candidate to be mutable. We set a *minimum* bar required for texturing and rendering. This includes: - 16 bit floating point support is now a requirement for textures and renderbuffers. Supporting texture filtering and blending is still optional for these formats. - S3TC is a required texture compression format - Interleaved depth/stencil is a required format for FBO rendering - At least one GL3-capable visual or pixel format must be exported which supports front-buffered rendering. OpenGL 3 will not have support for the GL_DOUBLE token. This means it will not be possible to send double precision vertex data to OpenGL. A format object has to be specified per texture attachment when a Program Environment Object is created. This helps minimize the shader re-compiles the driver might have to do when it discovers that the combination of shader and texture formats isn't natively supported by the hardware. GL 3 will only cache one error, and that is the oldest error that occurred. The OpenGL pipeline will be in a valid state once a context is created. Various default objects, created as part of the context creation, will have reasonable default values. These values are such that a simple polygon will be drawn into the window system provided drawable without having to provide a Vertex array object, vertex shader or fragment shader. GLSL related changes: - GLSL 1.30 will support a #include mechanism. The actual shader source for the #include is stored in a new type of object, A "Text Buffer" object. A text buffer object also has a name property, which matches the string name specified in a #include directive. - Legacy gl_* GLSL state variables are accessible through a common block.
More details will follow soon in an upcoming OpenGL Pipeline newsletter.

Barthold Lichtenbelt
OpenGL ARB Working Group chair

Zengar
10-30-2007, 09:07 AM
To make it short, we shoudln't expect the spec untill about new year and first implementations even later... Still, it is good that they are trying to make a solid product. I guess I can wait.

Mars_999
10-30-2007, 09:18 AM
Hey its better late than never! And if it takes abit longer to get something that is done correct, then yes lets wait.

Why no double vertex data? Is this for speed reasons only? I see with DX10.1 precision of float data will have to be tighter on that hardware... Will this be good enough for everyone? From games to scientific purposes? I am not clear on the #include directive, is this a header file where you have your shaders coded at?

e.g.
#include "lightingVS.xxx"
#include "lightingFS.xxx"
#include "lightingGS.xxx"

thanks for the update!!!

Zengar
10-30-2007, 09:28 AM
Double data never made sence as there are no implementations that actually use it. Something like this should be an extension. So IHMO it is a good move.

As far as I understood, the include works just like in C (copy-pastes a text from another file), only that is works with named buffers instead of files. A very elegant solution!

barthold
10-30-2007, 09:37 AM
Remember, OpenGL 3 will be implementable on currently shipping hardware. The fast majority of that HW does not natively support double precision vertex data. In GL 2, where you can send double vertex data to the GL, that will likely get converted to floats by the driver or HW. Future hardware might fully support double vertex data, and then support for it can be added back into a future OpenGL 3.x version.

Barthold

Y-tension
10-30-2007, 10:15 AM
I'm sure you're doing the best you can and surely these things require time. We'll keep using the already great feature set exposed by gl 2.1(plus extensions of course).
Everyone is on their toes to use the new api so make it as good as you can!

bobvodka
10-30-2007, 10:21 AM
Thanks for the update, it's good to know what is going on with it :)

Korval
10-30-2007, 10:34 AM
16 bit floating point support is now a requirement for textures and renderbuffers. Supporting texture filtering and blending is still optional for these formats.

Remind me of something. Do R300 (Radeon 9xxx) and NV30 (GeForce FX's) support 16-bit float textures/renderbuffers? I was hoping that these cards would be the minimum GL 3.0 compatible hardware.

Rob Barris
10-30-2007, 10:34 AM
The #include mechanism is intended to make partitioning of shader sources easier, but in GL3 is not going to provide a means to do either a callback or a real access to the file system.

What it does let you do is submit individual buffers of text and attach names to them separately, and then make reference to those named text buffers from another section of text using the #include statement. However at compile time you must be able to provide all of the buffers to GL3 that can be referenced for that compile job, this allows for more than one buffer to have the same name but still let the application disambiguate everything at compile time.

edit: why do it this way? It keeps all the work to do #include resolution on the server side. It is potentially not as flexible as a callback mechanism, but as we know callbacks do not translate well to a client/server model.

Stephen A
10-30-2007, 10:36 AM
16 bit floating point support is now a requirement for textures and renderbuffers. Supporting texture filtering and blending is still optional for these formats.

Remind me of something. Do R300 (Radeon 9xxx) and NV30 (GeForce FX's) support 16-bit float textures/renderbuffers? I was hoping that these cards would be the minimum GL 3.0 compatible hardware.
I'm 99% sure that R300 supports 16-bit float textures, but without filtering or blending. I never had a NV30, so I don't know about that.

Rob Barris
10-30-2007, 10:46 AM
By the way, for the next Pipeline there is going to be a segment about "GL2 to GL3 migration". I'd like to hear about specific areas of developer curiosity to try and cover those in some detail where possible, we could conceivably even air some of them briefly here and then in more detail in the Pipeline piece.

plasmonster
10-30-2007, 10:54 AM
That'd be great. I'm sure we'd all like to see some sample code and get a sneak peak at some of the more radical changes we're in for.

Thanks for the update and the good news!

k_szczech
10-30-2007, 10:54 AM
FP16:
GeForce FX, Radeon 9, Radeon X - no filtering, no blending
Radeon X1 - blending supported
GeForce 6, 7, 8, Radeon X2 - filtering and blending supported

So basically if you implement HDR you have to provide one implementation for GeForce 6/7/8 and Radeon X2 and one separate implementation for Radeon X1.

In my game I perform multiple tests at beginning to determine if FP16 blending/filtering is supported, if it runs reasonably fast and if it crashes or not. Then I know which HDR implementation to use if any.

This has been discussed many times, so I don't want to go off topic here. I hope life will be easier for us in future. Right now it's "test if it really works before you use it".

bobvodka
10-30-2007, 11:00 AM
Yes, some way to ask what level of FP support is available would be vitial imo.

ector
10-30-2007, 11:05 AM
State objects can be partially mutable, depending on the type of the state object. These state objects can still be shared across contexts. This helps in reducing the number of state objects needed in order to control your rendering pipeline. For example, the alpha test reference value is a candidate to be mutable.


Shouldn't alpha test be removed? You can easily simulate all possible alpha test modes using texkill functionality in the fragment shader.

Korval
10-30-2007, 11:05 AM
Yes, some way to ask what level of FP support is available would be vitial imo.

Asking questions. That's so DirectX ;)

In GL 3.0, you simply try to make a format object of the appropriate type. Part of the format object is a specification of what you intend to do with the images you create from it (bilinear filtering, anisotropic, etc). If, when you create this format, GL says, NO, well, you can't do it.


Shouldn't alpha test be removed?

No. It's much faster to have real alpha test than to do it in shaders. Plus, alpha test can be per render target (eventually). Plus, you can easily mix and match alpha test without having to write shaders for it.

k_szczech
10-30-2007, 11:06 AM
It's not just that.
Try writting a vertex shader that's compatible with clip planes and works on all hardware supporting GLSL.
On GeForce you need to write to gl_ClipVertex, otherwise it won't work. On some Radeons you can't write to gl_ClipVertex - it causes software mode. :mad: I'm using #ifdef for that.

Doh! You got me talking off-topic again ;)

Rob Barris
10-30-2007, 11:06 AM
FP16:
GeForce FX, Radeon 9, Radeon X - no filtering, no blending
Radeon X1 - blending supported
GeForce 6, 7, 8, Radeon X2 - filtering and blending supported

So basically if you implement HDR you have to provide one implementation for GeForce 6/7/8 and Radeon X2 and one separate implementation for Radeon X1.

In my game I perform multiple tests at beginning to determine if FP16 blending/filtering is supported, if it runs reasonably fast and if it crashes or not. Then I know which HDR implementation to use if any.

This has been discussed many times, so I don't want to go off topic here. I hope life will be easier for us in future. Right now it's "test if it really works before you use it".

GL3 will make this easier to assess by way of the format object mechanism as Korval noted. However the per-GPU-type outcomes you list above aren't likely to change, because the root issue is in the hardware.

k_szczech
10-30-2007, 11:09 AM
Yes, format objects are something I await with great anticipation. I hope it will be sufficient solution not only for supported-unsupported problem, but also for software-hardware problem :)

Stephen A
10-30-2007, 11:09 AM
This has been discussed many times, so I don't want to go off topic here. I hope life will be easier for us in future. Right now it's "test if it really works before you use it".

Even worse, sometimes EXT variants work better than ARB ones or vice versa, and there really is no way to test at runtime which implementation works reliably. Format objects FTW!

Anyway, some possible discussion topics regarding migration issues:

1) How will a GL3 render context be created? Assuming that creation will still go through the wgl/glx/agl/cgl API's, how will one be able to distinguish between GL2 and GL3 contexts?

2) Regarding multisampling, will there be any changes to the (context creation/format query/context destruction/final context creation) round trip?

3) What are the changes in VBO creation, binding and drawing?

4) Which functions require the consumer to keep data on client memory? (For example, in GL2 vertex arrays are the responsibility of the consumer, which can be problematic in garbage collected environments - VBO's don't suffer from this issue).

5) This is possibly what I'm most interested in: Will the new .spec files be available, to ease creation of bindings for other programming languages? Any chance for the wgl/glx specs?

Rob Barris
10-30-2007, 11:12 AM
I can answer Stephen A's question #4 right here, there is no client side vertex storage in GL3. All vertex data goes in buffer objects.

glDesktop
10-30-2007, 11:15 AM
I would like to know, if we have something in OpenGL 3 that replaces "wglSwapIntervalEXT".

It would be very good to have a simple cross platform solution.

Much appreciated.

ector
10-30-2007, 11:16 AM
In GL 3.0, you simply try to make a format object of the appropriate type. Part of the format object is a specification of what you intend to do with the images you create from it (bilinear filtering, anisotropic, etc). If, when you create this format, GL says, NO, well, you can't do it.

Seems like a slower, more annoying version of asking questions to me.

Can I do it this way? No.
Can I do it this way? No.
Can I do it this way? No.
Can I do it this way? No.
Can I do it this way? Yes, and you just did it.
vs
What does the hardware support? This and that.
Okay, I do it that way. Fine.




Shouldn't alpha test be removed?

No. It's much faster to have real alpha test than to do it in shaders. Plus, alpha test can be per render target (eventually). Plus, you can easily mix and match alpha test without having to write shaders for it.

How is it faster? The card has to run the entire shader anyway to get the alpha value to test on.

And what's stopping the hardware vendors from extending shaders to support per-target texkill? It's easier to add stuff to shader versions than to add new APIs.

bobvodka
10-30-2007, 11:18 AM
Yes, some way to ask what level of FP support is available would be vitial imo.

Asking questions. That's so DirectX ;)

In GL 3.0, you simply try to make a format object of the appropriate type. Part of the format object is a specification of what you intend to do with the images you create from it (bilinear filtering, anisotropic, etc). If, when you create this format, GL says, NO, well, you can't do it.

heh, my bad... for some reason I just totally blanked on the new system o.O

Korval
10-30-2007, 11:45 AM
Seems like a slower, more annoying version of asking questions to me.

Nonsense. It's much more flexible.

By taking fully formed formats and returning yes/no, you have the ability to forbid things that are not easily quantifiable.

Take simple texture size. Yes, there's a maximum size, but a 4096x4096x128-bit float texture takes up 256MB. You can expose a question, "what's the maximum texture size I take", but if the card only has 256MB, you've just told a lie if you said 4096. It's the combination of the size and the format that makes it impossible.

The GL method is much better, since you can check specific features that you know may be turned down ahead of time.


How is it faster? The card has to run the entire shader anyway to get the alpha value to test on.

Because alpha test is part of the hardware.


And what's stopping the hardware vendors from extending shaders to support per-target texkill?

Because it's redundant in the face of alpha test?

You may as well say that you should do depth testing in shaders simply because you can. That alpha test hardware is there, it's fast, and it doesn't require a shader conditional followed by a discard. There's no reason not to expose it.

Trenki
10-30-2007, 12:44 PM
- Legacy gl_* GLSL state variables are accessible through a common block.


Why is that? I thought the legacy stuff like modelview and projection matrices, light parameters etc. would go away!
I would prefer if these would disappear.

[ www.trenki.net (http://www.trenki.net) | vector_math (3d math library) (http://www.trenki.net/content/view/16/36/) | software renderer (http://www.trenki.net/content/view/18/38/) ]

JoeDoe
10-30-2007, 01:07 PM
As for me, I would like simple drawing mechanism via glBegin/glVertex2(3)/glEnd. It's very convenient way for drawing, for example, fullscreen quad with texture on it. I do not like an idea, that I must create a vertex buffer, fill it with an appropriate vertex data, and store this object as member in my classes. Or, for example, if I need to change a size of fullscreen quad, I must re-fill the buffer again and so on...

Don't kill convenient glBegin/glEnd approach, only for convenience!

PkK
10-30-2007, 01:53 PM
- S3TC is a required texture compression format


I think that's a really bad idea if you want to release OpenGL 3 before 2017.
S3TC is patented. It would be impossible to implement OpenGL without a patent licence.
Example: This means that Mesa can't go do OpenGL 3 and thus all the Linux drivers based on it won't support OpenGL 3.

If OpenGL 3 requires S3TC the free software community will have to create it's own 3D API instead of using OpenGL.

Philipp

Overmind
10-30-2007, 01:55 PM
I think it's a bit late for suggesting big changes, the spec is supposed to be nearly finished ;)


It's the combination of the size and the format that makes it impossible.

Sometimes it's even worse, it's not only the format, but the combination of format and usage. For example, some hardware might support 16 bit float textures, but only without filtering and mipmapping. Now, if you would simply have a single flag "supports 16 bit float", then you could not answer yes or no, because in certain specific cases, the implementation does actually support 16 bit float.

You can't possibly enumerate all allowed combinations, so the approach with just trying to create the object and catching the error is the only sensible way this works. Everything else will either restrict the use more than neccessary, or force software emulation.

Just look at the current situation with NPOT. Some hardware doesn't support full NPOT, but NPOT with some usage restrictions works. Now the vendors can choose, either not expose NPOT at all, or expose NPOT and provide software fallback for when the (unspecifiable) conditions are not satisfied.

Stephen A
10-30-2007, 01:55 PM
As for me, I would like simple drawing mechanism via glBegin/glVertex2(3)/glEnd. It's very convenient way for drawing, for example, fullscreen quad with texture on it. I do not like an idea, that I must create a vertex buffer, fill it with an appropriate vertex data, and store this object as member in my classes. Or, for example, if I need to change a size of fullscreen quad, I must re-fill the buffer again and so on...

Don't kill convenient glBegin/glEnd approach, only for convenience!
It's not just convenience. GL2 has at least 5 ways of uploading vertex data (immediate mode, vertex arrays, display lists, compiled vertex arrays, vertex buffer objects), which is bad for driver writers and confusing for end-users.

Even in the case you mention, I'm not sure that declaring a VBO is less convenient. Immediate mode needs 10 function calls to define a textured quad, while a VBO needs a simple data array, 3 function calls to create it (enable VBOs, generate a handle, upload data) and 1 to draw it. Plus, it's much more efficient - it seems like a win-win situation to me.

PkK
10-30-2007, 02:04 PM
As for me, I would like simple drawing mechanism via glBegin/glVertex2(3)/glEnd. It's very convenient way for drawing, for example, fullscreen quad with texture on it. I do not like an idea, that I must create a vertex buffer, fill it with an appropriate vertex data, and store this object as member in my classes. Or, for example, if I need to change a size of fullscreen quad, I must re-fill the buffer again and so on...

Don't kill convenient glBegin/glEnd approach, only for convenience!

There's no place for that in OpenGL any more. Convenience stuff can go into GLU (and I hope it will instead of just disappearing).

Philipp

Korval
10-30-2007, 02:14 PM
I thought the legacy stuff like modelview and projection matrices, light parameters etc. would go away!

They are. But glslang compilers can still recognize these bits of state, should the user of legacy glslang code wish to use that legacy state.

What there won't be is actual context state for these. You have to provide the data as uniforms if you want to use them.

Humus
10-30-2007, 03:43 PM
No. It's much faster to have real alpha test than to do it in shaders.

No, it's faster to kill fragments in the shader. I'm not aware of any reason why alpha test should be faster. Alpha test forces the shader to be executed in full. With discard it can be optimized to various degrees. If you issue the discard at the top of the shader the hardware could potentially early-out. I believe for instance GF 8000 series do this. ATI cards do run the entire shader, however, at least all texture lookups are killed for dead fragments which saves bandwidth. That's supported in the X1000 series and up IIRC (might be supported in the X800 generation too, don't remember exactly).

It's worth noting that DX10 has removed alpha test. I think it's a good candidate for removal from OpenGL as well.

Humus
10-30-2007, 03:58 PM
Because it's redundant in the face of alpha test?

Alpha test is the redundant feature. It's less flexible and offers no advantage of shader controlled fragment kills. With alpha test you burn an output channel that could be used for something useful. Alpha test is a legacy from the fixed function days.


You may as well say that you should do depth testing in shaders simply because you can.

Except that kills z-optimizations and is incompatible with multisampling. There are no particular "alpha-test optimizations" that the hardware does, it's rather the case that alpha-test is more of a headache for the hardware.


That alpha test hardware is there, it's fast, and it doesn't require a shader conditional followed by a discard. There's no reason not to expose it.

The conditional could be removed if OpenGL exposed a clip() equivalent that's available in DX, if the compiler isn't smart enough to convert your conditional to a clip() under the hood anyway. Btw, I disagree with the "fast" part. It's slow and provided for compatibility with legacy applications. It's a good candidate for removal from hardware as well.

ZbuffeR
10-30-2007, 04:03 PM
- S3TC is a required texture compression format


I think that's a really bad idea if you want to release OpenGL 3 before 2017.
S3TC is patented. It would be impossible to implement OpenGL without a patent licence.
Example: This means that Mesa can't go do OpenGL 3 and thus all the Linux drivers based on it won't support OpenGL 3.

If OpenGL 3 requires S3TC the free software community will have to create it's own 3D API instead of using OpenGL.

Philipp

Indeed, S3TC is a strange requirement. Who is actually using it by the way ? I am sincere.

ector
10-30-2007, 04:23 PM
Indeed, S3TC is a strange requirement. Who is actually using it by the way ? I am sincere.

Every major game out there for the last 5 years. The performance increase it offers over uncompressed textures is pretty dramatic, and as a bonus consumes less memory. The slight quality loss is very often worth it.

Korval
10-30-2007, 05:04 PM
Indeed, S3TC is a strange requirement.

Not really.

Take a good look at this forum. How many developers mistakenly look on extensions as something to be avoided where possible? They look at extensions as conditionally supported, even those that are commonly supported by virtually all implementations.

S3TC is basic functionality, whether patented or not. It is expected of any hardware, and not having it be a core feature suggests to people that it is not widely available. This is not the image that GL 3.0 needs to project.


Every major game out there for the last 5 years.

Oh, I think longer than that.

S3TC started showing up around the transition to the GeForce 2. And it was such a popular idea that Quake 3 was patched to allow for it. Even today, most games use S3TC where they can. Plus, S3TC artifacts aren't nearly as noticeable on higher-resolution textures.

Korval
10-30-2007, 05:31 PM
A format object has to be specified per texture attachment when a Program Environment Object is created. This helps minimize the shader re-compiles the driver might have to do when it discovers that the combination of shader and texture formats isn't natively supported by the hardware.

Hmmm.

Now, I understand perfectly well why this is done. Instances of programs (I really wish you would call them instances rather than "environment objects") may provoke recompilation based on what format of texture you use. And you don't want to allow for the possibility of provoking recompilation during times that aren't conceptually setup (like using a program instance rather than simply building one).

That being said... there's got to be a better way. The problem here is simple: I have to create an entirely new instance if I want to switch a texture from RGBA32 to, say, S3TC. The problem is that things that don't make a difference with regard to program recompilation are being swept up with things that do.

Maybe you could have it so that attaching a texture object/sampler pair to a program instance slot can fail. That way, you can mix and match formats so long as it isn't "too different", which is implementation defined.

V-man
10-30-2007, 05:43 PM
Because it's redundant in the face of alpha test?

Alpha test is the redundant feature. It's less flexible and offers no advantage of shader controlled fragment kills. With alpha test you burn an output channel that could be used for something useful. Alpha test is a legacy from the fixed function days.


You may as well say that you should do depth testing in shaders simply because you can.

Except that kills z-optimizations and is incompatible with multisampling. There are no particular "alpha-test optimizations" that the hardware does, it's rather the case that alpha-test is more of a headache for the hardware.


That alpha test hardware is there, it's fast, and it doesn't require a shader conditional followed by a discard. There's no reason not to expose it.

The conditional could be removed if OpenGL exposed a clip() equivalent that's available in DX, if the compiler isn't smart enough to convert your conditional to a clip() under the hood anyway. Btw, I disagree with the "fast" part. It's slow and provided for compatibility with legacy applications. It's a good candidate for removal from hardware as well.

Alpha test and fragment kill is not the same.
With alpha func, you have control over the function (AlphaFunc) and can provide a reference alpha.

With discard, it depends on the GPU, but the first spec that came out (ARB_fragment_shader), a fragment was killed if a component was negative.

Is alpha func actually a hw or is it simulated with shaders?

praetor_alpha
10-30-2007, 08:02 PM
Why is DOUBLE being dropped? We know that future hardware will be able to support it, regardless. Isn't OpenGL supposed to be a forward looking standard?

Will EXT framebuffer object be core?

Nevertheless, I am looking forward to GL3, and see how much it parallels OpenGL ES 2.

Toji
10-30-2007, 09:47 PM
I don't have much to say about the arguments over alphatesting/texture formats/etc (though I will say that I think glVertex3f() and crew need to go away. No point in having them anymore) but I would like to give a big thank-you to the guys at Khronos for not leaving us in the dark any longer! Even if it has been delayed it's great to know that it's only because you're making it better!

PkK
10-30-2007, 11:52 PM
Indeed, S3TC is a strange requirement. Who is actually using it by the way ? I am sincere.

Every major game out there for the last 5 years. The performance increase it offers over uncompressed textures is pretty dramatic, and as a bonus consumes less memory. The slight quality loss is very often worth it.

If you want the performance increase and save graphics memory you can use a generic compressed format and let the card handle the details. No need to explicitly require S3TC support.
Recent research has created compression schemes which give better quality than S3TC, but don't need more memory (ETC2 is an example).

Philipp

Stephen A
10-31-2007, 12:52 AM
S3TC *is* a strange requirement, because as was said previously as the format is patented. To me, the decision sounds more like politics getting in the way. Mesa3d will probably work around the issue by substituting S3TC with a different compression scheme behind the scenes.

As PkK said, the best solution would be to support compressed formats and let the driver choose the best implementation (S3TC, ETC2 etc), according to the hardware's capabilities.

bobvodka
10-31-2007, 02:16 AM
Why is DOUBLE being dropped? We know that future hardware will be able to support it, regardless. Isn't OpenGL supposed to be a forward looking standard?


The problem with leaving double in however is that it looks like the hardware can do it, and right now afaik no hardware can (not even the G92 based chips and probably not the new AMD ones either when they appear) which puts us a good year away from it being useful and instead confusing matters for an API which is meant to be closer to how the hardware works.

Double will be back I'm sure, just not for a year at least, during which time we are due at least 2 more updates to GL (Long Peaks Reloaded and Mt. Evans), plenty of time to add it back in.



Will EXT framebuffer object be core?


FBO like functionality is at the core of GL3 for rendering.

sqrt[-1]
10-31-2007, 03:46 AM
As PkK said, the best solution would be to support compressed formats and let the driver choose the best implementation (S3TC, ETC2 etc), according to the hardware's capabilities.

That would be a fatal choice. That would mean you would have have to ship your app with all textures uncompressed and compress at load/install time.

Do you have any idea how long it takes to do quality S3TC compression? On our current app it would take hours to do this. It is second only to shader pre-compiling. (but that is another discussion)

Also each compression scheme has different trade offs - you would have to specify stuff like "is this a color map?" "is this a normal map"? "is this a height map?" to enable the driver to select a appropriate compression format.

Stephen A
10-31-2007, 05:15 AM
As PkK said, the best solution would be to support compressed formats and let the driver choose the best implementation (S3TC, ETC2 etc), according to the hardware's capabilities.

That would be a fatal choice. That would mean you would have have to ship your app with all textures uncompressed and compress at load/install time.
Good point.


Also each compression scheme has different trade offs - you would have to specify stuff like "is this a color map?" "is this a normal map"? "is this a height map?" to enable the driver to select a appropriate compression format.
I am not sure what you are trying to say. Decisions like selecting the correct format for normal maps, height maps, font textures etc are simply unavoidable.

Edit: Your name is breaking forum quotes ;)

Jan
10-31-2007, 05:34 AM
I like alpha-tests and already feared they might be removed. Good to know it will stay.


What i'd like to know more about is the "default state" as mentioned above. It said you can "render a simple polygon without specifying vertex-buffers...". I would like to know more about what the default-state will allow. E.g. is simple texturing (not multi-texturing) available, too? (i doubt it)

Also, i'd like to know everything about context-creation (multisampling, adaptive antialiasing, etc.) and how the "drawable" interacts with the window-system. Using FBOs all the time and then only "presenting" the result to the window-system is what i would like to do in the future.

And i'd like to know what general information one can query from the system. For example gfx-memory, vender, renderer and, of course, whether the extension-system has been modified.

A list, which current extension will be "core" in 3.0 would be nice. In general a "minimal-requirements" listing.


For the PR department:
I think for OpenGL 3 to get a bit more excitement, outside of these forums, it would be great, if there will be a new logo (maybe the old one, but revamped). It should be available in several formats and at low to very high resolutions, for people to use it on websites, in games (option-menus...) etc.

A small video featuring an animated OpenGL logo would be very cool, so that game-developers can play it at program-startup (like the nVidia, ATI, etc. logos).

This would allow people to show off, that they are using OpenGL 3 and thus it will be more present to the public.


Jan.

Chris Lux
10-31-2007, 06:21 AM
Also, i'd like to know everything about context-creation (multisampling, adaptive antialiasing, etc.) and how the "drawable" interacts with the window-system. Using FBOs all the time and then only "presenting" the result to the window-system is what i would like to do in the future.
this too is what interests me the most. i also think that framebuffer creation and handling should be passed to the gl and only the result blitted onto the OS surface.


For the PR department:
I think for OpenGL 3 to get a bit more excitement, outside of these forums, it would be great, if there will be a new logo (maybe the old one, but revamped). It should be available in several formats and at low to very high resolutions, for people to use it on websites, in games (option-menus...) etc.

A small video featuring an animated OpenGL logo would be very cool, so that game-developers can play it at program-startup (like the nVidia, ATI, etc. logos).

This would allow people to show off, that they are using OpenGL 3 and thus it will be more present to the public.
great idea! :D

k_szczech
10-31-2007, 08:27 AM
A small video featuring an animated OpenGL logo
Maybe we should start a contest for GL3 static and animated logo in a separate thread then?
Now let me think... Should we present OpenGL as something bright and futuristic (bright future?) or something dark and powerful... ;)

Korval
10-31-2007, 08:53 AM
What i'd like to know more about is the "default state" as mentioned above.

I don't think the whole "default state" thing exists to be usable. That is, you aren't expected to actually use it in any real application. It is there to make sure that the context is viable from GL startup.

The only piece of default state that you might find useful is the default framebuffer object.

Jan
10-31-2007, 09:08 AM
k_szczech: Yes, the idea to make it into a contest came to my mind, too. Depends on the resources the ARB can dedicate to such a task. Though, it might be difficult to find skillful artists among programmers...

Korval: I agree, it will certainly be that way. But i'd still like to hear the details from the ARB in the next pipeline newsletter.

Jan.

Zengar
10-31-2007, 10:05 AM
Also each compression scheme has different trade offs - you would have to specify stuff like "is this a color map?" "is this a normal map"? "is this a height map?" to enable the driver to select a appropriate compression format.
I am not sure what you are trying to say. Decisions like selecting the correct format for normal maps, height maps, font textures etc are simply unavoidable.


I think what he wanted to say is that the compression algorithm may vary depending on data interpretation.

MZ
10-31-2007, 10:28 AM
Yes, some way to ask what level of FP support is available would be vitial imo.

Asking questions. That's so DirectX ;)

DirectX way looks great on paper, but real D3D games resort to checking card/vendor id and using it to pick graphics settings from prepackaged database. Hell, even Microsoft's own games do it, so we can take it as testament on how "future proof" the DirectX way is.

Stephen A
10-31-2007, 10:29 AM
Also each compression scheme has different trade offs - you would have to specify stuff like "is this a color map?" "is this a normal map"? "is this a height map?" to enable the driver to select a appropriate compression format.
I am not sure what you are trying to say. Decisions like selecting the correct format for normal maps, height maps, font textures etc are simply unavoidable.

I think what he wanted to say is that the compression algorithm may vary depending on data interpretation.

Yes, that's why decisions like this are unavoidable :) At some point, you'll have to decide which compression format to use (if any) for each asset. I don't see how S3TC (as a baseline format) will affect that decision.

Korval
10-31-2007, 10:50 AM
DirectX way looks great on paper

No it doesn't. As I pointed out in the next paragraph.

Further, I was making a joke ;)

Zengar
10-31-2007, 11:20 AM
@Stephen A: S3TC is basically a decompression, not a compression algorithm. Compression quality depends on the compression algorithm (and that may vary depending on data type). I would gladly bring a real world example, but my experience with S3TC is rather limited. But imagine, that the quality of r and g is more important to you than the quality of b. You can actually write a compression algorithm that would work this way, but you can't hope on automatic driver compression to be your best choice.

santyhamer
10-31-2007, 01:20 PM
OpenGL 3 will not have support for the GL_DOUBLE token

That's not a problem really. Double precision could be added using a future extension.



S3TC is a required texture compression format

I still prefer to pass raw RGBA data and let the driver to compress with the best format available(given some hints). With things like CUDA the texture compression can be done very fast so the loading speed won't suffer too much.... just a bad thing... the artists could not preview the final texture quality... I think here is where the S3TC can help.



16 bit floating point support is now a requirement for textures and renderbuffers. Supporting texture filtering and blending is still optional for these formats.

Ok, but please add some function like DX has to know if blending/filtering is supported for a specific texture format. Not sure if OGL3 gonna use the software fallback in case something is not supported(I heard not, which is good... better emit an error... and add a debug layer + emulation layer like DX ).

I can't wait to see the pipeline news!

Korval
10-31-2007, 02:34 PM
Ok, but please add some function like DX has to know if blending/filtering is supported for a specific texture format.

Did you actually read the thread? Or anything that's been written about GL 3.0? We've gone over this a lot, and it's been confirmed by the ARB as to how this will work.

Humus
10-31-2007, 03:24 PM
Alpha test and fragment kill is not the same.
With alpha func, you have control over the function (AlphaFunc) and can provide a reference alpha.

It's not the same, but everything you can do with alpha test can be done with discard, and vice versa.


With discard, it depends on the GPU, but the first spec that came out (ARB_fragment_shader), a fragment was killed if a component was negative.

What depends on the GPU? Fragment kills are well defined.


Is alpha func actually a hw or is it simulated with shaders?

Still hardware, but it wouldn't surprise me if it would be removed in say DX11 or DX12 hardware.

Humus
10-31-2007, 03:27 PM
But imagine, that the quality of r and g is more important to you than the quality of b. You can actually write a compression algorithm that would work this way, but you can't hope on automatic driver compression to be your best choice.

And in fact, you can do this already in Compressonator. You can specify the weighting between the channels. I've only needed to use that once, but boosting importance of blue removed the banding in a blue gradient for one texture.

sqrt[-1]
10-31-2007, 04:15 PM
I am not sure what you are trying to say. Decisions like selecting the correct format for normal maps, height maps, font textures etc are simply unavoidable.



What I meant was if you are letting the driver decide the compression, it would need lots more info to decide what format to use. Say you have an alpha channel that is only black or white, you could use DXT1A type format, but a driver cannot make these decisions - only you know how the textures will be used and what trade off can be made.


So yes, a decision on a compression format still needs to be made, but it should be your decision, not the drivers.

Korval
10-31-2007, 04:42 PM
It's not the same, but everything you can do with alpha test can be done with discard, and vice versa.

Not true. The thing about an alpha test is that it is an alpha test. That is, it tests alpha and determines pass-fail. Which means if the fragment passes, this is the alpha that will be used by the per-sample operations. Period.

Discard can be used on an arbitrary condition. So if don't discard, the alpha can be whatever you want it to be.


Still hardware, but it wouldn't surprise me if it would be removed in say DX11 or DX12 hardware.

You'll see alpha-test go away when you see depth tests go away. IE: never. It's minuscule hardware for a good performance gain.

pudman
10-31-2007, 07:27 PM
That is, it tests alpha and determines pass-fail. Which means if the fragment passes, this is the alpha that will be used by the per-sample operations. Period.

Would it make senses to support GLSL semantics that support a construct such as this? I mean, would there be times when programmatically you'd want some value (other than alpha) to "be used by the per-sample operations" based on a condition (or strictly discard)?

Why not expound the alpha test to be more generic in line with the programmatic model?


It's minuscule hardware for a good performance gain.

But have there been other "minuscule hardware" improvements that, as a whole, if removed would simplify chip design?

Maybe this would be in line with a Mount Evans improvement? Getting rid of even more fixed functionality?

tranders
10-31-2007, 07:53 PM
Seems like a slower, more annoying version of asking questions to me.

Nonsense. It's much more flexible.

By taking fully formed formats and returning yes/no, you have the ability to forbid things that are not easily quantifiable.

Hit and miss algorithms are highly inefficient and are definitely NOT flexible. GL3 should provide sufficient query capabilities (prior to actually attempting to create an object) so an application can pre-select a broad range of attributes that more than likely will succeed prior to beginning any committal of graphics resources.

I would only hope that the return is more than just yes/no and actually tells the caller why the format can't be created (e.g., "Insufficient graphics memory", etc.)

Korval
10-31-2007, 08:51 PM
Hit and miss algorithms are highly inefficient and are definitely NOT flexible.

I don't particularly know what you mean by "inefficient" (we are talking about start-up code here), but I clearly disproved your "not flexible" thing by showing you a place where object creation clearly proved more explicit at showing limitations than simple queries.


GL3 should provide sufficient query capabilities (prior to actually attempting to create an object) so an application can pre-select a broad range of attributes that more than likely will succeed prior to beginning any committal of graphics resources.

First, a format object is not a "committal of graphics resources" by any reasonable measure. It's not even an object the actual GPU recognizes.

Second, even if you could query things, what would you do with that information? I mean, your format object creation code still needs to have fallbacks (as I described previously), so you may as well make format object creation your form of querying.


I would only hope that the return is more than just yes/no and actually tells the caller why the format can't be created (e.g., "Insufficient graphics memory", etc.)

After thinking about it for a while, I came to a certain conclusion. It just doesn't matter?

What exactly would you do with information like "the texture size is too big, considering the other format parameters?" I mean, is your format creation code going to be a tree of format possibilities? Where if it doesn't work one certain way, you try something else compared to if it didn't work a different way?

No. Odds are, it's going to be linear. A is the primary format. If A fails, try B. If B fails, try C. If C fails, there aren't any more fallbacks, so assert or throw an exception or something. It's really that simple.

So unless you can explain, with reasonable code, exactly what you intend to do with such information, I would have to say that it is superfluous. It's also the reason why FBO has a dozen different completeness failures for implementation-defined reasons, but only one for implementation-specific ones.

Komat
11-01-2007, 01:42 AM
You'll see alpha-test go away when you see depth tests go away. IE: never. It's minuscule hardware for a good performance gain.
It is already gone from DX10 api. As Humus pointed out the discard operation has both greater flexibility and even in some current implementations it should have performance better or equal to the alpha test. There is no reason for the hw to waste transistors for dedicated alpha testing unit when the driver can append test&discard equivalent to the shader and get possible optimization bonus from that.

Zengar
11-01-2007, 01:49 AM
Korval, you can't compare depth test and alpha test. Alpha test is a hack and absolutely superfluous on GPUs with fast discard. Actually, I won't be suprised if newest cards emulate it with shaders...

Eric Lengyel
11-01-2007, 02:43 AM
- Interleaved depth/stencil is a required format for FBO rendering


Thank you! That's right, don't let the slackers at a certain company (that shall go unnamed, but we all know who I'm talking about) skimp on FBOs any more because depth/stencil was left out of the original extension spec. I still have to support pbuffers because of them. >-(

tranders
11-01-2007, 10:06 AM
I don't particularly know what you mean by "inefficient" (we are talking about start-up code here), but I clearly disproved your "not flexible" thing by showing you a place where object creation clearly proved more explicit at showing limitations than simple queries.
Being explicit does not imply efficiency.

I don't know how many different options are available to the format creation request, but the total number of possible failures is N-factorial (n!). A linear search over a permuted range of options is in no way efficient -- no matter how you look at it. Also, it is not always possible to perform these operations at startup.


What exactly would you do with information like "the texture size is too big, considering the other format parameters?" I mean, is your format creation code going to be a tree of format possibilities? Where if it doesn't work one certain way, you try something else compared to if it didn't work a different way?
If I know before hand that a particular filtering mode is not supported, why should I attempt to create a format only to have it fail? That would imply a decision tree. Very efficient. I should be able to determine from the API what my limits are without having to discover them manually or try to infer them from some bizarre series of failures. While fallbacks are required for any implementation, they should be the exception -- not the rule.

Programming by exception is NOT exceptional programming.

Korval
11-01-2007, 10:28 AM
It is already gone from DX10 api.

Well that's good for them. Personally, I would rather that IHVs not have to deduce that I'm doing an alpha test from my glslang code. These are people who, after all, cannot even write a decent C-style language compiler after 2.5 years.

I don't trust them to detect that I'm doing an alpha test at the end and modify the pipeline accordingly.


A linear search over a permuted range of options is in no way efficient

So here's an idea: don't do that!

I mean, it's not like you don't have images on-disc in a particular format that you were planning to use this format with. You know exactly what this particular format needs to do. You know what your minimum requirements are for that format. So most of that range of options is totally useless for creating the format of interest.

Not only is a "linear search over a permuted range of options" inefficient, it is a broken algorithm. It may well create a format that you cannot use with those images.

So instead you lay out a series of format choices, from the least likely to succeed to the minimum. More than likely, you only have 1 fallback, but I could see 2 or 3 in some cases. And you try them. If the minimum fails, then the format can't be created and you exit.

For example, let's say you have a format that you intend to use for RGBA floating-point render targets. But it needs to support framebuffer blending. So, the minimum format you can except is FP16 with blending. The first one you try (if you want the precision and can accept the performance loss) is FP32 with blending. If that fails, then you try FP16. If that fails, you raise an error.

Rather than saying, "I want to implement my format creation algorithm like this," you should be looking at GL 3.0 and using that to inform how your format creation algorithm will work.


Also, it is not always possible to perform these operations at startup.

Maybe not, but it would only ever be so in a program like Maya, Max or some other tool where texture formats were controlled specifically by the user. And in that case, you would simply tell the user that the format is not available.


If I know before hand that a particular filtering mode is not supported

You misunderstood the question.

What you said was, "I would only hope that the return is more than just yes/no and actually tells the caller why the format can't be created." That is, if a format object fails creation, you can query the system as to a specific reason why it failed creation.

Therefore, the reason for failure will not be as simple as, "Filtering mode X is not supported." It will be "Filtering mode X is not supported when you use data format X of pixel size Y, etc". In short, it will not be something you can easily correct. It can't even be an enumeration; it would have to be some kind of string you would need to either parse or look at.

Komat
11-01-2007, 11:37 AM
Personally, I would rather that IHVs not have to deduce that I'm doing an alpha test from my glslang code. These are people who, after all, cannot even write a decent C-style language compiler after 2.5 years.

Why would they need to do something like that if the discard is not worse that the alpha test? Imho it is more likely that they will modify your shader to add the discard to emulate the alpha test on hw where this might result in performance gain so you should be worry of using the alpha test :-)

Korval
11-01-2007, 12:12 PM
Why would they need to do something like that if the discard is not worse that the alpha test?

But it is. Discard takes shader time; at least one opcode's worth. Alpha test is free; it happens alongside the depth test. On hardware that doesn't support actually halting on discard, this will be slower than alpha test.

Overmind
11-01-2007, 12:14 PM
If I know before hand that a particular filtering mode is not supported, why should I attempt to create a format only to have it fail?

Because the format object contains all information that is needed to determine weather it supports the format. And the other way round, the driver may need all the information from the format object to be able to accurately determine if the format is supported.

That's exactly the purpose of format objects, to provide the driver with all information it needs *before* actually rendering, so it can determine if (and how) it can be done once at the beginning, and not every time something is rendered.

Given these facts, it seems bad design to be required to feed the information to the driver twice, once to query, and once to actually create the object.


GL3 should provide sufficient query capabilities

GL3 does provide sufficient query capabilities. Format objects are the query capability of GL3. Anything less would not be sufficient.

You said it yourself, there are too many combinations to ever enumerate all possibilities, so how would your query look like?

The only way I see how this query could look is to feed every single format *and* usage parameter to GL and ask "can this be done?". And that's exactly how it's going to be, with the additional bonus that you get a format object if it can be done. After all, who would ask the system if something can be done if he doesn't intend to actually do it?

Everything less than that will either produce false positives or false negatives because of insufficient information. The driver would have to assume something it doesn't know. If it assumes too much, it leads to false positives, which in turn would lead to software fallback or undocumented failures. Both are unacceptable. If it is conservative in its assumtions, it leads to false negatives, so not all features of the hardware can be used, which is also bad.

Note that I don't claim this is the most efficient solution. I claim it's the only solution that's actually working.

Please, tell me a better alternative to query these things, *and* show me that it will never produce false positives or false negatives, on any hardware (current and future).

Overmind
11-01-2007, 12:27 PM
Why would they need to do something like that if the discard is not worse that the alpha test?

But it is. Discard takes shader time; at least one opcode's worth. Alpha test is free; it happens alongside the depth test.


it's faster to kill fragments in the shader. I'm not aware of any reason why alpha test should be faster.

I'm going to believe Humus on this one, at least on ATI hardware ;)


On hardware that doesn't support actually halting on discard, this will be slower than alpha test.

Yes, and that's probably the reason why it's still in GL3, because GL3 is supposed to support this generation of hardware. But on current hardware, alpha test is not needed anymore.

It's bad timing for GL3, wait a year or two and we could get rid of the alpha test, too. But there is always going to be some feature that's nearly obsolete, but not quite. So it's always going to be bad timing for a major API revision, so let's just leave it in.

Even if alpha test vanishes from hardware, it should be trivial to emulate in a shader, so no real harm is done. And in another decade or two, we can make another API revision removing all this legacy stuff like textures and triangles :D

Korval
11-01-2007, 12:39 PM
, we can make another API revision removing all this legacy stuff like textures and triangles

No need for an API revision. Alpha test lives in a certain state object. Just create an extension that exposes a different object (that can go in the same place in the context) that doesn't have an alpha test field.

Humus
11-01-2007, 03:08 PM
Not true. The thing about an alpha test is that it is an alpha test. That is, it tests alpha and determines pass-fail. Which means if the fragment passes, this is the alpha that will be used by the per-sample operations. Period.

This is trivial to implement with discard:

gl_FragColor = ...;
if (gl_FragColor.a < ref) discard;


Discard can be used on an arbitrary condition. So if don't discard, the alpha can be whatever you want it to be.

Well, you can implement any function you want though just by computing a boolean result to alpha. But yeah, you'll waste a channel with alpha-test.


It's minuscule hardware for a good performance gain.

Could you elaborate on where you expect this performance gain to come from? Both ATI and Nvidia has adviced developers to not use alpha test for ages. Not that discard is friendly to the pipeline either, these fragment discard features are unfortunately a neccesary evil, but at least discard can early-out the shader.

Humus
11-01-2007, 03:12 PM
Thank you! That's right, don't let the slackers at a certain company (that shall go unnamed, but we all know who I'm talking about) skimp on FBOs any more because depth/stencil was left out of the original extension spec. I still have to support pbuffers because of them. >-(

I assume you're talking about ATI, but GL_EXT_packed_depth_stencil is supported already. At least it's in my extension list. Haven't tried to use it myself though.

Humus
11-01-2007, 03:17 PM
Personally, I would rather that IHVs not have to deduce that I'm doing an alpha test from my glslang code.

Why would an IHV care if your test is equivalent to an alpha test?


I don't trust them to detect that I'm doing an alpha test at the end and modify the pipeline accordingly.

No IHV would ever try to convert a discard into an alpha test. However, an IHV might very well try to pull the test as early in the shader as possible, something that's not possible using the alpha test.

Humus
11-01-2007, 03:26 PM
But it is. Discard takes shader time; at least one opcode's worth.

By the same logic dynamic branching should not be used because the branch logic may take an opcode or two, so we should just select the result at the end of the shader instead! I'm sure you can come up with a very simple example shader where alpha test would be an entire instruction faster, but I can come up with examples where discard is an order of magnitude faster than alpha test.


Alpha test is free

No, always disable it for everything you draw, and draw your alpha test passes as late in the frame as possible.


On hardware that doesn't support actually halting on discard, this will be slower than alpha test.

All modern hardware has at least some form of optimization for discard. In 99% of the cases discard is the faster option.

Korval
11-01-2007, 04:05 PM
This is trivial to implement with discard:

Did you read what I was responding to? This: "It's not the same, but everything you can do with alpha test can be done with discard, and vice versa."

My point is that alpha test cannot emulate the functionality of discard, not the other way around.


Both ATI and Nvidia has adviced developers to not use alpha test for ages.

Since when? Oh, I've heard them suggest not using alpha test for reasons of visual quality. But I have never heard them favor discard over alpha test.


Why would an IHV care if your test is equivalent to an alpha test?

So that they could remove those shader opcodes and replace them with an alpha test.


By the same logic dynamic branching should not be used because the branch logic may take an opcode or two, so we should just select the result at the end of the shader instead!

That's a compiler issue; how to deal with conditional branches in shaders. Alpha test is a per-sample operation.


I can come up with examples where discard is an order of magnitude faster than alpha test.

Not without making reference to hardware that has dynamic branching and early termination support.

Some of us actually still care about R300/R400 class hardware. It's going to be around a while, and having support for it would be nice. Also, they're a lot more sensitive to performance due to number of opcodes.


No, always disable it for everything you draw, and draw your alpha test passes as late in the frame as possible.

I didn't say that you should leave it on all the time, but it certainly isn't as expensive as a shader opcode.

glDesktop
11-01-2007, 04:11 PM
I would like to know, if we have something in OpenGL 3 that replaces "wglSwapIntervalEXT".

It would be very good to have a simple cross platform solution.

Much appreciated.

knackered
11-01-2007, 05:09 PM
I see there's no mention of binary blobs again. I think you're underestimating their importance. I'd rather wait another month if it means getting binary blobs. I'd also have rather you spent time spec'ing the binary blobs than coming up with this text buffer nonsense to support #include's. I already support includes in my renderer, it's a trivial bit of search&replace - took 20 minutes to implement, and I'm sure that included making a cup of tea.
Not that I'm not grateful for your OpenGL charity work.

Jan
11-01-2007, 06:15 PM
ATI/nVidia HAVE advised for AGES not to use alpha-test, if you can prevent it. There are plenty of papers mentioning it.

This is contrary to the times, when there were software-renderers (and the very early HW implementations), because at that times alpha-test was advised to use as often as possible to kill as many fragments as early as possible. Though there were no shaders at that time, of course.

Discard is also advised not to use, of course.

However, so far i have not seen a single paper, that said to prefer discard over alpha-test, if you actually need the functionality. I always got the impression that discard is by far the most evil per-fragment operation.

But if the ARB decides to include alpha-test in GL3, i believe they do have good (enough) reasons. And, as knackered pointed out, there are actually much more important things.

Jan.

Korval
11-01-2007, 06:25 PM
I see there's no mention of binary blobs again.

And... why would they?

They made it clear that such functionality was being considered for the post-3.0/pre-Mt. Evans refresh. It might be nice to hear some confirmation that they intend to go forward with such functionality, but they haven't provided anything more on any of the other post-3.0/pre-Mt. Evans features either.

Eric Lengyel
11-01-2007, 08:03 PM
Thank you! That's right, don't let the slackers at a certain company (that shall go unnamed, but we all know who I'm talking about) skimp on FBOs any more because depth/stencil was left out of the original extension spec. I still have to support pbuffers because of them. >-(

I assume you're talking about ATI, but GL_EXT_packed_depth_stencil is supported already. At least it's in my extension list. Haven't tried to use it myself though.

90% of my customers are still running on Windows XP. The GL_EXT_packed _depth_stencil extension is not supported in the ATI driver (7.10) under Windows XP, so I must fall back to pbuffers.

The extension does show up under Windows Vista, but it did not function correctly for a long time, and I again had to fall back to pbuffers. I just checked, and it does appear to be working correctly under Vista now, but at a 50% performance penalty compared to XP on the same machine.

Will ATI's OGL3 driver support XP?

sqrt[-1]
11-01-2007, 08:29 PM
Will ATI's OGL3 driver support XP?


You do realize that Humus no longer works for ATI right? (We need to get another OpenGL forum member as a mole for ATI)

V-man
11-02-2007, 02:27 AM
I would like to know, if we have something in OpenGL 3 that replaces "wglSwapIntervalEXT".

It would be very good to have a simple cross platform solution.

Much appreciated.


I'm not sure if that will ever happen. I don't know about every every OS out there, but each offers different capabilities.
Even SwapBuffers, probably won't ever be core in GL.

knackered
11-02-2007, 03:39 AM
And... why would they?
because they have suddenly become high priority for me, and looking at recent threads, they have for lots of others going the unified shader route.
...and it should be a doddle, as it's already in ES.

tranders
11-02-2007, 05:25 AM
Not only is a "linear search over a permuted range of options" inefficient, it is a broken algorithm. It may well create a format that you cannot use with those images.
If I understand correctly, you first create a format object and then use that format object to bind to an image object. There is no guarantee that the bind will succeed either (e.g., supported format, insufficient resources). Then I have to delete the format and try one with a lesser demand on the system. Same boat different paddle.


Maybe not, but it would only ever be so in a program like Maya, Max or some other tool where texture formats were controlled specifically by the user. And in that case, you would simply tell the user that the format is not available.
I don't program for commercial entertainment. Everything I do is pretty much controlled by the demands of my users and I don't have the luxury of telling them that something is not supported. I have to provide an alternative -- period.


You misunderstood the question.

What you said was, "I would only hope that the return is more than just yes/no and actually tells the caller why the format can't be created." That is, if a format object fails creation, you can query the system as to a specific reason why it failed creation..
Once again, an inefficient algorithm to determine what went wrong after the fact.

Let me try this -- no, unsupported filtering.
Let me try this -- no, unsupported filtering.
What about this? -- no, unsupported depth.
Ok, surely you support this? -- yes!
Whew! I finally found something you support.

I don't like APIs that prevent me from knowing about specific limitations unless I try to create an object and wait for a failure. I don't create rendering contexts this way, so why should I be expected to do this for images? Even creating a format object doesn't guarantee that a particular size image will successfully bind. And please don't claim that a creation routine is a query routine -- it's not.

I agree with the OP for this sub-thread.

tranders
11-02-2007, 05:42 AM
Note that I don't claim this is the most efficient solution. I claim it's the only solution that's actually working.

Please, tell me a better alternative to query these things, *and* show me that it will never produce false positives or false negatives, on any hardware (current and future).
So we continue to wait for an inefficient solution.

FWIW, if I know that a particular filtering mode is not supported at all for a particular graphics card, why should I even bother attempting to create a format object with that filtering mode? THAT is a better alternative. A failure for one combination does not provide any insight that another combination will or will not work -- only that that combination has failed. I understand that there can be a huge number of possible combinations but only a relative few supported combinations. If you don't understand the difference between n! and (n-1)!, then you can't possibly understand the problem.

Future proofing has nothing to do with having or not having a decent query system.

l_belev
11-02-2007, 07:06 AM
Seems like a slower, more annoying version of asking questions to me.

Can I do it this way? No.
Can I do it this way? No.
Can I do it this way? No.
Can I do it this way? No.
Can I do it this way? Yes, and you just did it.
vs
What does the hardware support? This and that.
Okay, I do it that way. Fine.


directx' approach is to add paralel APIs only for the questions.
These query APIs come with their own set of tokens, structures, etc. which are somewhat corresponding but still very different from the main APIs - great complexity for the user.
That paralel APIs require appropriate updates every time the main (functional) APIs are changed - very cumbersome and error-prone.
Actually Microsoft fail to keep the query APIs up to date and adequate enough. There are many essential questions that one might want to ask and the API can't answer.

This approach is apparently very user-unfriendly because in practice there are very few directx applications that actually use these query APIs.

About whether it is slower or faster, this is irrelevant because generally one inquires for the hardware capabilities only once, during the start up of the application.

Korval
11-02-2007, 10:31 AM
If I understand correctly, you first create a format object and then use that format object to bind to an image object. There is no guarantee that the bind will succeed either (e.g., supported format, insufficient resources). Then I have to delete the format and try one with a lesser demand on the system. Same boat different paddle.

That's not how it works.

In order to create an image in the first place, you need to have a format object. If you cannot create a particular image object for implementation-defined reasons (as opposed to programmer error), then you either use a fallback format object (unlikely, since your code was probably depending on something intrinsic to that format) or you throw an error.

As for binding, the way all image attachments (framebuffer objects and program environment objects) work now is that as part of the creation of those attachment objects, you specify a format object that all images that get bound to it must use. So, the only reason an image bind to an FBO or PEO would ever fail is because the system could not load all of the images into video memory at the same time. Any format incompatibilities would have been detected as a failure to create the FBO or PEO.


Everything I do is pretty much controlled by the demands of my users and I don't have the luxury of telling them that something is not supported. I have to provide an alternative -- period.

Well, tough. The preponderance of OpenGL users would rather have a "query" mechanism that actually works instead of giving false positives or false negatives. The GL 3.0 version of your program will be more complicated. For the rest of us, our lives will be dramatically simplified.


Once again, an inefficient algorithm to determine what went wrong after the fact.

Please stop harping about this "inefficiency". Nobody cares about minor inefficiency in startup code.


Even creating a format object doesn't guarantee that a particular size image will successfully bind.

... so? An image failing to bind to an FBO/PEO for implementation-dependent reasons (as opposed to using the wrong format object, which is a program error) is about the least likely thing to happen in a program. I imagine that most code wouldn't even catch the error.

Take a 256MB card. In order to fail to bind a VAO/PEO/FBO set of state, the sum total of all buffer objects and image objects used by all three stages must be > 256MB (probably a bit less than 256, but never mind that now). You could do it with a small number of 4096x4096x32-bit textures.

But consider this: what would D3D do? What would OpenGL 2.1 do? I don't know D3D, but I doubt it has any mechanism other than, "Hey, that didn't work." GL 2.1 would give rise to an error on rendering. So GL 3.0 isn't particularly better in this rare circumstance.

Xmas
11-02-2007, 10:44 AM
But it is. Discard takes shader time; at least one opcode's worth. Alpha test is free; it happens alongside the depth test. On hardware that doesn't support actually halting on discard, this will be slower than alpha test.
The cost of a single scalar comparison is getting insignificant given current hardware and shader complexity developments. It's also dwarfed by the impact fragment masking has on early Z optimizations.

Both DX10 and OpenGL ES 2.0 dropped alpha test. Both were heavily influenced by IHVs. There's no reason to believe alpha test has a future as a fixed-function hardware feature.



because they have suddenly become high priority for me, and looking at recent threads, they have for lots of others going the unified shader route.
...and it should be a doddle, as it's already in ES.
OpenGL ES supports loading binary shaders created by an offline compiler. There's no API for retrieving a binary blob from the driver yet, though. The latter is arguably more important if the number of hardware variations is high.

Overmind
11-02-2007, 02:00 PM
FWIW, if I know that a particular filtering mode is not supported at all for a particular graphics card, why should I even bother attempting to create a format object with that filtering mode? THAT is a better alternative.

Ok, and how exactly do you find out that information? I mean detailed enough so none of the problems I described in my previous post appear?

You're right, you may have to test a lot of combinations if you want to know exactly what combinations are allowed. I never said that's not true.

Let's just assume there was an API that let's you query which filter modes are supported, and another API that let's you query which formats are supported. Then let's assume an implementation supports filtering mode A only with format B, and filtering mode C with format D, but not A with D or B with C.

When I query support for format B, filtering A and filtering C respectively, what should the result be? Yes/Yes/Yes? That would be a lie, because if I try to use format B with filtering C it would fail. No/No/No? That would be a lie, too, because format B with filtering A would have worked. If there are X possible combinations, you may have to test X combinations in the worst case, it's as simple as that.

Please, tell me a concrete suggestion how a simpler query scheme than format objects is possible. You just say "that's not good", but you fail to provide a better alternative.


If you don't understand the difference between n! and (n-1)!, then you can't possibly understand the problem.

I wonder if you understand what you're talking about. n! is not the number of combinations, it's the number of permutations. The number of possible combinations of n properties with m values each is m^n, and that's usually a lot less than n! (but still too much to test everything).

Please, when trying to be smart and telling other people how incompetent they are, at least check you are correct, otherwise it could get embarrassing.


Future proofing has nothing to do with having or not having a decent query system.

"Future proofing" has everything to do with it. Any system that does not easily extend to future hardware is bad, because OpenGL should continue to work with future hardware. The old system of querying some arbitrary limits was not future proof, and we all know the result.

And again, please enlighten us with your genious idea about a decent query system. If you don't have a better idea, you have no right to complain.

knackered
11-02-2007, 02:05 PM
ok, just this once....


glCreateProgramObject(...)
glCreateShaderObject(...)
glShaderSource(...)
glCompileShader(...)
glAttachObject(...)
glLinkProgram(...)
glUseProgramObject(...)
glGetProgramObjectBinary(void *ptr, uint32 *size)
FILE* blobFile=fopen("blob.blob", "w");
fwrite(ptr, 1, size, blobFile);
fclose(blobFile);

there you go, free of charge - and no tea needed.

Humus
11-02-2007, 03:29 PM
My point is that alpha test cannot emulate the functionality of discard, not the other way around.

Fine, I was talking on an algoritmic level. If your algoritm uses one, it can be adapted to the other, but it may require more work to work around the limitations of alpha test.


So that they could remove those shader opcodes and replace them with an alpha test.

Which would be counterproductive in the vast majority of the cases.


Alpha test is a per-sample operation.

Alpha test is a per-fragment operation.


Not without making reference to hardware that has dynamic branching and early termination support.

Which modern hardware has to various degrees. You can't ignore this any more than you can ignore early-z and hi-z optimizations, even though it's irrelevant from a spec point of view.


Some of us actually still care about R300/R400 class hardware. It's going to be around a while, and having support for it would be nice. Also, they're a lot more sensitive to performance due to number of opcodes.

Well, I'm reasonably confident (was a while since I checked this) that the R400 generation kills the texture lookups for dead quads. Performance-wise we're talking about at best saving one opcode. Bloating a new API with legacy functionality for that seem like a waste far beyond what's normally accepted for supporting legacy hardware.


I didn't say that you should leave it on all the time, but it certainly isn't as expensive as a shader opcode.

It's typically far more expensive than a shader opcode.

Humus
11-02-2007, 03:38 PM
I always got the impression that discard is by far the most evil per-fragment operation.

The most evil operation is depth output. It not only kills both hi-z and early-z, it also kills z-compression and thusly negatively affect performance of later rendering passes to the same screen area. Alpha test and discard are equally evil to the pipeline, but not as bad as depth output. They work fine with hi-z, but not with early-z unless you disable depth and stencil writes. Generally dynamic branching is preferable if you have decent coherency and you don't neccesarily have to kills writes to color, depth and stencil buffers.

Humus
11-02-2007, 03:49 PM
Will ATI's OGL3 driver support XP?

I don't have any up to date plans as I'm not working there anymore, but the GL driver used in Vista (which is a total from scratch rewrite) works in XP too for all hardware R300 and up. However, it's not been publicly exposed there yet. I don't know when it will be. I believe the latest Linux drivers are using that driver though. I believe the reason it's not exposed in XP is because the legacy driver is more compatible with current games and applications. However, R600 generation uses that driver for XP too, and a few select applications use the new driver as well for XP.

Anyway, I take for granted that there will be no GL3 work done to the legacy driver, so for GL3 support it'll be in the new driver. I don't know what the plans are, but it would surprise me if ATI didn't expose GL3 on XP.

Korval
11-02-2007, 04:19 PM
You can't ignore this any more than you can ignore early-z and hi-z optimizations

Yes, but all hardware targeted by GL 3.0 will have early-z and some form of coarser z-culling. Not all 3.0 targeted hardware has dynamic branching.


the R400 generation kills the texture lookups for dead quads.

Say what? Since when? If this is true, why isn't this information well known?


Bloating a new API with legacy functionality for that seem like a waste far beyond what's normally accepted for supporting legacy hardware.

It's hardly bloat. At the very least, it isn't significant bloat.


Generally dynamic branching is preferable if you have decent coherency

And are dealing with hardware that has dynamic branching support, rather than the R300/R400/NV30 which has to do both sides of the branch and pick one or the other at the end.

tranders
11-02-2007, 04:42 PM
I wonder if you understand what you're talking about. n! is not the number of combinations, it's the number of permutations. The number of possible combinations of n properties with m values each is m^n, and that's usually a lot less than n! (but still too much to test everything).
You are correct. For yes/no values where order is not important, the total number of possible combinations is 2^n. However, you can't argue that eliminating one feature cuts the number of possible failures in half.


And again, please enlighten us with your genious idea about a decent query system. If you don't have a better idea, you have no right to complain.
There nothing genius about telling a developer that a particular feature is not supported:

Format_t *ptFormat = 0;

if (unsigned long ulFeatures = GetSupportedFeatures())
{
if (!ptFormat && (GL_SUPPORTS_FEATURE_X & ulFeatures))
{
// Try to create an X capable format ...
}
if (!ptFormat && (GL_SUPPORTS_FEATURE_Y & ulFeatures))
{
// No luck so far, try for a a Y capable format ...
}
}

if (!ptFormat)
{
// Do this in software
}

This is a very broad brush query and does not eliminate the need for fallback logic because of incompatible or unavailable feature combinations -- it simply narrows the search for a valid format.

As for complaints that D3D or GL2 "lie" about a particular "feature" or that the query information is out of date, the feature bits are provided by the driver. Format creation success offers no additional security or benefit it's just that the developer is forced to discover lack of functionality. Even with a simple query for supported features, the application still has to create the format and handle all exceptions and failures - there is no argument from me on that.

I will do what I have to do but I don't have to like it. Harping over.

-- peace

Korval
11-02-2007, 06:02 PM
it simply narrows the search for a valid format.

That makes sense, but only when you're looking at a format in a vacuum. When you look at it as a tool for solving a problem, for fulfilling a need, it doesn't make as much sense.

If you have a need for a floating-point image, a fixed-point one will not work. That's why you decided you needed a floating-point one. If you need a 4-channel render target, an unrenderable format will not work. And so on.

Which means that, in the event of failing to create the preferred format, the number of usable alternatives is quite small. Indeed, it is something that you want to have explicit control over, not start blindly enumerating all possible formats. Most important of all, what makes the format usable is the intended use, which generally cannot be boiled down to an enumeration. So if you really need a 2048x2048x128fp format that supports being a render target, supports blending, and supports filtering, your method would be useless, because it would return a format that fills only some of those requirements.

Your pseudo-code is broken in precisely this way: it is ignorant of need.

If your code needs feature X, not getting it should mean a hard break. It should not mean, "Keep trying until you return something". It should stop, return an error, throw an exception, anything but return something valid.

Lindley
11-02-2007, 06:05 PM
^I don't see how that's any different in terms of complexity. In order to significantly reduce it that way, you'd have to assume that a given feature is "always available" or "always missing". That simply isn't the case when you've got things like blending available for float16s but not float32s.

A better option would be to use format objects, but allow them to be specified incrementally, so that the addition of one particular aspect can be detected as the point of failure. In fact, a means could even be devised to suggest other values for existing features which *would* allow the failed feature to be supported. This latter part would necessarily be limited, and perhaps need to be user-requested (since the programmer would have a good idea what they'd be willing to compromise on).

Granted, this does somewhat break the "object won't exist at all if it isn't complete" notion, but ways around that could be devised. Perhaps put the above functionality in a factory class that only outputs a finished format object when it has a complete set of valid features?

knackered
11-02-2007, 07:13 PM
bang goes any chance of getting any feedback from the original poster on any of these questions. As usual it's descended into an 11 page dick waving contest.

Korval
11-02-2007, 07:17 PM
As usual it's descended into an 11 page dick waving contest.

Right. Because it's not a discussion of the features you care about.

knackered
11-02-2007, 07:29 PM
It's not the subject matter I resent, it's the protagonists. I wanted to hear something from the horses mouth while for one split second we had his attention. Carry on, it's too late now.

ector
11-02-2007, 08:16 PM
Sorry for fanning the flames before, I just find it completely ridiculous that D3D10 is approaching 2 years, and there's not even a working alpha version of OpenGL 3.0. And seeing it not learning the lessons of D3D10 (cutting old junk like alpha test, specifying a reasonable minimum level of functionality) makes it even worse. I know that the goals are different, but why is Microsoft so much more efficient at defining standards? It doesn't make sense.

I hope I don't start another flame thread with this :p

Lindley
11-02-2007, 09:28 PM
Microsoft doesn't define standards. They just write code that everyone conforms to despite its non-standard-ness.

Simon Arbon
11-02-2007, 10:03 PM
What i most want to see in Pipeline 005 is a discussion on how best to set-up a multi-threaded application for OpenGL 3.

eg. Say that i want to generate a shadow map and do a z-only pass from the camera view, two completely separate operations except that they share a vertex buffer.

Is it most efficient to:
1. Have two separate threads with separate rendering contexts perform both operations simultaniously.
2. Have one thread do the shadow map, then switch rendering contexts and do the z-pass afterwards.
3. Have one thread and one rendering context and attach different objects to it when we switch to the z-pass.

ector
11-02-2007, 10:50 PM
Microsoft doesn't define standards. They just write code that everyone conforms to despite its non-standard-ness.

What's worth more? A solid de-facto standard with working implementations, or a perfect formal academic standard with no implementations?

Korval
11-03-2007, 12:30 AM
I just find it completely ridiculous that D3D10 is approaching 2 years, and there's not even a working alpha version of OpenGL 3.0.

Um, why does that surprise you in any way? The effort to create GL 3.0 didn't even start until almost 2 years ago.

Further, D3D 10 did not exist before Vista. Oh, it was some nice paper, but it wasn't a real thing until developers could actually use it. So the 2 year timeline itself is stretching things.


specifying a reasonable minimum level of functionality

Um, did you read the original post? It does specify a minimum level of functionality. Just not one that says, "Only G80 and R600 cards may apply." Because that would be incredibly stupid. For obvious reasons.

And Microsoft (and software developers) are encountering some of those obvious reasons right now.


why is Microsoft so much more efficient at defining standards?

Because Microsoft is one entity. It doesn't take long for one entity to agree to something. The more people you get making the decision, the more likely you are to make the right one (to a degree), but the longer it will take.


1. Have two separate threads with separate rendering contexts perform both operations simultaniously.

Unless you're on a multi-GPU system (and they specifically allow for it), I can guarantee that this is a bad idea. And by bad, I mean "probably non-functional". One GPU means one renderer, no matter how many contexts or CPU threads you create.

Jalf_
11-03-2007, 04:45 AM
I just find it completely ridiculous that D3D10 is approaching 2 years
As far as I can see, it's approaching 9 months or so. DX10 wasn't really available until what, january, february?



and there's not even a working alpha version of OpenGL 3.0
Well, GL3.0 isn't meant to compete with DX10 as such, is it?
That sounds more like Mt.Evans to me.
Shouldn't 3.0 be seen more as a major clean-up effort, than a "let's focus exclusively on G80 and later cards" like DX10 does.
DX10 isn't very successful with that either. It'll probably become so, in a few years, but now? How many DX10-exclusive games have you seen? How many are under development?


And seeing it not learning the lessons of D3D10 (cutting old junk like alpha test, specifying a reasonable minimum level of functionality) makes it even worse.
Isn't the lesson from DX10 that cutting *all* old junk is a terrible idea that means no one will rely on your API for the next 4 years?
You have to strike a balance. DX10 removes support for everything before G80, which makes it unviable to use now, tomorrow, or next year. On the other hand, they have DX9 as a fallback, so overall, DX is doing just fine. But 10 in particular isn't much good at the moment.

GL tries to strike a different balance, one that doesn't require you to use a parallel legacy API for the foreseeable future. The goal with GL3.0 is that you should be able to switch to it and use it exclusively.
Not switch to it for the 3% of the userbase who has the hardware for it, and fall back to GL2.1 for everyone else.


Microsoft doesn't define standards. They just write code that everyone conforms to despite its non-standard-ness.
How so? DirectX looks like a standard too. It's defined by Microsoft, and it specifies a standard behavior that GPU's conform to. Doesn't that make it a standard?

Looking past all the MS-bashing, DirectX does have its advantages. It's a shame that there are zealots on both sides who refuse to even consider what's going on with the "opposite" standard.
Wouldn't it be nice if OpenGL evolved as quickly as DX? (but still made the "right" decisions of course)
Wouldn't it be nice to have access to tools like PIX for OpenGL development? That in particular, would rock.

Demirug
11-03-2007, 05:30 AM
Just for completeness.

The first working tech preview for Direct3D 10 was part of the December 2005 SDK. It worked with the Vista December CTP. The following SDKs were updated to be in sync with the following Vista versions. The first RTM version was December 2006.

Before this there was a very long planning phase.

Lindley
11-03-2007, 07:12 AM
How so? DirectX looks like a standard too. It's defined by Microsoft, and it specifies a standard behavior that GPU's conform to. Doesn't that make it a standard?

That makes it an API. It's pretty much the definition of an API, actually.

A standard is, in a historical sense, a means of ensuring that competing systems are capable of working together. OpenGL is a standard because there are lots of compatible implementations; DirectX isn't because there's really only one per version. Or if it is a "standard", it's a pointless one for that reason.

If Microsoft wants to make their own APIs, that's fine. But they do so with little regard to how others' implementations will interact with theirs. Hence the difficulty with creating all-browsers web pages; MS "helpfully" provides new programming options not in the standard, and suddenly all the standards-compliant browsers can't display some pages. It's called embrace-and-extend, and it's a very bad thing as far as standards go.



What's worth more? A solid de-facto standard with working implementations, or a perfect formal academic standard with no implementations?

Leading question which assumes those are the only two options.

De facto standards are an inevitability; the QWERTY keyboard, for instance. But as long as there's reasonably high-profile competition to a given idea, you don't have a de facto standard. You have something that may try to pass itself off as such, but that's not the same thing.

Jalf_
11-03-2007, 07:58 AM
[quote=Jalf_]
A standard is, in a historical sense, a means of ensuring that competing systems are capable of working together. OpenGL is a standard because there are lots of compatible implementations; DirectX isn't because there's really only one per version. Or if it is a "standard", it's a pointless one for that reason.

I disagree with that definition.
True, there's only one implementation of DirectX (which is its main weakness). But there are multiple implementations of the driver backend, so perhaps the difference isn't *that* big after all?
I'd say that DirectX is a standard for Windows development. It's not an *open* standard, true, not everyone can make their own implementations of it, but it's still a standard for the more limited scope that is "3d stuff on Windows". It's an API as well, yes, but it's a de facto standard that's hard to ignore.


Hence the difficulty with creating all-browsers web pages; MS "helpfully" provides new programming options not in the standard, and suddenly all the standards-compliant browsers can't display some pages. It's called embrace-and-extend, and it's a very bad thing as far as standards go.
True, but I don't see the relevance here. (And in the browser case, it doesn't help that the standards themselves are so utterly crappy ;)) But they're not exactly doing this with OpenGL. They just try to ignore that instead.

Lindley
11-03-2007, 08:11 AM
Oh, that was just an example. No real relevance to this particular case.

tranders
11-03-2007, 09:21 AM
Your pseudo-code is broken in precisely this way: it is ignorant of need.

If your code needs feature X, not getting it should mean a hard break. It should not mean, "Keep trying until you return something". It should stop, return an error, throw an exception, anything but return something valid.
It is precisely knowledgable of need - the need to efficiently determine the most capable supported format without blindly testing a linear list of acceptable results. The algorithm clearly does NOT indicate a REQUIREMENT of feature X so the rest of your response is moot. I still have to create the format object and handle potential failures, but I would like to do it more intelligently.

Cgor_Cyrosly
11-03-2007, 10:07 AM
Whether the array texture and FBO functions will become the part of the core of GL3.0?

Lindley
11-03-2007, 10:11 AM
The algorithm clearly does NOT indicate a REQUIREMENT of feature X so the rest of your response is moot.

I think you misunderstood. It's the *problem* that indicates a requirement of Feature X. Hence, you'll be dealing a fairly small number of acceptable combinations in any event, so testing all of them isn't a major efficiency concern.

I do think there's a better way to do it, but yours isn't it.

Brolingstanz
11-03-2007, 10:28 AM
I think Mount Evans is going to require DX10 class hardware, so there will certainly be a level feature set when that wheels around the bend.

GL3 isn't going to address any of this... it's just a recasting of GL2, a GL2++, if you will, a GL2 on steroids :-)

Korval
11-03-2007, 11:04 AM
Whether the array texture and FBO functions will become the part of the core of GL3.0?

Array textures are part of Mt Evans. FBOs are part of 3.0.

santyhamer
11-03-2007, 06:16 PM
Btw, anybody know if Intel gonna be an important part of Evans? Is in Khronos trying to change things for Larrabee/G45? Just curious...

Cgor_Cyrosly
11-03-2007, 09:57 PM
Whether the array texture and FBO functions will become the part of the core of GL3.0?


Array textures are part of Mt Evans. FBOs are part of 3.0.


Thanks for korval!
Whether contains the layerer rendering function of the FBO scope of GL3.0 which as like as the extension fucntions "glFramebufferTextureLayerEXT/glFramebufferTextureFaceEXT",but it likes must be supported array texture before it can be used.
Others about the "text buffer object" of GLSL1.3,I think the function
is :when compiling the shader code,the assemebly or machine code of compiled is put in a charactor buffer before execute and will be loaded form the buffer when runtime(so the shader source can be compiled at any time and used and shared by other programs at same time)?
thanks

davej
11-04-2007, 04:58 AM
I disagree with that definition.
True, there's only one implementation of DirectX (which is its main weakness). But there are multiple implementations of the driver backend, so perhaps the difference isn't *that* big after all?
I'd say that DirectX is a standard for Windows development. It's not an *open* standard, true, not everyone can make their own implementations of it, but it's still a standard for the more limited scope that is "3d stuff on Windows". It's an API as well, yes, but it's a de facto standard that's hard to ignore.
I disagree with your definition of standard.

DOC, XLS and PPT files are de facto standards as there are independent implementations of them - that is not something that applies to DirectX [1]. The fact that there is a well defined interface for IHVs to write backends to does not make it a standard either - they are not usable without Microsoft's implementation of the rest of DirectX. It's an API that's hard to ignore but it's not a standard.

Microsoft may call it a standard but they call applications cross platform because they run on different versions of Windows. ;)

[1] The Wine project has made some efforts but they are about emulating Windows rather than providing DirectX as an API for Linux applications. They also aren't complete implementations.

Vexator
11-04-2007, 05:15 AM
so far fbo's can have multiple color atachments, but only one depth attachment. will gl3 support fbo's with multiple depth attachments?

Lindley
11-04-2007, 06:07 AM
What use do you have in mind for multiple depth attachments?

MZ
11-04-2007, 07:46 AM
Two sided depth test, for example. This would allow doing depth peeling without totally disabling all early-Z optimizations (as it happens currently, when we emulate the missing second depth test by shadow map + discard). Obviously, this would require new HW.

Lindley
11-04-2007, 07:57 AM
Well, I'd be in favor of anything that gave the user more control over early-z, so sure.

Cgor_Cyrosly
11-04-2007, 08:34 AM
Will be supported create shaders form assemebly code in GL3.0?
Very hope for can be used direct in GLSL1.3 like this:
asm{...}
or asm_vertex{...}
asm_geometry{...}
asm_fragment{...}
or asm.geometry("shader code")
thanks!!!
If like this mode: "FragmentShader[id]={"code"},
it will easier for create and management multi-shaders and traversing multi-pass through the id in one file

Xmas
11-04-2007, 09:54 AM
so far fbo's can have multiple color atachments, but only one depth attachment. will gl3 support fbo's with multiple depth attachments?
When (and if) hardware comes around that supports this there's still time to create an extension.


Will be supported create shaders form assemebly code in GL3.0?
No, GLSL reserves the "asm" keyword "for future use" but it won't define an assembly language.

Cgor_Cyrosly
11-04-2007, 11:25 AM
Will be supported create shaders form assemebly code in GL3.0?

No, GLSL reserves the "asm" keyword "for future use" but it won't define an assembly language.
Thanks for Xmas,but supporting for "GL_NV_gpu_program4"?If ture,how to use assemebly shaders in GL3 programs? glGenProgramsARB,... can be used still?

Zengar
11-04-2007, 12:30 PM
I am sure that Nvidia will still be exposing their asm extensions in GL3.0, unless they change their approach to shader compilation (currently they compile GLSL to assembly first). Still, there is no good reason to use assembly in the first place (unless you need the extra G8x features).

Korval
11-04-2007, 12:30 PM
but supporting for "GL_NV_gpu_program4"?

As you may have noticed, this is an nVidia extension. All it does is expose the functionality of EXT_gpu_shader4 to nVidia's extensions for assembly langauge for programs. This is not a standard extension, and will not become one in the near future.

Jan
11-04-2007, 12:57 PM
Let's hope so! I fear that shortly after GL3's finalization nVidia starts littering the extension-registry again!

Of course, nVidia has defined many good extensions, but it also introduced some useless ones and sometimes they could just have worked a bit more with the ARB to release it directly as an EXT or ARB extension.

Jan.

pudman
11-04-2007, 02:33 PM
I fear that shortly after GL3's finalization nVidia starts littering the extension-registry again!

This is a case where I can see there always been an extension that exposes vendor-GPU specific behavior, such as 'asm'. (Vendor specific) Assembly code will never work across all GPUs and so can never be standard. That's why there's an NV extension and I would expect in the future for that to continue.


they could just have worked a bit more with the ARB to release it directly as an EXT or ARB extension.

I don't believe the ARB works at the same pace as hardware features are implemented and so preliminary versions of the extensions will typically be vendor branded.

tranders
11-04-2007, 04:24 PM
I think you misunderstood. It's the *problem* that indicates a requirement of Feature X. Hence, you'll be dealing a fairly small number of acceptable combinations in any event, so testing all of them isn't a major efficiency concern.

I do think there's a better way to do it, but yours isn't it.
I wrote the problem (well, technically ector posed the problem) and the *problem* is exactly NOT (and never has been) a requirement for any specific feature. The *problem* is being forced to search through all acceptable formats in order of preference until a supported format is found. For me, that may mean that every format (including no format) is "acceptable" but not all formats are "supported". It's a sure bet that most vendors will support only a subset of valid formats and not all vendors will support the same subset. You cannot discount my requirements simply because they are not the same as yours.

Any mechanism that avoids a known failure condition is an improvement. If you have a better way, then please show us some real-world code

Humus
11-04-2007, 05:01 PM
Yes, but all hardware targeted by GL 3.0 will have early-z and some form of coarser z-culling. Not all 3.0 targeted hardware has dynamic branching.

But all 3.0 targetted hardware supports discard, hence alpha test is not neccesary. OpenGL specifications have traditionally been very forwardlooking. OpenGL 1.0 was written when hardware barely existed. I think it's perfectly acceptable to have a minor performance decrease for a bunch of legacy hardware to allow a clean API that does things the performant way for current and future hardware. Just like we don't need loads of ways to submit vertex data to the API we don't need several ways to kill fragments either.

Zengar
11-04-2007, 06:24 PM
tranders, I understand your point of view, but it is important to see that the search space (number of possibilities) is greately limited. How many different code paths will you have implemented? Three? Five? A query API is not usefull in such scenario, because it would just douplicate the 'format approach', complicating the drivers and the applications.

For example, when dealing with screen resolution a 'format approach' is unfavourable. You can never know what resolution will be possible at clients hardware - but your programm can (almost) always adapt to it - this is a case for a query API. But with 3d graphics, you cannot account for things that are not coded in your application - this makes the query API obsolete. Basically, at the software creation time you have to consider for what is possible and design rendering paths accordingly... but you will never come with large number of possibilities. The basic assumption here is: you can't design for an unknown implementation - or, you have to use the features of the present hardware. This makes the query API obsolete, because - if the question is put that way - you only care if hardware can do it, not what it can do.

Of course, there are some exceptions... One I can think about is the processing power of the future hardware. If you use some sort of iteration based approximation, future hardware may be able to provide better quality then current hardware just by increasing the iteration number. This is where queries can be usefull. Still, such sort of queries have nothing to do with formats.

Roderic (Ingenu)
11-05-2007, 01:04 AM
But all 3.0 targetted hardware supports discard, hence alpha test is not neccesary. OpenGL specifications have traditionally been very forwardlooking. OpenGL 1.0 was written when hardware barely existed. I think it's perfectly acceptable to have a minor performance decrease for a bunch of legacy hardware to allow a clean API that does things the performant way for current and future hardware. Just like we don't need loads of ways to submit vertex data to the API we don't need several ways to kill fragments either.

That's Mount Evans, Long Peaks is about targeting NV4x+, R3xx+.
I would think that if they left alpha test in, it's because they know of some hardware that works that way. (instead of using discard)

Timothy Farrar
11-05-2007, 07:22 AM
I am sure that Nvidia will still be exposing their asm extensions in GL3.0, unless they change their approach to shader compilation (currently they compile GLSL to assembly first). Still, there is no good reason to use assembly in the first place (unless you need the extra G8x features).

Actually regardless of using G8x features assembly is very useful, for optimization. Sometimes the compilers generate code which is not optimal for the GPU or do not expose base functionality (such as the ARB pack/unpack opcodes).

pudman
11-05-2007, 09:14 AM
or do not expose base functionality (such as the ARB pack/unpack opcodes).

I believe that matches the caveat by Zengar:


(unless you need the extra G8x features)

As for the compilers not generating optimal code, I've seen plenty of complaints on this forum about the quality of the compilers. Exposing the code as asm could allow hand optimization but in this case it doesn't get to the root of the problem: non-optimal compilers.

Jan
11-05-2007, 09:25 AM
Asm languages might in fact be of some use, though i would be much happier, if in the future GLSL will be capable of every feature. Ie. if nvidia and ATI (also) add their features to GLSL, instead of (only) exposing a complete different language.

However, many other extensions have been added to OpenGL that became obsolete pretty quickly. I hope this won't be the case with OpenGL 3.0 too quickly. Even if the API might be future-proof, this counts for nothing, if vendors impatiently add their features. I have nothing against experimental extensions, that are clearly not for use in commercial apps and thus can be removed later. I simply don't like to have an extension, that is for a few months the de facto standard and then is hard to get rid off, since people already use it frequently.

The NV/EXT/ARB_texture_rectangle is a good example for a mess, that could have been prevented, if the vendors and the ARB had worked a bit more in conjunction.

Jan.

Xmas
11-05-2007, 02:00 PM
As for the compilers not generating optimal code, I've seen plenty of complaints on this forum about the quality of the compilers. Exposing the code as asm could allow hand optimization but in this case it doesn't get to the root of the problem: non-optimal compilers.
Hand optimizing only works well if the assembly language is close to the hardware. An ARB-defined language can't fit that bill because the architectures are too different.

The reality is that assembly shaders go through pretty much the same optimization steps in the compiler as high level shaders.

Timothy Farrar
11-05-2007, 02:18 PM
As for the compilers not generating optimal code, I've seen plenty of complaints on this forum about the quality of the compilers. Exposing the code as asm could allow hand optimization but in this case it doesn't get to the root of the problem: non-optimal compilers.

Actually the root of the problem is that compilers will never (in my life time) always generate optimal code (not on GPUs or CPUs). You are mapping a general C like language to hardware which exposes some very un-C like functionality. And for some of us, the performance advantage of hand optimization is very important. For example, just an hour ago I hand optimized an originally 32 instruction fragment shader (compiled GLSL code) into a 14 instruction fragment shader (hand assembly).

Timothy Farrar
11-05-2007, 03:17 PM
Let's hope so! I fear that shortly after GL3's finalization nVidia starts littering the extension-registry again!

Of course, nVidia has defined many good extensions, but it also introduced some useless ones and sometimes they could just have worked a bit more with the ARB to release it directly as an EXT or ARB extension.

Jan.

Actually, NVidia's "extension littering" is crucial, without this GL would be way behind from the driver support provided by DirectX. At least this way some of us GL developers can hope to compete with DirectX at least with NVidia's GPUs!

IMHO, the fundamental problem facing GL is that the GPUs are changing way to fast for proper driver support to be written by other vendors (ATI and Apple). The hope of being future proof is a pipe-dream. Think about what is going to change when AMD, Intel and hopefully NVidia release hybrid CPU/GPU architectures.

What GL really needs is quick approval for common vendor functionality exposed through the API, and drivers to back up this support. Without this GL developers which have real-world market constraints (ie have to compete with DirectX based programs currently making use of the full caps of the hardware) are going to be at a severe dis-advantage. Not to mention, up-to date GL drivers would do tremendous things for GL in general: more developers, more interest, and more products shipping with GL support, which will ultimately allow vendors to allocate enough resources to keep the GL drivers competitive with DirectX!

I think GL3 will be great if it improves batch performance and simplifies things such that vendors can easily keep drivers up-to date (ie removing old fixed functionality, depreciated features, etc).

I am most concerned with is the possibility that GL3 is going to put vendors farther behind in supporting the hardware. What I would rather see is GL2.1/SM4.0 support working now through current extensions! Take ATI/AMD for example. From all that I can gather their GL drivers for the new R600 chip don't even expose hardware texture fetch to complete a basic SM3.0 pipeline, let alone the SM4.0 pipeline. Please correct me if this is wrong (as I don't have the card to test). Just guessing here, that with them "open sourcing" the Linux/GL driver responsibility, there is going to be a tremendous lag between when GL3 gets finished and the drivers get some ability to use simple SM4.0 features (and that SM4.0 support isn't going to happen in GL2). And as for Apple, they made a huge step with Leopard, but still only have very basic SM4.0 support, same story there.

Well looks like 2008 is going to be an interesting year!

Korval
11-05-2007, 03:29 PM
What GL really needs is quick approval for common vendor functionality exposed through the API, and drivers to back up this support.

No, what OpenGL 3.0 needs (after a spec) is just better drivers. That's going to drive adoption far better than anything else.


improves batch performance

Batch performance is already equal to if not faster than D3D.


From all that I can gather their GL drivers for the new R600 chip don't even expose hardware texture fetch to complete a basic SM3.0 pipeline, let alone the SM4.0 pipeline.

I'm certainly not aware of it. As far as I know, vertex texturing on ATi hardware has been available since they started making vertex texturing work. Which was the X1xxx line, I believe.

pudman
11-05-2007, 03:53 PM
For example, just an hour ago I hand optimized an originally 32 instruction fragment shader (compiled GLSL code) into a 14 instruction fragment shader (hand assembly).

I've followed your blog so I know how capable you are in the asm arena (well, it is just a blog so I have to take your word (and screenshots) at face value).

In the case you just optimized, in your opinion, would it have been something the compiler could have optimized for? I'd be interested in knowing the details.

Cgor_Cyrosly
11-05-2007, 05:17 PM
However, many other extensions have been added to OpenGL that became obsolete pretty quickly. I hope this won't be the case with OpenGL 3.0 too quickly. Even if the API might be future-proof, this counts for nothing, if vendors impatiently add their features. I have nothing against experimental extensions, that are clearly not for use in commercial apps and thus can be removed later. I simply don't like to have an extension, that is for a few months the de facto standard and then is hard to get rid off, since people already use it frequently.

It seems that best to development of a standard of assemebly language of GPU shader programming for GLSL or all GPU language (except the virtual assemebly "PTX" of CUDA).Somtimes ,we hope as good as optimized our shaders and understanding of the underlying.The programmers sometimes need to return to source but at that time cannot too rely on the compiler.
The biggest benefit of assemebly lanuage is that we know what we are doing and it will get faster for our programs.
Even if for some unknown reason or merely is the hobby of personal or the intuition art.

Timothy Farrar
11-05-2007, 08:59 PM
In the case you just optimized, in your opinion, would it have been something the compiler could have optimized for? I'd be interested in knowing the details.

Yes and no.

First off I'm not talking about trivial cases here. For complex stuff, until you see the assembly, you don't exactly know what is and isn't efficient in terms of the GLSL language (other than benchmarking many different versions of an algorithm). What seems like simple and elegant GLSL can compile to a nightmare of GPU assembly. You can easily gain an advantage by knowing how many and what assembly opcodes get produced by the compiler. A stupid example of this is assuming something like atan() would have the same or similar performance as sin() and cos(). Had atan() been fast, like sin() (4 op cycles in G8x), then perhaps sin(atan(x)) would make sence instead of using x/sqrt(1+x^2). Another example would be to see what the compiler generates for the sign() function (lots of code including conversion to integer on the G8x). So seemingly simple GLSL functions can have hidden performance impact.

Another issue (at least on the G8x) is register usage. At some point using too many registers limits the ability of the hardware to thread the shader. Using feedback from the assembly output can be a good way to optimize even GLSL code and rework the code to insure register usage doesn't limit threading. Or just simply write in assembly with the knowledge of exactly the register limit (in some cases changing an algorithm to fit the machine).

Another limitation of the compiler (for the G8x) right now is simply making use of predicated (conditional) execution. In my experience it seems as if the compiler tends to do a lot of "set on" opcodes and "if" branching when much easier setting conditionals as the result of a previous alu op, then conditional alu destination mask would work (with no branching). Generating code which mixes conditional code update, clamping and saturation is not quite done well by the compiler (does not match up with C like languages well). Branch optimization is critical for GPU performance.

In my case the optimizations were primary a result of changing the algorithm to better match the hardware (as a result of seeing elegant GLSL code compile in to a mess of assembly), branch removal/recoding, and smart predicated execution.

Of course after doing the assembly optimization I could probably go back and write a much better GLSL version of the algorithm! So my 50% is only the result of the first optimization (which required knowledge of the output assembly to get to quickly).

We all know that in many cases the compiler is going to write code which performs just about the same as hand written assembly (and the compilers are only going to get better!), and in some cases hand assembly is going to beat the compiler.

However, having an exposed assembly layer (such as gcc -S for CPU programming, and a GLSL or Cg to GPU assembly conversion tool) is essential in quickly working out the best path to the hardware, even if you don't program in assembly. The assembly extension specs alone give tremendous insight into the hardware which with only high-level languages would be a black box.

It is the mix of having the high level GLSL for quick prototyping as well as production code for non-speed critical stuff, and assembly to fine tune the critical code which provides the optimal solution.

Korval
11-05-2007, 10:26 PM
GPU assembly

There's one problem: there's no such thing.

Cgor_Cyrosly
11-05-2007, 11:09 PM
GPU assembly

There's one problem: there's no such thing.

Ha-ha,yes.The assemebly shader language is only part of GPU functions and the assemebly language PTX of CUDA are not true assemebly ,but hope it can be developmented and truly should have the GPU instruction collection

bobvodka
11-05-2007, 11:32 PM
I'm certainly not aware of it. As far as I know, vertex texturing on ATi hardware has been available since they started making vertex texturing work. Which was the X1xxx line, I believe.

nope, the R600 chips are the first to allow VTF; the previous chips where SM3.0 compatible, however the spec. never said that you had to provide any formats as usable for VTF. So, the hardware says 'yes I can do it' but doesn't supply any valid formats because, physically, it can't; ATI made the call that render-to-vertex buffer was a better idea instead of a "slow" and format limited VTF system.

The rights and wrongs of this have been debated back and forward between ATI and NV lovers for some time now so please no repeat here :)

bobvodka
11-05-2007, 11:36 PM
Just guessing here, that with them "open sourcing" the Linux/GL driver responsibility, there is going to be a tremendous lag between when GL3 gets finished and the drivers get some ability to use simple SM4.0 features (and that SM4.0 support isn't going to happen in GL2).

Not the case; AMD are still going to work on Linux drivers into the future, however they are providing the technical details for 3rd parties to develop their own drivers so that a pure open source driver can be developed (not one with a binary element to it as it is now). However, atm I believe there are only the specs for bootstrapping and minor 2D drawing (no hardware acceleration iirc) for 2 chips out there; they plan to release the rest over time not all in one go, so I suspect it'll be a few years before we see hardware 3D in the open source drivers...

Timothy Farrar
11-05-2007, 11:56 PM
GPU assembly

There's one problem: there's no such thing.

ARB_fragment_program, ARB_vertex_program, and now NV_gpu_program4 for the G8x?

Sure it doesn't completely match the hardware, but is very close (and used as an intermediate step when compiling GLSL on the G8x NV drivers). Of course It is hard to tell how close with the latest generation of cards, beyond the obvious DIV being a RCP and MUL, etc.

Someone on the public NVidia dev board just confirmed my hunch that the pack/unpack opcodes are now macros (instead of direct opcodes) on the G8x which end up as a few integer instructions.

I'm going to guess that conditional write mask, conditional code update, negate, saturation, and clamping are all still per opcode modifiers (at least on the G8x) so they still match the NV_gpu_program4 spec and thus usage doesn't add opcode cycle costs. Of course I could be very wrong here, haven't profiled these directly yet. Anyone know about this?

Matthias
11-06-2007, 12:44 AM
ok, just this once....


glCreateProgramObject(...)
glCreateShaderObject(...)
glShaderSource(...)
glCompileShader(...)
glAttachObject(...)
glLinkProgram(...)
glUseProgramObject(...)
glGetProgramObjectBinary(void *ptr, uint32 *size)
FILE* blobFile=fopen("blob.blob", "w");
fwrite(ptr, 1, size, blobFile);
fclose(blobFile);

there you go, free of charge - and no tea needed.

I really vote for such a solution too. It is really a major PITA for us that loading huge files from the network is actually faster than compiling all those shaders again and again.

Query capabilities for hardware information would be next on my list, I 'd like to know things about total memory, used memory, available memory ...

Just my 2 cents

Matthias

Eric Lengyel
11-06-2007, 01:19 AM
The assemebly shader language is only part of GPU functions and the assemebly language PTX of CUDA are not true assemebly ,but hope it can be developmented and truly should have the GPU instruction collection

On Nvidia hardware, the ARB vertex program and fragment program extensions (plus later NV extensions) are extremely close to the native assembly used by the GF6 and GF7. The PTX assembly in Cuda is extremely close to the native assembly used by the GF8.

pudman
11-06-2007, 09:49 AM
GPU assembly

There's one problem: there's no such thing.

I think this goes back to where this subthread started: Lots of vendor specific extension.

Sure there's no general "GPU assembly". The hardware changes too fast for there to be a standard. That's why those extensions exist. To have a reasonable version of the assembly for (a) given chipset(s).

I'm sure you understood what we meant. "no such thing" is, in the general sense, incorrect.

Timothy Farrar
11-06-2007, 11:02 AM
GPU assembly

There's one problem: there's no such thing.

I'm wrong, looks like Korval's comment is correct on the newer GPUs!

Just looked through the PTX ISA doc, and there are some important differences between the NV_gpu_program assembly and PTX. While predicated execution and saturation clamping (0-1 only) are per opcode modifiers, the others (negate, absolute value, and predicate update) are not. Looks like the only way to update a predicate is through a "set if" instruction. So I would guess that the G8x GLSL's compilers fast path for predicated execution is simply to output branches in the assembly source (which would explain what I was seeing before), which later get compiled away as it does another transform to G8x machine code ... will have to do more testing to know.


In the case you just optimized, in your opinion, would it have been something the compiler could have optimized for? I'd be interested in knowing the details.

As for the compiler issue, and after seeing the PTX guide, I now believe that if a programmer knew what code patterns compiled into optimal predicated execution, that you could perhaps near fully optimize using the higher level language (GLSL). Also looks like Cg2.0 is going to provide the other high level tools needed for this (like floatToRawIntBits and intBitsToFloat, etc), and of course you can access Cg built-ins from inside GLSL code with the G8x drivers...

Still I wouldn't have been able to quickly optimize my code without seeing the "virtual" assembly output as an intermediate GLSL compiling stage. Given that vendors will probably provide separate tools for this, it seems as if having a modern update to the ARB shader program (assembly) in GL3 is not important or a good idea (hardware instruction sets are changing too fast and are too divergent).

Thanks everyone in helping me see some of the errors in my reasoning!

GL3 path seems right on!

Cgor_Cyrosly
11-06-2007, 08:32 PM
it seems as if having a modern update to the ARB shader program (assembly) in GL3 is not important or a good idea (hardware instruction sets are changing too fast and are too divergent).
the CPU instruction sets changing fast also but there are many peoples used assemebly still

Lindley
11-06-2007, 10:04 PM
Here's a question. If OGL3 is going to be so object-oriented, how will it fit with C, which doesn't *really* support objects all that well?

Zengar
11-07-2007, 12:08 AM
Lindley, I can't imagine you're serious. What does you mean *doesn't support objects*? Don't mix up objects as language-implemented data type and object-oriented-programming as concepts. The object (as most commonly understood) is just a record that contains the method pointers. Such structure is implementable in any language that supports function types.

P.S. Still, in terms of GL3 "objects-orientation" means different approach to one in current GL (state machine). Actually, you can call shaders and programms of GL2 "objects".

Jan
11-07-2007, 02:48 AM
OpenGL 2 has always been object oriented. For example textures are objects. Display lists are objects etc. Only the more recent things like framebuffers and shaders were implemented in a way that seemed to please the ARB more than the previous approach.

Only the PIPELINE has been configured with a state-machine. OpenGL 3 will do this a bit differently, but i think that there will still be some kind of small state-machine left. Otherwise it would be too much state to pass to each draw-call.

Jan.

Lindley
11-07-2007, 07:13 AM
I was asking in terms of what the syntax would look like. I'd imagine ARB wouldn't want it to vary too much between languages.

Zengar
11-07-2007, 07:27 AM
You can find examples of syntax in the pipeline newsletters (like here: http://www.opengl.org/pipeline/article/vol003_4/ ). It won't be much different from current one.

Korval
11-07-2007, 11:24 AM
OpenGL 2 has always been object oriented.

Aside:

Object orientation does not mean, "It uses objects." Object orientation is about the coupling of data and the code that directly affects that data, the binding of information and interface. It also refers to the notion of encapsulation, data hiding, and most importantly, polymorphism. If you're not doing all of these, it isn't object oriented.

OpenGL, either 2.x or 3.0, doesn't use enough of these to qualify.


It won't be much different from current one.

Actually, it's quite a bit different.

GL 2.x and all prior versions shoehorned the concept of objects onto its state-based design through the concept of binding. This provided a notion of a global "current object" that all state-based commands pertaining to that object's state could modify. Thus, the basic API would be no different when using objects than when not using objects.

GL 3.0 is an object-based (note: this is separate from object-oriented) system. Binding exists for the sole purpose of rendering; to bind an object is to express the intent to use it for rendering in the immediate future. Functions that change the state of an object take that object as a parameter. That way, an implementation does not have to guess as to why you bound that object to the context.

This seems like a subtle difference, but ultimately, it's pretty substantial. It's the single most important change, the one that necessitated many other changes to the API to the point where GL 3.0 because an entirely new API.

Humus
11-07-2007, 02:30 PM
ARB_fragment_program, ARB_vertex_program, and now NV_gpu_program4 for the G8x?

These extensions are very generic and at least on ATI cards they pass through the same optimizer as the GLSL ones, and DX shaders too for that matter. I'd say they aren't very close to hardware at all. The danger with high level languages is that it hides too much, so even an experienced developer could write bad code, because it's a black box. I don't think an assembly language is very useful, but insight into what happens under the hood is indeed very useful. For anyone with an ATI card I really recommend the GPU Shader Analyzer (http://ati.amd.com/developer/gpusa/index.html), it's the performance tool I've used the most by far. There you get to view the GPU assembly code (as in the real hardware instructions and not ARB_fp or something like that), and it supports GLSL and ARB_fp/ARB_vp shaders. Definitely recommended!

Humus
11-07-2007, 02:56 PM
To further elaborate, one important difference between ARB_fp and what the hardware really does is that ARB_fp doesn't take the mini-alu into account that all R300-R580 hardware is equiped with, nor does it take the general pipeline configuration into account, like the 3+1 setup of all R300-R580 that allows it to schedule scalar instructions in parallel with vectors, or the fact that some common constants can be applied to arguments. To take a random example, this shader may seem like it should generate three instructions or so:


varying vec3 var;

void main(){
gl_FragColor = vec4(vec3(dot(var - 0.5, var)), var.y * 0.25);
}


In fact, as you can see with GPU ShaderAnalyzer, it evaluates to a single ALU instruction slot:


0 alu 00 pre: srcp.rgb = 1.0-2.0*r00.rgb
0 alu 00 rgb: out0.rgb = dp3(r00.rgb, neg(srcp.rgb))/2 sem_wait
alpha: out0.a = mad(r00.g, 1.0, 0.0)/4 last


The first line uses the mini-alu to compute 1.0 - 2.0 * var (which is -2 * (var - 0.5)). This is negated and divided by two together with the dot-product to get the right result. In parallel on the scalar unit var.y * 0.25 is computed.

Cgor_Cyrosly
11-07-2007, 09:52 PM
How long the GL3.0 will released?After a week,two week or a month?Can disclose more details on the GL3.0 API and GLSL1.3 that has already determined function.example for the realizing of program parameter buffer or GS of functions in GL API and GLSL.


GLbuffer image = glCreateImage(template);
glImageData2D(image,0,0,0,256,256,GL_RGBA,GL_UNSIG NED_BYTE,data);

That means the glGenTextures, glBindTexture,glTexImage2D and and all texture functions will be changed of current version of GL? Why not called glCreateTexBuffer but glCreateImage?

Roderic (Ingenu)
11-08-2007, 01:39 AM
I've asked for a draft to have a clue of how things will work and write preliminary implementation/design in existing code...
But nothing went.

dor00
11-08-2007, 03:21 AM
Any idea where can I get even alpha/beta builds? I really want to use OpenGL3 from now...

Zengar
11-08-2007, 03:39 AM
You haven't read the thread, did you? There is not even a spec yet, what "alpha builds" are you talking about?

dor00
11-08-2007, 04:00 AM
You haven't read the thread, did you? There is not even a spec yet, what "alpha builds" are you talking about?

Oh, sorry..

For the webmaster, can we have some updates on that topic please?

Korval
11-08-2007, 11:33 AM
That means the glGenTextures, glBindTexture,glTexImage2D and and all texture functions will be changed of current version of GL?

You clearly need to actually read the last 4 newsletters. Coupled with the info from SIGGRAPH, this should bring you up to speed with the publicly available information, which answers 80% of your questions.


I've asked for a draft to have a clue of how things will work and write preliminary implementation/design in existing code...
But nothing went.

Did you really expect them to give you a draft of their GL 3.0 specification?

Rick Yorgason
11-08-2007, 08:12 PM
5) This is possibly what I'm most interested in: Will the new .spec files be available, to ease creation of bindings for other programming languages? Any chance for the wgl/glx specs?
I've also been interested in that. I prefer to work in C++, and language bindings are always depressingly incomplete. It would be even better if we had some sort of Khronos-sponsored binding, so that we didn't have to worry about its long-term suitability (or rather its lack thereof).

However, I've never heard of these .spec files you're talking about. Are they a solution for easing language bindings? Can you provide a link or something?

Korval
11-08-2007, 09:56 PM
I prefer to work in C++, and language bindings are always depressingly incomplete.

Huh?

Last I checked, both GLee and GLEW worked just fine and were quite complete. Or are you simply talking about using the bare .lib that only loads GL 1.1?


However, I've never heard of these .spec files you're talking about.

They're just the specification in a format that's reasonably parseable. That way, you could automate the generation of header files, and potentially loading code.

Rick Yorgason
11-08-2007, 11:56 PM
Huh?

Last I checked, both GLee and GLEW worked just fine and were quite complete. Or are you simply talking about using the bare .lib that only loads GL 1.1?
Sorry, I guess I wasn't clear; I'm not talking about extensions at all. I'm talking about using C++ wrappers for OpenGL's C API. In one of the newsletters, they mention that they're going with a C API, even though it can be kind of verbose, for ease of implementation, and that they expect language bindings to pop up pretty quickly. However, unless the ARB plans for these bindings, I expect they'll be sub-par.

They're just the specification in a format that's reasonably parseable. That way, you could automate the generation of header files, and potentially loading code.
I was sort of hoping for an example of this being done successfully. It seems like there would be some pitfalls in making such language bindings automatically, but if it could be done, that would be great.

However, we've already figured out that we're talking about different things. I could easily imagine how it could be successfully done for extensions. APIs are a bit more difficult.

Korval
11-09-2007, 12:53 AM
However, unless the ARB plans for these bindings, I expect they'll be sub-par.

... so?

I mean, worst case is that you use the C API directly, right? It's not like you're losing any functionality here.

Right now, the ARB is 1.5 months late on the spec, with no spec date looming in the immediate future. Wanting the ARB to write up a C++ wrapper over the C API at this point is a little too forward thinking.

Roderic (Ingenu)
11-09-2007, 02:45 AM
I've asked for a draft to have a clue of how things will work and write preliminary implementation/design in existing code...
But nothing went.

Did you really expect them to give you a draft of their GL 3.0 specification?

Yes it was very optimistic of me to ask that the community get a -draft- of *Open*GL 3.0 since we didn't get the promised specifications.
At least we could plan/design with a fairly good idea of how things will work, because at this stage, I somewhat doubt there will be drastic changes.
(And anyway it would be a draft, so we would know it's subject to change.)
Also we (software developers) could discuss the draft on the forums to give Khronos some feedback about it...

Cgor_Cyrosly
11-09-2007, 04:06 AM
with no spec date looming in the immediate future.

The future version of OpenGL will be no specs after 3.0 or the GL3.0 will become the latest version of the Graphics API with standard specs?

Stephen A
11-09-2007, 06:41 AM
I was sort of hoping for an example of this being done successfully. It seems like there would be some pitfalls in making such language bindings automatically, but if it could be done, that would be great.

However, we've already figured out that we're talking about different things. I could easily imagine how it could be successfully done for extensions. APIs are a bit more difficult.
The main difficulty is the age of the .spec files. There is a lot of stale cruft in there, plus a few genuine mistakes, but all in all they are pretty parseable, if you stick to a straight port of the C API. For more advanced functionality (actual enumerations, function overloads) things become more interesting, but the result is much more pleasing. A code snippet from the C# bindings included in OpenTK (http://opentk.sourceforge.net):


GL.Clear(ClearBufferMask.ColorBufferBit |
ClearBufferMask.DepthBufferBit);

GL.MatrixMode(MatrixMode.Modelview);
GL.LoadIdentity();
Glu.LookAt(0.0, 5.0, 5.0,
0.0, 0.0, 0.0,
0.0, 1.0, 0.0);

GL.BindBuffer(Version15.ArrayBuffer, vertex_buffer_object);
GL.VertexPointer(3, VertexPointerType.Float, 0, IntPtr.Zero);
GL.BindBuffer(Version15.ElementArrayBuffer, element_buffer_object);

GL.DrawElements(BeginMode.Triangles, shape.Indices.Length,
All.UnsignedInt, IntPtr.Zero);

(yes, these are automatically generated, source code available (https://sourceforge.net/projects/opentk/))

Bindings like these are possible for almost all high-level languages, C++ included, but we do need the .spec files to generate them. This is why I was asking if the specs will be publicly available in their 'raw' form.

Korval
11-09-2007, 11:57 AM
The future version of OpenGL will be no specs after 3.0 or the GL3.0 will become the latest version of the Graphics API with standard specs?

I think you misunderstood what I was saying.

I was saying that the most recent info from the ARB did not give us even a semi-firm date for when to expect the GL 3.0 specification. We can guess by the end of the year, but that's only a guess.

There will be post 3.0 specifications as well; the ARB still has some post 3.0 plans.

Rick Yorgason
11-09-2007, 06:11 PM
GL.Clear(ClearBufferMask.ColorBufferBit |
ClearBufferMask.DepthBufferBit);

GL.MatrixMode(MatrixMode.Modelview);
GL.LoadIdentity();
Glu.LookAt(0.0, 5.0, 5.0,
0.0, 0.0, 0.0,
0.0, 1.0, 0.0);

GL.BindBuffer(Version15.ArrayBuffer, vertex_buffer_object);
GL.VertexPointer(3, VertexPointerType.Float, 0, IntPtr.Zero);
GL.BindBuffer(Version15.ElementArrayBuffer, element_buffer_object);

GL.DrawElements(BeginMode.Triangles, shape.Indices.Length,
All.UnsignedInt, IntPtr.Zero);

Ah, that's still very procedural. Given the object-oriented nature of GL3, I can imagine a much nicer interface for OOP languages. Thanks for the link, though, after reading through one of those .spec files, it's easier to imagine how such a thing could be done (although it would probably require some modification to the .spec format).

Right now, the ARB is 1.5 months late on the spec, with no spec date looming in the immediate future. Wanting the ARB to write up a C++ wrapper over the C API at this point is a little too forward thinking.
Don't get me wrong, I'd rather have the C API now rather than waiting for them to plan support for language bindings. But if, perchance, they have given thought to standardized language bindings, that would be a great boon.

Cgor_Cyrosly
11-09-2007, 09:59 PM
=Rick Yorgasonthey have given thought to standardized language bindings, that would be a great boon.
Oh,yes!That`s very idea development a openGL language for GL3.0 and only used by GL.

Stephen A
11-10-2007, 01:04 AM
GL.Clear(ClearBufferMask.ColorBufferBit |
ClearBufferMask.DepthBufferBit);

GL.MatrixMode(MatrixMode.Modelview);
GL.LoadIdentity();
Glu.LookAt(0.0, 5.0, 5.0,
0.0, 0.0, 0.0,
0.0, 1.0, 0.0);

GL.BindBuffer(Version15.ArrayBuffer, vertex_buffer_object);
GL.VertexPointer(3, VertexPointerType.Float, 0, IntPtr.Zero);
GL.BindBuffer(Version15.ElementArrayBuffer, element_buffer_object);

GL.DrawElements(BeginMode.Triangles, shape.Indices.Length,
All.UnsignedInt, IntPtr.Zero);

Ah, that's still very procedural. Given the object-oriented nature of GL3, I can imagine a much nicer interface for OOP languages. Thanks for the link, though, after reading through one of those .spec files, it's easier to imagine how such a thing could be done (although it would probably require some modification to the .spec format).
Yes, the OpenGL 2.1 API is very procedural - the code sample above is about the limit of what can be automatically generated. To go further you will need to intervene manually, either by modifying the .spec files to contain more data, or by writing an OOP wrapper by hand.

Problem is, this is far from a trivial task: the OpenGL 2.1 specs contains 1528 functions (588 of which are core IIRC), and that's disregarding Glx/Glu/Wgl/Agl. If you limit the scope to the most useful functions (let's say VBO's, Shaders, matrices and whatever else you find useful) it would be indeed possible to write such a wrapper (I think most non-trivial projects do this anyway), but 1528 functions? Not a chance! :)

I don't expect the GL3 specs to be any different in this regard. I am not sure that the object-based API will lend itself well to high-level OO languages , because of issues regarding type-safety, or the lack thereof - will it be possible to 'cast' an OpenGL object to a different one? Will OpenGL functions be type-safe in that each will only accept one kind of object, or will a function be able to work with many different kinds?

The impression I got from the newletters is that the latter is true, which is about the worst possible interface for a high-level language to express. To illustrate this with a GL2 example, consider the glTexParameter family of functions. The second parameter (GL_TEXTURE_MIN_FILTER, GL_TEXTURE_MAG_FILTER, etc) changes the allowed type of the third parameter. From OpenTK:


GL.TexParameter(TextureTarget.Texture2d, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture2d, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexParameter(TextureTarget.Texture2d, TextureParameterName.TextureWrapS, (int)TextureWrapMode.ClampToEdge);

Notice the necessary cast to int. I don't know any high-level language that allows this type of polymorphism - this idiom can only be expressed in non-typesafe C (where all enums are ints). If the same thing happens in GL3, where one function can take many different objects, each with a different set of possible enums... well this will be a very difficult function to express in OO terms.

My hope for the GL3 specs is that functions will always take objects as their first (or last) parameter, no exceptions! If this is true, and if the typemaps clearly declare which types are objects (e.g. "GLobject" like the "GLenum" of the current specs), then it will be possible to autogenerate a quite OO version of the API. Otherwise, we will be stuck to the current procedural model until someone writes a OO wrapper by hand.

(I don't suppose we could have some input from the source here..?)

Simon Arbon
11-11-2007, 05:56 AM
Notice the necessary cast to int. I don't know any high-level language that allows this type of polymorphism - this idiom can only be expressed in non-typesafe C (where all enums are ints). If the same thing happens in GL3, where one function can take many different objects, each with a different set of possible enums... well this will be a very difficult function to express in OO terms.

Object pascal can do this in most cases;
If i was statically linking to a DLL i could do:
procedure glTexParameter( Target: enumTextureTarget, PName: enumTextureFilter, Filter: enumTextureFilter ); external 'test.dll' name 'glTexParameter'; overload;
procedure glTexParameter( Target: enumTextureTarget, PName: enumTextureWrap, WrapMode: enumTextureWrapMode ); external 'test.dll' name 'glTexParameter'; overload;

If i was using a real object in a separate DLL then i could define:

TTextureObject = class
procedure glTexParameter( Target: enumTextureTarget, PName: enumTextureFilter, Filter: enumTextureFilter ); overload; virtual; abstract;
procedure glTexParameter( Target: enumTextureTarget, PName: enumTextureWrap, WrapMode: enumTextureWrapMode ); overload; virtual; abstract;
end;


Unfortunately, the only case overload wont work is when you have to use wglGetProcAddress to get the entry point dynamically.
In this case the different versions need different names, but you can still have several strongly typed calls going to the same OpenGL entry point.


My hope for the GL3 specs is that functions will always take objects as their first (or last) parameter, no exceptions! If this is true, and if the typemaps clearly declare which types are objects (e.g. "GLobject" like the "GLenum" of the current specs), then it will be possible to autogenerate a quite OO version of the API. Otherwise, we will be stuck to the current procedural model until someone writes a OO wrapper by hand.
If by "Wrapper" you mean creating your own objects which contains duplicate calls and which call the OpenGL functions with the 'Real' OpenGL object, then yes, you could do this, but it adds an extra layer to every single call which makes it less efficient.
Hence anybody after the highest possible performance will simply ignore the wrapper.

However, it would not be difficult to add real OO support to the OpenGL API, without changing the existing C compatible API in any way.
1) The OO language expects to find a virtual method table, but this is just a list of entry point addresses, so all we are doing is taking the existing list of entry points in the driver (which we normally read with wglGetProcAddress) and organising them by object.
2) All OpenGL objects will need to be actual address pointers rather than any kind of table index.
3) The very first value in each OpenGL object needs to be a pointer to the above entry point list (VMT).
4) The include file just needs to contain a class definition for each OpenGL object type, which consists of virtual abstract method calls in the same order as the entry points in the VMT.
(ie. like in the above code block example)

If i now do:
TextureObject.lpTexParameter( GL_Texture_2d, GL_Texture_Min_Filter, GL_Linear );
the compiler generates code that pushes the TextureObject to the stack followed by the parameter list, gets the first dword from the object, then does an indirect call to the entry point at the 'TexParameter' offset in the VMT.
The effect is exactly the same as a C style call that had the OpenGL object as its first parameter, so either style of language can be used with the same API with no extra overhead.
An added advantage is that you dont need hundreds of wglGetProcAddress calls, just one for each object-type for the "Create" call, the rest of the entry points are read directly from the VMT in the driver when needed.

Zengar
11-11-2007, 06:54 AM
Sorry, it won't work. You forget that there are languages that behave differently (like Java or various functional languages). Also, even languages with similar object structure can have different binary interface. I am very sure that C++ and ObjectPascal are not binary compatible, for example (not that I have looked into it, but if i am not mistaken Delphi passes the self pointer via a register and not in the stack).

Brolingstanz
11-11-2007, 06:58 AM
@spec files

I seem to recall a thread in which Jon Leech mentioned an eventual house cleaning for the spec files, when time permitted. Though for the life om me I can't recall the thread's whereabouts.

Stephen A
11-11-2007, 02:38 PM
If by "Wrapper" you mean creating your own objects which contains duplicate calls and which call the OpenGL functions with the 'Real' OpenGL object, then yes, you could do this, but it adds an extra layer to every single call which makes it less efficient.
Hence anybody after the highest possible performance will simply ignore the wrapper.
Anyone after the highest possible performance probably won't be using Java, C# or Objcect Pascal in the first case. :) AFAIK, almost every 'serious' project wraps OpenGL/D3D calls in higher-level abstractions, so this hit isn't *that* great.

The real performance cost originates from the overhead of code interop between GC-managed platforms (Java, .Net, etc) and unmanaged code: a native OpenGL call takes about 2-3ns on my machine (2.8GHz Core2), while one going through .Net interop needs about 25ns - a huge difference (which I'm trying to reduce for Tao/OpenTK). However, even this overhead may not actually make a difference in the grand scale of things (how many OpenGL functions are you goind to call per second?). In any case, the cost of a simple OO wrapper with proper inlining would be miniscule compared to this.

Simon Arbon
11-11-2007, 09:34 PM
I am very sure that C++ and ObjectPascal are not binary compatible, for example (not that I have looked into it, but if i am not mistaken Delphi passes the self pointer via a register and not in the stack). Passing in registers is the "Default" behaviour, but any calling convention can be specified.
In my examples i should have used "virtual; stdcall; abstract;" so the calling convention matches OpenGL.
Borland C++ and Delphi (Pascal) are directly compatible in how they call methods of objects, other C++ compilers could be different, but i would not expect them to be so completely different that this couldn't be made to work.
Even if the structure of the VMT was different, i would expect all of them to still use the first location in their objects as a pointer to the class/VMT.
All that is really esential for this to work is that the first location (zero offset) in the OpenGL objects is a 32-bit variable that can be written & read, then each languages OpenGL include file can setup its own VMT/class and put a pointer to it in the object.
I would be interested in hearing from anyone who knows how other brands of C++ (& other OO languages) consruct their VMT's.


Anyone after the highest possible performance probably won't be using Java, C# or Objcect Pascal in the first case.
The real performance cost originates from the overhead of code interop between GC-managed platforms (Java, .Net, etc) and unmanaged codeI totally agree that .net and some high level OO languages can have bad performance, and i would never use them for a high-performance product.
However, C++ and delphi virtual method calls are just an indirect read followed by an indirect jump compared to an indirect call for OpenGL calls via a function variable.
For normal static functions the only difference is the use of a "self" pointer to address a block of memory, so the overall performance difference between C and C++/Delphi is virtually nil.

Stephen A
11-11-2007, 11:47 PM
This still sounds dangerous. I'm not saying that it wouldn't work, just that there seem to be too many ways for this to break down:
* A 32-bit variable wouldn't work on x86_64 machines.
* Amd64 uses a register-based calling convention, not stack based.
* There are architectures other than x86 and x86_64 where OpenGL must run.
* Any change to the compiler could potentially break this down. The compiler writers don't have the obligation to keep the internal structures intact, nor should they.

While it does sound interesting, it's unlikely something like this will ever happen. OpenGL itself is platform-agnostic, and the easiest (only?) way to do that it stick to the C ABI.

And a small nitpick:

However, C++ and delphi virtual method calls are just an indirect read followed by an indirect jump compared to an indirect call for OpenGL calls via a function variable.
For normal static functions the only difference is the use of a "self" pointer to address a block of memory, so the overall performance difference between C and C++/Delphi is virtually nil.
This is true for .Net too, and I assume the same for Java. The performance delta doesn't come from virtual function calls, but rather from the use of a garbage collector and a JIT compiler.

Simon Arbon
11-12-2007, 12:33 AM
* A 32-bit variable wouldn't work on x86_64 machines.
* Amd64 uses a register-based calling convention, not stack based.
* There are architectures other than x86 and x86_64 where OpenGL must run.Yes, i was assuming an x86_32 driver in my examples, the x86_64 needs a completely different driver and compiler anyway so using a 64-bit pointer instead is not a problem.
The spec could just say "The OpenGL object handle is a pointer to an object that begins with a pointer in the native machine format".
For other architectures, their would always be a specific driver written for any given processor family/operating system, so their could be minor implimentation differences to suit them.
It all really depends on what the various versions of C++ actually do to impliment objects on various hardware.
If they are all similar then this would at least work for C++ if not for more unusual languages.
The advantages of using strongly typed object oriented languages are too much to ignore, only people who like to release 100MB bug-fix service packs still use C for large projects.

Zengar
11-12-2007, 01:58 AM
Visual C++ and Delphi are not binary compatible (OOP wise), FPC and Delphi aren't binary compatible either. The only way that will work is using interfaces (like DX does), as this are defined to be plain structs with method pointers (and map directly to the interface construct in ObjectPascal).

Simon Arbon
11-12-2007, 05:42 PM
The ObjectPascal "interface" behaves as a separate object inside of the real object, and begins with a VMT pointer just like a normal object.
The code in the client exe that calls the method is exactly the same for interfaces as it is for virtual methods.
The only difference in the exe is the added calls to addref/release whenever you change an interface pointer.

In other languages that support an "interface" method call that works this way (and they have to, to support DX) then this would be completely compatible with a Delphi virtual method call.
Both use a pointer to a pointer to a struct of method pointers.
The only extra thing that would need to be done is to ensure that the first 3 methods are reserved for compiler generated QueryInterface/Addref/Release calls (and because we dont need them they can just return without doing anything)

Overmind
11-13-2007, 02:53 PM
The advantages of using strongly typed object oriented languages are too much to ignore, only people who like to release 100MB bug-fix service packs still use C for large projects.

The advantages of dynamically typed languages are not to be underestimated. The advantages of functional programming languages are also too much to ignore. And I think I don't even need to mention the huge advantages of logical programming languages. Yet still there are many programmers out there that quite successfully manage to ignore all that ;)

Every programming paradigm has advantages and disadvantages. It would be foolish to ignore them all and go just into a single direction. Sometimes I have the impression that object oriented programmers are not even aware that there are other languages out there.

Zengar
11-13-2007, 03:20 PM
Yes, it's true. I am really amused about the fact that many researchers still use Fortran for scientific computations while there exists a functional language (Clean) that is faster in math then C (according to the http://shootout.alioth.debian.org/ ).

ZbuffeR
11-13-2007, 03:56 PM
Zengar:
I didn't follow : you make fun of bananas and then compare apples to oranges ? :)
When you have huge computer code to simulate a nuclear power plant, that took 25+ years in the making, with a battery of tests, not switching the programming language seem sensible to me. And even more so when Fortran is very capable in parallel programming.

For new projects it is more debatable, but in most case, having good experience and tools in a language to do a correct implementation is better than a few % difference in calculation time.

Edit: Where do you see Clean better for math than C here ?
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=gcc&lang2=clean

Zengar
11-13-2007, 04:56 PM
You're right... seems like they posted new scores. Clean used to be number one in the list. And probably you are right about the other thing too, I don't know much about scientific programming. Still, I believe that a functional programming language is much better for scientific calculations then a procedural one. You have much faster development times, shorter code and less errors (plus natural parallelisation). I brought Fortran as example because it is a 50-years old language designed for perforation cards and it is still alive (even if they removed old constructions in newest fortran).

I just wanted to point out that programmers generally tend to be very conservative. There are lot's of mythos floating around like "Java is slow", "garbage collection is slow", "you can't write games with pascal" etc.

Korval
11-13-2007, 05:54 PM
There are lot's of mythos floating around like "Java is slow", "garbage collection is slow", "you can't write games with pascal" etc.

The thing about these "mythos" is that they're true. Garbage Collection is slower than managing the memory yourself. Java is slower than C++, much moreso if the word "GUI" matters to your code. And while I'm sure you could write games in Pascal, why would you want to?

Brolingstanz
11-13-2007, 06:14 PM
For high performance games for the PC and consoles in particular C/C++ probably makes the most sense, partly because you have little choice ;-).

You can, as many game engines do, create a custom configuration/scripting language to sit on top of your game code, which can offer far greater flexibility and features hardwired into your games logic/plumbing, which can be a real boon in many scenarios. Much more work and maintainance but well justified if extreme customization is paramount.

P.S. My this thread has meandered... :-)

Brolingstanz
11-13-2007, 06:29 PM
Still, I believe that a functional programming language is much better for scientific calculations then a procedural one.

What do you make of LINQ and the new features in .NET 3.0? I haven't played with it yet but I'm pretty jazzed about VS08's release later this month.

This stuff makes me really wish I could do my entire engine in C#. But I'm holding out, hoping C++09 will be the answer to my prayers.

Jan
11-13-2007, 07:10 PM
Haha, no it certainly won't. If you want to do game-engines, like me, you are stuck with C++ for all eternity and C++0x won't be much of an improvement.

Well, to make this clear, i like C++ and i never use anything else (for the kind of programming tasks, that C++ is suited for). But sometimes, when you look at C# or especially D, it makes you feel, well, "stuck" is what comes closest, i think. Problem is, C++ is old and since things like templates were actually invented by C++, they are also not very well designed. And compilation times are just outrages. Now, since there are not only C APIs out there, but also some important libraries, that are C++ only, you can't just choose any other language.

Also, as a competitive "system-programming"-language, there is, at the moment, only "D", i think (C# is not for system-programming), but it still needs several years, before it maybe get enough momentum. Actually i fear, that the incredible amount of C++ code out there will make it pretty much impossible, that we will see any kind of successor to it soon, that is not binary compatible (which is kinda impossible) and thus any transitioning will be a slow and painful process, that might not start in the next 10+ years, at all.

Jan.

Simon Arbon
11-13-2007, 10:47 PM
Just use whatever is appropriate for the task, i use a combination of languages, ObjectPascal for most of the procedural stuff, a logical language for AI, with a few small assembly routines thrown in if i really need them.

while I'm sure you could write games in Pascal, why would you want to? 1. Object pascal can do anything C++ or C can do.
2. Every C/C++ API header file gets translated into pascal very quickly. (by www.delphi-jedi.org (http://www.delphi-jedi.org/) and others)
3. The compiled code executes just as fast as C (as long as you dont make methods virtual when you dont need to, which is a common mistake)
4. Silly mistakes like invalid typecasting or using an invalid GL constant are detected at compilation.
5. It is easier to find bugs because the code is more explicit, you dont have the problem of a single missing character producing valid code that does something completely different to what you intended.
6. Is is much easier to read & understand someone else's code.

Really though, the choice of language is more of a personnal choice, most people simply use the language that they are most comfortable with.

As for execution speed, if the compiler is good enough and you dont overuse the more complex features when you dont need to, then there should be no difference.
Most speed differences are usually due to the quality of the library routines supplied with it, not the actually language.

A program can be written in any language that runs first time and has no bugs, its just a matter of using the proper software engineering techniques, but its a lot easier to do with pascal than with C because all of the typo's are caught by the compiler.

Garbage Collection is slower than managing the memory yourselfQuite true, fortunately my compiler lets you replace the memory manager they supply with your own.

Zengar
11-13-2007, 11:42 PM
@modus: This is true that for super-performance 3d engines there is little choice besides C++. But that is not the point. Not everyone needs this kind of performance. If we take Java as an example -- it is a perfect platform for implementing MMORPGs. You have pretty decent performance (about 80% of C++ analogue), multiplatform support out-of-the box (with webstart), rich network API.

LINQ seems nice, and surely can make database (and not only) programming easier.

P.S. Garbage collection is slower only in several szenarious (if you need only few big chunks of memory). When you are dynamically allocating and freeing lots of objects, nothing beets a garbage collector. Of course, collecting itself may be a minor slowdown, but I've never noticed any when working with Java or .NET

P.P.S. The tradeoff is programming efficiency vs. performance. There was that presentation from Epic Games about future of game programming, where they analyse UT3 engine and Gears of War and propose some language features they find usefull (quote: "would gladly trade 10% performance for 10% programming efficiency"). The language he describes inthe end is somehow similar to OCaml. An interesting read, if you haven't seen it before (http://www.cs.princeton.edu/~dpw/popl/06/Tim-POPL.ppt )

Simon Arbon
11-13-2007, 11:57 PM
P.S. My this thread has meandered... :-)
Yes, perhaps we should get back to "Things we would like explained in the next Pipeline newsletter".

It must turn up soon, after all its the "Summer" edition and its going to be 35C (95F) all this week, summer is here at last.
Just in time for my birthday... :D

Korval
11-14-2007, 01:24 AM
it is a perfect platform for implementing MMORPGs.

No, it is not.

On the client side, memory performance is crucial. MMO's already recommend 1GB of RAM on machines. Try doing them in a GC environment, and you're looking at 1.5-2GB, which is pushing the Win32-bit limit. Now you're in danger of running out of memory due to fragmenting the virtual address table. Even if you're not, you're still taking up lots more memory to do basically the same job. You cut your userbase down by doing so.

On the server side, you can't accept a 20% performance penalty. Servers are already choked as is; you can't afford to make them slower just because you want to use a newer language. 20% speed reduction means that a 50 man raid can only be a 40 man raid.


Garbage collection is slower only in several szenarious (if you need only few big chunks of memory). When you are dynamically allocating and freeing lots of objects, nothing beets a garbage collector.

At what? Not leaking memory? Because the price you pay for not calling "free" at the moment you need it is that "free" will be called "later". When, you can never say. Game's like to have consistent performance, and that's where GC runs into big problems.


where they analyse UT3 engine and Gears of War and propose some language features they find usefull (quote: "would gladly trade 10% performance for 10% programming efficiency").

Isn't Epic being sued for the quality of their engine? I'm not sure I'd take their advice.


Things we would like explained in the next Pipeline newsletter

We know what we would like explained: GL 3.0. Either finished, or an explanation of why it isn't.

Rick Yorgason
11-14-2007, 03:11 AM
Man, my intent wasn't to start one of these holy wars (http://www.catb.org/jargon/html/H/holy-wars.html).

Stephen A
11-14-2007, 04:29 AM
Man, my intent wasn't to start one of these holy wars.
Too late now... But can you think of a better way to keep ourselves occupied until OpenGL3 is ready? :)

@Korval:

On the client side, memory performance is crucial. MMO's already recommend 1GB of RAM on machines. Try doing them in a GC environment, and you're looking at 1.5-2GB, which is pushing the Win32-bit limit.
So you are saying that the GC magically increases memory usage? Sorry for expressing it like this, but that's nonsense.


On the server side, you can't accept a 20% performance penalty.
Yeah, that's why all server applications are written in x86 assembly. No wait... :)
(in reality, it's all Java, with .Net, Ruby, Python to a lesser extent)


At what? Not leaking memory? Because the price you pay for not calling "free" at the moment you need it is that "free" will be called "later". When, you can never say. Game's like to have consistent performance, and that's where GC runs into big problems.
Agreed. That's one of the biggest issues people have with XNA and related toolkits, which you can only eliminate by manually taking care of memory management (using object pools for example). This strips away one of the biggest advantages of GC languages, but such is the price of performance. You still benefit from faster memory allocations though.

However, it is possible to perform deterministic object allocation/de-allocation in .Net (and presumably other GC environments), using the Disposable pattern. The differences aren't as large as you make them sound.


Isn't Epic being sued for the quality of their engine? I'm not sure I'd take their advice.
Yeah, I'm sure you know better than the Epic team ;)

(please, Khronos, get OpenGL3 out of the door before things start becoming violent here! :) )

Overmind
11-14-2007, 05:05 AM
Garbage Collection is slower than managing the memory yourself.

This statement is not true.

Ok, there may be usage patterns where manual memory management is faster. But there are also usage patterns where garbage collection is much faster.

With manual allocation, the cost is proportional to the number of allocations and deallocations. With copy collection, the cost is proportional to the number of living objects. Allocation is extremely cheap (a single pointer increment), and deallocation is free.

So if you have a lot of allocations of small, short living objects, garbage collection is a clear winner.

Constructing a test case where garbage collection is twice as fast as manual memory management is not so hard, and this test case is not so exotic that I would bet on it not appearing in real world applications.

That being said, games or 3D engines usually don't have the memory usage patterns necessary to make copying garbage collection more efficient than e.g. reference counting. But I didn't want to leave such an absolute statement uncommented when it's false in general ;)

Xmas
11-14-2007, 06:30 AM
Please, this isn't a direction a thread called "OpenGL 3 Updates" should take. Just stop here.

Jan
11-14-2007, 08:32 AM
Allocating and deallocating memory is always slow, no matter, whether it is GCed or not. So, you are always good advised to use things like free-lists, which might be the fastest possible solution. Even in garbage-collected languages no one prevents you from doing it, and even there you will benefit from its use.

The difference is, that in non GCed languages you need to handle deallocation EVERYWHERE. That's a big source for memory-leaks and that makes your software unreliable. Well experienced programmers will use RAII a lot (e.g. using the STL) and thus code becomes pretty reliable. Still, a huge part of my engine is only responsible for resource-management, including proper deallocation and detection of ill usage patters, e.g. forgotten objects. With GC at least some of the code could be shortened quite a bit.

In the end, what it comes down to, is that with GC you can usually do all the optimizations, that you can do in C++, that actually make sense. However, for all the small stuff, you can be sure, that you don't need to worry, there will be no memory leaks, that make your app crash after it runs for x days (e.g. servers). That's most certainly a good reason, why many people prefer C# and Java over C++ for such tasks. No matter how good your programmers, when you have software running for weeks or months, it still needs to be reliable.

And usually we have enough to worry about already. I wouldn't mind getting a garbage collector in the C++ core language. Though that won't happen.

Jan.

Timothy Farrar
11-14-2007, 10:48 AM
Please, this isn't a direction a thread called "OpenGL 3 Updates" should take. Just stop here.

Yep, time to change the subject.

Will GL3 support multi-sample textures (FBO only has mulit-sample renderbuffers)?

This is something D3D 10 has and GL should have as well!

Having the ability to read samples of the multi-sample FBO is tremendously useful. Obvious example is the Stencil Routed A Buffer paper for order independent transparency.

Zengar
11-14-2007, 11:35 AM
No, it won't. It will basically cover the OpenGL 2.0 features. Multisample textures are likely to be introduced in the subsequent releases or as extension.

Korval
11-14-2007, 12:32 PM
No, it won't. It will basically cover the OpenGL 2.0 features. Multisample textures are likely to be introduced in the subsequent releases or as extension.

I wouldn't be too sure about that.

The concepts of RenderBuffers and Textures are defined by the Format object. That is, you set a usage parameter on the Format object, which acts as something of a suggestion as to how you intend to use it (as a render target or a render source or both).

Obviously, format objects will need to be able to create render sources that can be multisample-able; this will get them up to speed with FBO renderbuffers. However, since it's just a format object parameter, it's entirely possible to express in GL 3.0 the desire to create an image that will be multisampled and be used as a render source (texture).

From there, it is up to the IHVs to actually make this format object work for the hardware that it can work on.

Zengar
11-14-2007, 12:40 PM
Yes, of course, but this is also possible in the current OpenGL (wth appropriate extension). I understoond the question as "will this feature supported out of the box?", and the answer is most probably no. GL3.0 has to be fully implementable on GFFX (or so we were told), and only G8x (ATI2xxx) and up support multisample lookup.

Brolingstanz
11-14-2007, 12:44 PM
Yes maybe in the "Reloaded" time frame, but by Mount Evans for sure.

The folks calling the shots on this thing are aware of what the hardware can do, and Mount Evans has a minimum hardware requirement, thus I think it's safe to say we'll see those features in Mount Evans.

It's going to be one hell of a mountain.

Brolingstanz
11-14-2007, 12:49 PM
Btw, I agree with what you said earlier, Zengar. Honestly I couldn't give a rats ass what language people use... just wanted to point out the console thing as something of an impediment at this point. But who knows what the future holds... I'd be willing to bet we'll see managed languages on those boxes too at some point down the road (sure would be nice anyway IMHO).

Fitz
11-14-2007, 01:19 PM
Since the initial release of OGL 3.0 won't include alot of the newer extensions will Nvidia and them release 3.0 specific extensions to enable those features or will we just have to wait however long it takes Kronos to get Mount Evans completed(I'm guessing a realllly long time..)?

Brolingstanz
11-14-2007, 01:25 PM
I would think so.

They have a history of being quick to the gate with new hw features via extensions. After all, it's good for business.

Korval
11-14-2007, 01:28 PM
I understoond the question as "will this feature supported out of the box?", and the answer is most probably no. GL3.0 has to be fully implementable on GFFX (or so we were told), and only G8x (ATI2xxx) and up support multisample lookup.

So, if GL 3.0 has the expressiveness to specify, "make an image format for a multisample image that I intend to look up from", why would an IHV not allow a program to use it if the hardware is available?

Just because GL 3.0 has a minimum hardware requirement doesn't mean that it has maximum requirements. GL 3.0 won't require implementations to provide this, but it won't stop them either. Maybe Mt Evans will be where they actually require it, but you ought to be able to at least use it in 3.0.

Plus, unlike GL 2.1, you can actually test to see if you can do it.


Since the initial release of OGL 3.0 won't include alot of the newer extensions will Nvidia and them release 3.0 specific extensions to enable those features or will we just have to wait however long it takes Kronos to get Mount Evans completed

I seriously doubt the IHVs will create stop-gap extensions for DX10 features. There's no point when those extensions will be deprecated in a few months.


I'm guessing a realllly long time..

It won't be that long. GL 3.0 is taking a while because it's an entirely new API; getting it right for future extensibility is hard.

Most of Mt Evans features will drop into GL 3.0 without even API changes.

Zengar
11-14-2007, 02:08 PM
Because you will need new GLSL functions to read from such textures. You will need something like 'texture2Dms(sampler, coords, sample)'.

Cgor_Cyrosly
11-15-2007, 05:21 AM
when used shader program in GL3.0,is it like this:


GLcontext ctx;
ctx=glCreateContext();
program=glCreateProgram(ctx);
vs=glCreateShader(GL_VERTEX_SHADER);
gs=glCreateShader(GL_GEOMETRY_SHADER);
fs=glCreateShader(GL_FRAGMENT_SHADER);
glAttachShader(program,vs);//attached to program so contained in ctx
glAttachShader(program,gs);
glAttachShader(program,fs);
...//create other GL object for current GL context
glBindContext(ctx);//bind the ctx as current active context of GL
for rendering
renderSence
...
or
vs=...
gs=...
fs=...
program=glCreateProgram();
glBindProgram(program);//bind program object as a part of current context
...

and when used multi-pass:


create shaders for one pass
program[0]=glCreateProgram();
attach shaders to program[0];
...
create shaders for n pass
program[n]=glCreateProgram();
attach shaders to program[n];
...
glBindProgram(program[0]);
...
glBindProgram(program[n]);
//bind program objects:program[0]~program[n] to current context

Is right?

Korval
11-15-2007, 02:00 PM
when used shader program in GL3.0,is it like this:

The newsletters and other publicly available materials will answer your question about object creation.

Rick Yorgason
11-16-2007, 12:43 AM
And usually we have enough to worry about already. I wouldn't mind getting a garbage collector in the C++ core language. Though that won't happen.
No, GC won't happen in the C++ core language, and there's lots of good reasons for that; even C++0x only plans on making the changes that are necessary to remove some of the limits faced by third-party GC implementations. However, it doesn't need to be in the core; if you want to use GC in C++, there are good implementations you can pick from.

The biggest problem with GC in games is that there's lots of resources you have to deal with that aren't memory, like files and network streams. When dealing with these in GC languages, you don't know when an object will be destroyed, so you need to explicitly catch exceptions from these other kinds of resources and destruct them there. This is why the "Finally" block is so much more useful in Java than it would be in C++. (Admittedly, C# has improved this somewhat with "using" blocks, but as far as solutions go, that's about as good as the C++ solution of using smart pointers for memory management.)

Groovounet
11-21-2007, 06:00 PM
Which choice have we for alpha testing if we don't use alpha test or discard? ... I think it's remind very useful and I use it too. It maybe limit the fragment output consistency on the hardware side but I can't a reason that alpha test being worse than depth test...

Korval
11-21-2007, 07:12 PM
Which choice have we for alpha testing if we don't use alpha test or discard?

Um, none? I don't know what it is you're asking here, but if you don't have Alpha Test or Discard, you pretty much can't do it. You might be able to pull off some kind of depth write trick, but that's about it...

santyhamer
11-25-2007, 11:19 PM
Tick tock tick tock!
4 days for the spec
this crystal ball told me!

plasmonster
11-26-2007, 12:00 AM
I just asked my magic eight-ball. It said "It it decidedly so", after some deliberation.

Zengar
11-26-2007, 04:16 AM
My spirit will haunt you if it is not true...

cass
11-26-2007, 03:34 PM
NV3x supported neither blending or filtering of fp16 surfaces.

NV4x was the first architecture that did, and it supported both, but without MSAA for the render targets.

Cgor_Cyrosly
11-26-2007, 10:16 PM
The GL3.0 will be released at Christmas?

dor00
11-28-2007, 05:42 AM
2 days left..

elFarto
11-28-2007, 07:33 AM
My spirit will haunt you if it is not true...
My spirit will do more than haunt if he's telling porkies. :)

Cgor_Cyrosly
11-28-2007, 08:20 AM
2 days left

Really? It is so exciting that!

santyhamer
11-28-2007, 07:45 PM
2 days left

Really? It is so exciting that!
Well, I was just arguing... but logic says must be before or after Christmas vacations so... :p

bobvodka
11-29-2007, 11:08 PM
If they get all of december off then sign me up for a job working there!

dor00
11-30-2007, 07:05 PM
The time is 1 Dec 2007, no news yet...

Zengar
11-30-2007, 08:19 PM
It is also saturday :) I would at least wait till monday

FaceOnMars
12-01-2007, 03:35 PM
I'm starting to see a lot of ghosts...

FaceOnMars
12-01-2007, 03:41 PM
About the object model of the new API, I guess it will follow the path of the OpenSL ES... It will give the potential to be faster, but also a lot more verbose, but making an C++/Delphi OOP wrapper around it will be a lot easier than with the state machine model.

Jalf_
12-03-2007, 01:01 AM
And usually we have enough to worry about already. I wouldn't mind getting a garbage collector in the C++ core language. Though that won't happen.
No, GC won't happen in the C++ core language, and there's lots of good reasons for that; even C++0x only plans on making the changes that are necessary to remove some of the limits faced by third-party GC implementations.
Actually C++0x *does* include a garbage collector. :)

Jan
12-03-2007, 03:36 AM
Reference please?

Zengar
12-03-2007, 03:46 AM
http://en.wikipedia.org/wiki/C++0x#Transparent_Garbage_Collection

P.S. So what about GL3.0 specs? :)

MORB
12-03-2007, 04:39 AM
http://en.wikipedia.org/wiki/C++0x#Transparent_Garbage_Collection

P.S. So what about GL3.0 specs? :)

It seems that it won't make it into the next C++ standard however, but they will continue working on it:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2432.html

Korval
12-03-2007, 12:39 PM
The basic idea for C++0x is that they're going to add a few semantic changes that will make it easier to implement GC in C++. The actual GC feature itself will not be part of C++0x.

Also, there is a clear desire among the C++ working group to get C++ on a 3-5 year cycle of updates and releases, rather than the 10 year cycle they're on now. So even if GC doesn't hit C++0x, they're looking for it to hit C++13 or C++15.

Kazade
12-03-2007, 02:36 PM
Does anyone have any further estimate of when the OpenGL 3.0 specification will be released? Will it be released before Christmas?

Jan
12-03-2007, 02:45 PM
Does anyone have any further estimate of when the OpenGL 3.0 specification will be released? Will it be released before Christmas?

Nope, we know as much as you do.

The "short" summary in Wikipedia about C++0x makes me wonder, whether they just decide to put a feature in there, or whether they are sure it is easily implementable. Some of those features are quite difficult for compilers, i think. Combined with the already extremely complicated syntax, that makes C++ Compilers so hard to implement, i fear it will take several years, before we get compliant compilers. Just as it was with the last C++ standard...

Jan.

bobvodka
12-03-2007, 09:22 PM
The method with C++0x has been different to before, a fair few of the things which are being added already exist in some form in some compilers; I know GCC has some of the features already experimentally added.

Jalf_
12-04-2007, 07:38 AM
About the C++ garbage collector thing:

http://herbsutter.spaces.live.com/?_c11_...%26ayear%3d2007 (http://herbsutter.spaces.live.com/?_c11_BlogPart_BlogPart=blogview&_c=BlogPart&partqs=amonth%3d11%26ayear%3d2007)

Guess I was wrong. In a 2006 post, he said the garbage collector was in, but looks like they had to back down a bit. Oh well...

Don't Disturb
12-04-2007, 11:42 AM
ok, just this once....


glCreateProgramObject(...)
glCreateShaderObject(...)
glShaderSource(...)
glCompileShader(...)
glAttachObject(...)
glLinkProgram(...)
glUseProgramObject(...)
glGetProgramObjectBinary(void *ptr, uint32 *size)
FILE* blobFile=fopen("blob.blob", "w");
fwrite(ptr, 1, size, blobFile);
fclose(blobFile);

there you go, free of charge - and no tea needed.
Yes, something like this is really, really needed. GLSL is unusable for any significant number of shaders right now because compilation/linking is so slow.
An initial compilation when the user installs the app, and a recompilation whenever they update their drivers would be fine, but not every time they run the app!

Seth Hoffert
12-05-2007, 02:23 PM
I am in desperate need of a new newsletter :( .

bobvodka
12-05-2007, 07:19 PM
Not gonna happen pre-GL3 I feel, which it pushing it for Xmas, which means we probably won't see stable drivers before Easter at best.