PDA

View Full Version : New GL spec means: new begging.



kRogue
03-15-2010, 03:33 PM
Just to be the first one to beg, now that GL 4.0 and 3.3 specs are out, just would like to add the requests:

1) separate shader objects (especially since there are now so many shaders running around)

2) DSA (or atleast a subset of DSA for manipulating textures and buffer objects)

3) GL_NV_texture_barrier

4) limited blending for integer texture targets

Groovounet
03-15-2010, 06:48 PM
- Agreed especially for DSA and texture barrier. Number 1!

- A debug profile!

- I think with the step up from OpenGL 3.3, separate shader object could be fine now.

- include, shader binaries.

More begging to come.

Stephen A
03-16-2010, 02:51 AM
To join the chorus:

- DSA for all non-deprecated functions. I don't care about using DSA on all functions from 1.0 to 4.0. Cut down the spec and only include core 3.3+/4.0+ functions to the spec.

- Shader binaries.

- Debug profile.

And that's it, more or less.

Jan
03-16-2010, 05:04 AM
- Multi-threaded rendering, i.e. command buffers, or something like that.

Jan
03-16-2010, 05:29 AM
- Environment (global) uniforms for GLSL.

skynet
03-16-2010, 06:12 AM
- Environment (global) uniforms for GLSL.
Do you need more than UBO's for that?

Jan
03-16-2010, 06:26 AM
It would be nice to have that on older hardware, because UBOs aren't supported everywhere. Also it would be nice to be able to make just any single uniform an env-variable, without the need to create a struct for everything. That makes it far easier to have people plug in data through scripts and such, which the core engine does not know anything about.

For example, if my game-play requires a "time-of-day" variable, a level-script could just set that variable and all depending shaders would get that value, without further ado.

Jan.

Groovounet
03-16-2010, 07:13 AM
- Generalizing the GL_ARB_explicit_attrib_location extension for uniform variable (GL_ARB_explicit_uniform_location) and what we used to call varying variables in GL_EXT_separate_shader_objects. This would fix the only trouble I have with this extension.

- Allowing blocks for vertex shader inputs and fragment shader outputs and allow to assign a location to these blocks.

kRogue
03-16-2010, 08:16 AM
Some thoughts:
1) explicit uniform location... that is going to be messy once arrays, matrices and such come into play as they usually take more than one slot... also, what keeps going through my head: if you have lots of common uniforms to many shaders, wasn't UBO's supposed to handle that? Can you give the use case for it?

2) I am thinking that binary shaders is a bit of a red-herring... over in GLES2 land there are binary shaders in some form on some platforms... the Tegra implementation of binary shaders is particularly painful as the binary shader also depends on GL state (blending, masking, etc).. packing up all the possibilities likely would make the blob too big... but over on driver land, the driver just saves those compile jobs it needs... maybe something less "final" than binary shaders, a pre-compiled blob that is not complete but has done the bulk of the work? Admittedly, for binary shaders to actually work well, somehow the shader source code will need to be in there too for an old application to work on new hardware of years from now... or for that matter when the GLSL compiler is fixed/improved.

Groovounet
03-16-2010, 08:58 AM
explicit location is a door for a really nice implementation of SEMENTICS. No more GetUniformLocation / BindUniformLocation, GetAttribLocation / BindAttribLocation ... let's say: less of those. I agree than with the multiple location per variable ... there is an issue. With block and maybe sizeof(block), it could be solved.

ZbuffeR
03-16-2010, 09:08 AM
About binary shaders, the idea is not to ship binary shaders, but offer a mechanism where an application can sends a bunch of GLSL shaders in source code form to the driver, and get back the compiled blobs to reuse at a later time, to reduce application start time. Of course as soon as the driver or hardware changes, the binary blob should be refused, and the app will have to refeed the original GLSL source code.

There was pretty interesting discussions and uses cases about this on these forums, I will try to dig a link.

Point 2 on this post (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=262474&Searchpa ge=1&Main=48047&Words=binary+%2Bblob&Search=true#P ost262474)

This post too (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=266278&Searchpa ge=1&Main=51540&Words=binary+%2Bblob&Search=true#P ost266278)

Jan
03-16-2010, 01:05 PM
You found 2 (two!) posts about that topic? Weren't there like a gazillion threads about "binary shader blobs" until now. And every two weeks someone needs to explain the idea again, and again, and again...

Anyway, i'd like to have that too.

Jan.

ZbuffeR
03-16-2010, 01:13 PM
Well I found 2, you found none ... :)

kRogue
03-16-2010, 01:22 PM
Giggles, "Point 2 of this post", that post was mine.

At any rate, I have seen the binary shader thing thrown around ALOT. What goes through my head is the idea of a binary blob as a hint to be used optionally... now the horrible icky things soon come into play: deployment. Chances are one will need a binary blob for each major generation for each major GPU architecture. On PC, that means right now means 6 blobs (GL2-cards, GL3-cards, GL4-cards)x(nVidia or AMD). [I don't event consider Intel anymore at this point]. But the plot thickens, OS and driver version.

One can go for this: application does not ship those blobs but rather makes them at first run and then re-uses them so we get things like:

glHintedCompileShader(const char *GLSLcode, int blob_length, GLubyte *binary_blob);

and maybe a query or something:

glGetInteger(GL_USED_BINARY_BLOB, &return_value);

and lastly glGetBinaryBlob.

it is feasible to do, and in most cases only the first run of the application will get a "full" compile... the sticky bit of the idea above is that it assumes that the "binary blob" does not depend at all on GL state (who knows what the driver does as some GL state changes, see my comments on Tegra)...

But the idea of shipping static binary blobs to cut down on startup time I don't see be so feasible with constantly evolving hardware and drivers... it is kind of feasible in the embedded world, but only with an incredible amount of care.

Groovounet
03-16-2010, 04:37 PM
Oh god! And I wasn't aware of the binary blob idea!

In my mind it was way more simple: At first launch, you build the shader source, get and save the binary and load it directly at next launch. Binary would even be compatible only for a specific driver version. A function to check binary validity would be use to check if a shader rebuild is required.

Alfonse Reinheart
03-16-2010, 04:55 PM
it is kind of feasible in the embedded world, but only with an incredible amount of care.

That kind of thing works in the embedded world because you have near total control over the system you're running on. You get to say what your chipset is, what driver version of that chipset is, etc. So shipping binary blobs is fine.

BBs for desktop OpenGL should not be interchangeable. We want them to solve a specific problem: shader compilation startup time. The best way to do this is to just have them generate a block of data that may be able to be loaded later as a program.

In any case, BBs are secondary. We now have five different shader stages; separation of programs into stages is becoming very necessary, if you want to do anything with geometry or the two tessellation stages. And we know that separation isn't a hardware issue; it's 100% just a bad decision made by the ARB 5 years ago.

Fix it!

Rob Barris
03-16-2010, 06:55 PM
Well, who can argue with a polite request like that.

Simon Arbon
03-16-2010, 10:49 PM
At any rate, I have seen the binary shader thing thrown around ALOT. What goes through my head is the idea of a binary blob as a hint to be used optionally... now the horrible icky things soon come into play: deployment. Chances are one will need a binary blob for each major generation for each major GPU architecture. On PC, that means right now means 6 blobs (GL2-cards, GL3-cards, GL4-cards)x(nVidia or AMD). [I don't event consider Intel anymore at this point]. But the plot thickens, OS and driver version.
One can go for this: application does not ship those blobs but rather makes them at first run
Actually both ideas have been suggested by different people and are listed separately on the User Wish List (http://www.opengl.org/wiki/Proposals)

GLSL shader precompilation
Description: Having an off-line tool that compiles GLSL into something that you can feed into any implementation to create programs.
This something would likely be a form of the ARB assembly, PTX (http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf) or LLVM (http://llvm.org/) .
Benefit: Presumably, the compile/link time of this precompiled format would be lower than that of GLSL itself.
Some people also want this to make it harder for people to copy their GLSL code.

Compiled Shader Caching
Description: The ability to store compiled shaders in some format, so that subsequent executions of the programs will not require a full compile/link step. Or at least, will not require it unless drivers have changed.
Benefit: Improve program initialization/level loading (if shaders are part of level data).

The most comprehensive discussion on binary blobs started here:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=244045#Post2440 45

kRogue
03-17-2010, 03:01 AM
There is another way to cut down on shader startup time, and several folks have already commented that they do it:

Use nVidia Cg compiler with -oglsl to get GL-asm.

Naturally, this is not going to fly too well when you want to use more advanced features on a non-nVidia card as GL-asm is not even part of the GL spec, and the asm extension itself has not been updated (but nVidia has made lots of NV_ extensions for it).

A potentially more middle ground might be:
1) update GL-asm interface parallel to GL features
2) allow for application to get GL-asm code
3) ability to send GL-asm instead of GLSL to GL for shaders

Though this is not so great either as making the GL-asm spec is not going to be fun (and we would find that each new GL release has 4 docs: GL core, GL compatibility, GLSL and (new)GL-asm). Worse, epic chance that driver has to apply some magic to the GL-asm too and what is good GL-asm might heavily depend on the GPU.

The idea of sending some intermediate byte code is appealing though, but what the byte code needs to store might also depend on GPU architecture.. and even then the driver will need to "compile" the byte code... one can argue that the GL-asm idea above is just one form of this too.

Ilian Dinev
03-17-2010, 03:07 AM
Or like someone mentioned long ago: the driver could simply store a 50MB hashtable of glsl program srccode and its binary; no extra specs necessary. Internal binary format for the given driver, update of driver invalidates that cache.

kRogue
03-17-2010, 03:17 AM
Well, who can argue with a polite request like that.


That comment made me laugh, it made my day :D

At any rate, considering the new layout thingy added to GLSL, does not seem like a stretch to add it for vertex shader outputs, etc. I suppose that might be even the plan.

Though.. it does make me nervous some, some GPU's are vector centric, some are scalar centric and some are just funky. Additionally, for some implementations of GL, having 2 vec2 interpolators is same cost as one vec4 interpolator, where as for others it is not...

Still I'd love to have separate shader object's mixed with a little DSA syntax to state which shader you are setting the uniform of.

kRogue
03-17-2010, 03:22 AM
Or like someone mentioned long ago: the driver could simply store a 50MB hashtable of glsl program srccode and its binary; no extra specs necessary. Internal binary format for the given driver, update of driver invalidates that cache.


This I guess is okay-ish for beefy desktops, but for smaller things is not really feasible... there are small embedded devices that do GL2.x and GL3.x (not just GLES1 and GLES2).

ZbuffeR
03-17-2010, 04:21 AM
Indeed, the driver-side cache seem very fragile.
A given application will have a hard time to tune the cache size, as if it is too small, that defeats the purpose.

That could be a plus provided transparently by some drivers, but it would be impossible to rely on it.

Groovounet
03-18-2010, 10:49 AM
Completing my request for a generalized "explicit location" / sementic oriented design and separate shader progrom based on that, it would be great to have a glTransformFeedbackVaryings function that take locations instead of (stupid) names. glTransformFeedbackVaryingsNV in fact... I never really understand from where drop this glTransformFeedbackVaryings with names but I would be pleased not to put it right my own and personal deprecated functions list.

By the way I would like to say that I love the work done on transform feedback for OpenGL 4.0. I quite think that a lot of it could find a place in OpenGL 3.4 but I might be wrong.

V-man
03-22-2010, 07:33 AM
Actually both ideas have been suggested by different people and are listed separately on the User Wish List


[censored], there is a user wish list now? That's great!

For binary shaders? Why isn't this made available until now? Last time I heard there was discussion at khronos meeting about it.

Chris Lux
03-30-2010, 02:41 AM
i think there should be the possibility to set defines/macros during the compile of a shader.

maybe extend the glCompileShader function to take another string as second parameter in the usual compiler parameter form.

for example:

glCompileShader(_gl_shader_obj, "-D_MY_MACRO=3 -D_MY_SECOND_MACRO=2");

currently this can only be achieved my parsing the shader source for the #version statement and pasting the custom definitions into the source string. maybe this could help custum shader instantiation aside the subroutines.

Groovounet
04-19-2010, 09:29 PM
Move up this good thread with my little OpenGL wish-list:
OpenGL 3.4 / 4.1: Expectations / Wish-list (DSA!) (http://www.g-truc.net/post-0279.html)
OpenGL 3.4 / 4.1: Expectations / Wish-list (major) (http://www.g-truc.net/post-0280.html)
OpenGL 3.4 / 4.1: Expectations / Wish-list (others) (http://www.g-truc.net/post-0281.html)
OpenGL 3.4 / 4.1: Expectations / Wish-list (summary) (http://www.g-truc.net/post-0282.html)

Alfonse Reinheart
04-19-2010, 11:06 PM
OpenGL 3.4 / 4.1: Expectations / Wish-list (DSA!)

Oh Jesus Christ. "Direct State Access (DSA) is the most wanted feature for OpenGL 3.4 and OpenGL 4.1 by most serious OpenGL developers." I call total BS on this statement. And considering that it's the very first sentence, that does not bode well for the rest.

The presence or absence of DSA doesn't prevent you from getting your work done. The only thing it might prevent is a reasonable multithreaded implementation. Important, yes. But far from overwhelming. The simple fact is this: except for the multithreading issue (which I admit isn't something one should easily gloss over, but it does require additional API support), what are you going to do with DSA that you could not do without it?

Nothing. It is pure convenience.

When there are real problems and needs with regard to the API, convenience is a luxury.


Currently, at linking the compiler need to link the output variables of the vertex shader to the input variables of the fragment shader using the variable string name. With explicit varying location, this task would disapear: output vertex shader variable with location 0 would automatically communicate with input fragment shader variable with location 0 automatically.

So you get to name things by different names in different stages. Again: pure convenience; the types still have to match. Personally, I prefer names. They have an actual meaning, unlike a number which doesn't.

And having the same variable defined with different names in different places is pretty much the definition of "confused logic".


VAO alternative idea with explicit binding points and layout sharing:

That's a pretty terrible idea. The offsets are baked into the format, and you can only bind the buffer object's base. Because the bindless "handles" are simply GLuint64's, you can "bind" any arbitrary number. This allows you to put multiple meshes with the same format in the same buffer, and simply "binding" a different set of addresses for each attribute in each mesh. To make this work, you need to be able to use something like glBindBufferRange.

More important than that, by NVIDIA's admission, the reason bindless gets its performance advantage is from cache issues. Cache issues specifically related to having to read data from buffer objects. Making the user bind buffer objects more frequently is not the way to solve this problem. The performance benefits come from locking the buffer, not from rearranging commands.

Groovounet
04-20-2010, 03:22 AM
**** ! Whouhouuu ! ****

I love physics, strong action, strong reaction!
Next time I will say REQUIRED by ALL and observe how you react.
If you look at the OpenGL thread announcement, DSA is at a lot of places. That night I talk to a friend about that announcement and he reacts first with a "damn it again no DSA!" and then he cares about the details. How rational is this wish? But people don't make wish by reasons otherwise it would be no war. Anyway, number 1 item in my wish list!

For the second part, I didn't say that is was the solution, it was just an introduction for the problems solve bindless graphics. I saw that statement for cache pollution in the spec. Moreover, I don't think that just bindbuffer pollute the L2 cache all this calls must be evolve. I probably haven't been accurate enough here, thanks!

PS: For a change, I would enjoy to see you posting your wish list just to change role and be for once the guy that suggest instead of just being always the critic one. Don't give me wrong, I have nothing again critics at all, he would just be nice to see you perform on other side. :p

Dark Photon
04-20-2010, 06:21 AM
PS: For a change, I would enjoy to see you posting your wish list just to change role and be for once the guy that suggest instead of just being always the critic one. ... Don't give me wrong, I have nothing again critics at all, he would just be nice to see you perform on other side. :p
Agreed! Way too much vitreol.

To support and refine your "critic" reference, there's nothing wrong with constructive criticism and suggestion at all (helps folks grow, encouraging, gives them something to think about)! But too much of what I'm seeing from him is just flat bitter-old-man destructive criticism (ranting, demotivating, egotistical [my-way-is-the-only-way], and sometimes even totally false).

Chris Lux
04-20-2010, 11:15 AM
Move up this good thread with my little OpenGL wish-list:
OpenGL 3.4 / 4.1: Expectations / Wish-list (DSA!) (http://www.g-truc.net/post-0279.html)
i am totally with you on that. one thing i wonder whould fit into the DSA scope: initialize/update textures from PBOs.

there are only the glTextureImageXD functions. what would be really helpful would be something like:

glTextureSubImage_data_from_unpack_buffer_offset(u int texture, uint unpack_buffer, enum target, int level, int xoffset, int yoffset, sizei width, sizei height, enum format, enum type, intptr offset);

i am not good at function names, but i hope you get the idea. because with the current DSA extension we would have to actually bind the unpack buffer and to the texture update.

Alfonse Reinheart
04-20-2010, 12:50 PM
For a change, I would enjoy to see you posting your wish list just to change role and be for once the guy that suggest instead of just being always the critic one.

I have. Look at the announcement thread.

1: Separate shader.

2: Whatever lowers vertex submission overhead.

3: Multithreading.

There's simply not much more that OpenGL actually needs at this point. Even the vertex submission thing is just a "would like to have," since OpenGL is already quite good and better than D3D at this.


But too much of what I'm seeing from him is just flat bitter-old-man destructive criticism

The ARB has limited time and resources. That means prioritization is important. That means that they should work on features that will actually make OpenGL programs more capable before working on niceties and API cleanup.

DSA doesn't make OpenGL more capable. DSA only matters as a potential component of full-on multithreading support. There is nothing you can do with DSA that you can't currently do without it, so I see no need for it. And I consider it constructive to say that, as it pushes things in a good direction, rather than a useless one.

I like DSA. But I'm not going to let people act like it is anything more than syntactic sugar. I like sugar, but it shouldn't get in the way of the actual meat.

Groovounet
04-20-2010, 01:08 PM
From what you say, if DSA help multithreading (how?), so DSA is the meat.

On that side the separate shader is actually really similar to DSA. There is nothing we can't do without separate shader. Duplicate shader and you go.

Anyway, I didn't even said that DSA or separate shaders are the most important feature to have, I said, they are 1 and 2 in my wish-list.

Following your 'meat' idea, the main thing left is RWTexture and RWBuffer and at least RWTexture is really on its way to OpenGL 4.1 and hopefully RWBuffer too. Which I put in the category "expectations".

Alfonse Reinheart
04-20-2010, 01:58 PM
From what you say, if DSA help multithreading (how?), so DSA is the meat.

No, multithreading is meat. DSA is merely a means to that end.

And the reason DSA is needed is because of state issues. If you confine your state to one thread, it is easy to be responsible for the current state of everything. However, if you have multiple threads that can change the current state, this is a real problem.

Being able to have a worker thread that can do operations on an object without having to bind that object to the context (and thus mess up context state) is important to making multithreading work.


On that side the separate shader is actually really similar to DSA. There is nothing we can't do without separate shader. Duplicate shader and you go.

Duplicating the shader takes up precious compile time. The more fully linked programs, the longer it takes to load. This is non-trivial for serious application.


Following your 'meat' idea, the main thing left is RWTexture and RWBuffer and at least RWTexture is really on its way to OpenGL 4.1 and hopefully RWBuffer too. Which I put in the category "expectations".

Which is part of the argument that your priorities are wrong. These provide actual, tangible functionality to rendering. So they should be higher priority than other things.

CrazyButcher
04-20-2010, 02:12 PM
great work on that wishlist there Groovounet, I second your last 3.4/4.1 list.

Personally I would prefer the separate shader (explicit attribs) stuff first, as I miss that most from working with Cg, which also means transform feedback is more generic to setup (doesn't need to know about the shaders to come) and parameters don't need to be updated for every linked pair combo.

And then DSA... it just makes it more natural to write modern renderers this way, multi-threading... and great that you took the attention to detail and point out issues within DSA.

hopefully DSA for core stuff will pave the way for the overhauled display/command list thing. Having the includes for GLSL would add a lot of convenience, yes one could write their own yadda yadda, it's just nicer this way.


Which is part of the argument that your priorities are wrong. These provide actual, tangible functionality to rendering. So they should be higher priority than other things.

your reasoning here is that exposing new things should have highest priority, however one can also reason that DSA would improve the situation independent of a hardware feature for all general opengl development on "current main stream" hardware (ogl 3.x). And I would think that has some serious importance as well. If you want to play with the "latest" you will have the EXT and vendor stuff to get going and ARB stuff will follow and not be fundamentally different. But DSA simply exposes a new form of quality to OpenGL in total, the earlier that comes, the better for all of us.

Alfonse Reinheart
04-20-2010, 02:27 PM
one can also reason that DSA would improve the situation independent of a hardware feature for all general opengl development on "current main stream" hardware (ogl 3.x).

No, it wouldn't. It might make them slightly more intuitive to use, but it would not make them better hardware features. It doesn't let you do things you couldn't before.

Jan
04-20-2010, 02:55 PM
OpenGL is an API, as such its core idea is not only to provide a feature-set, but also to make it pretty. Otherwise we could really just push opcodes to the GPU (or whatever the lowest level of programming those darn things is).

In the beginning OpenGL focused strongly on providing an interface that is actually NICE. It's simply that over time this style of programming became more in the way, than it was helping.

In recent years there have been so many new features, that changed these APIs so radically, that OpenGL simply had trouble keeping up. This seems to be mostly over now with 4.0. We really have all the important features, now the focus should shift back to making the interface more usable.

DSA has two benefits: First, it makes the API more streamlined and easier to use correct in bigger projects. Second, it should be generally better for multithreaded use. Both from a performance and a correctness standpoint.

Depending on HOW multithreaded rendering will be added, it MIGHT be beneficial to first add DSA (as core). Maybe it would be good, to explicitly disallow any glBind* (ie. selectors) calls in a thread. But i can only speculate here, the ARB will certainly find a good solution.


@Alfonse: Take a look at the first few sentences of your posts. They usually start with something like "No it's not.", "No, that's wrong.", etc. You are regularly writing posts where you "state facts". Now, you know your stuff, that's true, but many things are still up to debate, especially things to come. Often you ARE right, but not always. But even in those cases people consider the way you write your posts kinda rude. Nobody likes people who claim to know it all.

Jan.

Groovounet
04-20-2010, 03:31 PM
At OpenGL code level you are right Alfonse but at software design level, DSA are just like separate shader programs, a breath!

When I was working on a 10 years old OpenGL software (I let you imagine the mess!) how many time I had been confronted to a black box that perturbs my OpenGL states (and contexts actually sometimes...) I was simply: "How the heck I am going to make this work?!" (and reaching a gun hidden in my desk just in case.)

Honestly, I did some really not pretty things, everything done in a very inefficient processing and development ways. What do you think of using the 3Dsmax SDK and stealing the OpenGL context with glGetCurrentContext to render my entire scene? What do you thing of 2 OpenGL threads using 2 different contexts working in parallel on the name rendering code path but with different quality setting all this working on 4 viewports with 4 different surfaces? Well that was the context I had to deal with. Awful stupid and weird but that the kind of stuff you find out there and there is a point where releasing is necessary.
DSA fixes the problem of tracking states and objects. It doesn't even make this question relevant anymore. A black box? Who cares with DSA? (Well the black box better us DSA too!)

An other point you seems to miss Alfonce: Even if these image load and store give me a ****whahouuu**** effect, concretely tomorrow beside trying it out, I don't think I am going to use them for a while. I can't way for paper and documentation about there use but well, it's a big deal! DSA and separate shader programs are those features that as soon as they get release, I can use them and ... enjoy them! In one day or two, a complete complex software can be entirely updated. To really take advantage of them? Maybe not instantly but soon it an alternative way of thinking and to design an OpenGL software.

So YES it would!

Alfonse Reinheart
04-20-2010, 03:52 PM
First, it makes the API more streamlined and easier to use correct in bigger projects.

More streamlined, perhaps. But easier to use in large projects? Not really. DSA makes it easier in more decentralized projects, where it is difficult to enforce coding standards rigorously. But large projects are not necessarily decentralized.


Second, it should be generally better for multithreaded use. Both from a performance and a correctness standpoint.

My question is simply one of why you want it. Do you want DSA for the sake of API cleanliness and whatever? Or do you really just want multithreading, and see DSA as one of the necessary components of that?

This isn't an academic distinction; this is important for the ARB to know. Because if you're just asking for DSA, they may ignore the multithreading stuff or postpone it to later. Whereas if you're asking for multithreading, then the ARB knows that they need DSA, but also some other things (possibly encapsulating the last few bits of context state in objects, etc).

Groovounet
04-20-2010, 04:28 PM
First, it makes the API more streamlined and easier to use correct in bigger projects.

More streamlined, perhaps. But easier to use in large projects? Not really. DSA makes it easier in more decentralized projects, where it is difficult to enforce coding standards rigorously. But large projects are not necessarily decentralized.


I can't disagree more. The more the project get bigger, the more the 'cases' count increase and the state combination become hard to track. With DSA you don't have to track the entire object edit parts.

What do you call a decentralized project?




Second, it should be generally better for multithreaded use. Both from a performance and a correctness standpoint.

My question is simply one of why you want it. Do you want DSA for the sake of API cleanliness and whatever? Or do you really just want multithreading, and see DSA as one of the necessary components of that?

This isn't an academic distinction; this is important for the ARB to know. Because if you're just asking for DSA, they may ignore the multithreading stuff or postpone it to later. Whereas if you're asking for multithreading, then the ARB knows that they need DSA, but also some other things (possibly encapsulating the last few bits of context state in objects, etc).


The ARB likes to do thing step my step and if DSA is necessary for multithreading, then chances are that OpenGL 4.1 will be with DSA and the rest later on. This would be especially true if they even need to encapsulate last states in objects (There are actually a lot left!) If we just have to way 6 more months... well, I won't die.

By the way, I'm so into some kind of immutable state objects to replace display lists use for this purpose... I'm not sure a still believe it might happen

Stephen A
04-20-2010, 04:36 PM
I can't disagree more. The more the project get bigger, the more the 'cases' count increase and the state combination become hard to track. With DSA you don't have to track the entire object edit parts.

Absolutely correct. The larger the project the higher the chance of non-deterministic interactions between its various components in a bind-to-edit API. This is especially true when integrating 3rd party components (a necessity in large projects), which enforce their own standards in state handling. You soon come to a point where you have to sacrifice performance to either save/restore state (glGet*) or aggressively reset state after every operation.

DSA resolves this issue cleanly and thoroughly. This is not an academic consideration, this is something that affects the daily work of many (if not most) OpenGL developers.

kRogue
04-30-2010, 03:34 AM
Um, this DSA + MultThreading point is bogus. A given GL context can only be current in one _thread_ at a time. If you have multiple threads and you wish to change the state of any GL objects, then you create additional GL contexts in the same share group, as such the GL state issues that DSA deals with are NOT an issue as each thread has it's own context and each context has it's own latch/bind-state, what DSA does nicely for you is that can make layered stuff more easily.

Groovounet
04-30-2010, 08:22 AM
I guess it's not 'bogus' in the context of multithreaded OpenGL drivers. For example, if a thread want to update a texture parameter, it doesn't have to check is the current binded texture OpenGL is use by another thread. It's less thread sync.

One thread could manage the draw calls and severals, the various objects updated for various heavy tasks like on the fly DXTC compression.

Well, like I say... I guess!

skynet
05-02-2010, 03:52 AM
Groovounet, the point is, you _cannot_ do that glTexParameter call in a second thread, unless you create an extra context for that thread. OpenGL allows only one thread to talk to a context at any time.

kRogue is right, DSA doesn't help to make OpenGL more 'threadsafe'. It helps were several pieces of code that don't necessarily know each other, need to work together without polluting each other's assumed state.

Jan
05-02-2010, 04:28 AM
DSA itself won't make OpenGL threadable, but it is very much essential as the foundation.

Take a look at the D3D11 spec. For multithreaded use you create EXTRA contexts per thread there, but those are no real contexts, they are only for command-queuing. Their command queues are then executed on the main context.

This is a very clever idea and i'd like to see the same thing in OpenGL. However, without DSA it would be difficult to put meaningful commands in those queues, because you simply don't know the state that OpenGL is in, when the queue is executed, so you would need to set each and every state to proper values, that might affect your operations.

In non-multithreaded use you usually have a good idea about the current states, or you have your own abstraction that caches those states, so you can avoid redundant calls, so DSA is not THAT important here, it is simply more convenient.

Jan.

Groovounet
05-03-2010, 01:21 PM
I implied that each thread has an OpenGL context as... it's how it works with OpenGL.

However my guess doesn't even require multithread software, I was speaking about multithread drivers, meaning, drivers that use multiple threads even if the software is single-threaded and for that purpose DSA should help.

And for multithreaded software, I agree with both of you.

This is as if we were writing the OpenGL drivers and where we need to add mutex and how probable the mutex is already locked. With bind and edit, it's high probable because the mutex are on binding point. With DSA the mutex are on object so as far as multiple thread doesn't work on the same object... it should be ok.

Alfonse Reinheart
05-03-2010, 07:28 PM
I would hope in such cases that they aren't using locked mutexes of any kind. Well, I would hope that nothing in the OpenGL spec forces them to have to use them except for the most important synchronization points.

Lockless multithreading is a skill that any multithreaded programmer should learn.

Gedolo
05-29-2010, 09:31 AM
As a programmer I want DSA because off the advantages in functionality more (the grounds work in multithreading it asks) than the API-convenience.

I want first a good foundation that allows the upsides of DSA to shine. I don't care when! Even it's not in the following version (4.1?) of OpenGL. That's okay. I want to to ENABLE what it is ABLE TO do. Being able to multithread without locks, true multi-threading (performance-enhancement) is much more worth than the typing convenience.

Please don't rush it.
We really need those foundations to be solid, future-proof.

Please do this the right way, thanks for reading.

Gedolo
05-29-2010, 09:41 AM
About the binary shaders.

If that is done, make it so that it always has the shader code in there in it's current form.

Because for fallback-solution.
And when all combinations are a little too big to be done binary. And it has probably other advantages and solves problems that aren't mentioned here.

This makes it very robust and problem-free compared to a only-binary shader approach.

V-man
06-05-2010, 06:35 PM
However my guess doesn't even require multithread software, I was speaking about multithread drivers, meaning, drivers that use multiple threads even if the software is single-threaded and for that purpose DSA should help.

And what do these so-called multithreaded drivers do exactly?
Are they working in the background trying to emulate a feature that is not directly supported by the GPU?
I hope not because I thought GL 3.0 was about moving closer to the hardware and making the drivers simpler and thus perform better.

glTexParameter? That should be dead. What happened to sampler objects?

Groovounet
06-06-2010, 11:29 AM
glTexParameter* is still useful for a couple of things: texture swizzle, lod base / bias, base and max level.

Multithreaded drivers are just taking advantage of multicore CPU like any software. Even when we will have a proper multithreading API, the drivers would remain multithreaded.

Groovounet
06-07-2010, 06:44 AM
Just to fix a mistake: lod base / bias is sampler state, so texture swizzle and base and max level are the only glTexParameter* left.

base and max level are definitely images based parameter but I can imagine texture swizzle becoming sampler parameters as well.

V-man
06-08-2010, 08:50 AM
Ok, so there is now glIsSampler, glGetSamplerParameter, glGenSamplers, glBindSampler, glDeleteSamplers,
AND THE VERY important glSamplerParameter.
GL_TEXTURE_LOD_BIAS, GL_TEXTURE_MIN_LOD and GL_TEXTURE_MAX_LOD are part of the sampler.

The swizzle stuff is not part of sampler for some reason but if you ask me, it should be.

skynet
06-08-2010, 10:02 AM
To me it seems, swizzling belongs to texture image, because it is meant to make the shader agnostic of the actual layout of the texture (image data).
So, for instance, you can present the shader a GL_RED encoded grey-texture as RGB. Without swizzling, you'd have to specifically write a shader that 'knows' that it needs to duplicate the R channel into G and B when it accesses that specific grey texture.

Turning semantics around: putting swizzling in the shader (i.e. sampler) would always swizzle in a shader-specific way, independent of the real layout of the bound texture. This is not needed, you can do that already:

my_swizzled_color = texture2D(texture, coords).rrrg;

Groovounet
06-08-2010, 01:54 PM
Yes sorry the lod stuff is sampler stuff.

I was actually really surprise that the swizzle stuff remains a glTexParameter (and post something about that).

skynet, I must I find your view on the shader independence very convincing. One thing remain, what if we want to present a single texture with two different ways? (bound twice)

Chances are that the texture cache will do its job if we fetch as the same coordinates so the 2 sampling would be that more expensive. Also, why not different texcoords for both textures?
Humm for multi resolution noise...? (I'm not saying I really believe in this scenario!)

Alfonse Reinheart
06-08-2010, 02:04 PM
One thing remain, what if we want to present a single texture with two different ways? (bound twice)

What would make you want to do this? Specifically, in such a way that it happens behind the shader's back.

Gedolo
06-16-2010, 06:48 AM
Here is something I don't want to see going wrong.

Recently on Geeks3d there was an article about some new Nvidia extensions. They exposed memory usage and size.

Now an arb version might in the future be comming in OpenGL.

And, I can't say this enough, use the correct PREFIX with the correct numbers.
What I mean is:
kB = 1000 B NOT 1024B
For the powers of two we have KiB, MiB and others with an 'i' in the middle. Please use the correct ones on the official extensions. Make it possible to choose the powers of two or the decimal way. But return the RIGHT COMBINATION.
Everybody is already beginning to switch: Apple with Snow Leopard, Ubuntu is talking about doing it in the 10.10 release.
Only Windows is kinda slow, nothing surprising.

Thing is, the right combinations could save a lot of headaches.
Especially the prefixes with the i's in them.

http://en.wikipedia.org/wiki/Binary_prefix

Dark Photon
06-16-2010, 07:03 AM
And, I can't say this enough, use the correct PREFIX with the correct numbers.
What I mean is:
kB = 1000 B NOT 1024B
For the powers of two we have KiB, MiB
This is crap, propagated by hard drive and other digital storage manufacturers.

kilobyte = kb, KB, kB was and is 1024 bytes (2^10).
megabyte = mb, MB was and is (2^20) bytes.
gigabyte = gb, GB was and is (2^30) bytes.

These conventions started because computers are based on binary transitors and thus powers of two. They were hijacked and mutilated (into 10^3, 10^6, 10^9, etc.) for hard disk storage capacities by hard drive manufacturers who found they could be used to imply that their disk drives contained more space than they actually did (the bigger the disk, the bigger the lie; the lie being absolutely huge nowadays). And a byte would be 10 bits (10^1) in their world if that helped their scheme (which it didn't, so they didn't)... I still remember the controversy on this back when PC hard disks were around 340MB.

http://en.wikipedia.org/wiki/Binary_prefix#Legal_disputes


and others with an 'i' in the middle.
I refuse to be pushed off to this mebibyte crap.

There are true megabytes (2^20), and there are fake "hard disk" megabytes used by lying cheats (10^6). Question how a byte is defined with such characters...

Jan
06-16-2010, 07:46 AM
1) I agree with Dark Photon entirely.
2) Haven't seen such a pointless and idiotic "THIS IS THE MOST IMPORTANT THING EVER!"-suggestion for OpenGL in a loong time.

Jan.

Groovounet
06-16-2010, 07:57 AM
OO

I follow Dark Photon and Jan! No need to bring more (edit: SUPER) crap in OpenGL!

Alfonse Reinheart
06-16-2010, 11:51 AM
And, I can't say this enough, use the correct PREFIX with the correct numbers.

Any such extension really has no right to be returning any units. It should be returning integers of a known unit, not strings with units attached.

Furthermore, it's a suffix, not a prefix. Units go at the end, not the beginning.


This is crap, propagated by hard drive and other digital storage manufacturers.

No, it is "crap" propagated by people who know what the metric system is.

Long before even the difference engine existed, there was the metric system. It defined what the prefixes "kilo", "mega", "giga", etc meant.

Programmers do not get to redefine the metric system just because it is convenient for them. Meters, liters, grams, mols, lightyears, etc, all conform to the metric system: when you put "kilo" in front of them, it means 1000 of them. Bytes don't get an exemption just because it is convenient. Computer "kilobytes" are not 1000, therefore they are not kilobytes, and no amount of programmer inertia, inflexibility and whining will change this fact.

Personally, I don't like KiB, MiB, and such; the names seem silly and difficult to pronounce. But I do know and understand why they exist. And my personal discomfort with new terminology does not in any way weaken the argument for their existence.

Simon Arbon
06-16-2010, 10:27 PM
Programmers do not get to redefine the metric system just because it is convenient for them.Bits & Bytes were never part of the metric system.
IBM and others were using k=1024 when talking about Bytes of core memory in the early 60's, then the memory chip manufacturers standardised on this useage, making it accepted practice for DECADES.
KiB & MiB were not even proposed until 1995, and were not adopted by the IEC and NIST until 1999.
Some standards authorities like JEDEC still dont accept the change, and i have never seen a magazine article, reference manual or advertisement that uses KiB/MiB for computer memory.
Even the hard disk manufacturers still use M=10^20 when refering to their cache memory.


kilobyte = kb, KB, kB was and is 1024 bytes (2^10).
megabyte = mb, MB was and is (2^20) bytes.Thats simply not true.
Capital B has always meant Byte, lowercase b has always meant bit, and lowercase m has always meant 1/1000th.
Using mb for MegaByte started when magazine journalists got new word processors with built-in spell checkers that helpfully detected unusual capitalisation and automatically changed it to lower case when they hit the space bar.
Some of them didn't notice that whenever they typed MB it magically changed to mb, and once a few magazines had published this same mistake people started to assume it must be correct.

Jan
06-17-2010, 04:17 AM
That's all nice and stuff, but whenever a piece of code tells you a size of memory, that simply has to be done in Bytes. Period.

Everything else is for the convenience of the person sitting at the PC.

Jan.

Dark Photon
06-17-2010, 05:10 AM
kilobyte = kb, KB, kB was and is 1024 bytes (2^10).
megabyte = mb, MB was and is (2^20) bytes.Thats simply not true.
Capital B has always meant Byte, lowercase b has always meant bit
Yeah, thought about that after the fact. True in some context b and B are distinguished as bit and byte (particularly memory sticks/chips), though that's not universal as you point out so you have to use context to determine what 'b' is. While I personally always use capital for byte in written text, have seen others (not using desktop pub S/W but raw text ed) use mb for MB.

kRogue
06-17-2010, 06:20 AM
I can't believe this thread has degenerated, so I will add to it:

http://xkcd.com/394/

enjoy.

Simon Arbon
06-17-2010, 07:15 PM
Ahhh - So those RAM chips with an extra parity bit per byte should be in KBa :D
Then there are all those old computers with 5, 6, 7, 8, 9, or 16 bit bytes, so shouldn't we be using KiO (Kibi-Octet) to avoid all confusion :sorrow:
If the IEC really want people to change to a less confusing notation then they really need to come up with something thats easier to say and doesn't sound so harsh.

Gedolo
06-22-2010, 08:43 AM
Shall I make a seperate thread for this junk?

Totally agree that it should only returns whole numbers without prefixes.

And Alfonse Reinheart, I said prefix is because it's in front of the Unit.

I'm NOT speaking about UNITS at all!

It is about MISUSE of K, M, G and so on.

For the people who don't know yet:
It's not that computers are binary that you can't have 10bits or 100bytes. It is not more complicated for the logical gates of the computer to divide by 1000 than it is to divide by 1024.

It's not that Windows reports it this way, that you have to follow that. windows != computer

Stop messing the number of combinations that you can have with a number of bit up with the number of bits. It's really dumb and annoying.

Most people think it's 1000^x and wrongfully assume so anyway.

It's the programmers that are misusing these things and it has to stop. Nobody likes this K almost everywhere =1000 but oh if it's with byte or bit, we want our special circle meaning to be special or something. I hate %!#!=dumb\ this crap!!


The article is:
http://www.geeks3d.com/20100531/programm...sage-in-opengl/ (http://www.geeks3d.com/20100531/programming-tips-how-to-know-the-graphics-memory-size-and-usage-in-opengl/)

Then why didn't you came up with something better/different?


If the IEC really want people to change to a less confusing notation then they really need to come up with something thats easier to say and doesn't sound so harsh.

Does anybody want to give a reasonable example?

Oh wait, k=1024-people are not able to come up with a better alternative. Or not? But why wouldn't you have said it then, when it mattered?

You didn't care when that organization was looking for good names. Now when they have made a new system you're all bashing it. I find them sounding rather strange to pronounce but that won't stop me from using them.

@Dark Photon

Please learn about computers some more before blabering about such utter nonsense.

Actually hard drive makers put on the box that they mean G = 1000^3.

If you compare the raw full numbers (not the abbreviated ones) in windows then you'll see it'll match up.

I see that you don't actually understand that although computers work binary, the amount of bytes does NOT HAVE to be a multiple of two. Those conventions started after sizes became larger than thousands. Long after computers where invented. The first computers just showed numbers without prefixes. Programmers where a bit lazy, also had to do memory mappings from RAM to HDD. They used 1024 to be able to calculate more with whole numbers which was easier at the time and the difference wasn't significant. Now with TB the difference is getting really big in comparison with the amount. (Around 10 percent)


@Everybody:
Mac is using them in 10.6 for hard drive space, Linux kernel is using them from 2001 and now distributions and userspace programs are going to be adopting this system. https://wiki.ubuntu.com/UnitsPolicy

JEDEC is in the newest revisions adding notes about recommending the new system.

Jan
06-22-2010, 03:46 PM
TLDR, but oh boy...

"It is not more complicated for the logical gates of the computer to divide by 1000 than it is to divide by 1024. "

Yeah, well, compare the cycles for a DIV and a SHIFT, you'll see.

Gedolo
06-27-2010, 03:17 AM
@Jan

OMG!! x cycles more?!?
Now the performance of my Nvidia Tesla supercomputing cluster is totally ruined! lol


Now serious, if you want to have the best performance.

Remove the graphic interface from your operating system.
Have you any idea how much cpu cycles that uses. Also stop using a graphical browser and media player. We can read books, right?
It'll save a lot of cpu cycles.

For these few calculations, it's no big deal.
Or is this the kind of stuff that knocks your cpu off it's socks? If so, it's best to consider a technical problem (HW or software related). Or upgrading.

What about division by a power of two as a division.
Probably happens in not well-optimized Operating Systems.
Not going to call names just yet (hint: Microsoft).

Frankly, in a good cpu, a division and a shift instruction should both only cost one cycle.

Ilian Dinev
06-27-2010, 09:05 AM
Wow, someone needs to go study some microelectronics and assembly, urgently.
A victim of the Java/script generation, I guess?

ZbuffeR
06-27-2010, 01:09 PM
How dare you mix Java and Javascript in the same sentence !
:)
(I mean as this thread is already way out topic, why not continue a bit more ...)

Jan
06-27-2010, 01:23 PM
@Gedolo: I wasn't saying that it MATTERS (nowadays). I was simply telling you, that you are wrong.

Now please continue with your blabbering (i'm on vacation for the next two weeks, i don't care).

Jan.

Gedolo
07-01-2010, 07:32 AM
Big blunder of me in page 7.
It was late. Was very tired.



Now something serious again.
This article: http://www.geeks3d.com/20100629/test-opengl-geometry-instancing-geforce-gtx-480-vs-radeon-hd-5870/

Talks about geometry instancing.
The demo discussed in the article has a few modes.
At one mode there is a limitation that limits the number of objects/call that can be done.
It would be handy for that person if there is something to get the maximum amount on the current GPU. This way the code can be written more robust, scalable, compatible, better performing.

Gedolo
07-01-2010, 07:32 AM
Quote:

F5 key: geometry instancing: itís the real hardware instancing (HW GI). There is one source for geometry (a mesh) and rendering is done by batchs of 64 instances per draw-call. Actually on NVIDIA hardware, 400 instances can be rendered with one draw call but that does not work on ATI due to the limitation of the number of vertex uniforms. 64 instances per batch work fine on both ATI and NVIDIA. The tranformation matrix is computed on the GPU and per-batch data is passed via uniform arrays: there is an uniforn array of vec4 for positions and another vec4 array for rotations. OpenGL rendering uses the glDrawElementsInstancedARB() function. The GL_ARB_draw_instanced extension is required. The HW GI allows to drastically reduce the number of draw calls: for the 20,000-asteroid belt, we have 20000/64 = 313 draw calls instead of 20,000.

Gedolo
07-05-2010, 11:15 AM
Maybe it's best when having functions that return memory size.
Only returning whole numbers in bytes and bits.
Or would it be better to only do bits? That way it's exactly the right size and maximum accuracy down to the bit.