PDA

View Full Version : id's "Rage" engine, Tech5 is OpenGL based



dorbie
09-21-2007, 12:34 PM
Contrary to earlier reports the current Tech5 code is OpenGL based.

http://www.beyond3d.com/content/news/487

Carmack is open to a port to D3D based on secondary factors just as he is open to a port to OpenGL 3.

Just thought I'd share since false information has been widely circulated on this issue including on these forums.

Jan
09-21-2007, 01:59 PM
Doesn't surprise me a bit. If OpenGL 3.0 wasn't on the horizon, i would switch to D3D, too. So he's basically doing the same thing most of us do: Wait and hope for a great new API. BTW, does anyone know, whether the ARB will be able to keep their promise, about having "something" ready until the end of September?

Jan.

Korval
09-21-2007, 02:12 PM
BTW, does anyone know, whether the ARB will be able to keep their promise, about having "something" ready until the end of September?They didn't say something; they said the spec.

And no, I think they've been pretty quiet on the subject. But, then again, there's a newsletter due out in 9 days that will shed light on the subject, one way or another.

Note to ARB: we will (happily) accept the GL 3.0 specification in lieu of a vol 5 newsletter.

Zengar
09-21-2007, 03:02 PM
Will there be a party? :)

Korval
09-21-2007, 03:45 PM
According to the BOF presentation at SIGGRAPH, the ARB should have finalized all open issues by the end of August. Which means at this point, it's all about wording and actually writing the spec.

dorbie
09-21-2007, 04:23 PM
Jan there's some irony to your post in the context of a situation where people were previously saying he's already committed to D3D. He's not saying he'll switch to D3D, he's saying he's keeping his options open, and secondary factors will influence the final determination.

I've seen Carmack say elsewhere that DX10 doesn't change much from his perspective and Tech 5 will be targeting DX 9 class hardware.

On the spec front I really don't see D3D 10 pressuring a rush to OpenGL 3, especially after recent developments but the sooner the better.

Roderic (Ingenu)
09-22-2007, 01:22 AM
http://slashdot.org/comments.pl?sid=302231&cid=20671657



The PC version is still OpenGL, but it is possible that could change before release. The actual API code is not very large, and the vertex / fragment code can be easily translated between cg/hlsl/glsl as necessary. I am going to at least consider OpenGL 3.0 as a target, if Nvidia, ATI, and Intel all have decent support. There really won't be any performance difference between GL 2.0 / GL 3.0 / D3D, so the api decision will be based on secondary factors, of which inertia is one.

Jan
09-22-2007, 03:30 AM
dorbie: True. I only interpret it this way: "D3D9 is very good, it works well on the PC and the 360. OpenGL 2.x is a mess, but it is currently still good enough. Until release that could change (especially since ATI and nVidia might not work on their OpenGL drivers much, if even an id-Engine might not use it anymore). However, if OpenGL 3.0 is as good as expected AND THERE ARE GOOD ENOUGH DRIVERS, id Tech 5 will most certainly use OpenGL 3.0 (or one might be able to select the render API on the PC)."

I do think, that OpenGL 3.0 needs to be out soon, because it will take some time to get stable and fast drivers. Currently D3D10 is no threat, so we still got some time, but the situation will change. (In this business usually sooner than later.)

Jan.

dorbie
09-22-2007, 03:41 AM
Jan, in that case I don't think "interpret" means what you think it means. :-)

V-man
09-22-2007, 01:21 PM
I can't believe he listed Intel.

Jan, what are you talking about? He is aiming for multiplatform support so ID has no choice to support 3 APIs.


D3D on the xbox platform, LibGCM on the PS3 and OpenGL on the MacAdd to that Linux : GL
On Windows, they will likely just offer GL

Jan
09-22-2007, 03:25 PM
I was only talking about the windows-version. Nothing else. On windows it can be OpenGL 2.x, 3.0 or D3D9 (or even two of em). And the question is, which one it will be, because if it ain't OpenGL, it is bad news for us.

Jan.

zeoverlord
09-22-2007, 04:27 PM
Originally posted by Jan:
And the question is, which one it will be, because if it ain't OpenGL, it is bad news for us.
Jan. It will more than likely be openGL for windows, the reason being that the D3D for xbox360 and windows is not exactly 100% compatible (at least not in the way you use it) and will have to be worked on to make it work well.
While the mac version is almost identical to the linux and windows variants.

Also seeing as it is already running with GL on the windows platform it's a fair bet that it will stay that way, unless openGL 3 comes out in time and proves to have a better texture management system from his viewpoint, in which case he might consider switching to that.
I do see him trying it out just to see if it can be done.

knackered
09-23-2007, 05:20 PM
Unless we get faster GL implementations on Vista, I think OpenGL is living on borrowed time on Windows - but considering the tight integration between d3d and aero, I don't think OpenGL can ever regain the edge it once had. We're already getting customers wanting us to deliver our software on Vista setups, and I can imagine Carmack is thinking more than a year ahead, unlike us. GL on Vista is not running at acceptable speeds.

Korval
09-23-2007, 07:22 PM
GL on Vista is not running at acceptable speeds.Is it?

The most I've heard is that GL running windowed in Vista isn't running at acceptable speeds, and even that's somewhat debatable depending on the specific driver. GL running fullscreen in Vista seems to be running reasonably well (considering that Vista itself can impose a small performance penalty).

Now, I do recall a post or something from nVidia about how their drivers under the XP driver model used to perform a "flush" operation frequently that was fast under that driver model but slow under the Vista model. That would be a problem, but one I would expect to be corrected as IHVs produce more mature Vista drivers.

In any case, I suspect that the nature of GL 3.0's context framebuffer (which is the primary interface point between D3D and Aero) will allow it to behave similarly to a regular D3D renderer. As for other driver-dependent behavior, I suspect that the nature of GL 3.0's API, in terms of how you render with it and so forth, would allow drivers to be more streamlined in such things.

knackered
09-24-2007, 01:07 PM
Well then, I must be imagining things because I haven't read what you've read, I've only practical experience of the problem.

dorbie
09-24-2007, 08:06 PM
Just as well that Vista is tanking then, if it's too slow you can always downgrade with Microsoft's new downgrade path (but of course there's NOOOOOTHING wrong with Vista, no sirree).

As for performance ... even D3D takes a hit on Vista, and drivers can do a lot under the covers, it boils down to priorities and breadth of support. I still think OpenGL gets enough attention and the multiple API support required is actually less divergent than it was. OpenGL 3 will help further.

Overmind
09-25-2007, 01:57 AM
Well then, I must be imagining things because I haven't read what you've read, I've only practical experience of the problem.Just curious: Have you compared GL/XP to GL/Vista or have you compared D3D/Vista to GL/Vista? If the latter, have you done a similar comparison D3D/XP to GL/XP, to eliminate user code differences?

knackered
09-25-2007, 02:50 AM
The former.

tranders
09-26-2007, 06:33 PM
D3D apps have no advantage over OpenGL apps when running in a window under Aero. Just because Aero uses D3D[9Ex] does not mean that D3D apps will run better or faster.

As far as the driver model goes on Vista, there's not a lot that can be done to help either API out from an application's point of view. The driver is given an off-screen texture to render into and the desktop (DWM) manages when that buffer gets presented to the user. That's how blurred title bars, Flip3D, etc. are handled. Frame rates are limited by the application's frame rate plus the DWM's frame rate.

V-man
09-27-2007, 02:59 AM
This has SPECviewperf benchmark on Quadro FX 5500
of course, drivers have changed since back then
http://www.opengl.org/pipeline/article/vol003_9/


http://www.tomshardware.com/2007/01/29/xp-vs-vista/
look at page 6, 7.
SPECviewperf benchmark on ATI X1900XTX, cat 7.1
This bench is very old but it's as though there is no acceleration at all.

I would love too see these benches repeated with new drivers.

Relic
09-27-2007, 03:18 AM
Yellow press at its best.
Catalyst drivers before 7.2 didn't contain an OpenGL ICD implementation under Vista.
This article was comparing HW OpenGL under XP with MS SW OpenGL under Vista. Pure FUD.

tranders
09-27-2007, 05:21 AM
I've been running with acceleration on NVIDIA HW since last year. ATI was very late releasing drivers (at least for FireGL HW (Q207)). There is a hit in performance (5%-15%) for windowed apps but its workable. The biggest problem however is with interop with the Aero DWM -- just about any 2D write operation to the window will corrupt the DWM composite buffer.

Relic
09-27-2007, 08:09 AM
The biggest problem however is with interop with the Aero DWM -- just about any 2D write operation to the window will corrupt the DWM composite buffer.That's a Microsoft Vista "feature" and documented here:
http://www.opengl.org/pipeline/article/vol003_7/

GDI surfaces are separate from 3D surfaces and you can't mix them anymore under Vista.
Aero only stays on for pixelformats with the new PFD_SUPPORT_COMPOSITION flag set, and that flag and PFD_SUPPORT_GDI are mutually exclusive.

tranders
09-27-2007, 11:30 AM
This was documented by Microsoft way before that pipeline article and I've known about the problem since the earliest beta releases. Just because it's documented doesn't make it any more palatable and doesn't eliminate the problem. Even if your application has an absolutely pure 3D pipeline and can typically interoperate with the DWM, an external application (e.g., Spy++ Finder Tool) can corrupt your composited image. Constant frame rate applications will see a glitch, static frame applications will need to manually refresh the window (or figure out a way to circumvent the corruption).

dorbie
09-27-2007, 10:01 PM
What will happen to DirectX 10 performance on Vista when this happens (C|NET):

http://www.news.com/8301-10784_3-9785337-7.html?tag=nefd.only



While Vista was originally touted by Microsoft as the operating system savior we've all been waiting for, it has turned out to be one of the biggest blunders in technology


The time is up. Microsoft must abandon Vista and move on. It's the company's only chance at redemption.
Ouch!

When I said Vista was the new Windows Me I was wrong, surprisingly it's shaping up to be worse.

IMHO it makes a lot of the issues surrounding 3D on Vista irrelevant. The bigger issue is in fact that you can't get D3D 10 on XP. My question is how likely is a reversal on D3D 10 availability on XP? Is OpenGL going to be the only game in town for the Windows mass market for next gen 3D features?

Korval
09-27-2007, 10:25 PM
Is OpenGL going to be the only game in town for the Windows mass market for next gen 3D features?No. Because despite whatever nonsense articles (saying that MacOSX is "hot on its tail" is laughable, as is decrying the OS for assisting DRM when the alternative was not being able to play DRM'd movies at all) get written, Vista is in fact the future of Windows. Whether it takes one or two Service Packs, whether it takes a year or two for drivers to mature into stability (not Microsoft's fault), people will eventually switch.

There was another kernel and Windows OS when ME came out (not that ME forced XP; the NT kernel was always going to be Window's future); Microsoft has no alternative but to live on the Vista codebase and expand upon it.

I suspect that game developers will either accept cross developing for D3D9 and D3D10 or just switch to OpenGL 3.0. The deciding factor will likely be how fast and on the ball IHVs are with GL 3.0 implementations. If the spec ships soon, and nVidia, ATi, and Intel can get good drivers out within 2 months of that (not just beta crap, and certainly nothing that remotely resembles anything ATi has farted out as far as GL support goes), I suspect developers will be inclined to switch to GL 3.0.

Though I suspect the ARB will need to quickly resolve one of OpenGL's remaining annoying problems (http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=3;t=015415) . That is, precompiling shaders. It's one of the last real warts left in the language, and is a source of frustration for developers who need lots of shaders.

dorbie
09-27-2007, 11:05 PM
It seems there's a burgeoning industry out there that disagrees with you and Microsoft on that score.

I hope you're right about the move to OpenGL 3. D3D 9 with fragment based alternatives will do just fine for all the displacement stuff that gets promoted as D3D 10's highest visibility trick. The only surprise is that stuff actually ships (or screenshots are published) without a fragment based displacement path in D3D 9, it's downright perplexing. With the possible exception of the cascades demo I have not seen any D3D 10 software that really justifies the API, and then of course it's not clear that cascades wouldn't be better off in software and fragment shaders with broader support from a developer's standpoint (and a market i.e. XP).

Agreement on precompiled shaders is non trivial, compilers and underlying implementations are currently free to differ profoundly. It's a great idea until you have multiple companies years into their respective investment in their shader optimizations, intermediate representations and instruction sets.

ZbuffeR
09-28-2007, 07:11 AM
On the precompiled shaders topic : compiling shaders as implementation-dependant binary blobs at first launch and then reusing them for subsequent runs does not seem unsolvable to me.
If driver or hardware changes, it would recompile the blob.
It would still allow profound implementation differences, and would help for 99% of the time.

Korval
09-28-2007, 10:33 AM
ZbuffeR, that's actually a pretty good idea: have the binary blob actually store the program text itself. That way, it just doesn't matter. So long as each IHV gets a unique identifier for their own binary portion (a GUID should suffice), and the textual format is rigidly specified, I don't see the problem.

knackered
09-28-2007, 10:47 AM
For me, this could be transparently implemented by the driver using the source code as a huge primary key in a database that links to a previously built binary blob. Even a dbase query like that would be infinitely faster than this compile/link thing (like a ms, if that).

Korval
09-28-2007, 10:54 AM
For me, this could be transparently implemented by the driver using the source code as a huge primary key in a database that links to a previously built binary blob.Um, no. I do not believe that drivers should randomly start storing databases on my harddisk.

Particularly since I develop shaders, and will therefore have thousands of entries (many of which are non-functional).

ZbuffeR
09-28-2007, 12:13 PM
This mechanism would be best implemented explicitely, with API entry points to retrieve / upload the binary blob (and only if desired).

yooyo
09-28-2007, 02:34 PM
Precompiled shader would be difficult because new driver version could better optimize shader code.

Maybe something like encription... App provide key to driver and driver can decode encripted shader and compile it. Separate app can encode all shaders into binary files usin common encription procedures.

Or even better.. App can provide decoder callback to driver. When driver compile shader it actually callback app to decode binary shader in ASCII string and compile it.

It is very hard to make compromise between IP, compatibility and performances.

Zengar
09-28-2007, 03:13 PM
Actually, something like what knackered suggested (cashing compiled shaders) would be the simplest solution. Just let the driver do it. It could also clean the cache from time to time, deleting not so often used entries. The cache should be stored somewhere in the user home directory.

Korval
09-28-2007, 03:19 PM
Precompiled shader would be difficult because new driver version could better optimize shader code.Yes, and if that's the case, the driver can detect this by comparing its driver revision number to the one it stored in it's binary data. That's an implementation detail best left up to IHVs.

Really, putting the text in the actual binary blob was the basic sticking point, from my perspective. By doing that, you ensure that any implementation can always recompile the program, which makes binary blobs interoperable across implementations.

Oh, and another reason not to do it transparently: transparent behavior cannot be specified by the GL spec. It can't say, "Oh, btw, you should create some database somewhere so that, between evocations of the compiler, you can check to see if a program is setup exactly as a prior one and load the shader from there." It can only specify behavior that the user can detect the results of on screen. Performance optimizations have always been un-specifiable.

This way, implementers are forced to give us a back-door. The only potential problem is that it isn't guaranteed to be faster, since an implementation could store nothing but the program in the binary blob.

Looking at it from a GL 3.0 perspective, it is exactly like requesting a binary form of a program template object.

knackered
09-28-2007, 04:50 PM
Performance optimizations have always been un-specifiable? That's the whole point of OpenGL, it specifies correct behaviour, the implementation is free to optimise as it sees fit, including whatever method it chooses to speed up shader build times. I'm not suggesting the cache be part of the OpenGL specification, I'm just suggesting the IHV's do something like this to work around the otherwise elegant design of GLSL without cluttering up the spec with some hideous vendor-specific binary blob crap that, up until now, hasn't been required in any of the other OpenGL mechanisms.
Honestly, if Nvidia give me a method to retrieve the binary data they send to the card after linkage, and a method for uploading a previously generated blob, I could write this functionality into their driver within a couple of hours.

Korval
09-28-2007, 06:08 PM
I'm just suggesting the IHV's do something like this to work around the otherwise elegant design of GLSL without cluttering up the spec with some hideous vendor-specific binary blob crap that, up until now, hasn't been required in any of the other OpenGL mechanisms.Consider this.

Are you willing to trust the performance of your application to people who, thus far, seem incapable of even implementing a function C-like language compiler? Regardless of the fact that IHVs have every reason to want to release solid, stable drivers, they seem absolutely incapable of doing so. Even if you only look at the program compiling/linking part, they're all over the place.

Basically, if you're wrong in your trust, and you go ahead with a plan to implement a 2,000 shader application, then your app takes 10 minutes to load every time.

I don't feel comfortable relying on them to get the job done at this point. The only way to guarantee this (or as close as it gets) is to have something that can be written into an OpenGL spec.

It may not be the prettiest way of doing it, but if it's in the spec, then they have to implement it. Even if they implement it crappily, it's still better than what we have now.

knackered
09-28-2007, 07:02 PM
I don't get your point, I have to trust them about performance every day, about every single part of the spec they implement. There's nothing in the spec about performance, just behaviour. In any case, I'll still pre-warm my shaders at start-up, so the worst case scenario is still restricted to start-up speed.
As for reliability, this cache system won't add to their problems, it's simply a bolt-on to the input and output of their driver code.
If they implement the cache system, then the problem goes away without going down a potentially nasty API change route. NVidia implement it, ATI will then have no choice but to implement it too otherwise people will complain about ATI being slower than NVidia. Intel will follow, as usual.

Komat
09-29-2007, 02:30 AM
Originally posted by Korval:

Really, putting the text in the actual binary blob was the basic sticking point, from my perspective. By doing that, you ensure that any implementation can always recompile the program, which makes binary blobs interoperable across implementations.
I do not like this "embedded source" idea. The unblobing is presumably a fast operation while compilation from source code is very slow one.

With them being completely separate you can choose when to do each of them. For example silently load the blobs as part of the program startup knowing that the time will be reasonable or compile the programs at more appropriate place (e.g. before entering the level or during intro movie so the user will at least get the "In menu experience" quickly after start of the program) while displaying notification to the user that this might take some time.

With the "embedded source" any blob load might be potentially as slow as compiling the shader and you lost this control.

Unlobing and compilation also have different semantics. When I successfully unblob the blob I know that it matched the hw and I am done with it. When I compile shader I know that I need to store the resulting blob. With the driver recompiling the shader based on source from the blob, the unblobing api would need to report that event to the application so new blob can be stored for the shader. While this is not a technical problem it might indicate that completely separate compilation and unblobing is more cleaner approach.

Komat
09-29-2007, 02:35 AM
Originally posted by knackered:
I don't get your point, I have to trust them about performance every day, about every single part of the spec they implement.And I already got burned by expecting that the uniform variables are really variables which can be freely changed and not something that will cause shader reoptimalization.

Jan
09-29-2007, 04:58 AM
I agree with every word Komat said. That's exactly how i'd like the API to work / look like.

Jan.

V-man
09-29-2007, 09:56 AM
Originally posted by knackered:
I don't get your point, I have to trust them about performance every day, about every single part of the spec they implement. There's nothing in the spec about performance, just behaviour. In any case, I'll still pre-warm my shaders at start-up, so the worst case scenario is still restricted to start-up speed.
As for reliability, this cache system won't add to their problems, it's simply a bolt-on to the input and output of their driver code.
If they implement the cache system, then the problem goes away without going down a potentially nasty API change route. NVidia implement it, ATI will then have no choice but to implement it too otherwise people will complain about ATI being slower than NVidia. Intel will follow, as usual. Strangely enough, it was someone from ATI (Evan Hart I think) who posted this idea. "Scan the text your app sends to GL and pick the blob from the database".

That's nuts if you ask me.

It's better to do what D3D + its tools are doing.
IMO, it's better for the API to offer solutions and using it is the developer's responsibility.
If the developer wants to screw himself with slow compiler time, that's fine.
If the developer wants to screw himself with the issues that come with a precompiled thing, that's fine.

Korval
09-29-2007, 11:32 AM
I do not like this "embedded source" idea. The unblobing is presumably a fast operation while compilation from source code is very slow one. Yes, but no slower than what you would have asked for originally.

The problem is that you cannot force a driver to use a blob. The driver decides whether the blob will be functional or not. Presumably the program you're trying to load is important, so you're going to give them that string whether it comes from the blob or from your code.

At least if the string is in the blob, it's guaranteed to always work, so there's no need for multiple program compilation paths.


With the "embedded source" any blob load might be potentially as slow as compiling the shader and you lost this control.Without the embedding of the source, any blob can simply fail to load, and thus will force you to pass in the string and compile the normal way.

Remember, if IHVs want to get lazy, they still can in your way. They can make the "binary" blob simply the string, and do the compiling all the time. There is nothing you can do to stop IHV laziness; the best you can do is offer them reasonable avenues for optimization.

Tin Whisker
09-29-2007, 12:14 PM
I have a couple questions that are pestering me.

1) How does the bottom line factor into IHV decisions regarding OpenGL, as far as time and resources invested in ARB participation, API design, and any marketing?

2) How do OS specifics factor in to API design? I realize of course that the ideal is OS neutrality within the API itself, but given the practical concerns of real world implementations (DMA scheduling, bus bandwidth, etc.), are there any issues that might guide the API design in ways that are less than obvious. For example, an obvious one is the introduction of PBOs, which as I understand it is predicated on availability and efficiency of asynchronous DMA transfers.

To frame this in the current topic, I'm wondering if there might be more subtle side effects of OS internals involved in the overall API design going forward. In particular, how might Vista's new driver model influence design considerations, now and in the future?

Komat
09-29-2007, 04:57 PM
Originally posted by Korval:
Without the embedding of the source, any blob can simply fail to load, and thus will force you to pass in the string and compile the normal way.
This is exactly what I want. If it can not be loaded fast, then the load should fail. This is similar to GLSL shaders for which would be better to fail compilation than switch to sw emulation.

Yes, if I need that shader, I will need to compile it explicitly. The difference is that in that case I can decide, when to compile it (or temporarily replace it with lower quality shader until asynchronous compilation completes). For example of what I am thinking about see following pseudocode utilizing asynchronous compilation.



At least if the string is in the blob, it's guaranteed to always work, so there's no need for multiple program compilation paths.
You still need code which generates the blobs from source code. Because during development it is useful when the program can operate without the blobs so you can easily modify the shaders, chances are that you will already have that path anyway.



Remember, if IHVs want to get lazy, they still can in your way. They can make the "binary" blob simply the string, and do the compiling all the time.Yes they can however I hope that they will not go that way in this case. One thing is trying to optimize too much for some case and different thing is going directly against spirit of the blobs.

Korval
09-29-2007, 07:14 PM
You still need code which generates the blobs from source code. Because during development it is useful when the program can operate without the blobs so you can easily modify the shaders, chances are that you will already have that path anyway.Since the string part of the blob format will be well defined (so that one implementation can read the string from another implementation), it is quite reasonable for a developer to work exclusively with blobs. All they need to do is use an IHV GUID that nobody is using.

Komat
09-29-2007, 08:16 PM
Originally posted by Korval:
Since the string part of the blob format will be well defined (so that one implementation can read the string from another implementation), it is quite reasonable for a developer to work exclusively with blobs.Imho the only reasonable use of blobs is as optimized loading format.

For all other uses they are vastly inferior to even current glsl api because they do not have support for linking multiple shader fragments nor the compilation error reporting facilities and also have binary based structure.

The purpose of the api should be to provide services which you can not implement yourself. In this case it is a fast loading of GPU native format using opaque blobs. Regeneration of the blob when it becomes incompatible can be easily handled by the application using the "existing" shader api so it should be part of some helper library and not part of the api itself.

davej
09-30-2007, 06:50 AM
Originally posted by yooyo:
Maybe something like encription... App provide key to driver and driver can decode encripted shader and compile it. Separate app can encode all shaders into binary files usin common encription procedures.

Or even better.. App can provide decoder callback to driver. When driver compile shader it actually callback app to decode binary shader in ASCII string and compile it.Encrypting shaders in either of the ways you describe won't actually provide much protection to a determined hacker - a simple GL wrapper library will enable getting at the plain text shader in both cases. That's also ignoring the fact that, because your app would never run on it otherwise, Mesa will provide an open source implementation of the decoder - so a hacker could get at your shader with little more effort than setting a breakpoint.

If a developer doesn't want to distribute their shaders as plain text they can store an encrypted version on disc and decode it before passing it to the API. Anything beyond that will be more awkward to use than it will be for hackers to bypass.


As for storing compiled shaders, I think leaving caching blobs to the driver is a bad idea.

The fastest way for an application to load multiple blobs will be to load one file, which contains all the blobs, into memory in one go and call glProgramBLOB, or whatever, passing different blob pointers as it needs to from there. Having the driver mess about with multiple files, or even a single file acting as a database of varying numbers of blobs, will be slower.

Caching of blobs should be left up to applications because they know their usage patterns and will be best able to pick a scheme which suits them.

knackered
09-30-2007, 07:34 AM
I don't understand why there's a general dislike of the transparent driver caching idea. Nobody's actually given a reason why it's a bad idea.

That's nuts if you ask me.
As for storing compiled shaders, I think leaving caching blobs to the driver is a bad idea.You just keep saying that you'd like more control over managing vendor-specific blobs, for some weird reason. Next you'll be asking for control over how it searches for a matching pixel format, giving it it's own extension and a chapter in the red book.
The caching system would just work, all the time, no matter what, without anyone having to do any compiler switch nonsense (which is effectively what you're talking about).
Give reasons against it.

davej
09-30-2007, 09:46 AM
Originally posted by knackered:
I don't understand why there's a general dislike of the transparent driver caching idea. Nobody's actually given a reason why it's a bad idea.

As for storing compiled shaders, I think leaving caching blobs to the driver is a bad idea.You just keep saying that you'd like more control over managing vendor-specific blobs, for some weird reason.
<straw man snipped>
Give reasons against it. Nobody's said that the driver caching blobs is the worst idea ever, we just don't think it is the best.

For most invocations of most applications, shaders will have been compiled and their blobs cached somewhere from a previous run. This should therefore be the behaviour to optimise for.

If a driver stores blobs as separate files, then it will have the overhead, for each blob, of finding the file on disc, opening it, loading the data and closing the file. Separate files are more likely to be spread about the disc leading to more disc head movement. Optimisations in the file system may help but all of this will ultimately slow down the loading of blobs. If the driver implements some sort of database in a single file to reduce disc access overhead, then it will have to manage the complexity of adding and deleting files and the corresponding fragmentation in the database.

An application can, at install, first run or "driver changed I've got to do some reconfiguration" time, store all the blobs into one, sequentially written file. This will give the best chance of that application's blobs being written to disc in a minimally fragmented file. When it comes to reload the blobs, this will be done by opening only one file and, because it will be a sequential read of a minimally fragmented file, loaded into memory quickly. The application then has all the blobs ready to pass to the 'attach program blob' API as needed.

Leaving blob caching to the driver will undoubtedly be less work for application developers and may allow things like a new driver recompiling all the cached shaders as part of the driver installation but ultimately, applications loading blobs themselves in the manner described above will be faster for most invocations of most applications.

That is why I think it should be left to applications.

Korval
09-30-2007, 01:02 PM
I don't understand why there's a general dislike of the transparent driver caching idea. Nobody's actually given a reason why it's a bad idea.If you disregard the actual reasons given thus far, then yes, nobody's given an actual reason. However, if you actually pay attention to the responses to the idea:

Korval: The only way to guarantee this (or as close as it gets) is to have something that can be written into an OpenGL spec.

Komat: And I already got burned by expecting that the uniform variables are really variables which can be freely changed and not something that will cause shader reoptimalization.

Davej: Caching of blobs should be left up to applications because they know their usage patterns and will be best able to pick a scheme which suits them.

So there have been plenty of reasons given against it. And the only argument for it is that it is transparent to the application writer.

bobvodka
09-30-2007, 04:21 PM
There is also the matter of when does the driver clean up the cache? If you are keying on the shader then, unless I'm mistaken, a small change in a shader means another key, another entry and more disk space gone.

When I'm working on shaders I tend to go thru many revisions, with this transparent caching it seems they will all be left laying about until such time as I reinstall windows; and most of them will be taking up diskspace for no good reason.

Same applies to game; if I uninstall a game I expect it all to vanish, not to have cache'd files by the driver laying around which might never get used again.

Finally, if the driver regenerates this cache on a driver change (as required) it's going to add more time to installing the driver (which frankly is long enough as it is already) and end up doing redudant work if, again, most of those cached shaders are unused.

yooyo
09-30-2007, 05:25 PM
This discussion become offtopic... but anyway...

Plain text or binary blob? If you (with app) deliver plain text you cant protect IP. Developers can encrypt shaders, but this is "easy to hack" solution. Maybe we can obfuscate shader?

What if you deliver binary blob? What is this binary blob anyway? Is it compiled shader returned (in binary form) from driver? We have different vendors, each have several hardware classes, same shader can be compiled on different way depending on underlaying hw & driver. Even more, if user change IQ settings in game, this action will probably trigger changes in shaders and force recompilation. If app use exactly the same shader but with different prams and context it can be compiled different... same shader different results.

So.. this binary blob must be some bytecode in common form supported by all vendors. Im affraid, this is not going to happen ever. Even more, this bytecode can be reversed back to plain-text (but little obfuscated).

If driver have internal binary shaders database, app still have to provide original shader to compilation. Driver can cache binary blobs in app folder for later use. This could be implemented in driver control panel.. simple check-button in UI. This will speedup game startup on second run.

Some hw vendors use dirty tricks & shader replacement by detecting executable filename, so they already have precompiled shaders.

At end.. compilation speedup can be achived by caching shaders on local disk. Application should not be aware of this. Protecting IP can be done by some level, but it is not safe enough (as any other digital protection).

JC engines doesnt use too many shaders, so Doom3, Q4, ETQW, etc. and his engine have quite fast startup.

Brolingstanz
09-30-2007, 06:58 PM
It's better to do what D3D + its tools are doing.Ja, das ist gut.

Zengar
09-30-2007, 07:08 PM
I see only one argument against driver based caching, and it is:


Davej: Caching of blobs should be left up to applications because they know their usage patterns and will be best able to pick a scheme which suits them.
But even this is not so important. The driver can detect basic usage patterns easily. It can maintain usage statistics for each shader and cache it only if it is a) long enough b) used often enough c) add your choice

Also, the extension may introduce some driver hints(like glSetShaderParameter(shader, GL_CACHE_SHADER_EX, GL_TRUE))

Of course, there may be some special cases... Your application generates thousands of shaders on the fly and uses them only once? Well, you probably won't be caching them yourself anyway. Still, if by a small chance some particular shader code is generated more often then others, a driver with simplest usage statistics will notice it immediately and put it into the cache.

I imagine something like this: the driver creates a file per application, somewhere under the user's data. This file contains the cached shaders. Each time the rendering context is deleted, the driver evaluates the statistics and updates the cache. The implementation should be simple enough, I am sure that a more or less proficient programmer could code such system in several days.

@bobvodka: The reinstalation of drivers won't take more time, the uninstall utility just deletes the cache files.

And of course, Korval is right about having to specify this behaviour in the spec. Still, I don't see why OpenGL spec can't mention saving data to users hard drive.

I like such approach very much, because it is very transparent. It "just allows" new functionality, even for applications that aren't aware of it. And as mentioned, some fine-grained control about caching could be provided via extensions. Maintaining the blobs manually destroys the abstraction.

Korval
09-30-2007, 08:12 PM
So.. this binary blob must be some bytecode in common form supported by all vendors.You're misunderstanding the conversation.

The debate is between three alternatives:

1: Let the IHVs optimize program compilation by hoping and praying that they will implement some kind of caching system that compiles a particular program once and then uploads the compiled program when it detects you are trying to compile the same program again.

2: Add an extension that provides the ability to retrieve a "binary blob" from a program object. This blob will contain the program in an IHV-specific format of an indeterminant nature. The only part of the format that is cross-platform is a multi-byte header identifying the GL implementation that created it. If you load this blob into a separate implementation, it will simply fail. And even the implementation that created it can decide not to load it again (suggesting that it can optimize the string version better now or something).

3: Same as 2, except that the binary blob must also contain the string form of the shader in addition to the implementation identification code and the implementation's binary data. The string will be stored in a well-defined format. This allows a program to use blobs built from any implementation in any other, because worst-case, loading it simply provokes a recompilation.

We are not suggesting a new shader language, whether textual or binary.


The driver can detect basic usage patterns easily.Actually, no: OpenGL pre-3.0 have shown that detecting usage patterns sucks. The detection either gets them wrong or gets them kinda right or whatever. It's never as good as it would be if the driver and the code established a real contractual obligation that was enforceable in some way. That's why 3.0 abandons anything that requires such detection in favor of a more rigid approach.

It's also the reason why 3.0 won't have hints. Or won't use them for much.


The implementation should be simple enough, I am sure that a more or less proficient programmer could code such system in several days.Yes, and implementing a functioning glslang compiler should be simple enough that 2 years ought to be enough. And yet both nVidia and ATi have failed miserably at it.

IHVs have violated the trust relationship enough that trusting them on shader stuff is just stupid. I'd much prefer a contract written in spec-language that you can at least verify is being honored.

Also, this doesn't answer Komat's issue. That is, what if you can work around the long compile somehow? What if you can compile a subset of shaders that take less time initially, just to get up and running, and then compile others in a separate thread? You method would make it impossible to tell if recompiling is going to take place, so it would be impossible to work around the long compiles in the event of a forced recompilation.

Now, I don't necessarily agree with the point (mainly since I don't plan to explore such a solution), but I can't really find fault with it, as it would make a potentially important kind of solution impossible to implement for little reason.


the uninstall utility just deletes the cache files.Immediately causing every GL program that relied upon that cache to suddenly take 10 minutes to start up when it used to take 20 seconds. This is a pretty strong argument against this.

I'd rather not put that kind of basic application performance in the hands of people who have been shown, time and again, incapable of being able to write a C compiler.


Still, I don't see why OpenGL spec can't mention saving data to users hard drive.Because OpenGL specifies behavior, not optimizations. What you're talking about is an optimization.

The GL spec can only say what will happen to the internal state (contents of images and framebuffers, etc) of a context when you call certain API functions. It certainly cannot state what will happen to information when the context is destroyed and a new one created perhaps days later. "Behavior" is that which you can detect has happened in the state by looking at the state.

You cannot detect that something has been put into a cache unless the cache object itself is a GL object that you can talk to. You cannot detect that some file has been put somewhere, etc. In short, this is not behavior.

It's also the reason that GL doesn't specify windowing system dependent stuff, even if it could in an OS-neutral way. It is simply outside of its domain.

sqrt[-1]
10-01-2007, 12:49 AM
I think I have made my opinion on this matter known previously:
http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=7;t=000626#000012

And I can still say I prefer the D3D approach. (I fact I seems to recall that Nvidia engineers were pushing for this when GLSL was first proposed. Argument being that they would have to optimize for D3D anyway and they hated adding 1MB+ bloat to their code)

To re- iterate another post of mine:

I still think a intermediate representation is a good idea even if the compile times are the same. (which I highly doubt)

- Easy to see when the compiler does "dumb things".
- Don't have to worry about code parse bugs. (These should not happen, but do)
- Dead code/Code folding optimizations can take as much time as needed.
- Don't have to worry about spec violations from different vendors (or even changing between driver versions)
- Easier for driver writers to support. (probably)

I know a lot of these problems do not exist in "theory" but in practice I believe compiling at runtime is adding a huge surface area for failures.

Simon Arbon
10-01-2007, 01:37 AM
Korval, My vote goes to:

2: Add an extension that provides the ability to retrieve a "binary blob" from a program object.I agree with pretty much everything Komat has said in this topic in support of this option.
I dont want the source in the blob because
a) 99% of the time you dont need it so its a waste of time & memory loading it
b) I want the choice to do the recompilation in the background or at a later time

As for encrypting shaders (yooyo), any scheme to pass encrypted code to the driver will either need a decoder in the driver that will be hacked in no time, or will pass unencrypted data where it can be intercepted.
Much better to just publish your fabulous shaders in GPU Gems so we can all admire your work.

Korval
10-01-2007, 02:16 AM
I still think a intermediate representation is a good idea even if the compile times are the same.Whether it is or not is entirely irrelevant. Because it's not going to happen.

Continuing to extol the virtues of such a system is meaningless posturing in the face of this fact. Back when the debate was going on, all of these things were brought before the ARB. They considered them and rejected them in favor of the advantages of glslang.

So, we are where we are. Proposing "solutions" that aren't going to be implemented is useless. The one thing that the 3 outlined possibilities have in common is that they could all happen.

knackered
10-01-2007, 02:16 AM
Fair enough, I understand davej's objection.
I also understand sqrt's objection.
The d3d way it should be.

yooyo
10-01-2007, 02:55 AM
@Simon Arbon:
I dont care about encryption. I'll probably never use it, but some people think that it's neccessary to protect their IP by making shaders unaccessible. This is reason why I mention encryption.

Zengar
10-01-2007, 03:10 AM
Of course, a simple intermediate representation would be even better.

P.S. Can anyone summarize the D3D approach here?

davej
10-01-2007, 03:41 AM
Notwithstanding my comments about the application being the best place to make decisions on compiled shader caching, it would still be useful to have this facility for developers who would like things cached for them but are not concerned about wringing the maximum performance from the system.

But does a cache have to be implemented in the driver? A library can implement shader caching on top of any get/set program blob APIs the driver provides and would require little effort by a developer to use. Even if it wasn't standardised and implemented as part of glu, I'm sure someone would generous enough to release their implementation like the various extension wranglers are, or maybe one of the IHVs could do it. This will provide the best of both worlds from the developer point of view and will not require extra complexity added to drivers.

The question then becomes: are there optimisations the driver could do if it was managing the cache that it wouldn't be able to do if it wasn't - I'd imaging it might save some memory copies but it would really need a driver writer to tell. And if there are, do these outweigh a highly optimised application just using the blob APIs?

Any thoughts?

knackered
10-01-2007, 11:08 AM
Yes a generic caching system can be built on a blob API, and if the blob API were available then I'd be all for it. But currently we have no such API, so driver caching is our only option at the moment of getting our compile times down.
I still don't like the idea of the blob API existing at all, but it's conceivable that we may pay a disk-thrashing price if the driver caching thing became a final solution to the problem.
I don't know, I'd be interested to hear from someone on the ARB about this.

Demirug
10-01-2007, 11:18 AM
Originally posted by Zengar:
P.S. Can anyone summarize the D3D approach here? Direct3D use a two step solution for shaders:

1. Step: The HLSL shader is compiled to a hardware independent bytecode. You can specify the shader model that the compiler should use. This information is included in the byte code, too. The compiler is part of the DirectX SDK. It can be found in the Direct3D Extension library and as a command line tool.

2. Step: The byte code is passed to the driver. The driver is responsible to translate it to a platform dependent shader. The driver has to report the supported shader models and must accept any shader that is compiled for such a model.

Zengar
10-01-2007, 11:24 AM
Sounds nice to me :) One just has to design the bytecode in a way that allows it to be analysed/compiled very fast.

Overmind
10-01-2007, 11:41 AM
One just has to design the bytecode in a way that allows it to be analysed/compiled very fast.Don't forget that this bytecode must not throw away any information that may be used to optimize. And with that I mean not only information that is useful for optimization on current hardware, but also information that may be used to optimize for the new GPU with *insert random uber-feature here* support.

The D3D bytecode does not accomplish this. For example if you compile for a shader model that has no dynamic branching, your shader will never benefit from dynamic branching, no matter how good the actual hardware is.

knackered
10-01-2007, 01:32 PM
I just want my GLSL shaders to build quicker. That's all I care about, it's becoming an issue now I've moved my renderer completely away from fixed function.

Komat
10-01-2007, 03:34 PM
Originally posted by Overmind:
Don't forget that this bytecode must not throw away any information that may be used to optimize. And with that I mean not only information that is useful for optimization on current hardware, but also information that may be used to optimize for the new GPU with *insert random uber-feature here* support.
Only if you need to have the theoretically best performance on future hw. For some people good performance on current hw, fast loading time and reliable compilation is more important than theoretically optimal use of any hw including future one. If it runs good on current hw, future hw will likely have the horsepower to handle less than optimal shaders at perfectly sufficient speed.

Additionally bytecode format which does not lose any optimization information has lost the feature it was selected for in first place: The speed of loading (and in real world also improved compilation reliability). Only format which applies significant portion of the work at the precompile time can have that feature and people who wish to use that format likely found the features gained to be worth of the features they will loose.



The D3D bytecode does not accomplish this. For example if you compile for a shader model that has no dynamic branching, your shader will never benefit from dynamic branching, no matter how good the actual hardware is. The idea behind D3D bytecode is that you can precompile shaders for various targets (including degree of preference for dynamic branching or additional capabilities of the target hw) and select the best from them for current hw. The goal of D3D bytecode is not to get perfect match for any hw ever created, reliability across hw is more important.

Korval
10-01-2007, 04:35 PM
The driver has to report the supported shader models and must accept any shader that is compiled for such a model.Too bad the second half of that is a bold-face lie.

It isn't too hard to construct a shader that agrees with the D3D definition of a shader from a particular shader model (in instruction count) and yet fails to load on one or more actual pieces of hardware. All you need to do is figure out which opcodes expand to multiple machine opcodes on certain hardware, and then make a shader that fits within a shader model's instruction count but uses a lot of these expanding opcodes.

At least OpenGL's shader systems acknowledge the reality that you cannot guarantee that a shader of any form will compile on all hardware. I prefer the cold reality of a situation to a pretty fantasy that might bite me later on with certain hardware.


One just has to design the bytecode in a way that allows it to be analysed/compiled very fast.And, of course, specify any number of different "levels" of functionality, based entirely on specific revisions of pre-existing hardware, effectively making one shader language for every IHVs revision.

Remember: the D3D's model is predicated on the believability of the aforementioned lie: that any shader that passed HLSL compile for a particular shader model will load perfectly into any piece of hardware that expresses conformance with that arbitrarily defined shader model.

Once you step out of the realm of fantasy and return to reality, you find that attempting to define a set of "cross-platform" shader models is basically nonsense. Shader models, practically by definition, are not cross-platform. They give weight to implementations that basically implement them directly into hardware, and make it more difficult for IHVs to create cleverer implementations (see Intel's upcoming graphics processor, which I guarantee does not implement D3D shaders the way that ATi/nVidia do).

I'd also like to remind people that it's not going to happen. So continuing to wish for it is a waste of everyone's time.

V-man
10-01-2007, 05:24 PM
Originally posted by Overmind:
[QUOTE]The D3D bytecode does not accomplish this. For example if you compile for a shader model that has no dynamic branching, your shader will never benefit from dynamic branching, no matter how good the actual hardware is. I don't know for sure if that's how they do their compiling but that would be my guess.

For the case of GLSL, it already supports such language features and the byte code can support them too.

Other types of optimizations are easy to do using the bytecode itself
- MUL and ADD becoming MAD
- MUL and ADD and ADD becomming MADD
- etc

Higher level functions like cross, normalize, reflect, refract can be kept as opcodes incase they become available in future GPUs.

V-man
10-01-2007, 05:43 PM
It isn't too hard to construct a shader that agrees with the D3D definition of a shader from a particular shader model (in instruction count) and yet fails to load on one or more actual pieces of hardware. All you need to do is figure out which opcodes expand to multiple machine opcodes on certain hardware, and then make a shader that fits within a shader model's instruction count but uses a lot of these expanding opcodes.There are 2 things to remember :
- instruction count
- instruction slot

Example : the cross product instruction
Either this can be done as a single instruction or the GPU doesn't support it and it needs to expand it. The expansion is the "instruction slot" count.

I think the "instruction slot" count is the maximum number across the board (whatever are the GPUs to be released by ATI, nVidia, Intel and else)

The D3DX compiler actually refused shaders that will work on your GPU since your GPU supports some of those instructions as "single slot".

Korval
10-01-2007, 07:04 PM
I think the "instruction slot" count is the maximum number across the boardHow could they possibly know, when Intel hasn't even released their new Larrobe-based GPU that's built around x86?


The D3DX compiler actually refused shaders that will work on your GPU since your GPU supports some of those instructions as "single slot".And this is good? A shader that would be perfectly functional on this GPU not being able to be used because of the architecture of the compiling sequence?

This is providing plenty of evidence for glslang. At least IHV compilers will eventually get better; the D3D shader pipeline won't.

sqrt[-1]
10-01-2007, 07:51 PM
Originally posted by Korval:

And this is good? A shader that would be perfectly functional on this GPU not being able to be used because of the architecture of the compiling sequence?
I think it is a non-issue on modern hardware as the limits are so high, you have to write some seriously big programs to hit these limits.

Anyway, if it becomes an issue, just don't include a instruction count check in the spec. (either load or fail to load)

Or perhaps include a query for limits like they did to support texture indirection counts for ATI.

Komat
10-02-2007, 12:24 AM
Originally posted by Korval:
At least OpenGL's shader systems acknowledge the reality that you cannot guarantee that a shader of any form will compile on all hardware. I prefer the cold reality of a situation to a pretty fantasy that might bite me later on with certain hardware.
I do not consider behaviour ranging from shader compilation failure to running in sw emulation (without API supported detection of that fact) to be "acknowledging the reality" significantly better than the D3D model.

If you take advantage of OGL capability to use the hw to the limit, you are risking that the shader will not work on different hw with similiar level of functionality or even on newer driver for the same hw. The D3D model which limits you from using the hw to the limit reduces the chance that you will into this problem.



How could they possibly know, when Intel hasn't even released their new Larrobe-based GPU that's built around x86?
I think the idea is that the limits were set for hw existing at that time with expectation that future and more advanced hw will meet that requirement easily by having bigger instruction limits or at least because it was designed with that requirement in mind.

Komat
10-02-2007, 12:43 AM
Originally posted by Korval:

Shader models, practically by definition, are not cross-platform. They give weight to implementations that basically implement them directly into hardware
Favoring one implementation does not prevent them from being cross-platform as long as the program compiled for that model runs on more than one hw which is the purpose of the shader models.

Even with shader system which does not by itself favor one implementation, you will still likely favor one implementation by way in which you write the shader (e.g. ammount of texture sampling vs. ammount of calculations).

Korval
10-02-2007, 01:28 AM
you will still likely favor one implementation by way in which you write the shaderYes, but then it's me and my choice. I should absolutely not have the choice forced on me, and certainly not by the whims of some 3rd party application. Nor should the ARB give preferential treatment to certain implementations over others; they should all have reasonable chances at optimization for their respective hardware.

And, as a reminder, not gonna happen.

yooyo
10-02-2007, 04:49 AM
knackered hit the point:


I just want my GLSL shaders to build quicker.

dorbie
10-02-2007, 04:31 PM
Yep, good posts by knackered,

AFAIK drivers already use cached hash functions for shader compilation especially for the implementation/optimization of fixed function state machines on programmable hardware.

There are multiple ideas being proposed here.

1) caching of compiled results across runs

2) precompilation/shader binaries

3) backwards compatibility for shader binaries (the reverse of 2)

1) cached compiled shaders is a significant step away from fixed state caching caching, the hash lookup is much more complex.... I don't think it's untoward of a driver to cache these things across runs, and it doesn't always have to be cached on disk or even system memory.

2) there's a spectrum of options here, does the developer pre-compile, can you pre-compile at installation, can you precompile and cache on the fly. Should it be left as explicit and up to the application with only a glGetShaderBinary(), this could actually be used by an application to implement all these schemes, but it could also be abused and lead to incompatibility or slow code-paths on alternate hardware.

3) Some of the options in 2 probably mean that you have to support backwards compatibility, but benchmarketing is already abused and this would make it worse. Should there be a standard intermediate representation use for shader portability and how low level should it be?

I know the instinct of driver developers is to leave this to developers and keep the interface simple, so just loading and querying shader binaries leaves you with plenty of options. If those binaries are low level and non portable you'd have to use the interface very carefully even potentially on different cards from the same vendor. But it would solve a lot of the griping here at the expense of installation or 'initialization' time each time some sort of graphics card or driver validation failed.

Frankly I think most options lead to a mess and an implementation nightmare on the desktop except localized binary building, but even that would be prone to support issues. Perhaps it could be strictly enforced with a local card ID & driver check.

Specifically perhaps you should not be allowed to run a shader on one card compiled for a different card or driver even if it is probably compatible. i.e. no binary shipments, all shaders are invalid if anything changes, the application MUST check a shader load attempt and recompile if a binary load fails, AND explicitly save that binary to disk if they think they need it again. This would solve the turnaround issue and eliminate needless recompilation which is the main performance issue, and it still leaves ascii shaders on the primary code path without complicating things.

Maybe that's a bit tricky to use, but if you can't figure this out you shouldn't be messing around with binary shaders.

Simon Arbon
10-03-2007, 12:26 AM
A typical compiler does the following:

1) Parse the source text and produce a tokenised binary representation.
2) Convert the linear code into a parallel syntax tree.
3) Do general optimisations (constant evaluation, redundant code elimination)
4) Fit the code to the destination processor (assign variables to registers)
5) Generate machine code for each subtree
6) Serialise and optimise code (reorder to improve pipelining)

Step 1) gives us a binary representation of source which saves some space and removes symbol names, but gives insignificant compile time savings.

The syntax tree is complex, can be larger than the source, and will often change format whenever the compiler changes its optimisation schemes.
Using this as a binary representation would limit future compiler improvements.

Using an "intermediate" binary format is in effect compiling for a non-existant "virtual" GPU
and then having to recompile the intermediate code to fit it to the actual GPU.
All this accomplishes is limiting your capabilities to those of the virtual processor instead of using your full GPU capabilities.
The Direct3D idea of creating a new intermediate language for every 2 or 3 real GPU's is just plain silly, they would have been better-off doing a separate profile for every real GPU.

The only format that makes sense is storing the actual GPU machine code as a blob, along with a hardware idenfifier and a driver/compiler version number.


Dorbie: Specifically perhaps you should not be allowed to run a shader on one card compiled for a different card or driver even if it is probably compatible. i.e. no binary shipments, all shaders are invalid if anything changes, the application MUST check a shader load attempt and recompile if a binary load fails, AND explicitly save that binary to disk if they think they need it again. Yes, most definitely, even if its just a driver change there may have been improvements to its optimisation methods, so the driver should decide to either accept the blob or tell you to recompile.

Demirug
10-03-2007, 12:34 AM
I donít like the idea of hardware/driver version dependent binary shaders as the only solution. Sure they would be better than nothing but I donít think that this is the way to go. From the game developers view we need a hardware independent binary shader format. Forcing the users to wait at the first start (or after a driver update) until the shader cache is rebuild is not that good in a market were you have consoles as competitor. I know that some DX games do this too but as one of the big future aims for PC gaming is to remove the installation step at all anything that slow down going to game should be avoid.

If we take a look at other famous byte code systems like java byte code or the MSIL byte code used by .Net we can see that you can even write large programs. Shader programs are much shorter. I havenít done anything with java byte code yet but as I have some knowledge of MSIL I can say it would not that hard to compile GLSL (or any other shader language) to such a byte code system. Converting such byte code to platform specify code is a common job. And it would be much faster than using GLSL shaders over and over again. Most of the hard work is already done at this point. Syntax checking, validation, dead code remove, Ö

If you add ďcall tokensĒ to the byte code and use them for the more complex functions (anything beyond simple math and logic) you donít lost the semantic of the shader. This way you can even add additional libraries with new functions later or as an extension. Another good thing about having hardware independent binary shaders is that you can easily write tools like debugger around such a system.

Simon Arbon
10-03-2007, 01:09 AM
Demirug: Most of the hard work is already done at this point. Syntax checking, validation, dead code remove.Unfortunately not, the hardest part is fitting to the hardware limitations and optimising for all of the odd quirks of a particular processor.
You will save some time, but not enough to be worth the trouble.

Java byte code is mainly used to decrease the download time, its only fast to compile because the just-in-time compilers dont optimise the code they produce.

Looking through the MSIL byte-code documentation gave me a headache, i have never seen a more confusing mess made of what started out as quite a good idea.

Zengar
10-03-2007, 04:18 AM
Originally posted by Simon Arbon:

The only format that makes sense is storing the actual GPU machine code as a blob, along with a hardware idenfifier and a driver/compiler version number.
No, this is bad. Because if you store the GPU code, you can store only programs, and not individual shaders. This limits the usefulness of compiled shaders greately. In short words, compiled shaders must be stored prior to the linkage step.

Basically, the slowest part of a compiler is the parser. If one chooses the intermediate representation visely (it should be some sort of control/dataflow graph with static assigment), the recompilation would be lighting fast. Also, shaders are very small and differ from the CPU programs - main difference being the full absence of side effects. Actually, a GPU shader is a pure function. So, experience of functional language compilers may be very helpful here.

P.S. @Simon Arbon: Just-in-time compilers do optimize their code in most complex way. The HotSpot compiler actually does some crazy things like dynamic reoptimisations and excessive inlining. Modern Java is almost on par with C on speed. Still, the drawback of Java and .NET bytecode is that they are based on a stack machine - which is easy to interpret and a pain to compile.

P.P.S. You could also look at the LLVM project (http://llvm.org/) for more inspiration.

sqrt[-1]
10-03-2007, 04:44 AM
While compile times are a big concern for me, they are a secondary concern to the problems of compiler quirks.

I don't think I have ever moved some source code from one compiler to another without having to fix minor "quirks" or spec violations in the old code. Now I agree that GLSL is a lot simpler than a lot of other languages, but even the driver writers for OpenGL 2.1 can't seem to always reject bad code.

Infact the GLSL spec says the only requirement is to accept valid code. (detection/rejection of bad code is optional)

You would think a solution to this would simply be to have a common parsing interface code in each driver. 3Dlabs attempted this for OpenGL2.0 but only ATI seems to use it. Nvidia have their own Cg based compiler (that was known for a long time to accept bad code) and even Mesa wrote their own? (I think they found too many bugs in the reference implementation?)

I am afraid that most programmers are not good enough to program to a spec without relying on the compiler catching the occasional spec violation. (I include myself in this)

Komat
10-03-2007, 04:58 AM
Originally posted by Demirug:
If we take a look at other famous byte code systems like java byte code or the MSIL byte code used by .Net we can see that you can even write large programs. Shader programs are much shorter. I havenít done anything with java byte code yet but as I have some knowledge of MSIL I can say it would not that hard to compile GLSL (or any other shader language) to such a byte code system.
As far as I know, even the MSIL code which is compiled to very flexible CPU architecture already looses HW features which are accessible when you directly compile for the target CPU. Namely the SSE vector instructions. This is exactly the same reason which is presented here against D3D assembly.

Simon Arbon
10-03-2007, 05:44 AM
@Zengar: An interesting link, i will be looking at this in detail.
The superoptimising pascal to x86 compiler i use for my final builds spends 5% of its time in the parser and 95% of its time in the x86 optimiser.
Of course the x86 is a horribly complex processor to optimise 100%, and you would expect a GPU to be a much simpler RISC style processor so yes, it would have a much faster back end.

You also seem to be talking about using some sort of syntax tree/dataflow graph as the intermediate language rather than tokenised source or a pseudo-machine code.
In the post you quoted my only objection to using this was that it would tend to lock-in the compiler to using that specific representation and would not allow future improvements in compilation technology that depended on a different representation.
If a structured intermediate representation could be shown to be flexible enough to support future optimisation developments, to be no larger than the source code, and to provide a significant speed boost, then i would happily support it.

if you store the GPU code, you can store only programs, not individual shaders.This is true only because of the way the manufacturers have implimented their current complilers.
It should be possible to store the vertex, geometry & fragment shaders individually, and would be nice if we could compile and store bits of unlinked shader, but i would not be worried if i cant as i always use the same vertex shader with a given fragment shader anyway.

Demirug
10-03-2007, 06:38 AM
Simon Arbon: Looking through the MSIL byte-code documentation gave me a headache, i have never seen a more confusing mess made of what started out as quite a good idea.It has given me fewer headaches then the Intel x86 programmerís manuals. But thatís is a personal thing for sure. Anyway it was only a example. For the specify domain of shader programs a specify byte code specification would be better.


Komat: As far as I know, even the MSIL code which is compiled to very flexible CPU architecture already looses HW features which are accessible when you directly compile for the target CPU. Namely the SSE vector instructions. This is exactly the same reason which is presented here against D3D assembly. [/QB]Yes there are no MSIL tokens for SSE instructions. But there is nothing that stops you from doing vectorisation as part of the JIT process. If you take a ďnormalĒ C compiler the vectorisation is done in the last pass too. You can only generate SSE code directly by using an inline assembler or if supported special intrinsic.

If your shader byte code supports call instructions and custom libraries it would be possible for every IHV to provide libraries with special functions that only there driver will understand. You will lose hardware independencies by using such functions but you will have the choice. Coming back to the MSIL example it would be possible to write a vector function library and a special CLR that translate this calls to native SSE code.

Korval
10-03-2007, 10:37 AM
This limits the usefulness of compiled shaders greately. In short words, compiled shaders must be stored prior to the linkage step.Shaders are useless; alone, they can do nothing. If you can cache the programs that they link to, you don't need to build the shaders at all. So there's no point in storing them.

Now, when it's time to compile from glslang, yes, shaders have value. But if you're not wanting to compile from scratch, you have no need for shaders; all you want is a program.

Further, if you're going to have an intermediate language as you suggest, it most certainly will be just a form of the linked program, not of individual shaders. After all, as we've pointed out in this thread, shader compilation doesn't do nearly as much as one would like.


Yes there are no MSIL tokens for SSE instructions. But there is nothing that stops you from doing vectorisation as part of the JIT process.The point is this: what happens if you vectorize wrong?

SSE vectorization is, for example, very different from the kind of vectorization that ATi/nVidia use. Which means Intel's upcoming GPU will have a very different kind of architecture.

It's no different from asking an AltaVec-based G5 to execute x86 instructions. Sure, it can do it, and you could recompile the x86 assembly to G5/AltaVec assembly, but the resulting code would never be be quite as fast as if it had been built from source. Inlining, for example, has already happened, so you're stuck with whatever inlining scheme works best for x86.

The expression (where M and V are matrices):

M-1 * V * M

Will be optimal for G5s and x86 in very different ways. The G5's AltaVec is a bit more powerful and flexible than x86's SSE or 3DNow, so it will do things one way, while the right answer for x86 will be significantly different. Furthermore, you cannot convert from one to the other optimally with just the assembly; you need the semantics of the original expression before you can do that.

So any intermediate language would either be good for some platforms and bad for others or would be bad for all platforms compared to glslang.

tamlin
10-07-2007, 01:59 PM
To elaborate on what Korval wrote (and taking it quite a few steps further, I hope);

for shader/program optimization (intended for a specific hardware) to even be decent, it requires either

1) a higher-level representation for the drivers to optimize to their (abilities of, but that's just QOI) optimal representation, or

2) all implementations to have disassemblers for all other compiled-down-to "language" for ALL other GPU's, be able to turn that disassembly, apply SSA and who knows what analysis is needed, decompile it to generic source, recompile it to target instruction set, and finally optimize it to the target machine code.

I don't think it takes a genius to see what would work in the long term here. Does it?

It really takes no more thinking than "What would you prefer; C or IA32 machine code, if you were to make some code run on any of a number of CPU's?". As it seems the plain C code case is out of the picture (as that'd make it "source code", and with pissing contests and secrecy we can't expect that) there's no way to fix that flaw. What remains is another compiled representation - though still ISA-independent.

That said, I disagree with Korval's conslusion that "any intermediate language would either be good for some platforms and bad for others". A good (read: proper) intermediate represenation would benefit all - even if some implementations had non-optimal optimizers to turn this into optimal code for their hardware. Same crap happens every day with all compilers, compiler vendors, CPU's, ISA's, platforms and... (fill in what's missing)

<rantish>
Let's just hope AMD releasing spec's (claimed - we have seen NO proof yet) turns this game around, making closed-source compilers display their true face, and allow for really caring optimizations - not just "optimize for our latest card to look like we're not another 'Quack' ".<rantish/>

Brolingstanz
10-07-2007, 02:17 PM
<thinking aloud amongst friends>
yup an IL is good for everyone, except the IHVs... more work for them, less work for us. All things being equal, I tend to come down on my side, but by the same token I don't blame them for coming down on theirs.

Perhaps a more staggered, incremental approach for GLSL is in order, custom tailored to a few generations of hardware, with a minor face list every few years or so rather than a complete overhaul every decade.

I see nothing in the GLSL spec that expressly prohibits an IL.

You want vendor independance, use DX. You want platform independance, use GL. Be nice to have both with GL... certainly seems doable to me. Maybe I missed something.

</thinking aloud amongst friends>

Korval
10-07-2007, 04:24 PM
It really takes no more thinking than "What would you prefer; C or IA32 machine code, if you were to make some code run on any of a number of CPU's?". As it seems the plain C code case is out of the picture (as that'd make it "source code", and with pissing contests and secrecy we can't expect that) there's no way to fix that flaw. What remains is another compiled representation - though still ISA-independent.If your goal is to make your code run optimally anywhere, then your goal is not the secrecy of the source code. You can have multiple goals, but one has to have primacy. And if the goal is to be able to compile for any arbitrary architecture into an optimal form, then you're not going to achieve secrecy.

And if that's your goal, then you need a language that preserves high-level constructs. It must preserve function calls, high-level looping constructs (not merely a jump function), structs, etc.


A good (read: proper) intermediate represenation would benefit all - even if some implementations had non-optimal optimizers to turn this into optimal code for their hardware.Perhaps, but such a thing is really just glslang. Maybe an easier to parse form of it (high-level assembly), but that's all the savings you're going to get.

The problem of interest in this thread is that programs with thousands of glslang shaders take minutes, in some cases hours, to start. We want to find a way that can decrease this time.

The intermediate language issue only came up as a possible solution to this problem, and it is only useful as a solution to that problem. An intermediate language that is high enough level to remain optimal in virtually all conditions will require approximately the same compile time as glslang. This no longer makes it a solution to the problem, and thus it becomes useless.

Which is the point. You can decrease compile time (on some hardware) by making a low-level shader language. But in doing so, you make it so that only certain hardware can compile the shader optimally.

In general, I would say that readback of a fully IHV-dependent binary blob is the 100% safest way to achieve faster compile times. While the initial compile will be slow, you can pretend that it is program install time. The concern is that later changes (swapping graphics cards, etc) will force a lengthy compile, and even a driver download can cause the system to want to recompile the shader.

The modification of this method to hold textual glslang code in addition to IHV-dependent data is an alternative that exists primarily to make the binary blobs IHV-neutral. That is, a blob written by driver X can be read in driver Y without problems, though it may recompile the shader. The only problem there is that it is not possible to know if the shader will actually be recompiled or whether it will be loaded precompiled.

The third alternative is to hope that IHVs will cache compiled versions of our shaders and load them when our string matches with one in the cache. The primary disadvantage is that it isn't really a solution, as it does not rely on GL spec-defined behavior (since spec-defined behavior begins and ends with a render context).

There are two things that these three alternatives have in common that the intermediate language one doesn't. First, it retains glslang's advantage with regard to optimization. Second, and most importantly, they have a chance of actually being implemented. The ARB is not going to create a brand new shader language, and they certainly are not going to go the full D3D route with shader model nonsense.

So the intermediate language "alternative" is simply idle fantasy.

knackered
10-07-2007, 06:22 PM
I simply want my shaders to build faster, therefore I'll take the unspec'd driver caching idea for now. They could even extend it so each IHV keeps an online-cache, the driver uploads the shader text to the server, the cache checks the hardware/driver config, scans the appropriate database, returns the binary blob. I'm going insane.

Zengar
10-07-2007, 10:57 PM
Originally posted by Korval:
[QUOTE]
The expression (where M and V are matrices):

M-1 * V * M
... will be optimal in any IL that has matrix operations and types as part of the standart library.

And while it is true, that GLSL is ultimatively a high-level IL, it is not so optimal from compiler's point of view (C clones are parser-unfriendly) . Shaders are very small compariong to real-world program. I really believe that if one could step parsing and basic analysis, including them into IL, it would speed up the whole thing.

P.S. I have great suspects that Nvidias drivers first compile to assembly via their Cg compiler and then compile assembly again.

Korval
10-07-2007, 11:47 PM
will be optimal in any IL that has matrix operations and types as part of the standart library.If you transmute this into a form of assembly language, you get these opcodes, at the highest possible assembly-like level:


I really believe that if one could step parsing and basic analysis, including them into IL, it would speed up the whole thing. Sure. As I pointed out, you would get some performance improvement from it (though how much is entirely up for debate) However, as I also pointed out, it's not going to happen.

Zengar
10-08-2007, 12:11 AM
I was not speaking of assembly langauges. Nor do I believe assembly languages are good IR - because they separete code and data (variable mutability) which makes the dataflow difficult to analyse.


Insead of

Originally posted by Korval:



tempMat t1;
InvertMat t1, M;
MultMat t1, t1, V;
MultMat t1, t1, M;I propose something like

@1 : Mat = InvertMat(M)
@2 : Mat = MultMat(@1, V)
@3 : Mat = MultMat(@2, M) Here variables are assigned only once, so compiler can create precise data/controlflow graphs - actually, the IL code would define a graph.
And if the compiler knows an optimisation (rewriting) rule in a form

Multmat(Multmat(InvertMat(X), Y), X) -> SomeOptimisation(X, Y) where X, Y are Mat is can very effectively apply it by replacing old part of the graph with new one. The semantics won't change (referential transparency).


Even more, this would alow some optimisations (like dead code or common subexpression elimination) to be performed before feeding the end compiler - in the shader to IL conversion step.

Such representation is very powerful, the only real "problem" is branching, but it may be solved in a rather efficient manner.

sqrt[-1]
10-08-2007, 12:38 AM
Zengar's suggestion coupled with user controlled binary dumps and I would be 100% happy.

Compile to "Zengar IL" to remove dead code, do all pre-processing and basic code folding. (solves most of my concerns about shoddy driver code parsing)

User controlled binary dumps would be great for loading speed.

PaladinOfKaos
10-08-2007, 09:03 AM
An IL shouldn't be necessary - I just want an asynchronous compile (so I can start a shader compile, then start loading other assets associated with that material), and I want the "linking" stage to be as minimal as possible for the hardware - it should just be a validation that VS output matches FS input on DX10-class hardware.

Binary dumps might be nice, but I'd rather handle the caching myself. Compile the first time it's used, and tag the dump with a hash of the renderer string. Heck, the first compile could be done during the install - there's lots of CPU idle time during that period of disk access.

Korval
10-08-2007, 10:28 AM
I propose something likeThat's really no different from the assembly form I mentioned. Indeed, it's (slightly) more involved to parse, and the goal was to minimize parse time.


I just want an asynchronous compileWell, in GL 3.0, all object creation, except for templates, is asynchronous.


I want the "linking" stage to be as minimal as possible for the hardwareWell that's never going to happen. Even in regular C/C++ code, linking involves optimizations like inlining, global optimizations, etc. This only becomes more important when a couple of opcodes of wasted space can mean running acceptably and not (or not running at all).

PaladinOfKaos
10-08-2007, 12:33 PM
Originally posted by Korval:
Well that's never going to happen. Even in regular C/C++ code, linking involves optimizations like inlining, global optimizations, etc. This only becomes more important when a couple of opcodes of wasted space can mean running acceptably and not (or not running at all). Do any drivers actually do constant-propagation across shader stages? And is it really worth the hassle of not being able to easily mix'n'match various vertex and fragment shaders?

Zengar
10-08-2007, 12:47 PM
Originally posted by Korval:
That's really no different from the assembly form I mentioned. Indeed, it's (slightly) more involved to parse, and the goal was to minimize parse time.
Hm, no, not at all. First, my suggestion was not an assembly but a graph - there are no variables, but only expressions. With other words, code = data it produces. Second - it is not about the text form, as the matter of fact forget the text form. The IL as I visualise it is a graph and should be stored in a compiler-friendly format - as a graph for example;) Thus, parsing and semantics analysis won't be needed, because of the graph's nature.

But of course, this is just theoretical talk :) If ARB and/or hardware vendors don't want to make such a step I can't force them :) We know that IT is a very slow in adopting new technologies - we still use programming languages that became morally obsolete several decades ago.

Korval
10-08-2007, 01:22 PM
And is it really worth the hassle of not being able to easily mix'n'match various vertex and fragment shaders?First of all, you're talking about two entirely different things.

To be more formal, in glslang shaders are linked into programs. Under GL 2.1, there is a single program that covers all programmable logic from the beginning of the pipeline to the end.

The first fact is (mostly) orthogonal to the second. This is proven by the fact that under GL 3.0, linked programs can serve as only part of the programmable pipeline. That is, you can have a separate vertex program that you bind to a context, and then a fragment program that you compiled/linked separately bound to the same context.

So, if what you're interested in is being able to use multiple vertex programs with the same fragment program, then you will have it in GL 3.0. But it has nothing to do with the actual concept of shaders and program linking. You will still have to compile shaders and link them into programs in 3.0; you will simply have the option of having a program cover only select pieces of the programmable pipe, instead of all or nothing.

And the time it takes for linking will only be affected to the extent that you will be linking perhaps fewer programs if you take advantage of the ability to have separate vertex, fragment, and potentially geometry programs.


First, my suggestion was not an assembly but a graph - there are no variables, but only expressions.You're talking about functional programming.

Of course, a GPU is not a functional programming device, nor does it follow the functional programming model. GPUs are imperatived. Which means that any conversion from that model into a imperative model will take some time. Typically, it's going to require turning the functional graph into a sequence of imperative steps.

Which will almost certainly be slower than parsing and processing C. And at least in C, it is obvious how to do things like register allocation, etc. These are solved problems in compiler design. Converting a functional programming expression graph into a sequence of imperative steps is a non-trivial exercise.


The IL as I visualise it is a graph and should be stored in a compiler-friendly format - as a graph for exampleAnd graphs have a standard format. Right. :rolleyes:


we still use programming languages that became morally obsolete several decades ago."Morally obsolete"? Programs have morals now?

Brolingstanz
10-08-2007, 01:24 PM
The way I see it the vendors already have the frameworks in place to handle the IL that DX compilers give them, which is probably followed by a simple, fast, source-to-source translation.

For GL this would mean each IHV would have to produce some set of tools for offline compilation, something I'm sure they would relish ;-)

A 3rd party compiler would have to be trusted, else there would need to be a validation step during the IL hand-off, which might necessitate a complete recompilation or something equally stinky thus making the whole proposition quite ridiculous. You can't just hand the card a steaming pile in a trusted environment, could make the OS and the vendors look bad.

I think if this ever happens it happens by the hands of the IHVs, which means probably never as Korval has been hinting at with such subtlety :p

It would be quite cool nonetheless.

Zengar
10-08-2007, 01:41 PM
Uh, ok, I don't see the point of turning this thread into a discussion about programming languages. Still, if someone understood what I was talking about, I will count myself lucky.

Demirug
10-08-2007, 01:43 PM
Korval:
Well that's never going to happen. Even in regular C/C++ code, linking involves optimizations like inlining, global optimizations, etc. This only becomes more important when a couple of opcodes of wasted space can mean running acceptably and not (or not running at all).But for most modern OS C/C++ supports dynamic linking. And link time code generation is an advanced feature that was integrated not long ago. Most times optimizations are only done at module level.

I am sorry that I am doing a compare to Direct3D again. Since Direct3D 8 there is only dynamic linking. You simply set the shaders and the runtime/driver link them on the fly. There are some interesting things driver can do here if you have a multicore system.

knackered
10-08-2007, 02:26 PM
My hair's falling out.

Korval
10-08-2007, 02:34 PM
Still, if someone understood what I was talking about, I will count myself lucky.I understood what you're talking about. It is simply not useful towards the goal of making shader compilation/program linking faster, for the reasons outlined above.


But for most modern OS C/C++ supports dynamic linking.Yes, and there are performance penalties for using it. Every function call across DLL boundaries dereferences a function pointer.

Granted, it is a small penalty, but the point is made.


Since Direct3D 8 there is only dynamic linking.I think you confusing the same two concepts as I stated above.

D3D doesn't do any linking of any kind. There is one shader string that is converted into one program object (using glslang language) that is bound to a single component (vertex, fragment, geometry, etc).

Glslang allows for shader linking. That is, you compile one or more shader strings into a shader object. You then link one or more of these shader objects into a program object.

GL 2.1 says that only one program object can be bound to a context, and therefore it must contain all of the code for all programmable components. That is, a program must cover vertex, fragment, and geometry entrypoints; if it does not, then no program can cover them at the same time in a context.

GL 3.0 allows multiple programs to be bound. That is, an independent program can cover vertex only, vertex and fragment, fragment and geometry, or any combination thereof. You can bind multiple programs so long as their coverage does not overlap.

This last part is not linking. It has nothing to do with linking. Linking is the process of taking compiled shader objects (analogous to .obj files) and combining them into a single program object. How program objects interact with the context is entirely irrelevant to how linking works.

So, in summery, this is linking (in pseudo-code):


ShaderObject obj1 = glCompileShader(&amp;strSomeString, 1);
ShaderObject obj2 = glCompileShader(&amp;strOtherString, 1);
ProgramObject progObj = glLinkPrograms(obj1, obj2);This is independent binding of vertex/fragment/etc programs:


ProgramOBject vertexProgram = CreateVertexProgram();
ProgramObject fragmentProgram = CreateFragmentProgram();
glBindToContext(GL_VERTEX_PROGRAM_BIT, vertexProgram);
glBindToContext(GL_FRAGMENT_PROGRAM_BIT, fragmentProgram);GL 3.0 will allow this.

Any questions?

Lindley
10-08-2007, 04:24 PM
That looks similar, at least on an API level, to what Cg does already.

I suppose it might do the glLinkProgram step "under the hood", so to speak.

Hampel
10-09-2007, 12:34 AM
@Korval: what is the advantage (in OGL 3.0) of having the possibility to combine/attach several (compiled) shaders into a single program object? Why not treating each shader as a separate program? For now, you only have 3-4 different program types (vertex, geometry & fragment and in the near future probably blending).

Overmind
10-09-2007, 02:39 AM
Which will almost certainly be slower than parsing and processing C. And at least in C, it is obvious how to do things like register allocation, etc. These are solved problems in compiler design.Most compilers translate the source to a functional intermediate representation (e.g. control flow graph in SSA-form), then optimizing, allocating registers, and transferring back to an imperative representation for code generation.

A functional graph-based representation is actually easier to compile than an imperative bytecode. But that being said, the translation from an imperative bytecode to SSA form takes only a small fraction of the compile process, so it really isn't worth it.

MZ
10-09-2007, 09:50 AM
Originally posted by Korval:
You're talking about functional programming.

Of course, a GPU is not a functional programming device, nor does it follow the functional programming model. GPUs are imperatived. Which means that any conversion from that model into a imperative model will take some time. Typically, it's going to require turning the functional graph into a sequence of imperative steps.

Which will almost certainly be slower than parsing and processing C. And at least in C, it is obvious how to do things like register allocation, etc. These are solved problems in compiler design. Converting a functional programming expression graph into a sequence of imperative steps is a non-trivial exercise.Just for the sake of correctness, none of the above is true.

Korval
10-09-2007, 12:20 PM
what is the advantage (in OGL 3.0) of having the possibility to combine/attach several (compiled) shaders into a single program object?Each shader doesn't have to be a complete piece of a programmable component. It's the same reason you don't necessarily put all of your C/C++ code in one file.

A shader can be a set of utility functions that are included in every program you link. You then let the linker's dead-code elimination remove the stuff it doesn't need. And, in a reasonable implementation of this construct, you won't need to do the compilation work multiple times.


But that being said, the translation from an imperative bytecode to SSA form takes only a small fraction of the compile process, so it really isn't worth it.That kinda begs a question.

If we have the following compiler stages:

Compile source into SSA Perform dead-code removal/inlining Convert SSA to machine code

Where is the glslang compiler performance going? I mean, is compiling a C-like language into SSA-form really that time consuming? Even my fairly old computer can compile a several-thousand line .cpp file in less than a second, and it's doing optimizations, inlining, and all sorts of other stuff. Obviously if I start instantiating a bunch of templates, it takes longer, but straight C++ is pretty fast in terms of compilation. And glslang is much simpler.

Is it the dead-code removal?

Or is it simply that IHVs haven't prioritized the performance of their glslang compilers? I mean, we all know about nVidia's silly "dobule-compile" in their glslang. But is it simply that all IHVs have one part-timer working on their glslang compiler, such that after 2-3 years they still don't have decent implementations or compiler performance?

Maybe we should find some way of putting pressure on the IHVs. I mean, GL 3.0 has to do that, since glslang is required to do anything. But GL 3.0 adoption will be impacted by the quality of 3.0 implementations. And the quality will be impacted by the adoption. Etc.


Just for the sake of correctness, none of the above is true.Such a fine argument you have made :rolleyes:

If what you say is true, then GPU's are in fact functional programming devices. Also, doing register allocation in C-style parsing is not a solved problem. And what Zengar was talking about is not functional programming.

Zengar
10-09-2007, 12:48 PM
Actually I was not going to comment on it, but...


Originally posted by Korval:

If what you say is true, then GPU's are in fact functional programming devices.
GPU shaders are more "functional" then imperative. A shader is a pure function with no side effects at all. Modern GPUs are of course imperative but we were not talking about GPUs, we were talking about graphics (and shading languages).


Originally posted by Korval:

Also, doing register allocation in C-style parsing is not a solved problem.
No idea was parsing of C-style code has to do with register allocation. But yes, register allocation is a much-researched topic and is, as the matter of fact, very effective with SSA-like form I was talking about.


Originally posted by Korval:

And what Zengar was talking about is not functional programming. No it was not. I was talking about declarative (at least something like it) programing. A feature that actually marks functional programming is the presence of higher-order functions.

Overmind
10-10-2007, 04:33 AM
Where is the glslang compiler performance going?I can't say for sure for the GLSL compiler, but generally most of the compile time is spent on various optimizations.

For example register allocation can be a very expensive operation, depending on how many registers you have and how "optimal" the solution should be. Also, there may be other optimizations going on, like arithmetic transformations, various forms of code motion, or transformation of a sequence of simple operations into more complex operations (vectorization, ...).

I don't know if current GLSL compilers are doing any of these, but all of this can be very costly.


Even my fairly old computer can compile a several-thousand line .cpp file in less than a second, and it's doing optimizations, inlining, and all sorts of other stuff.Personally I never experienced any problem with shader compilation time, but I can see that when compiling some thousand shaders every millisecond becomes important. Especially when you can't load all shaders at application startup, and can't accept any variation in framerate (e.g. VR applications, where users get sick when you have too much lag :p ).

knackered
10-13-2007, 03:32 PM
I was just wondering, and seeing as though this topic has morphed into a discussion of GLSL compilers, could I get a general consensus on something?
Can we completely rely on the 3 main GLSL compilers (nv,ati,intel) removing functions/variables/uniforms/varyings that don't contribute to the final output?
Is there something in the spec, or is it something anyone's observed to always be true in practice?

Jan
10-13-2007, 04:07 PM
If i learned anything important about OpenGL in the past years, it is not to rely on ANYTHING and not to be surprised about the weirdest behavior. I fear that with such a complex part of OpenGL (GLSL) all your hopes are in vain.

Jan.

knackered
10-13-2007, 06:49 PM
thanks for your reply, but no - I know the nvidia driver definitely removes uniforms that don't contribute to the final output (because you can't retrieve their location in your app), but I was wondering if anyone had maybe benchmarked a shader that calls a hugely complex function that doesn't contribute to the output, and noticed no difference between a build with or without it. I'd like some feedback for hardware other than nvidia, to be honest. If needs be, I'll have to run another pre-processing pass over my shader code to remove such things (variables and functions that either aren't used or don't contribute to the output in some other way). I don't want to write that pre-processing step, but if I have to I will.
P.S. I know you can't rely on anything, which is why you should go off the mean average approach adopted by the 3 main IHV's. It's all you can do. I think it would be madness for a driver not to do this optimisation - if you've gone to the trouble of writing a compiler, this would be a trivial thing to add.
Having said all that, it's occurred to me that they must work backwards from the outputs as they compile to assembly, so it would be a natural side-effect of this process to eliminate dead code and variables.

Lindley
10-14-2007, 08:21 AM
I assume you're using generic shader code which will be compiled to a number of more specific shaders, where various portions may or may not be used?

I've heard that shaders can understand #ifdefs, although I haven't tested that myself. And it might only be Cg. You'd have to experiment. If so, that would be the easiest way to get guarantees.

knackered
10-14-2007, 11:36 AM
Yes I know about #ifdef's, and yes they can be used in GLSL along with #if's, #elseif's, and #define's.
There are various reasons why #ifdef's will be a bit impractical for what I have planned - and what I have planned will be very, very cool.

Korval
10-14-2007, 12:32 PM
Can we completely rely on the 3 main GLSL compilers (nv,ati,intel) removing functions/variables/uniforms/varyings that don't contribute to the final output?Well, we can probably assume it will. With all the inlining and so forth that compilers have to do, it'd be pretty hard for them to include functions that aren't included.

Now, what constitutes "contribute to the final output" is probably going to be something of a problem. If you include a function you don't call, I'd be willing to assume that it will be culled. But if you include a function that you call, but through whatever logic you use doesn't actually contribute to the output, I imagine it will not be culled.

knackered
10-14-2007, 02:46 PM
Good point - everything's inline'd on all hardware up till now, isn't it? Even if they aren't, if i don't call them all they're costing is memory and enable speed I suppose....what am I worried about?

There's not going to be any attrib/variable/uniform driven conditionals, I'm pretty sure I'll be avoiding them for a good while yet.
But maybe yes it would be a problem if some getScale() function always returns 1.0 because of a #define - I can't realistically expect the compiler to optimise out a statement that multiplies by the result of a function that always returns 1.0. Or could I?

Jan
10-14-2007, 03:57 PM
What i meant earlier (not to rely on anything) was targeted at your question about a consensus. I am pretty sure, that ATI's and nVidia's compilers do a lot to optimize the hell out of the shaders. However, i am also sure, that there are big differences in how well they do it. For example nVidia's implementation might optimize one shader very well, whereas ATI might optimize another shader much better. You can expect many optimizations, you just cannot expect all compilers to optimize the same piece of shader code equally well.

A good compiler, especially one that inlines everything anyway, should have no trouble to remove your getScale function. In the end, it will just replace your function call by the functions body and in the next step it will detect that you multiply with 1.0 and remove that, too.

So, yes, you can expect such optimizations. But as long as ATI and nVidia don't release some papers, about what their compilers are actually capable of and thus TELL you, what you are allowed to expect, i wouldn't count on it.

I don't have any experience with Intel. Their drivers might suck, but since they do sell their own C++ compiler, they actually might have a very good GLSL compiler, too. Might.

Jan.

knackered
10-14-2007, 04:47 PM
Mmm, yes, there should be something in the spec about this basic optimisation stuff, because it affects how you write a shader framework. I'm not talking about using half-floats or collapsing statements into one or anything, just the redundant code removal should be a requirement.

sqrt[-1]
10-14-2007, 08:46 PM
Arr yes, stuff that a nice intermediate language spec would provide. (Korval in 3...2...1..)

I have actually seen the HLSL compiler do some really nice optimizations.

For example (psudo code):

vec4 GetNormalMap()
{
//Texture lookup
}

vec4 SomeFunction(float a)
{
...
...
...
return GetNormalMap() * a;
}

vec4 main()
{
return SomeFunction(1)+
SomeFunction(2);
}

The HLSL compiler will not insert the "SomeFunction" texture lookup code twice.
It seems that if an operation only involves uniforms/varyings/texture samplers, and if you call it more than once, the HLSL compiler will cache the result from the first call (when possible) and use it in later calls.

Yes I realize you could do these optimizations yourself, but it typically makes the code look a lot more ugly. (especially if your shaders are huge)

Also in reply to the previous example, the HLSL compiler is smart enough to realise that a

vec4 GetNormalMap()
{
return vec4(1.0);
}

ultimately results in a no-op, and does not insert the code.