View Full Version : Are low level shaders better?
08-21-2006, 08:11 AM
Do you find that low level shaders perform better than GLSL? In some cases, they can give 2 times higher FPS but I haven't done much testing. I would like to hear from you.
Also, I suspect that fixed function vs shader that does the same thing, the fixed function might give better performance since it's tuned up in the drivers.
08-21-2006, 08:25 AM
About the first question - that truelly depends on many-many cases.
As for me, I'm not used to deal with GLSL, I'm using Cg for off-line compilation and ARB_fragment_program/ARB_vertex_program with nVidia OPTION string to load exaclty that shader, which is most appropriate for currently running platform. All the optimizations may be separately done, if it is really the bottleneck.
But I don't think GLSL can compile better, then me, writing the same low-level shader, just because of my experience with it. Sometimes you know there is an instruction, which is not counted in compilation process (XPD, DPH, SSG and so on)
08-21-2006, 09:53 AM
One of the things that I've found, which is specific to NVidia, is that I can't noticably beat out the GLSL compiler. I can tie with it, but that's all I've managed to do. Here's what I'm doing so you can get a better picture of how the GPU is being used.
Usually, I have 8 textures bound for normal light rendering. This includes 4 textures from the material (bump, normal, diffuse, specular), 1 shadow buffer, 1 light mask, 1 projected texture, and 1 normalization cubemap. Most lighting models are fairly simple, but my most complex one compiles to about 80 instructions. Most of my tests were with the more complex ones, and my test was simply an assembly fragment program using ARB_precision_hint_fastest and a GLSL shader using the half types exposed by the NVidia GLSL compiler. After some pretty exhaustive tests, I found that I couldn't beat out the GLSL compiler. The best I did was tie it. However, I will say that if I dumped out the assembly, I would sometimes get more instructions from the GLSL shader. However, more instructions doesn't translate directly into worse performance. My guess is that the output was simply tailored to NVidia's best performing instruction usage.
Also, I have yet to do any performance tests on ATI hardware, but they seem very confident in their GLSL compiler. That said, I'll probably see similar results on ATI cards.
All this said, I now believe GLSL is the better option. The reason is that the assembly shaders I've written were hand optimized for the card that was in my machine at the time (NVidia). I would bet that an ATI card would run those shaders slower than if they were hand tuned for ATI. However, GLSL takes care of this and compiles down to the best shader for the given architecture (at least in theory, as well as in practice from what I've seen thus far). That said, I'm a firm believer in GLSL now.
One thing to note, I will generally prototype shaders in assembly. This lets me see where my clock cycles are going and lets me define how a shader should work in a way that is friendly for the target hardware. Then, I simply do a simple port to GLSL, dump out the assembly, and then compare with my prototype. This lets me catch performance issues that wouldn't have been obvious otherwise. For instance, a matrix-vector multiply either translates into 4 dp4's or a transpose (several MOVs) and then 4 dp4's. Seeing these things in assembly is a great way to catch these problems (I've solved things like this by transposing the matrix on the CPU, then reversing the ordering of the parameters passed to the mul() function in the GLSL shader).
08-21-2006, 10:24 AM
Originally posted by V-man:
Also, I suspect that fixed function vs shader that does the same thing, the fixed function might give better performance since it's tuned up in the drivers. Fixed function is often the same thing as shaders, only it's hand tuned for the specific card, like it is when you compile it with GLSL, only slightly better.
So today there is no real difference between fixed function, assembly shaders and GLSL if they do the same thing.
Generally speaking, fixed function, assembly, and GLSL all use the same underlying hardware on most modern GPU implementations today.
What's different is the software path that generates the underlying machine-specific microcode.
Typically fixed function is *very* well optimized. Assembly and GLSL tend to be in the same ball park today as their levels of abstraction are not all that different. Especially if you're writing "equivalent" shaders in both (that is, relatively simple).
Over time, as shaders and shading language features get more complex, I expect the cost of compilation and linking to become the bigger factor.
The language you choose should probably be more about portability and content creation tool integration. Rely on offline compilation to boil it down to efficient code.
08-21-2006, 02:07 PM
do you mean, that even assembly fragment program is "tuned-up" after loading? I'm not speaking about precision_hint_fastest, I mean "tuning" in more complex way, like instructions replacing and so on.
Yes, even the assembly profiles do register allocation and various other optimizations to avoid hazards, maximize parallelism, and generally improve perf.
GPU shader microarchitectures are too different (today) and perf matters too much to assume that any portable shader description can be translated into executable code as-is with no optimizations.
I think offline tools and an opaque binary shader loading interface like OpenGL ES has is the closest you'll get to knowing exactly what the compiled shader microcode really looks like.
08-21-2006, 07:11 PM
If GLSL and assembly shaders gave identical and ideal performance, I would be ok with it, but losing even a bit due to GLSL sucks because there is no room to spare.
I like portability but I don't like the idea of some extra MOV
About the first question - that truelly depends on many-many cases.I know. I think it could be a loss from 0% to PLENTY
With NV you might be able to just use Cg and make use of their NV extensions but ATI is another story.
08-22-2006, 12:26 AM
Originally posted by cass:
Assembly and GLSL tend to be in the same ball park today as their levels of abstraction are not all that different.This is true, and frequently ignored. ARBfp doesn't match the underlying hardware any better than GLSL. The difference is that ARBfp is a simpler interface, thus had shorter time to market.
08-22-2006, 02:49 AM
Plus, there's just no beating high level shaders for prototyping. I wouldn't go back to the ARB*p stuff if you paid me. But if in the end you determine that there's a significant difference in performance, you could always revert to another form before you ship. I imagine higher level optimizations are going to make a far bigger difference, in the days that follow.
Originally posted by Humus:
Originally posted by cass:
Assembly and GLSL tend to be in the same ball park today as their levels of abstraction are not all that different.This is true, and frequently ignored. ARBfp doesn't match the underlying hardware any better than GLSL. The difference is that ARBfp is a simpler interface, thus had shorter time to market. Note, this is why I advocate simpler interfaces for programmable hardware. Improves time to market, and lets the software layer address the language aesthetics and tools integration issues.
This investing in a software layer above the driver isn't necessarily an easy transition for OpenGL to make, but it's a worthwhile one, I think.
08-26-2006, 12:53 PM
Here is one example
you may need glew32.dll, glut32.dll
R9700, Cat 6.8, assembly, 400FPS
R9700, Cat 6.8, GLSL, 200FPS
and I recently changed the GLSL part to use glVertexAttrib and now it's even worst
R9700, Cat 6.8, GLSL, 40FPS
Can it get any worst?
08-26-2006, 05:25 PM
ARBfp: 11 ALU, 4 TEX
GLSL: 10 ALU, 4 TEX
ARBfp: 3 ALU
GLSL: 3 ALU
GLSL comes out as the winner. If there's a performance issue here it lies elsewhere. Also worth noting is that writing good code will always be more important than language. You can cut 3 instructions from both the ARBfp and GLSL code of Glass by using an interpolator instead of gl_FragCoord.xy * InvTex0Dimensions.
08-26-2006, 10:11 PM
For the ASM version
Program native instructions = 15
11 ALU and 4 TEX
Program native instructions = 3
So we are in agreement. How do you know about GLSL?
I know the code sucks because it uses immediate mode and I have glGetError everywhere, but that shouldn't be a problem.
I even changed from using a mat3 to mat4 in the glass VS
08-26-2006, 11:08 PM
I know the code sucks because it uses immediate mode and I have glGetError everywhere, but that shouldn't be a problem.And exactly why not?
I'm not an IHV, so I can't be sure, but I wouldn't be surprised if little-to-no effort was expended to make immediate mode and glslang shaders work well together. Take the 10 minutes to change over to VBOs just to make sure.
08-27-2006, 03:46 PM
Originally posted by V-man:
How do you know about GLSL?Using an internal tool.
Originally posted by V-man:
I even changed from using a mat3 to mat4 in the glass VSI see no point in doing that. In the worst case that could mean another instruction for the last line. In your case though this seems to be optimzed away since you're just adding on a zero in the end. I tried changing it back to mat3 and changing some vectors that really are scalars back to scalars. It didn't make any difference in this shader though, 34 instructions in both cases.
08-27-2006, 11:41 PM
I see no point in doing that. In the worst case that could mean another instruction for the last line. In your case though this seems to be optimzed away since you're just adding on a zero in the end. I tried changing it back to mat3 and changing some vectors that really are scalars back to scalars. It didn't make any difference in this shader though, 34 instructions in both cases.When I changed from mat3 to mat4, I was thinking in terms of what happens in the driver.
If the driver has to upload vec4, then it has to expand the matrix to a 4x4 anyway.
If it can optimize that out, then it's good because it has to also look at the FS.
I would like to change to VBO and who knows when get my complicated renderer working in my real app. I suspect that generic vertex attrib (GVA) sucks in some way.
Tracing Doom 3 shows that they do use GVA and the non-GVA in parallel.
-Not sure why they use glBlendFunc(GL_ONE,GL_ZERO);
Isn't that like disabling blending?
-glDrawElements but not glDrawRangeElements
glDrawElements(..,. UNSIGNED_INT,.....); only
-Consecutive calls to
08-28-2006, 12:20 PM
It won't expand to 4x4, but possibly to 4x3, depending on how you see it. It will of course use three different constants, but the last component may be used for some other scalar. I don't really see any benefit to use 4x4 when you can use 3x3 on the driver side either.
As for generic vertex attribs, that should not be a problem. For immediate mode I'm not sure how optimized that is, but for vertex arrays it should be fast, unless you're using a format that's not natively supported (like 3 * GL_UNSIGNED_BYTE).
10-01-2007, 10:27 AM
Well, more than a year has passed by since this conversation took place. What are the general opinions nowadays about this topic?
ASM vs GLSL performance.
10-04-2007, 10:54 AM
Try it in the shader analyser:
(Nvidia has a similar tool i think)
Normally, GLSL will be compiled to the same microcode like an equivalent ARB_FP. If not, you can usually optimize your GLSL code until it does.
I decided to switch to GLSL...
10-19-2007, 02:13 PM
Whatever suits you. But I hope there will be continued support for lower level languages.
For the use we make of shaders in our engine, the "assembly" languages are absolutely essential, crucial, fundamental, vital. If we were stuck with GLSL our engine would run at half the speed and do half the things it does. In our case, the asm stuff is vastly superior but due to the structure of the language and not how it compiles. Sorry that I can't disclose more.
10-19-2007, 04:24 PM
how do you know your engine would run at half the speed?
I can well believe it though, I still can't write a GLSL shader that's as fast as the 'fixed function' path (which is apparently a shader as well). But I really can't be bothered going back to vp/fp asm.
I assume you generate your shaders procedurally and maybe do your own optimizations? If so, i think it could be done with GLSL just as good. Whether GLSL compilers are good enough to make it equally fast is another question.
10-20-2007, 04:23 AM
Well, I am of course exaggerating, for dramatic effect and to satisfy my lousy sense of humour.
It's the particular use of procedurally generated shaders that gives us the performance boost. It could be done in GLSL but I wouldn't want to use it because it's more cumbersome, slower at compiling (done on the fly, on demand) and structurally not at all as well suited at least to my implementation.
We still have preloaded shaders (I would get rid of them) that could just as well be GLSL but I personally find vp/fp more intuititve because I find them better aimed at the purpose. From my experience, I would say high level language features should stay the hell away from shaders until GFX hardware is about a hundred times as fast as it is today, then they would either allow us to do whatever we will be doing then or else simply still waste performance.
The language you use affects what optimisations you can make. I have written the same functions in 3 considerably different languages and found that I completely restructure everything depending on the instructions/operators I have available. I still occasionally write multi-page blocks of CPU asm!
Cass said above that fixed function is *very* well optimised and I assume that is beyond our means as developers. Annoying as sometimes I would change only so little...
I'm hoping OGL3 will be much more adequate than these languages but I haven't taken the time to really get informed.
10-20-2007, 08:09 AM
Of course madoc, you are going to be missing out on any structural optimisations made in future hardware. It's much easier for an optimiser to figure out what you're trying to do with a higher level language.
But for the here and now, you probably don't care a great deal.
10-20-2007, 11:55 AM
My experience is that no IHV has ever had the faintest clue what *I* want to do, damn it. They seem know what Carmack wants to do, but I want to do something else :) .
The thing is, HLSLs will just give you some high level function, you use that and hope that someone hideously smart implemented some compiler magic that makes that your best choice.
With a low level language, as you are implementing, you notice things like: Hmm.. what if I use this instruction instead, and by using these components of this register there, and keeping that result here... Hey! I just saved 4 instructions!
You do that by clever use of language features that go beyond it's intended purpose, high level languages are all purpose and no features.
Of course, if the language you are using has nothing to do with what's going on at the hardware level then you achieve nothing more than deluding yourself but until this day my profiling of vp/fp has shown this is obviously not the case (at least not yet). Given the future as I forsee it, I don't see vp/fp becoming quite so obsolete in terms of alikeness to hardware, in fact, GLSL would be just as absolete in those terms.
Oh, I do realise the at least potential advantages of HLSLs, don't mistake me. But so far I have always had better luck optimising myself and vp/fp fulfill my needs and have other qualities that suit me well.
Powered by vBulletin® Version 4.2.2 Copyright © 2016 vBulletin Solutions, Inc. All rights reserved.