PDA

View Full Version : ARB_Fragment_Shader



Zak McKrakem
09-20-2002, 10:42 AM
It is approved as you can see in www.opengl.org (http://www.opengl.org)
http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.txt

Sorry, the topic should be ARB_Fragment_Program

[This message has been edited by Zak McKrakem (edited 09-20-2002).]

Asgard
09-20-2002, 12:11 PM
Now that was quick. And I just wrote in another post that it'll take some more time to get the spec done. Good job! Now let's read through that monster ;-)

IT
09-20-2002, 12:15 PM
Very well then. Good weekend reading.

PH
09-20-2002, 01:11 PM
That's great. I noticed another new extension,
http://oss.sgi.com/projects/ogl-sample/registry/ATI/text_fragment_shader.txt

Is this extension extension for the 8500 ? I haven't looked at it yet but maybe ATI can answer faster than I can read http://www.opengl.org/discussion_boards/ubb/smile.gif.

Asgard
09-20-2002, 01:29 PM
Is this extension extension for the 8500 ?

Yes, it's basically a redefinition of ATI_fragment_shader with a textual instead of a procedural interface.

Dan82181
09-20-2002, 01:34 PM
ATI_text_fragment_shader is just the ATI_fragment_shader using strings ala NV_vertex_program/ARB_vertex_program. Doesn't appear to be any new functionality or abilities though http://www.opengl.org/discussion_boards/ubb/frown.gif

Dan

davepermen
09-20-2002, 01:41 PM
Originally posted by Dan82181:
ATI_text_fragment_shader is just the ATI_fragment_shader using strings ala NV_vertex_program/ARB_vertex_program. Doesn't appear to be any new functionality or abilities though http://www.opengl.org/discussion_boards/ubb/frown.gif

Dan

think about it. its in hw, build by transistors. how should they add functionality if they made the hw just that way? .. only nvidia can do this, but imho i prefer the full hw right from the beginning http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Dan82181
09-20-2002, 01:45 PM
Originally posted by davepermen:
think about it. its in hw, build by transistors. how should they add functionality if they made the hw just that way? .. only nvidia can do this, but imho i prefer the full hw right from the beginning http://www.opengl.org/discussion_boards/ubb/biggrin.gif



Wishful thinking. I've been trying to squeeze a couple of things in and was just hoping it could do maybe 3 or 4 passes, or possibly more ops per pass depending on which ops you are running. Oh well, looks like I gotta wait till I finish getting the car paid off until I get a 9700.

Dan

IT
09-20-2002, 01:53 PM
Well, hopefully 9700 drivers exposing ARB_fragment_program come out before I have my car paid off. :-) Just kidding ATI, I know you're working hard at it. Trust me, before I get my car paid off, we'll be looking at 50Ghz Pentium 20s or something.

Just browsing the docs, it looks like you can do fog/atmospheric stuff in the fragment program, so that's cool.

LaBasX2
09-20-2002, 02:09 PM
Very good news!

With arb_vertex_program and arb_fragment_program graphics coding is slowly becoming fun again http://www.opengl.org/discussion_boards/ubb/smile.gif

PH
09-20-2002, 02:37 PM
Originally posted by davepermen:
think about it. its in hw, build by transistors. how should they add functionality if they made the hw just that way? .. only nvidia can do this, but imho i prefer the full hw right from the beginning http://www.opengl.org/discussion_boards/ubb/biggrin.gif



Actually, there's some room for a bit more. You can't write to depth like you can in D3D. I would like to see that added for OpenGL ( the hardware supports it ).

Korval
09-20-2002, 02:59 PM
Actually, there's some room for a bit more. You can't write to depth like you can in D3D. I would like to see that added for OpenGL

That's what ARB_fragment_program is for http://www.opengl.org/discussion_boards/ubb/wink.gif

PH
09-20-2002, 03:06 PM
Yes, but that extension won't be supported on 8500. I don't understand why ATI doesn't expose that last bit of 8500 functionality in OpenGL.

MZ
09-20-2002, 04:07 PM
I'm very glad to see ATI_text_fragment_shader. Thanks to this, a bit of mess in GL has been cleaned.
However, the extension could be slightly better if it handled texture targets the same
way as it is done in ARB_fragment_program (ignoring target priorities).

Would be great if Nvidia made similar ext for their RC and RC+TS ...


[This message has been edited by MZ (edited 09-20-2002).]

PH
09-20-2002, 04:56 PM
I had that thought too, about NV doing something similar for their extensions. I like the fact that ATI's new extension uses the ARB functions.

Edit: And while we're at it, how about an ARB_vertex_array_object extension ? That would clean up another bit of mess.

[This message has been edited by PH (edited 09-20-2002).]

jwatte
09-20-2002, 06:41 PM
Honestly, I would prefer an ARB_vertex_array_range extension, which includes a requirement for fences as well. Maybe with some supported number of allocated memory chunks and vertex array ranges that's greater than one :-)

Honestly, I know more about how I stream and write my data than the driver does. The D3D model of using the same buffer with NOOVERWRITE and switching between different buffers with DISCARD is not at all as clean -- in my humble opinion, of course :-)

Humus
09-20-2002, 08:21 PM
Whooha! Thumbs up for the ARB! http://www.opengl.org/discussion_boards/ubb/smile.gif

Now if we could get a ARB_vertex_array_object/ARB_vertex_array_range extension too it would be awesome, best would IMO be to pack ATI_vertex_array_object and ATI_map_object_buffer and produce an ARB of it. http://www.opengl.org/discussion_boards/ubb/smile.gif

Can't wait to get my 9700, hopefully the ARB_fragment_program will be supported quite soon. Future looks bright http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Ozzy
09-20-2002, 09:22 PM
Originally posted by PH:
And while we're at it, how about an ARB_vertex_array_object extension ? That would clean up another bit of mess.



Oooh no.. Today's implementation on ATI boards are far behind VAR. Btw, Sis & Matrox are using VAO ext respectively for Xabre & Parhelia boards. But i don't know more about that. tests? :)

Humus
09-21-2002, 01:19 AM
The important thing is that we get a good interface, whether a certain implementation gives you satisfying performance or not isn't part of the equation.

knackered
09-21-2002, 04:02 AM
The dark days are coming to an end - about time too.

cass
09-21-2002, 05:19 AM
ATI and NVIDIA are working on a unified extension that should have all the desirable properties of VAO and VAR. The interface will look more like ATI's VAO.

We all realize that it's much better for everybody of we can provide a common API for this sooner rather than later.

Thanks -
Cass

PH
09-21-2002, 05:21 AM
I knew it, Cass http://www.opengl.org/discussion_boards/ubb/smile.gif. Well, at least I guessed you guys were working hard to make things nice and easy for us.

Cab
09-21-2002, 06:19 AM
Originally posted by cass:

ATI and NVIDIA are working on a unified extension that should have all the desirable properties of VAO and VAR. The interface will look more like ATI's VAO.

We all realize that it's much better for everybody of we can provide a common API for this sooner rather than later.

Thanks -
Cass

This is one of the best news I have heard in the last OpenGL years. I hope it will be an ARB extension.
ARB_Fragment_Program released as the hw is becoming available (and not one or two years later) is another really fine news. IMHO.
It is good to notice that when different hardware companies, after doing their own graphics hardware, sit together to define a common interface to make developers life easiest, they reach good solutions. I think this is what the ARB is. http://www.opengl.org/discussion_boards/ubb/wink.gif

Humus
09-21-2002, 06:31 AM
Originally posted by cass:

ATI and NVIDIA are working on a unified extension that should have all the desirable properties of VAO and VAR. The interface will look more like ATI's VAO.

We all realize that it's much better for everybody of we can provide a common API for this sooner rather than later.

Thanks -
Cass

Oh my lord, the future of OpenGL looks really bright http://www.opengl.org/discussion_boards/ubb/smile.gif
Another thumbs up.

Asgard
09-21-2002, 06:32 AM
Originally posted by Cab:
It is good to notice that when different hardware companies, after doing their own graphics hardware, sit together to define a common interface to make developers life easiest, they reach good solutions. I think this is what the ARB is. http://www.opengl.org/discussion_boards/ubb/wink.gif

Woohoo! Way to go, ATI and NVIDIA. Finally we'll have a unified interface to VAO/VAR.
Also good to see that OpenGL seems to finally take the fast lane in overtaking DirectX concerning feature up-to-dateness http://www.opengl.org/discussion_boards/ubb/smile.gif

NitroGL
09-21-2002, 11:40 AM
Originally posted by Asgard:
Woohoo! Way to go, ATI and NVIDIA. Finally we'll have a unified interface to VAO/VAR.
Also good to see that OpenGL seems to finally take the fast lane in overtaking DirectX concerning feature up-to-dateness http://www.opengl.org/discussion_boards/ubb/smile.gif

it's about time they started working together too http://www.opengl.org/discussion_boards/ubb/wink.gif

zed
09-21-2002, 11:58 AM
a big thanks from me to all the guys working in the graphics companies 3dlabs/ati/nvidia/sgi (others)
keep it going http://www.opengl.org/discussion_boards/ubb/smile.gif

Ozzy
09-21-2002, 10:03 PM
Originally posted by Humus:
The important thing is that we get a good interface, whether a certain implementation gives you satisfying performance or not isn't part of the equation.

Well, it is an important part of mine.

Frankly, to reach a point B from A.
U can always go there by foot (CVA) or use your dadcycle (VAO) but i definitely prefer my motorbike! (VAR). ;)

vshader
09-21-2002, 11:05 PM
i think the point is that even if the interface looks like VAO, nVIDIA's implementation (for example) needn't perform any worse than VAR - though i would prefer it if the new spec allows the app to handle synchronization like VAR if it wants to.

Korval
09-22-2002, 02:22 AM
VAR, as nice as it is, isn't very OpenGL like. Rather than hiding things (like synchronization issues) for you, it exposes them and expects you to do the right thing. It, also, allows you to directly allocate and touch implementation memory, something else OpenGL doesn't like to do directly.

vshader
09-22-2002, 03:26 AM
that's why it would be good if getting a direct pointer was an option, rather than the default.

still its good to have it in there if you need it. and i think the GL2 guys have the right idea by extending and generalizing the NV_FENCE concept to let the app do better synchronization. with CPU and GPU working in parallel, it's just like multi-threaded programming, and you NEED app-controlled sync, because the app knows what it's trying to do and the driver can only guess. sometimes u gotta allow these things for higher performance.. but like i said, it shouldn't be the only (or even the default) way to do it.

jwatte
09-22-2002, 07:02 AM
> i think the point is that even if the
> interface looks like VAO, nVIDIA's
> implementation (for example) needn't
> perform any worse than VAR

Yes, it does. VAO needs to implicitly set a fence (or something similary) every time you use the object. With VAR, I can tell it when to synchronize, and when not to. Synchronization costs a bit of overhead (last I heard, each fence would temporarily bubble the pipe).

PH
09-22-2002, 07:05 AM
Yes, it does. VAO needs to implicitly set a fence (or something similary) every time you use the object. With VAR, I can tell it when to synchronize, and when not to. Synchronization costs a bit of overhead (last I heard, each fence would temporarily bubble the pipe).


Isn't the fence only needed when you write data ? For example, if you use the MOB extension to get a pointer, I assume it would issue a flush ( or something ).

cass
09-22-2002, 07:42 AM
There can be funny tricks with buffer renaming that allow you to hide synchronization. For what it's worth, there will be a way to get a pointer to write your vertex data directly into. It's not entirely clear just what that will look like right now. Whatever it is, I don't think it'll look like VAR where you have one pointer for all time. The driver needs the ability to move stuff around and give you a new pointer from a potentially different type of memory.

We're going to be very careful not to take away any capabilities you have today. If you want to use VAR, it will continue to be there too -- there will just be a portable alternative with (hopefully) equal or better flexibility and performance characteristics.

Thanks -
Cass

Mazy
09-22-2002, 12:06 PM
Any idea what timeframe we are talking about?? maybe the same drivers that exposes ARB_Fragment_program? and maybe before the release of nv30?

vshader
09-22-2002, 12:42 PM
aint gonna be no ARB_fragment_program without no nv30.

btw, following ATI's lead, how about GL_NV_text_textureshader_registercombiners(using the new ARB program interface)?

henryj
09-22-2002, 12:44 PM
ARB_vertex_array_object???

Us MAC guys have had this for ages...well a couple of weeks anyway http://www.opengl.org/discussion_boards/ubb/smile.gif

OneSadCookie
09-22-2002, 01:06 PM
Well, APPLE_vertex_array_object, APPLE_fence and APPLE_vertex_array_range, available on both NVidia and ATI hardware, at any rate...

Now the question is, when are the PC drivers going to support the best OpenGL extension ever -- APPLE_client_storage (http://developer.apple.com/opengl/extensions/apple_client_storage.html) ?

[This message has been edited by OneSadCookie (edited 09-22-2002).]

Cab
09-22-2002, 01:09 PM
I like VAR. It gives me the flexibility I need. But it is also true that it is not the OGL style and its limitations of ‘just one array of memory’ limit its possible extensibility (for example to have things in different types of memory). It is not easy to use for non experienced programmers, as you can constantly see in this forum, (you have to create a small memory ‘manager’ for the allocated buffer and, possibly you have to ‘separate’ the buffer in two chunks for static and dynamic objects, ...). It will not be easy to use for a programmer learning OGL and doing small tests, apps. Or it will not be useful for small demos without a previous ‘framework’

I understand that VAO approach is a little bit more limited (you can simulate current VAO with current VAR) but it is easier, you can use it quickly in a small demo, app. With direct pointers you can do things ala VAR style. And it should not be slower than VAR. It is something similar to D3D vertex buffers and with flags like DISCARD (previous content), STATIC, DYNAMIC, ... it leaves the driver room for optimization (like buffer renaming as Cass mentioned or storing static objects in video memory, if desired). You must notice that current D3D vertex buffers have needed the evolution from D3D5, 6, and 7 (small changes in 8) to reach its state but now ‘they’ have something I find useful. Of course, this is just my opinion.

Anyway, I see any effort in a common API for transferring geometry to GPU in an efficient way, very welcome. I’m really a little tired of the lack of this feature and to be dealing with VAR, VAOs and CVA (I have dropped the support for CVA some months ago). I have the same feedback from other OGL programmers, some of them doing things in D3D just for the lack of this feature. Another 'big' missing feature, in my opinion, is the lack of a common extension for fragment_shaders for HW like GF3, GF4, Radeon 8500, P10, Parhelia,... But it is true that this limitation is just for this HW and seems be solved in future HW with arb_fragment_program while the absence of ARB_VAO (or similar) is a limitation for past, current and future HW.



[This message has been edited by Cab (edited 09-22-2002).]

jwatte
09-22-2002, 02:34 PM
> Isn't the fence only needed when you write
> data ? For example, if you use the MOB
> extension to get a pointer, I assume it
> would issue a flush ( or something ).

That would make it totally useless (well, almost) for streaming data, which is exactly what MOB is FOR.

The problem is that you only want to flush up to the point the buffer was last used, or some point "soon after" that. You don't want to flush up to the present. The only sane way of enforcing this in a driver is with some sort of testable condition for whether each buffer has completed (*). With VAR, I can make the granularity/fencepost trade-off myself, and I make it such that I won't be testing or setting a fence more than once per frame during normal load.

(*) The driver COULD keep some amount of memory around for buffer rename space, and only set fences when I've gone through that amount of memory of newly mapped buffers. However, this means that the driver, not me, has to decide how much memory I'll go through per frame, which it's unfortunately less well suited to doing than I am. Although I'm sure they'll do the best they can and get within spitting distance -- personally, I go fill rate bound on the cards that need help, anyway, so it's probably no big deal. However, these are the reasons I like the approach of VAR better than that of MAB.

Humus
09-23-2002, 12:50 AM
Originally posted by vshader:
btw, following ATI's lead, how about GL_NV_text_textureshader_registercombiners(using the new ARB program interface)?

I think that's a good idea too. That would allow me to clean up a little in my framework.

davepermen
09-23-2002, 02:00 AM
thats about the reverse cg way:
first, make a good base,
then, make as much as possible work on this base, even where there is no full support..

i hope nvidia will create some GL_NV_DX_text_pixel_shader1_3 in.. would be very cool..

oh, and a GL_NV_register_combiners would be cool, too.. for gf1 and gf2, ya know..

they could do that easily, internally they only need to call nvparse http://www.opengl.org/discussion_boards/ubb/biggrin.gif

looking for a bright future..

MZ
09-23-2002, 06:20 AM
oh please, no nvparse syntax, just good old asm :p

davepermen
09-23-2002, 08:18 AM
Originally posted by MZ:
oh please, no nvparse syntax, just good old asm :p

a) i prefer the syntax for the register combiners as it is more natural to the combiners, more clear how to use them exactly (rc hw design sucks http://www.opengl.org/discussion_boards/ubb/biggrin.gif)

b) GL_NV_DX_text_pixel_shader1_3 was the name i gave it.. guess what syntax i would like for gf3 and gf4?`!? http://www.opengl.org/discussion_boards/ubb/biggrin.gif yes. there is a DX in, and yes, there is pixelshader1.3 in.. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

fresh
09-23-2002, 10:23 AM
THANK YOU GOD

vshader
09-24-2002, 04:18 AM
Originally posted by davepermen:
GL_NV_DX_text_pixel_shader1_3 was the name i gave it.. guess what syntax i would like for gf3 and gf4?`!? http://www.opengl.org/discussion_boards/ubb/biggrin.gif yes. there is a DX in, and yes, there is pixelshader1.3 in.. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

hate to burst your bubble, but considering microsoft's IP claim on ARBvp (which looks nothing like DX vertex shaders now except a few of the ops and their 3-letter codes), do you think they're gonna allow GL_NV_DX_text_pixel_shader_1_3?

i agree its a good idea to combine the texture shaders and rc's in one program, but i can't see MS allowing an openGL implementation of one of the best things going for DX8.

afterthought: although, didn't NV co-design the DX8 vs and ps languages with MS? still, i don't think they could do it without MS licensing...

MZ
09-24-2002, 05:10 AM
vshader,
If what you said were true, than MS would "disallow" both ATI_text_fragment_shader and ATI_fragment_shader as well, since they are based on DX PS 1.4

vshader
09-24-2002, 06:53 AM
they are based on the same technology, but they don't use the same syntax...there's no "SampleMap" in ps1.4

but really, i don't know.

i'm just guessing that if MS can claim IP on ARBvp, then you'd have to think they could do so (and more) on an openGL extension that was just a copy of the DX8 ps1.3 spec.

edit: and btw, isn't it more that ps1.4 is based on ATI technology? again i'm guessing, but ps1.0 - 1.3 were basically written around the GF3 spec, and i assumed that lobbying from ATI got ps1.4 in DX8.1 so the API wasn't so biased towards nVIDIA.

i'd be really interested if anyone knows the full story behind that ... how much was written to the hardware, rather than hardware being made to the spec?

[This message has been edited by vshader (edited 09-24-2002).]

Dan82181
09-24-2002, 08:35 AM
ATI's ATI_fragment_shader adds some capabilities to "fragment program"s that aren't possible in DX8.1/PS1.4.

While ATI_fragment_shader is missing the capabilities of depth buffer output, there could be a good reason for it. When looking at the OpenGL 'machine', and taking what I known about optimizing fragment generation, I think this is where you run into a problem. I think ATI's "HyperZ" may run afoul with a "depth fragment program" on OpenGL, I think that the color (RGBA) portion of fragments aren't generated until after they pass the scissor, stencil, and depth tests. Just my speculation, maybe Evan or Jason or some other ATI guys/gals could shed more light.

But the great thing about ATI_fragment_shader is the source register modifiers, which aren't available with DX8.1/PS1.4. The 2X_BIT_ATI, BIAS_BIT_ATI, COMP_BIT_ATI, and NEGATE_BIT_ATI are really nice features to have. Like expanding a range compressed vector ([0,1] -> [-1,1]). Normally (DX8.1/PS1.4), this would require an entire shader op just to blow it up. In OpenGL (ATI_fragment_shader), just use GL_2X_BIT_ATI|GL_BIAS_BIT_ATI as the source modifier, and you have your expanded vector (you could even add the GL_NEGATE_BIT_ATI if you needed the oposite direction too). And you can even do that for all 3 registers in a 3 register op http://www.opengl.org/discussion_boards/ubb/biggrin.gif , and still have destination modifiers http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.opengl.org/discussion_boards/ubb/biggrin.gif

While it would be nice to have the depth capabilities, I think I'd rather have the source register modifiers, seeing as they add much more value (to me atleast) to fragment generation.

Dan

fresh
09-24-2002, 05:14 PM
Wouldn't having a depth fragment output totally ruin the early z test / hyperz optimizations? Maybe the shader compiler can determine whether or not the program is writing to the depth component and optionally enable/disable the z test optimizations.

sqrt[-1]
09-24-2002, 08:05 PM
Originally posted by Dan82181:

But the great thing about ATI_fragment_shader is the source register modifiers, which aren't available with DX8.1/PS1.4. The 2X_BIT_ATI, BIAS_BIT_ATI, COMP_BIT_ATI, and NEGATE_BIT_ATI are really nice features to have. Like expanding a range compressed vector ([0,1] -> [-1,1]). Normally (DX8.1/PS1.4), this would require an entire shader op just to blow it up. In OpenGL (ATI_fragment_shader), just use GL_2X_BIT_ATI|GL_BIAS_BIT_ATI as the source modifier, and you have your expanded vector (you could even add the GL_NEGATE_BIT_ATI if you needed the oposite direction too). And you can even do that for all 3 registers in a 3 register op http://www.opengl.org/discussion_boards/ubb/biggrin.gif , and still have destination modifiers http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.opengl.org/discussion_boards/ubb/biggrin.gif

While it would be nice to have the depth capabilities, I think I'd rather have the source register modifiers, seeing as they add much more value (to me atleast) to fragment generation.

Dan

Uh.... I am almost 100% certain D3D PS1.4 includes all the source and destination modifiers that you mentioned.

davepermen
09-25-2002, 07:33 AM
Originally posted by fresh:
Wouldn't having a depth fragment output totally ruin the early z test / hyperz optimizations? Maybe the shader compiler can determine whether or not the program is writing to the depth component and optionally enable/disable the z test optimizations.



you answered yourself.. http://www.opengl.org/discussion_boards/ubb/biggrin.gif

Dan82181
09-25-2002, 11:59 AM
Originally posted by sqrt[-1]:
Uh.... I am almost 100% certain D3D PS1.4 includes all the source and destination modifiers that you mentioned.

I know DX has the destination modifier, but the only source modifier I've ever seen has been then negate modifier, I've never seen the bias, comp, or 2x modifiers though, so I assumed they don't exist. In several DX-PS examples I've seen in the past, I've always seen people use a MAD op to expand range compressed vectors (hence my speculation for their non-existance). That's not to say they don't exist, I've just never seen them in all examples I've looked through in the past, so I could very well be wrong. Anyone here know for sure?!

Dan

davepermen
09-25-2002, 12:08 PM
http://msdn.microsoft.com/library/defaul...erModifiers.asp (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dx8_vb/directx_vb/Graphics/Reference/Shader/Pixel/Modifiers/SourceRegisterModifiers.asp)

Asgard
09-25-2002, 12:08 PM
Originally posted by Dan82181:
That's not to say they don't exist, I've just never seen them in all examples I've looked through in the past, so I could very well be wrong. Anyone here know for sure?!

They exist in almost all pixel shader versions. _x2 only in ps version 1.4.

For reference see http://msdn.microsoft.com/library/defaul...s/Modifiers.asp (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dx8_c/directx_cpp/Graphics/Reference/Shader/Pixel/Modifiers/Modifiers.asp)

zeckensack
09-25-2002, 01:34 PM
Originally posted by fresh:
Wouldn't having a depth fragment output totally ruin the early z test / hyperz optimizations? Maybe the shader compiler can determine whether or not the program is writing to the depth component and optionally enable/disable the z test optimizations.

I don't think the R200 has any of these early discard features. I may be wrong though.

The Geforce4Tis do have them, and they too can write depth. I guess it's just as you said, that function is disabled if the shader modifies depth.

Dan82181
09-25-2002, 02:28 PM
Originally posted by Asgard:
They exist in almost all pixel shader versions. _x2 only in ps version 1.4.


Well, I guess since PS1.4 came with DX8.1, those examples I saw must have been DX8.0 PSers, which would explain why I never saw a 2x modifier (nor had I seen the comp before) and why extra ops had to be used to expand normals. Thanks!


I also noticed limitations in the DX spec about the combination of modifiers, notabily with the invert and with regard to contants. Those alone are two things I've had to do several times. Definately makes my glad I don't use DX http://www.opengl.org/discussion_boards/ubb/biggrin.gif



Originally posted by zeckensack:
I don't think the R200 has any of these early discard features. I may be wrong though.
The Geforce4Tis do have them, and they too can write depth. I guess it's just as you said, that function is disabled if the shader modifies depth.


Pulling the data stright off of ATI's website
http://www.ati.com/products/pc/radeon8500le/faq.html




Q25: What is HYPER Z™ II?
A25: Z-buffer data is a primary consumer of graphics memory bandwidth, which is the performance bottleneck of most graphics applications. Hence, any reduction in Z-buffer memory bandwidth consumption will result in performance dividends. HYPER Z™ II is a technology that makes Z-buffer bandwidth consumption more efficient by implementing the following memory architecture features:
1) Fast Z clear
2) Z compression
3) Hierarchical Z
HYPER Z™ II is second-generation technology, while other competing technologies have only been introduced for the first time. This results in a more robust and efficient implementation.


Q26: Other graphics manufacturers are claiming new memory bandwidth saving techniques. How does this compare to HYPER Z™ II?

A26: Like HYPER Z™ II, other graphics manufacturer optimizes memory bandwidth. Both HYPER Z™ II and competitor’s solution offer lossless Z-buffer compression. Both technologies attempt to discard polygons that are occluded by other polygons (a process called “occlusion culling”). In this respect, HYPER Z™ II is far superior. HYPER Z™ II saves the GPU from rendering over 14 billion pixels per second, while, it is estimated competitor’s only discards 3.2 billion. Fast Z clear has no counterpart in competitor’s architecture.


I'm guessing that some sort of "early Z out" is present.

Now, I don't claim to be an expert, but considering when the 8500 came out (more specifically, the timeframe in which the chip as being developed), it would seem like a rather smart move by a graphics chip company to not go though the trouble of caring about the value of the pixel being written to the color buffer before it gets sent to the scissor, stencil, or depth testing units (note about alpha test below). You would just be eating away at any and all performance you have. What seems to confuse me is this diagram...

http://www.ati.com/developer/sdk/RadeonSDK/Html/Info/RadeonPixelPipeline.html
(keep in mind that it was for the original Radeon/Radeon 7500)

You would think that the color value of the pixel is determined before the scissor test starts. I'm thinking that when a fragment is generated by a "fragment program", that the alpha test either gets skipped, or moved to after the depth test and before the alpha blend (if thats possible). You certainly wouldn't want to go though the trouble of running pixels though a large, complex "fragment program" to only have it killed by the scissor, stencil, or depth test. There may be a reason why it is possible for the "depth fragment program" under DX but not under OpenGL. Since I don't work for ATI, I couldn't tell you for sure what's going on in the chip or the drivers, or if it is an IP thing with MS.

Dan

[This message has been edited by Dan82181 (edited 09-25-2002).]

vshader
09-25-2002, 05:33 PM
you are talking about deferred shading - not doing shading calcs until framebuffer contents are determined.

only the Kryo board does this AFAIK. the ATI board has the early Z optimization, but i'm sure fragments are shaded before they get alpha/stencil/scissor wetc tested - that's why the pipeline diagram looks like that.

think about it - if it didn't, it would have to store, for each framebuffer fragment, enough state info so it could go back and apply the fragment shader or texture environment or whatever when you call glSwapBuffers() - that's when the framebuffer contents have finally been determined. that's a lot of extra data per fragment ...

marketing material for the Kryo board has a bit of info on deferred shading. it gets round the extra memory prob by only doing small tiles of the framebuffer at a time ... i think. i'm a bit hazy on the whole thing.

[This message has been edited by vshader (edited 09-25-2002).]

Korval
09-25-2002, 06:22 PM
i'm sure fragments are shaded before they get alpha/stencil/scissor wetc tested

Why? Only the Alpha test is guarenteed to have anything to do with the output of the per-fragment operations. It is easy enough to move depth, stencil, and scissor tests to the beginning of the fragment pipe. That way, if the test fail, you don't try to fetch a texture (or 4) or run a complicated fragment program.

The only time you have to (or even should) run any of these tests after the fragment stages is if those programs are going to change the results of the test. As long as the program doesn't write to the depth (or alter the depth value), then there's no need to put the depth test after fragment processing.


if it didn't, it would have to store, for each framebuffer fragment, enough state info so it could go back and apply the fragment shader or texture environment or whatever when you call glSwapBuffers()

Um, no. Observe:

OK, you're scanconverting a triangle. You get to a pixel. Now, you have fragment information. The thing is, you also have all the info you need to do depth, stencil, and scissor tests. You may as well do those now. Once you're done, if the pixel wasn't culled, you go ahead and apply the fragment information to compute the color. Then, based on the alpha test and blend mode, you apply this color. Then, you go on to the next pixel. There's no need to retain the fragment information until swap buffers is called.

vshader
09-25-2002, 07:54 PM
yes Korval, you're very right and what i said was kinda dumb... you gain less by doing the tests as the fragments come rather than waiting till the buffer is finalised (compared to the deferred shading algorithm)... but don't you think if the cards did it that way the marketing spiel would trumpet it like they do the Z optimizations? i dunno, i just think it's strange for ATI to publish a pipeline diagram that makes the system look less efficient than it is... all their 9700 diagrams have Alpha and stencil tests after the frag programs.

Dan82181
09-25-2002, 09:48 PM
Well, I think I found something about the 8500's pixel pipeline. Have a look at this file...

http://www.ati.com/technology/hardware/pdf/smartshader.pdf

It shows that both fixed-function and programmable pixel shaders occur before fog blending and "visibility testing". I am going to go out on a limb here and assume that "visibility testing" includes our scissor, alpha, stencil, and depth tests. To be honest, I don't really see that being true when reading what ATI said about the HyperZ technology. Common sense would tell a person to design both the chip and the driver to perform the scissor, stencil, and depth tests before any other fragment generation (duh!). If a depth test after fragment generation is needed, you could leave the depth value in a register/cache and perform another depth compare after the color calculation. Really wouldn't be that expensive considering that you aren't having to fetch the value a second time (atleast I don't think it would be that expensive). The only way I know of us getting a definate answer about this is to wait until someone from ATI pokes their head in and lets us know what's going on.

Dan

t0y
09-25-2002, 10:21 PM
I might be jumping in wrong conclusions here, but isnt HyperZ/variants a "crude" (not exact) method of early discarding of pixels, thus not making part of those tests at all?

My radeon R100 has this capability, but there were many drivers that showed artifacts because it wasn't working properly (maybe it's a hardware thing and it's currently disabled, I don't know).

Since it works together with Z-compress (or whatever it's called) it may be sharing some of the extra data structures to keep some rough estimate on the minimum Z value a particular area of the depth-buffer.

Humus
09-26-2002, 01:08 AM
What HyperZ does is to cull 8x8 blocks depending on what depth it will have. It's easy to store a min and max value for each block and first calculate the min and max for the incoming tile and discard the whole tile if you know everything will fail, and disable depth reads if you know everything will pass.