Weird dFdx dFdy on ATI

Hi!

I’m using dFdx() and dFdy() in order to produce face-normals in the fragment shader (can’t use geometry shader, don’t have vertex normals). It works fine on nVidia.

On ATI… its a different story :slight_smile:
As soon as I render into a texture (using a FBO), the normals come out flipped. When rendering into a PBuffer or the normal window, everything is ok. This is the only difference to rendering into a texture (otherwise, same states, same shader etc…).

I suspect, that some how when rendering into a texture, the driver changes its fill-rules from “bottom-top” to “top-bottom”. This sign-flips the dFdy vector I get, which in turn flips my normals around (??).

Can anyone confirm this?

Also, on another card (FireGL 5600) I get black “2x2-pixel-acne” on some polygons. The dFdx / dFdy values seem to be screwed here. Is this a driver bug or a hardware limitation?

thanks.

I think that these functions should be used with a lot of parsimony…because they are very expensive.
Maybe when you use fbo on ATI cards, the differences are computed backward and forward when you use pbo or the framebuffer to compute derivatives.
But it would be very strange…
If you just want to compute face normals, why don’t you just precompute this in your application per vertex?

I think that these functions should be used with a lot of parsimony…because they are very expensive.

Why? The derivates should be “naturally” available on varyings anyway… I have not encountered a dramatic performance drop when using them.

If you just want to compute face normals, why don’t you just precompute this in your application per vertex?

True flatshading (which is want I intended) is only possible when you effectively make each vertex unique. This would bloat our dataset extremely and would drop vertex-reuse between faces to zero :-/

Ok, assuming that functions are computationally fast.
You said that you want flat shading and you computes a normal per fragment?? It is strange or there is something that I still don’t understand. I don’t think that the problem is due to your shading method.

Yes, it may sound like overkill. But on our target hardware, geometry shaders are not available. So the only possible way to produce face-normals (I could think of) was to compute them in the fragment shader. Yes, it may sound awkward, but in the end you get nicely per-pixel-flat-shaded polygons :slight_smile:

Ok, I think I understand, by “geometric shader”, you mean “vertex shader”?
I have an ATI graphic card (Radeon X1600 mobility) Maybe, I can test your code if you want and if it is not already done on this type of graphic card…

but in the end you get nicely per-pixel-flat-shaded polygons :slight_smile:

I love it! :smiley:

No, he means geometry shader. The derivative functions are pretty fast on modern hardware.

btw, you could use the geometry shader to compute the normal, as you have the plane equation for each triangle (you can access all vertices). But of course, it won’t work for ATI…

Why would they be “very” expensive? They are pretty much for free; the derivatives do already exist (for mipmapping and friends). The hardware does not have to do any extra fancy calculations, just look at the difference in values between adjacent fragments (they are processed parallel and in lockstep).

Zengar> Ok, I have looked what is a geometry shader, it seems to not be supported yet with glsl, but it is with Cg.
NeARAZ> You are right. I thought that finding the adjacent fragments would take time. But, I still wonder how does it work when derivatives are computed on the border of a triangle, It would need adjacent triangles too, or maybe it it is undefined (it is with flat shading). That is why I don’t like these functions very much…

It is, with appropriate extension. GL_EXT_geometry_shader if I am not mistaken.

Basically, all pixels (fragments) are rendered in 2x2 blocks (at least). Near the borders of the triangle, they are still processed as 2x2 blocks, some fragments being outside of the triangle. They are simply discarded from being written, but everything is still computed and thus derivatives are available.

This (that fragments are rasterized in 2x2 blocks) is the reason why drawing very small triangles (e.g. a pixel size) is inefficient for fillrate - the pixel pipelines do process many more fragments than are actually written.

And of course, the reason why derivatives are not available inside of dynamic branches is because adjacent pixels can take different code paths there, so they are no longer executing in lockstep.

Thank you very much NeARAZ, now it is perfectly clear. :slight_smile:

thats a good question !

how are derivatives computed in a single fragment triangle ?

i assume, the 2x2 lockstep computes also fragments outside of the triangle, which in some cases should lead to strange border artifacts if the derivatives are computed from variables that are not well defined outside the “extrapolated” triangle

but i am very much hoping that adjacent triangles share 2x2 blocks,
otherwise small triangles not exactly rasterizing on the 2x2 grid would be a big fragment performance hit.

“but i am very much hoping that adjacent triangles share 2x2 blocks”

They most certainly don’t.

  1. The gfx card usually does not have any adjacency information.
  2. Triangles are rendered in “random” order, when triangles x and y are adjacent, that doesn’t mean they are close in the vertex array and thus, when rendering of triangle y is started, triangle x could be done for ages.
  3. 2x2 blocks are a unit executed on triangles, not on screen (difficult to phrase that). Only because two triangles get rasterized close to each other ON SCREEN, doesn’t mean they are actually processed “close to each other” IN HARDWARE.

“otherwise small triangles not exactly rasterizing on the 2x2 grid would be a big fragment performance hit”

Yeah, you got that point right, rendering many small triangles DOES yield to many wasted cycles.

Jan.

the graphicscard has adjacency information if u render triangle/quad stripes and they are rendered in a well defined order

i dont know how gfx hardware works internally,
but it should be possible:

if u take a pool of destination 2x2 blocks,
and wait till triangles fill them up,
and issue to the fragment shader pipeline

  1. only full 2x2 blocks
  2. or partially filled 2x2 blocks to free some up if the pool is depleted

thats an easy method to share some 2x2 blocks among small triangles
if u render the trianglesstripes in a “snake” order (what u already should do to exploit the texture cache)
the gfx card should be able to reach more than 80% efficiency

“the graphicscard has adjacency information if u render triangle/quad stripes and they are rendered in a well defined order”

You are right, that at least partially adjacency information could be present and exploited. However, AFAIK this information is not used by any hardware, if so IHVs would have told us about it.

About the rest: Many papers state, that rendering many small triangles is bad for the 2x2 block design, because it is NOT used, across polygon borders. Of course i am not a hardware engineer, so i only know about the publicly available information, so if anyone has more in depth knowledge, i would really like to hear about it.

“the gfx card should be able to reach more than 80% efficiency”

My personal advice: Don’t start an argument with “i dont know how gfx hardware works internally” and finish it with absolute figures, that you pulled out of nowhere, it makes you not credible. I can agree with many things that you said, but i hate it when people make up stuff, that is not provable.

Jan.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.