My work is not significantly different from that presented last year by Ray, Cavin and Lévy. I did my work independently but on low speed in my spare time, so I was overtaken by them, hence this will probably not be accepted anywhere as a research paper.
Still, it ought to be of interest to the community. It is a more direct approach that has very much better performance and is ready for use now, so I decided to post it here.
My demo should run on any GLSL-capable hardware, even older generation and budget cards. Please tell me if you have problems running it, and I’ll try to fix it.
If you have any comments, feel free to post them here, or email me directly on the address in the article.
Ah, my fault, sorry. With the inversesqrt it does run and the result is correct, but still only with 0.1 FPS.
I assume with
“My demo should run on any GLSL-capable hardware, even older generation and budget cards.”
you meant, that it does run on older cards, but not necessarily fast. Or should it run faster?
Sorry about the rsqrt(), my Nvidia driver constantly tricks me into writing non-stantard GLSL. I’ll changed that ASAP. Sorry for the inconvenience.
Strange about that 0.1 fps on the Radeon 9700 Mobility. Perhaps that card does not support the automatic derivatives? Please try the alternate fragment shader “fragment_shader_noAA.frag” (just rename it ti “fragment_shader.frag” and edit my silly rsqrt() mistake). Does that run faster?
I would be very interested in knowing of any other problems, and some performance numbers, with ATI hardware. This should run fine on ATI 9xxx and Nvidia 5xxx series cards, but I have only been able to test it on Nvidia hardware.
With the other shader it runs smooth at over 100 FPS. And it does have a colored background, which i don’t have with the first shader. Is that intentional?
Strange about that 0.1 fps on the Radeon 9700 Mobility. Perhaps that card does not support the automatic derivatives?
I’m pretty sure that R300 class hardware doesn’t have those opcodes natively. Since Humus got good performance on his R520 class hardware, I assume that they do. I don’t know about R400 hardware, but since it was mostly a performance upgrade of R300’s, I doubt it.
runs well on a nv3x
“My work is not significantly different from that presented last year by Ray, Cavin and Lévy. I did my work independently but on low speed in my spare time, so I was overtaken by them, hence this will probably not be accepted anywhere as a research pape”
im pretty sure i saw something very similar 3-4 years ago (on flipcode IIRC) though obviously not using glsl
Originally posted by zed: runs well on a nv3x
NV3x (slow) pixel shader 2.a with derivate support. RD3xx as far as I know is 2.b, which does not have it.
I’ve bookmarked the whole url (is this expected to remain persistent?) and I’ll try out the demo soon.
Just as a reference, it recalled me of a much different algorithm from Loop-Blinn about curve rendering (which is in fact referenced in the paper). It’s a much different problem but I wanted to point it out there.
Ported to Mac and run on 17" iMac Core Duo (Radeon X1600).
The default shader gets 350-850fps depending on how many pixels are being shaded. It’s just a black and white image.
The noAA shader gets about the same frame-rate (maybe 400-1000) but what was black with the previous shader is black/blue checked, and what was white with the previous shader is white/yellow checked, at 45 degrees to the other check pattern.
On 6600GT the performance are as follows:
980-510fps with AA, minor imperfections on closeups (quite hard to catch the eye).
1050-670fps with AA off but it blurs quite a bit.
As expected, the shader run eats only a bit of GPU in distant view (roughtly 40fps) but a lot on closeups (500fps).
> im pretty sure i saw something very similar 3-4 years ago (on flipcode IIRC) though obviously not using glsl
If you can find the reference, please e-mail me! I would be very interested indeed in seeing it.
As people have pointed out, the “_noAA” version has a checkerboard pattern in the blue channel to visualise the texels. (The code for that is in the other shader as well, but it is commented out.) The texture resolution is only 32x32 pixels, and as you can see, even the straight lines are not at right angles to the texel borders.
The demo in isolation does not really look like much, but please understand that it is intented as an illustration to the article, not the other way around.
Thanks for the info on ATI hardware not supporting derivatives. Is there some other way of doing AA in procedural shaders on those chipsets?
NV3x (slow) pixel shader 2.a with derivate support. RD3xx as far as I know is 2.b, which does not have it.
it maybe slow but still runs ok for this ie ~100-400fps
sorry stefenG (youre the fella who made the noise shader aye, sweet) no further info, it may of only been 2 years ago, my minds a bit hazy. i had a seacrh but now since flipcode is no more theres no much to go on.
Originally posted by StefanG: Is there some other way of doing AA in procedural shaders on those chipsets?
There’s a chapter called “Fast Filter-Width Estimates with Texture Maps” in the first GPU Gems book. It describes how to calculate derivatives with clever texture sampling.