I got tired of waiting for hardware noise() to
appear on Nvidia and ATI cards, so I sat down
and wrote my own implementation of 2D and 3D
Perlin noise in GLSL.
(NOTE: code updated December 1, 2004 to correct a small bug.)
It’s not super fast on my sluggish GeForceFX 5600XT,
but it ought to be a lot faster on high-end ATI and
Nvidia cards. Post your performance numbers here if you like,
stating which card you tried it on. I’m very interested.
To get a fair performance figure, please press “W”
when the program starts to switch from a teapot
to a sphere, the teapot takes considerable time
in itself to render with some cards and drivers.
(It’s done the lazy GLUT way with glEvalMesh)
The supporting C code was written and compiled
for Windows, but it should be quite simple to
port it to any other operating system.
(I used GLFW, which can be found on SourceForge
for Windows, MacOS and various flavors of Unix.)
It works fine here (Wildcat Realizm 100) but your GLSL program has few compilation errors (not according with the spec)
First one: frac function does not exists on GLSL, it is fract
Second one: you are using integers in floating point expressions, and the automatic promotion is not allowed, so this causes errors.
This is the line: return ttt*(t*(t*6-15)+10);
and may be return ttt*(t*(t*6.0-15.0)+10.0);
It is difficult writting programs according with the spec with the NV compiler
Ooooh. That sounds nice. Kind of makes you
wonder whether 3DLabs cards really have hardware
noise, or if they have implemented it as a shader
procedure that gets auto-included by the compiler.
For comparison, I get 28 fps with the sphere on
my Nvidia GeForce 5600XT. Not as much fun at all.
Wow, thanks for all the good news, everyone!
I really need better hardware to do my GLSL
development. This is great. No need to wait
for hardware noise, I just need a better card!
The software emulation fallback for the ATI 9600
is probably due to too many dependent texture
lookups. For 3D noise I compute eight gradients,
each of which requires two texture lookups, one
dependent on the other.
You could try the 2D noise instead and see
whether that fits within the hardware limits.
Each frame of the sphere animation has about
70,000 pixels of noise, so 1000 fps means over
70 M samples of noise per second. This is more
than one order of magnitude better than software
noise with Ken Perlins fixed point algorithm on
a fast CPU.
I guess there is headroom to do 4D noise as well.
Twice as much work per pixel. But I think I’ll
look into doing simplex noise instead, it’s
faster for higher dimensions.
Well, I have to say that my Realizm card is a pre-production board (I got it when there weren’t production boards) so it is not representative of Realizm cards. It has less clock frequency than ‘real’ production boards, so it is really slower than final cards. Soon I’ll receive a production board and I could talk about real speeds
Originally posted by StefanG:
you’re on a 3DLabs card, so you have native
hardware noise functions. Lucky bastard. [/QB]
I’ve tried replacing your noise function with the noise1 from GLSL and on my Realizm 200 it’s slower than your implementation
Originally posted by Ffelagund: Well, I have to say that my Realizm card is a pre-production board (I got it when there weren’t production boards) so it is not representative of Realizm cards. It has less clock frequency than ‘real’ production boards, so it is really slower than final cards. Soon I’ll receive a production board and I could talk about real speeds
My Realizm 200 is a final board and it’s (a little bit) slower than your preproduction Realizm 100
Do you know the clock frequency of your card and of retail cards ?
Sorry, I don’t know those details. But I think that a Realizm 200 has the same clock speed than a Realizm 100. (As far I know, the differences between 100 and 200, are that the 200 has genlock/framelock and more vram, but nothing about the clock)
I disabled the secondary display.
The teapot stayed at 134 fps, but the sphere jumped clear up to 1230 fps.
What is so different between these two models? I assume they both have a similar number of triangle meshes and roughly the same number of pixels are being processed. What is so different?
I then reduced the refresh rate to 72 Hz and then 60 Hz and the render rates did not change at all.
Improvement from disabling the second output made me think that video refresh was interferring with rendering, but lowering the refresh rate had no effect. This doesn’t add up in my brain.
I suspect the teapot is CPU limited rather than GPU limited, hence, no performance change when I change the video load.
But I don’t understand why changing the refresh rate does not have any impact on performance. My default setting of dual screens at 85 Hz refresh and 1280x1024 resolution requires the video card to read and display 212801024*85 = 222,822,400 pixels per second. At 3 bytes each (for 24-bit color), this is about 668 Mbytes/sec. That uses up a good portion of the GPU memory bandwidth.
By dropping to one display I reduce that load by 50%, or 334 MB/sec.
If I then drop the refresh rate to 60 Hz, the video load drops to 235 MB/sec, but the rendering rate does not improve. I guess this suggests that with one display the video interference is no longer a factor on rendering performance.
I just tested dual monitors at 60 Hz and it ran at the same rate as 85 Hz. I expected that case to run a little bit faster. I guess the ‘overhead’ of video is more complex than simply computing the number of bytes read per field.