Perlin noise implementation in GLSL

StefanG · November 25, 2004, 6:49am

I got tired of waiting for hardware noise() to
appear on Nvidia and ATI cards, so I sat down
and wrote my own implementation of 2D and 3D
Perlin noise in GLSL.

http://www.itn.liu.se/~stegu/TNM022-2004/source/GLSL-noise.zip

(NOTE: code updated December 1, 2004 to correct a small bug.)

It’s not super fast on my sluggish GeForceFX 5600XT,
but it ought to be a lot faster on high-end ATI and
Nvidia cards. Post your performance numbers here if you like,
stating which card you tried it on. I’m very interested.

To get a fair performance figure, please press “W”
when the program starts to switch from a teapot
to a sphere, the teapot takes considerable time
in itself to render with some cards and drivers.
(It’s done the lazy GLUT way with glEvalMesh)

The supporting C code was written and compiled
for Windows, but it should be quite simple to
port it to any other operating system.
(I used GLFW, which can be found on SourceForge
for Windows, MacOS and various flavors of Unix.)

Stefan Gustavson

imported_Ffelagund · November 25, 2004, 10:04am

It works fine here (Wildcat Realizm 100) but your GLSL program has few compilation errors (not according with the spec)
First one: frac function does not exists on GLSL, it is fract
Second one: you are using integers in floating point expressions, and the automatic promotion is not allowed, so this causes errors.
This is the line:
return ttt*(t*(t*6-15)+10);
and may be
return ttt*(t*(t*6.0-15.0)+10.0);

It is difficult writting programs according with the spec with the NV compiler

For the rest, it is very fast here

StefanG · November 25, 2004, 10:13am

OK, thanks, fixed that and updated the archive.
The Nvidia compiler tolerates quite a lot of
things that go against (or at least beyond)
the spec.

Exactly what does “pretty fast” mean? Plus,
you’re on a 3DLabs card, so you have native
hardware noise functions. Lucky bastard.

imported_Ffelagund · November 25, 2004, 10:23am

pretty fast means: sphere 150 fps and about 75 with the teapot

StefanG · November 25, 2004, 10:49am

Ooooh. That sounds nice. Kind of makes you
wonder whether 3DLabs cards really have hardware
noise, or if they have implemented it as a shader
procedure that gets auto-included by the compiler.

For comparison, I get 28 fps with the sphere on
my Nvidia GeForce 5600XT. Not as much fun at all.

Stefan G

WyZ · November 25, 2004, 12:19pm

Hi,

I get 150 fps with the teapot, and 830 fps with the sphere on a Radeon X800 XT PE. My PC is an Athlon64 3500+.

Very nice, indeed!

ZbuffeR · November 25, 2004, 1:45pm

After forcing vsync off, my Geforce 6800 LE did :
145 fps with teapot
460 fps with sphere

Pretty nice stuff indeed.

WyZ · November 25, 2004, 3:38pm

I just tried it on a Radeon 9600 Pro, and it runs in software (0.1 fps)

I will try to make it run in hardware. I will keep you posted.

Cheers!

def · November 26, 2004, 12:52am

Here are my results:

GeForceFX 5900:

Teapot 105 fps
Sphere 200 fps

QuadroFX 4000:

Teapot 130 fps
Sphere 940 fps

GeForceFX 6800GT(Ultra):

Teapot 140 fps
Sphere 1060 fps

Interesting results…
( the famous “Best Performance question” )

StefanG · November 26, 2004, 4:27am

Wow, thanks for all the good news, everyone!
I really need better hardware to do my GLSL
development. This is great. No need to wait
for hardware noise, I just need a better card!

The software emulation fallback for the ATI 9600
is probably due to too many dependent texture
lookups. For 3D noise I compute eight gradients,
each of which requires two texture lookups, one
dependent on the other.

You could try the 2D noise instead and see
whether that fits within the hardware limits.

Stefan

StefanG · November 26, 2004, 4:54am

Each frame of the sphere animation has about
70,000 pixels of noise, so 1000 fps means over
70 M samples of noise per second. This is more
than one order of magnitude better than software
noise with Ken Perlins fixed point algorithm on
a fast CPU.

I guess there is headroom to do 4D noise as well.
Twice as much work per pixel. But I think I’ll
look into doing simplex noise instead, it’s
faster for higher dimensions.

imported_Ffelagund · November 26, 2004, 5:13am

Well, I have to say that my Realizm card is a pre-production board (I got it when there weren’t production boards) so it is not representative of Realizm cards. It has less clock frequency than ‘real’ production boards, so it is really slower than final cards. Soon I’ll receive a production board and I could talk about real speeds

Zeross · November 26, 2004, 5:22am

Originally posted by StefanG:
you’re on a 3DLabs card, so you have native
hardware noise functions. Lucky bastard. [/QB]
I’ve tried replacing your noise function with the noise1 from GLSL and on my Realizm 200 it’s slower than your implementation

Zeross · November 26, 2004, 5:31am

Originally posted by Ffelagund:
Well, I have to say that my Realizm card is a pre-production board (I got it when there weren’t production boards) so it is not representative of Realizm cards. It has less clock frequency than ‘real’ production boards, so it is really slower than final cards. Soon I’ll receive a production board and I could talk about real speeds
My Realizm 200 is a final board and it’s (a little bit) slower than your preproduction Realizm 100

Do you know the clock frequency of your card and of retail cards ?

Sorry for the OT

imported_Ffelagund · November 26, 2004, 8:44am

Sorry, I don’t know those details. But I think that a Realizm 200 has the same clock speed than a Realizm 100. (As far I know, the differences between 100 and 200, are that the 200 has genlock/framelock and more vram, but nothing about the clock)

hdg · November 29, 2004, 7:41am

This is interesting. I consistently get lower performance than others using a GeForceFX 6800 Ultra.

I got:
134 fps on the teapot
577 fps on the sphere

The teapot number is almost as high as the previous post, but the sphere number is significantly lower.

I have a dual Opteron 246 and use Forceware 70.41. I also have dual screens, each at 1280x1024 at 85 Hz.

Any idea why my system is so slow?

Aeluned · November 29, 2004, 11:49am

hdg,

I’ve run into a similar situation before.
Do you see any improvement if you configure your system for 1 monitor?

p.s.: 140fps/980fps 6800GT Dual Xeon.

hdg · November 30, 2004, 7:56am

That’s currious…

I disabled the secondary display.
The teapot stayed at 134 fps, but the sphere jumped clear up to 1230 fps.

What is so different between these two models? I assume they both have a similar number of triangle meshes and roughly the same number of pixels are being processed. What is so different?

I then reduced the refresh rate to 72 Hz and then 60 Hz and the render rates did not change at all.

Improvement from disabling the second output made me think that video refresh was interferring with rendering, but lowering the refresh rate had no effect. This doesn’t add up in my brain.

Aeluned · November 30, 2004, 8:16am

I don’t have the technical explanation for this,
but I’ve found that disabling the secondary display improves performance.

Naturally there must be some overhead in configuring the video card to output to dual displays.

hdg · November 30, 2004, 9:03am

I suspect the teapot is CPU limited rather than GPU limited, hence, no performance change when I change the video load.

But I don’t understand why changing the refresh rate does not have any impact on performance. My default setting of dual screens at 85 Hz refresh and 1280x1024 resolution requires the video card to read and display 212801024*85 = 222,822,400 pixels per second. At 3 bytes each (for 24-bit color), this is about 668 Mbytes/sec. That uses up a good portion of the GPU memory bandwidth.

By dropping to one display I reduce that load by 50%, or 334 MB/sec.

If I then drop the refresh rate to 60 Hz, the video load drops to 235 MB/sec, but the rendering rate does not improve. I guess this suggests that with one display the video interference is no longer a factor on rendering performance.

I just tested dual monitors at 60 Hz and it ran at the same rate as 85 Hz. I expected that case to run a little bit faster. I guess the ‘overhead’ of video is more complex than simply computing the number of bytes read per field.