PDA

View Full Version : Perlin noise implementation in GLSL



StefanG
11-25-2004, 07:49 AM
I got tired of waiting for hardware noise() to
appear on Nvidia and ATI cards, so I sat down
and wrote my own implementation of 2D and 3D
Perlin noise in GLSL.

http://www.itn.liu.se/~stegu/TNM022-2004/source/GLSL-noise.zip

(NOTE: code updated December 1, 2004 to correct a small bug.)

It's not super fast on my sluggish GeForceFX 5600XT,
but it ought to be a lot faster on high-end ATI and
Nvidia cards. Post your performance numbers here if you like,
stating which card you tried it on. I'm very interested.

To get a fair performance figure, please press "W"
when the program starts to switch from a teapot
to a sphere, the teapot takes considerable time
in itself to render with some cards and drivers.
(It's done the lazy GLUT way with glEvalMesh)

The supporting C code was written and compiled
for Windows, but it should be quite simple to
port it to any other operating system.
(I used GLFW, which can be found on SourceForge
for Windows, MacOS and various flavors of Unix.)

Stefan Gustavson

Ffelagund
11-25-2004, 11:04 AM
It works fine here (Wildcat Realizm 100) but your GLSL program has few compilation errors (not according with the spec)
First one: frac function does not exists on GLSL, it is fract
Second one: you are using integers in floating point expressions, and the automatic promotion is not allowed, so this causes errors.
This is the line:
return t*t*t*(t*(t*6-15)+10);
and may be
return t*t*t*(t*(t*6.0-15.0)+10.0);

It is difficult writting programs according with the spec with the NV compiler ;)

For the rest, it is very fast here :)

StefanG
11-25-2004, 11:13 AM
OK, thanks, fixed that and updated the archive.
The Nvidia compiler tolerates quite a lot of
things that go against (or at least beyond)
the spec.

Exactly what does "pretty fast" mean? Plus,
you're on a 3DLabs card, so you have native
hardware noise functions. Lucky bastard. :)

Ffelagund
11-25-2004, 11:23 AM
pretty fast means: sphere 150 fps and about 75 with the teapot :p

StefanG
11-25-2004, 11:49 AM
Ooooh. That sounds nice. Kind of makes you
wonder whether 3DLabs cards really have hardware
noise, or if they have implemented it as a shader
procedure that gets auto-included by the compiler.

For comparison, I get 28 fps with the sphere on
my Nvidia GeForce 5600XT. Not as much fun at all.

Stefan G

WyZ
11-25-2004, 01:19 PM
Hi,

I get 150 fps with the teapot, and 830 fps with the sphere on a Radeon X800 XT PE. My PC is an Athlon64 3500+.

Very nice, indeed! :)

ZbuffeR
11-25-2004, 02:45 PM
After forcing vsync off, my Geforce 6800 LE did :
145 fps with teapot
460 fps with sphere

Pretty nice stuff indeed.

WyZ
11-25-2004, 04:38 PM
I just tried it on a Radeon 9600 Pro, and it runs in software (0.1 fps) :(

I will try to make it run in hardware. I will keep you posted.

Cheers!

def
11-26-2004, 01:52 AM
Here are my results:

GeForceFX 5900:
---------------
Teapot 105 fps
Sphere 200 fps

QuadroFX 4000:
--------------
Teapot 130 fps
Sphere 940 fps

GeForceFX 6800GT(Ultra):
------------------------
Teapot 140 fps
Sphere 1060 fps

Interesting results...
( the famous "Best Performance question" )

StefanG
11-26-2004, 05:27 AM
Wow, thanks for all the good news, everyone!
I *really* need better hardware to do my GLSL
development. This is great. No need to wait
for hardware noise, I just need a better card!

The software emulation fallback for the ATI 9600
is probably due to too many dependent texture
lookups. For 3D noise I compute eight gradients,
each of which requires two texture lookups, one
dependent on the other.

You could try the 2D noise instead and see
whether that fits within the hardware limits.

Stefan

StefanG
11-26-2004, 05:54 AM
Each frame of the sphere animation has about
70,000 pixels of noise, so 1000 fps means over
70 M samples of noise per second. This is more
than one order of magnitude better than software
noise with Ken Perlins fixed point algorithm on
a fast CPU.

I guess there is headroom to do 4D noise as well.
Twice as much work per pixel. But I think I'll
look into doing simplex noise instead, it's
faster for higher dimensions.

Ffelagund
11-26-2004, 06:13 AM
Well, I have to say that my Realizm card is a pre-production board (I got it when there weren't production boards) so it is not representative of Realizm cards. It has less clock frequency than 'real' production boards, so it is really slower than final cards. Soon I'll receive a production board and I could talk about real speeds :)

Zeross
11-26-2004, 06:22 AM
Originally posted by StefanG:
you're on a 3DLabs card, so you have native
hardware noise functions. Lucky bastard. :) [/QB]I've tried replacing your noise function with the noise1 from GLSL and on my Realizm 200 it's slower than your implementation ;)

Zeross
11-26-2004, 06:31 AM
Originally posted by Ffelagund:
Well, I have to say that my Realizm card is a pre-production board (I got it when there weren't production boards) so it is not representative of Realizm cards. It has less clock frequency than 'real' production boards, so it is really slower than final cards. Soon I'll receive a production board and I could talk about real speeds :) My Realizm 200 is a final board and it's (a little bit) slower than your preproduction Realizm 100 :confused:

Do you know the clock frequency of your card and of retail cards ?

Sorry for the OT ;)

Ffelagund
11-26-2004, 09:44 AM
Sorry, I don't know those details. But I think that a Realizm 200 has the same clock speed than a Realizm 100. (As far I know, the differences between 100 and 200, are that the 200 has genlock/framelock and more vram, but nothing about the clock)

hdg
11-29-2004, 08:41 AM
This is interesting. I consistently get lower performance than others using a GeForceFX 6800 Ultra.

I got:
134 fps on the teapot
577 fps on the sphere

The teapot number is almost as high as the previous post, but the sphere number is significantly lower.

I have a dual Opteron 246 and use Forceware 70.41. I also have dual screens, each at 1280x1024 at 85 Hz.

Any idea why my system is so slow?

Aeluned
11-29-2004, 12:49 PM
hdg,

I've run into a similar situation before.
Do you see any improvement if you configure your system for 1 monitor?

p.s.: 140fps/980fps 6800GT Dual Xeon.

hdg
11-30-2004, 08:56 AM
That's currious...

I disabled the secondary display.
The teapot stayed at 134 fps, but the sphere jumped clear up to 1230 fps.

What is so different between these two models? I assume they both have a similar number of triangle meshes and roughly the same number of pixels are being processed. What is so different?

I then reduced the refresh rate to 72 Hz and then 60 Hz and the render rates did not change at all.

Improvement from disabling the second output made me think that video refresh was interferring with rendering, but lowering the refresh rate had no effect. This doesn't add up in my brain.

Aeluned
11-30-2004, 09:16 AM
I don't have the technical explanation for this,
but I've found that disabling the secondary display improves performance.

Naturally there must be some overhead in configuring the video card to output to dual displays.

hdg
11-30-2004, 10:03 AM
I suspect the teapot is CPU limited rather than GPU limited, hence, no performance change when I change the video load.

But I don't understand why changing the refresh rate does not have any impact on performance. My default setting of dual screens at 85 Hz refresh and 1280x1024 resolution requires the video card to read and display 2*1280*1024*85 = 222,822,400 pixels per second. At 3 bytes each (for 24-bit color), this is about 668 Mbytes/sec. That uses up a good portion of the GPU memory bandwidth.

By dropping to one display I reduce that load by 50%, or 334 MB/sec.

If I then drop the refresh rate to 60 Hz, the video load drops to 235 MB/sec, but the rendering rate does not improve. I guess this suggests that with one display the video interference is no longer a factor on rendering performance.

I just tested dual monitors at 60 Hz and it ran at the same rate as 85 Hz. I expected that case to run a little bit faster. I guess the 'overhead' of video is more complex than simply computing the number of bytes read per field.

StefanG
11-30-2004, 11:39 AM
The video output bandwidth, i.e. for moving pixel data from the front buffer to the display, is probably separate from the rendering bandwidth, i.e. writing pixels to the back buffer. At least that's what I would expect from good, high performance double buffer hardware.

On the cards I have tried, configuring the desktop for a simple vertical or horizontal span over two identical monitors gives comparable performance to a single display, while using the NVidia feature DualView can give me a drop in performance to about 50%, at least if I run at higher resolutions. My guess is that the DualView driver renders the screen image to a separate pixel buffer and then copies the pixel data to the actual display buffer.

Regarding the teapot vs sphere performance:

If you have a look at my code, you can see that I was very lazy when I wrote the teapot rendering code. I just cut and pasted the GLUT code, which uses glEvalMesh() to render quads from a bicubic patch description. Unfortunately, the API entry glEvalMesh() is not hardware accelerated at all on most existing OpenGL implementations, so the teapot model is CPU limited. I tried creating a display list for the teapot, but the display list captures the glEvalMesh() call as such, it doesn't expand it into separate hardware-accelerated triangles.

Zeross
11-30-2004, 11:53 AM
Originally posted by Aeluned:
I don't have the technical explanation for this,
but I've found that disabling the secondary display improves performance.

Naturally there must be some overhead in configuring the video card to output to dual displays.I've seen the same thing. With Doom 3 and some OpenGL demos from nVidia like Dawn or Dusk the frame rate on my system is really low but if I disable the secondary display it's multiplied by 10.

Haven't seen the same thing with DirectX applications.

hdg
11-30-2004, 01:19 PM
Originally posted by StefanG:
The video output bandwidth, i.e. for moving pixel data from the front buffer to the display, is probably separate from the rendering bandwidth, i.e. writing pixels to the back buffer. At least that's what I would expect from good, high performance double buffer hardware.

On the cards I have tried, configuring the desktop for a simple vertical or horizontal span over two identical monitors gives comparable performance to a single display, while using the NVidia feature DualView can give me a drop in performance to about 50%, at least if I run at higher resolutions. My guess is that the DualView driver renders the screen image to a separate pixel buffer and then copies the pixel data to the actual display buffer.

"high performance double buffer hardware" went away several years ago. Video cards now have a unified memory which stores front and back buffers AND textures (along with other stuff) all in the same memory. Thus, the display refresh may impact rendering performance if the GPU is memory bandwidth limited.

I tried running in horizontal span mode and the sphere render rate went up to 1314 fps! That is better than with a single display.

This is really weird. I hope it is just a driver bug so that it can be fixed. Dualview and Horizontal Span both display two screens at 1280x1024 (in my case). Dualview renders to two independent windows of 1280x1024 each, while Horizontal Span renders to a single 2560x1024 window. The same number of total pixels, just different window settings. I would think that a fragment program limited application should run about the same rate in either mode. But they don't, so your theory of nVidia doing a pixel copy seems to make sense, although I can't image why they would do that.

StefanG
12-01-2004, 03:44 AM
Just to let you know:

The noise shader code in the zip file has been updated somewhat, in response to a comment. I did a silly mistake and sampled the texture right at the edge between texels, which gave some visual glitches. The correct half-texel offset is in there now.

Also, the gradient texture used only the least significant bits and was basically black in RGB, which is a potential problem if texture compression kicks in and fudges small texel-to-texel differences. I scaled the values up to make them more robust. Nobody has reported a problem with this, but it seemed more safe.

I also did a rewrite of the comment header in the fragment shader to include some reference to your encouraging benchmarks.

V-man
12-01-2004, 03:10 PM
Originally posted by hdg:
"high performance double buffer hardware" went away several years ago. Video cards now have a unified memory which stores front and back buffers AND textures (along with other stuff) all in the same memory. Thus, the display refresh may impact rendering performance if the GPU is memory bandwidth limited.
It should not effect much since current memory has bandwidth in the gigabytes/sec.

jwatte
12-01-2004, 06:32 PM
I think 3dlabs still sells hardware with separate framebuffer and texture memory (and the texture memory is fully virtualized, so they page in only what's needed).

Anyway, the point is good: the Radeon 9700 Pro came out over two years ago, and had 20 GB/s memory bandwidth at the time. It's doubled since then, more or less (GF 6800 Ultra is rated at 35 GB/s).