PDA

View Full Version : GLSL noise fail? Not necessarily!



StefanG
03-14-2011, 07:33 AM
I read this somewhat disheartening summary in the slides from the recent GDC presentation by Bill Licea-Kane:

"Noise - Fail!"

However, that does not need to be the case any longer. Recent development by Ian McEwan at Ashima Art has given us a new take on hardware friendly noise in GLSL:

https://github.com/ashima/webgl-noise

It might not seem like much, but his algorithm has all the hardware-friendly properties you want, some of which my old GLSL simplex noise demo was missing. In summary, it's fast, it's a simple include (no dependencies on texture data or uniform arrays), it runs in GLSL 1.20 and up (OpenGL 2.1, WebGL) and it scales well to a massively parallel execution because there are no memory access bottlenecks.

Concerning this, I would like to get in touch with some people in the Khronos GLSL workgroup. I was last involved in this around 2003-2004, and my contact list is badly outdated. Are any of the good people in the GLSL WG reading this? My email address is "stegu@itn.liu.se", if you want to keep this private. Just please respond, as I think this is great news.

/Stefan Gustavson

randall
03-14-2011, 11:11 AM
Great stuff. I will try it. Thanks for the info.

StefanG
03-14-2011, 05:45 PM
For those who want an easy to run demo:

http://www.itn.liu.se/~stegu/simplexnoise/GLSL-noise-ashima.zip

With the default window size, the sphere covers about 70K
pixels, so multiply the frame rate with 70,000 to get the
number of noise samples per second.
On my ATI Radeon HD 4850, I get 5700 FPS, which translates
to about 400 Msamples/s. Whee!

Windows and Linux compatible source code. (Untested on
Linux, but it should compile and run without changes.)
Windows binary (.exe) supplied for your convenience.
Uses only OpenGL 2.1 and GLSL 1.20, so it should compile
under MacOS X 10.5 as well, if you either run it from the
command line or create an application bundle and change the
file paths for the shader files to point to the right place,
e.g. "../../../GLSL-ashimanoise.frag" instead of
"GLSL-ashimanoise.frag". You also need the library GLFW
to compile the demo yourself (see www.glfw.org (http://www.glfw.org)).

trinitrotoluene
03-14-2011, 06:39 PM
I tried the Windows binary on Linux through Wine and it run great. The only "problem" is that the compiler issue a warning for the fragment shader: WARNING: 0:252: warning(#288) Divide by zero error during constant folding.



vec4 ip = 1.0 / vec4(pParam.w*pParam.w*pParam.w,
pParam.w*pParam.w,
pParam.w,0.);


Of course, changing the last parameter of vec4 to any number than 0 remove the warning and don't modify the noise texture on the sphere.

StefanG
03-15-2011, 02:17 AM
Good catch. Of course the 0. should be 1., although it does not really come into play in the calculations. I'll make sure to tell Ian and ask him to update his code as well.

randall
03-15-2011, 05:09 AM
When I run GLSL-ashimanoise demo I get 'Fragment shader compile error:' message but program runs ok.
(Windows 7, Geforce GTX 470).

StefanG
03-15-2011, 06:12 AM
That is probably the bug mentioned above. I have now fixed that in the demo. I have also cleaned up the GLSL code a little.

My own C code was also cleaned up. It's still a hack, but it's not quite as ugly anymore.

remdul
03-15-2011, 08:22 AM
Nice work.


"Noise - Fail!"
So the "return 0.0" wasn't due to IP issues but plain old laziness? Interesting twist.

StefanG
03-15-2011, 10:00 AM
Because of my long standing interest in noise, I have had some insight into the painful and drawn-out process of implementing noise() in GLSL. I would venture a guess that the problems have not been primarily because of licensing or patent issues, but for a lack of a good enough candidate, and a resulting fear of premature standardization.

A noise() function that gets implemented as part of GLSL needs to be very hardware friendly. Previous attempts have been lacking in at least some respects. Ian's code removes two memory accesses in a very clever way, by introducing a permutation polynomial and creating an elegant mapping from an integer to a 2D, 3D or 4D gradient. This is the first time I have seen a clear candidate for a standard noise() function that both runs well as stand-alone shader code *and* scales well to a massively parallel hardware implementation.

Also, a standard noise() implementation will need to remain reasonably stable over time. You can't expect people to create real time shaders using one version of noise() only to have them look different when a slightly better but different version shows up in the next generation of hardware.

In short, a standard noise() needs to be hardware *and* software friendly, to enable an efficient implementation in silicon but also allow for a shader fallback with good performance. A standard also needs to be good enough to keep around for a long time. This code delivers on both accounts, I think.

ZbuffeR
03-15-2011, 10:47 AM
Very interesting.

one version of noise() only to have them look different when a slightly better but different version shows up
To me this is the biggest problem.
Indeed, I saw some problems in Renderman-compliant pipelines because noise() was differently implemented.

Having a 'custom noise done in GLSL code' makes your shader deterministic and more portable.

StefanG
03-15-2011, 12:53 PM
Having a 'custom noise done in GLSL code' makes your shader deterministic and more portable.

Agreed, and this is what this code does well, right now. The fact that it also manages with GLSL 1.20 amazes me, because that means a direct portability across the board: OpenGL, WebGL, OGL ES.

For me, the ideal situation would be to have a choice between a "shader noise", where you had complete control and perfect repeatability across platforms, and a ten times faster "hardware noise" that came at about the same cost as a texture lookup, to use for shaders with lots of noise components. In software shading, it is both common and useful to have dozens of noise components in a shader.

(Anybody from the Khronos GLSL workgroup reading this?)

Alfonse Reinheart
03-15-2011, 01:19 PM
a ten times faster "hardware noise" that came at about the same cost as a texture lookup, to use for shaders with lots of noise components.

I seriously doubt that IHVs are going to start building hardware components for doing noise computations. Hardware makers have been moving away from dedicated hardware functionality for some time, to the point where texture unit hardware on some platforms doesn't even do filtering logic anymore.


Anybody from the Khronos GLSL workgroup reading this?

And what if they are? The ARB only controls the specification, and the specification already has noise functions and explains what they should do. The ARB cannot make implementers implement the noise function with any particular algorithm; the IHVs have to do that themselves.

StefanG
03-15-2011, 05:11 PM
The reason I want to get in touch with the GLSL workgroup was mentioned in my original post. I had a long and interesting discussion with them a few years back on hardware-friendly noise algorithms. That time, it stumbled partly on not having a good enough algorithm to recommend as the standard, partly on not having enough processing power in a typical low end GPU. Circumstances have now changed, and I would like them to at least consider reopening the discussion. The GLSL specification is currently way too unspecific on what to implement for noise(), and nobody seems to want to take the first step. I think a recommendation and a reference implementation would do some good towards making that happen.

You are of course right in saying that nobody can force HW manufacturers to support noise in GLSL, but at least they can be handed a well proven algorithm that is recommended as the standard, and get a reference implementation in GLSL code with good performance that could be silently included by the driver when a call to noise() is found in a shader. The market could take it from there. If people start using noise for visual complexity, hardware will be designed to speed it up. Seeing how tremendously important noise is in software shading, I think it could be pretty useful for hardware shading as well.

You are probably right in counting out a custom hardware "noise unit", at least for the near future, but one way forward from here would simply be to allow a shift in balance between texture lookup bandwidth and ALU speed for upcoming generations of GPU hardware. Procedural texturing means you no longer have to scale the texture access bandwidth with the number of GPU execution units, because many execution units can be put to good use without using any texture bandwidth at all.

(Edit: checking my mail logs, it seems I actually had that noise discussion with the OpenGL ES workgroup, not the GLSL workgroup. Sorry for any confusion. My point still stands: now is a good time to discuss this in more detail.)

Alfonse Reinheart
03-15-2011, 06:17 PM
I think a recommendation and a reference implementation would do some good towards making that happen.

Neither of which is appropriate for the OpenGL specification. Saying that algorithm X must be used limits GL implementations, and the OpenGL spec does its best to avoid that kind of specificity. The most the spec says is that the results should be no different than if algorithm X were used. This is why the spec is generally lax about what anisotropic filtering really does, as well as has a lot of leeway for multisampling implementations.

I'm all for the IHVs using this for their noise function (though they'll have to work out how compatible it is with the Artistic License 2.0). It would be great if they would actually go and implement the noise functions rather than have them return zero. But I'm not in favor of the ARB putting in the spec itself that this algorithm must be used for noise functions. That sets a dangerous precedent, and also may run afoul of IP.


The market could take it from there.

That's something I don't understand. Sure, noise-based textures can be useful in certain circumstances. But noise is not going to be the basis of most textures in, for example, games. So even if you had a single-cycle noise function, it's still not going to produce results that are as good for most cases as a well-drawn texture.


Procedural texturing means you no longer have to scale the texture access bandwidth with the number of GPU execution units, because many execution units can be put to good use without using any texture bandwidth at all.

It also means that you have to do anisotropic filtering yourself. For many textures (diffuse maps and such), I'm not sure I consider that a good tradeoff.

Now, one thing that could make it something of a good tradeoff is the current and upcoming series of on-CPU GPUs (Intel's various "bridges" and AMD's Fusion). Texture memory bandwidth takes a significant hit, so the best way to compensate is to increase shader complexity to compensate. Also, these really impact deferred rendering.

StefanG
03-16-2011, 02:26 AM
You seem to underestimate the utility of procedural shading.
True, I am a procedural geek and may be biased the other way, but you can't fully emulate a procedural pattern with a drawn texture. (The reverse is also true - they solve different problems.) Procedural patterns give you unlimited resolution, infinite texture size without tiling, arbitrary non-repeating pattern animation, enormous flexibility and variation without redrawing a bitmap, analytic derivatives to simplify and improve anisotropic filtering, and a very compact means for expressing visually complex patterns. Noise is seldom used by itself, but it is a very good complement to bitmap textures and more regular procedural patterns like contours and periodic patterns. Turbulent phenomena like water, smoke, fire, stone, dirt and mountains can also be done better if cheap procedural noise is in the toolbox. And you can save a lot on storage and texture memory bandwidth if you use it right. (I could go on, but you get the picture.)

Noise is a fundamental and very heavily used part of software shading, and the visual complexity of offline rendered SFX is very much due to procedural noise. Software shaders use drawn textures too, but procedural methods are very popular as an alternative and a complement.

All I'm saying is nothing has happened for a decade now, so perhaps the spec needs a more clear pointer to what should actually be implemented for the currently broken part of GLSL that is noise(). No vendor is implementing it to spec. I would like that to change, and I would be willing to spend significant amounts of work on it.

Regarding your concerns for IP problems, this is software, so patents are not a concern. If the current license is unsuitable, copyrights can be renegotiated (the author is very much in the loop here) or worked around by a re-implementation. There are no trade secrets, because the code is published openly. The underlying math is not protectable. "Noise" is a generic word and not a registered trademark. What problems do you see? And why should we avoid discussing how to make things better just because there might be IP concerns? If we are so afraid of taking a step forward, we will never get anywhere.

Alfonse Reinheart
03-16-2011, 03:50 AM
infinite texture size without tiling

Admittedly, I'm not exactly fully versed on noise functions, but don't the various different methods of computation become less stable as you get farther from the origin? It would be interesting to see how far you can go from the origin with this noise function before it starts not returning good results.


All I'm saying is nothing has happened for a decade now, so perhaps the spec needs a more clear pointer to what should actually be implemented for the currently broken part of GLSL that is noise(). No vendor is implementing it to spec.

True, but they're not doing it because the specification is wrong or bad or poorly specified, Everyone knows what the noise functions should do, and the spec provides a reasonable description of this while allowing IHVs freedom to implement different algorithms.

They initially didn't implement them because older hardware was flat-out incapable of it. Even using this algorithm, I don't think there are enough ALU instructions on a Radeon 9700 or even a GeForce FX to actually make it work. Now on actually good hardware, it's more a matter of nobody caring. Work is spent on things people use, not things people might use.

Also, GLSL is not a decade old. It's only been around a half-decade.

To be honest, I would go so far as to say that "noise" is not something that IHVs should be providing at all. It's just too high-level. In general, you want your noise-based images to be cross-platform. And the OpenGL spec will not (and should not) guarantee a specific noise implementation.

remdul
03-16-2011, 05:24 AM
You seem to underestimate the utility of procedural shading.[...]
One should add that, if the noise functions were actually implemented from the start, it would likely be widely used. We'd be discussing the need of consistency across hardware/vendors instead because noise would be so important to us.


don't the various different methods of computation become less stable as you get farther from the origin
Even if they do, it would still be at much larger scales than a noise lookup image would currently allow.


concerns for IP problems
Ken Perlin's patent probably covers this as it is pretty broad, and I'm sure he specifically registered it so that others could implement noise functionality freely. My earlier comment was only half serious, I don't think IP is an issue.

StefanG
03-16-2011, 07:00 AM
This is kind of a chicken and egg problem. Before noise is available nobody will use it, and there is no way of knowing for certain what people will do with it if it is made available. However, looking at its vast popularity with the RenderMan crowd, it seems like a pretty good idea to just go ahead and implement it. Recommending an algorithm and providing a reference implementation with good performace is a good start. Even if Khronos might not be the entity to formally decide on such matters, it is now time to open up the discussion. (Which is exactly what we are doing now, by the way.) My concern here is that classic Perlin noise was never standardized, and I have seen a lot of the problems caused by that. We could spare the GLSL crowd from repeating the same mistakes.

I was unclear on that "decade of not having noise in hardware". GLSL is not yet a decade old, but shader-capable hardware was introduced in 2002. I have been doing hardware accelerated procedural textures since then.

Your remark on "infinite size" is certainly correct. The size of the useful support domain ultimately depends on a floating point precision or fixed point range, but that is the case even for vertex position data and ordinary interpolated texture coordinates, so I was equating "floating point precision limited" to "infinite" to get a point across without going into a lot of detail. My apologies if I came across as bending the truth or being sloppy.

Alfonse Reinheart
03-16-2011, 12:17 PM
One should add that, if the noise functions were actually implemented from the start, it would likely be widely used.

Noise simply was not practical until relatively recent hardware (GL 3.x level). Even if the hardware could have done it before, it would have taken up most of your available ALUs, killing performance. And even on modern hardware, you'd need a fairly beefy GPU to be able to use it freely without dropping performance.


This is kind of a chicken and egg problem. Before noise is available nobody will use it

I disagree with that to an extent. If someone wanted to use noise, they have been free to implement it, whether with this algorithm or with another. And while this algorithm is certainly less resource-intensive than previous examples, it's still going to compile down to a lot of ALUs.

Therefore, use of it is primarily governed by performance. If this algorithm spurs people to use more noise, it will only be because it is faster than previous ones.

StefanG
03-16-2011, 05:18 PM
If this algorithm spurs people to use more noise, it will only be because it is faster than previous ones.

Not quite. Please read my original post. First and foremost, this version is a lot more convenient to use, as it is a pure computation without lookup tables. You just include a piece of code in your shader and call a function - no textures to create and load, no uniform arrays to initialize. This is a big improvement over previous versions. It is a true novelty and what I would consider the key feature. The algorithm is actually somewhat slower than my old demo on current mid-range hardware, but it scales better to the massive parallelism in today's high-end hardware, where memory bandwidth is a bottleneck.


you'd need a fairly beefy GPU to be able to use it freely without dropping performance.
Please be reasonable in your demands on a noise algorithm. Noise can be very useful even if it competes for resources with other rendering tasks. It simply makes some things look better, and it can be worth the effort. Hardware rendering is mostly a tradeoff between quality and speed, and procedural shading is not a magic exception. Noise is available as one possible tool when building a shader, but of course it requires some resources.

I agree that until now, we have not quite seen the levels of GPU performance where you could allow routine use of procedural noise, but the situation is improving rapidly, and memory is becoming the bottleneck, further adding to the benefits of procedural shading.

Before you criticize the algorithm for requiring too much ALU resources to be useful, please look at the code. The number of computations required is not as huge as you may think. Benchmarking this particular implementation on a GeForce GTX560, I clocked it to around 500 million 3D noise samples per second, with no texture resources being used. That gives plenty of headroom for other more traditional shading tasks as well, don't you think?

I stand firmly by my opinion that procedural shading is a smart thing to do in many situations, and that using it more would create a slightly different and easier path forward for future GPU hardware. Texture bandwidth could become less of a problem.

Alfonse Reinheart
03-16-2011, 06:06 PM
First and foremost, this version is a lot more convenient to use, as it is a pure computation without lookup tables. You just include a piece of code in your shader and call a function - no textures to create and load, no uniform arrays to initialize. This is a big improvement over previous versions.

If you're making a high-performance application, it's going to be inconvenient to you in many ways. The overhead of setting up a texture or uniform array will be negligible compared to the general issues of managing a high-performance rendering engine.

Or, to put it another way, the inconvenience of using textures or uniform arrays or whatever is not the reason why noise functions have not gained widespread use in shaders. Performance is the reason.


Please be reasonable in your demands on a noise algorithm. Noise can be very useful even if it competes for resources with other rendering tasks. It simply makes some things look better, and it can be worth the effort. Hardware rendering is mostly a tradeoff between quality and speed, and procedural shading is not a magic exception. Noise is available as one possible tool when building a shader, but of course it requires some resources.

All I'm saying that the resources/performance it requires is not paid for by the quality improvements as of yet. Not for applications that need every GPU cycle they can get.


Before you criticize the algorithm for requiring too much ALU resources to be useful, please look at the code. The number of computations required is not as huge as you may think. Benchmarking this particular implementation on a GeForce GTX560, I clocked it to around 500 million 3D noise samples per second, with no texture resources being used. That gives plenty of headroom for other more traditional shading tasks as well, don't you think?

Let's take your 500 million samples per second number. Divide that by 60 frames per second; you get 8.3 million samples per frame. Divide that by a quite common 1920x1080 resolution, and you get 4 samples per image pixel. It's even worse if you go up to 2560x1600, where you drop to two samples per pixel.

That pretty much requires deferred rendering now, since you can't afford to have more than 4x overdraw. It also means that you don't have the resources to do much anisotropic filtering, so you're going to get quite a bit of aliasing in your texture.

And this doesn't even take into account processor resources dedicated to other things, like lighting, vertex processing, and so forth. So in order to use even 1 noise sample per image pixel, you have to sacrifice 25% of the hardware's shader resources.

The GTX 560 is upper-midgrade hardware; most graphics hardware is considerably slower. Obviously, graphics hardware gets faster all the time, but the performance from noise simply isn't there yet. Not unless you focus solely on the high end.

So I stand by my statement: "you'd need a fairly beefy GPU to be able to use it freely without dropping performance."

kRogue
03-17-2011, 03:22 PM
Lets do an operation count on the posted glsl's simplexNoise2 function, which I quote here with some #ifdef's removed for the way I'd use it:



float taylorInvSqrt(float r)
{
return ( 0.83666002653408 + 0.7*0.85373472095314 - 0.85373472095314 * r );
}

float permute(float x0,vec3 p) {
float x1 = mod(x0 * p.y, p.x);
return floor( mod( (x1 + p.z) *x0, p.x ));
}
vec2 permute(vec2 x0,vec3 p) {
vec2 x1 = mod(x0 * p.y, p.x);
return floor( mod( (x1 + p.z) *x0, p.x ));
}
vec3 permute(vec3 x0,vec3 p) {
vec3 x1 = mod(x0 * p.y, p.x);
return floor( mod( (x1 + p.z) *x0, p.x ));
}
vec4 permute(vec4 x0,vec3 p) {
vec4 x1 = mod(x0 * p.y, p.x);
return floor( mod( (x1 + p.z) *x0, p.x ));
}

float simplexNoise2(vec2 v)
{
const vec2 C = vec2(0.211324865405187134, // (3.0-sqrt(3.0))/6.;
0.366025403784438597); // 0.5*(sqrt(3.0)-1.);
const vec3 D = vec3( 0., 0.5, 2.0) * 3.14159265358979312;
// First corner
vec2 i = floor(v + dot(v, C.yy) );
vec2 x0 = v - i + dot(i, C.xx);

// Other corners
vec2 i1 = (x0.x > x0.y) ? vec2(1.,0.) : vec2(0.,1.) ;

// x0 = x0 - 0. + 0. * C
vec2 x1 = x0 - i1 + 1. * C.xx ;
vec2 x2 = x0 - 1. + 2. * C.xx ;

// Permutations
i = mod(i, pParam.x);
vec3 p = permute( permute(
i.y + vec3(0., i1.y, 1. ), pParam.xyz)
+ i.x + vec3(0., i1.x, 1. ), pParam.xyz);

// ( N points uniformly over a line, mapped onto a diamond.)
vec3 x = fract(p / pParam.w) ;
vec3 h = 0.5 - abs(x) ;

vec3 sx = vec3(lessThan(x,D.xxx)) *2. -1.;
vec3 sh = vec3(lessThan(h,D.xxx));

vec3 a0 = x + sx*sh;
vec2 p0 = vec2(a0.x,h.x);
vec2 p1 = vec2(a0.y,h.y);
vec2 p2 = vec2(a0.z,h.z);

#ifdef NORMALISE_GRADIENTS
p0 *= taylorInvSqrt(dot(p0,p0));
p1 *= taylorInvSqrt(dot(p1,p1));
p2 *= taylorInvSqrt(dot(p2,p2));
#endif

vec3 g = 2.0 * vec3( dot(p0, x0), dot(p1, x1), dot(p2, x2) );

// mix
vec3 m = max(0.5 - vec3(dot(x0,x0), dot(x1,x1), dot(x2,x2)), 0.);
m = m*m ;
return 1.66666* 70.*dot(m*m, g);
}


Lets add up to operation counts, and then see how worthy it is (without having NORMALISE_GRADIENTS defined)
9 dot's 5 mod's 4 floor's 1 fract quite a few vec2 +'s and *'s (counting them is too much)
Notice that there are no "nasty expensive transcendental operations". Thing is... this is likely fast enough to run on embedded hardware such as ARM Mali, NVIDIA Tegra and PowerVR.. not the bottom line of those, but the "high end for each". I am not talking 60Hz kind of performance, but bearable nevertheless. For these gizmos, bandwidth is at times horribly limited.. much less bandwidth in them than on desktop. Almost all of them have a unified memory model and the caches are not insanely huge, so a texture look up has a lot of latency. On these gizmos just enabling mipmap filtering has a huge dramatic improvement in performance. Really freaking huge. To get this to be happier on some of those GPU's some massaging to use mediump's (and even lowp's) will help as well (actually Mali does not even support highp in fragment shader anyways).

I think some of it can be optimized a touch more in the case where NORMALISE_GRADIENTS is not defined... it looks like, but I have not taken the time to do it that:



vec3 a0 = x + sx*sh;
vec2 p0 = vec2(a0.x,h.x);
vec2 p1 = vec2(a0.y,h.y);
vec2 p2 = vec2(a0.z,h.z);
vec3 g = 2.0 * vec3( dot(p0, x0), dot(p1, x1), dot(p2, x2) );


to calculate g can be jazzed up to look like a MAD's rather than dot's (which for some hardware gives a significant performance increases when one drops to mediump).

At any rate, I think this is pretty spiffy and I intend to try it out at work on PowerVR and Mali soon.

StefanG
03-19-2011, 05:04 AM
When you qualify your arguments like that, I agree with you. Thanks for the patience and constructive thought you put into discussing this. Given the other recent comment on low end hardware with bad memory bandwidth, I think we can conclude that noise is sometimes useful, but of course it depends on the situation at hand. My opinion is that procedural shading for real time rendering is finally a real possibility, and my bet it that it is likely to become more interesting in the near future.

BTW, your numbers become less depressing if you consider that a typical scene need not use noise for every surface on the screen. It's a special effect, not a universal tool. Yet.

lasHG
03-19-2011, 04:34 PM
Really a nice noise.
I would like to discuss some major speed improvements with the original author but I can't find any contact information.

--las/Mercury

EDIT:
http://research.mercury-labs.org/noise.glsl (only touched simplexNoise3 so far - search for "LAS_OPT" for the changes)

IanMc
03-19-2011, 08:27 PM
I'm feeling like I need to get some documentation done!


to calculate g can be jazzed up to look like a MAD's rather than dot's (which for some hardware gives a significant performance increases when one drops to mediump).
I've re-factored noise2D to do this, and the opcode count is the same for the non-normalised version, and a few % smaller for the normalized one. I'll push an LESS_DOTS version to the repository as soon as I have made sure I didn't break something else in the process!


I would like to discuss some major speed improvements with the original author but I can't find any contact information.

I hope not that major! Drop me a PM.

lasHG
03-19-2011, 09:00 PM
Done. :)
With the proposed changes I get at least ~4900 fps on my GTS 450 compared to ~4500 fps without the changes.

EDIT:
Just for the discussion: A noise provided by the hardware and the GLSL specification would be a really great thing, especially a 4D noise (try to store that as a texture...).

IanMc
03-19-2011, 09:21 PM
http://research.mercury-labs.org/noise.glsl (only touched simplexNoise3 so far - search for "LAS_OPT" for the changes)

Ah! the step() function. I could use some general advice on this: When I did my initial comparison of built in functions, step() always produced worse code (via nvida's Cg compiler, and ANGLE to hlsl) - so I didn't use it. Should I ignore this and believe/hope/trust that real drivers do some magic optimization that will always make intrinsics better?

I like the use of floor() instead of lessThan(). neat.

The third optimization you do isn't an optimization because it changed the behavior of the code: By removing one of the mod() operations per permutation, the maximum size of a permutation ring before precision aberrations occur drops from 2896 to 203 (and the number of suitable finite rings inside these sizes from 1138 to just 78). This is might be acceptable from the 2D case, but probably isn't for the 3D case, and doesn't work for the 4D case. I can send you a more detailed explanation if you like ?
Is it worth adding an option to enable this anyway ?

lasHG
03-19-2011, 09:37 PM
The third optimization you do isn't an optimization because it changed the behavior of the code: By removing one of the mod() operations per permutation, the maximum size of a permutation ring before precision aberrations occur drops from 2896 to 203 (and the number of suitable finite rings inside these sizes from 1138 to just 78). This is might be acceptable from the 2D case, but probably isn't for the 3D case, and doesn't work for the 4D case. I can send you a more detailed explanation if you like ?
Is it worth adding an option to enable this anyway ?

I was not really sure whether it would change the behavior, a more detailed explanation would be greatly appreciated.
Even if it changes the behavior it comes with a great speed up, especially if you call the function more than just once in your shader - for some things I am currently working on it will be called even more than 100 times per shader call.
Maybe a "DIRTY_TRICKS" define wouldn't be that bad at all. ;)

StefanG
03-21-2011, 05:53 AM
Maybe a "DIRTY_TRICKS" define wouldn't be that bad at all.

Or just make it a separate function altogether. This is software, and versions of noise focusing on different problems (performance, support domain, statistical properties) should probably be allowed to differ quite significantly, even have different names and different maintainers. There's no real use writing an "uber-shader" that tries to do it all by configuration options. With too many #ifdefs you lose readability and maintainability of the code, and that would be counterproductive.

PkK
03-21-2011, 12:38 PM
If this was under a more permissive license, it could make it's way into Mesa, and thus into virtually all free drivers, resulting e.g. in a good noise implementation being available to most GNU/Linux users. And once it's there the non-free drivers probably would want to catch up.

Philipp

IanMc
03-21-2011, 02:39 PM
(though they'll have to work out how compatible it is with the Artistic License 2.0)

If this was under a more permissive license
The purpose of the license we used was to allow anyone to use and modify it while maintaining some coherence to contributions and bug fixes initially.

Also, as with all startups we have to walk a fine line between the desire to selflessly distribute 'cool code' and the interests of our investors.

That said, we can always create custom licenses, for specific applications. Drop me an email.

grimdel
03-21-2011, 05:02 PM
Its nice to see that you've noticed that Perlin's "improved noise" paper forgot to normalize his gradients. I have seen the error migrate into various implementations meant to replace the original algorithm.

And if I'm reading your shader correctly, you calculate your gradients w/ the same algorithm. If so, then there is an optimization you can use.

Because of the way the gradients are calculated, the gradients have a uniform magnitude (1 or sqrt(1) for 2D noise, sqrt(2) for 3D noise, sqrt(3) for 4D noise).

In your implementation, you normalize each gradient before calculating the final weighted sum.

Because the output of a noise function is a sum of the weighted gradients, and the gradient magnitudes are constant, you can replace the code that normalizes each gradient by dividing the weighted sum by the sqrt() magnitude.

And since the sqrt() magnitude is constant for each noise function, you can hardcode the 1/sqrt() normalizer as a constant and multiply the weighted sum (instead of dividing).

StefanG
03-22-2011, 01:55 AM
And if I'm reading your shader correctly, you calculate your gradients w/ the same algorithm. If so, then there is an optimization you can use.


No, the gradients are done in a totally different manner here. It's a more clever method in several ways, but the normals get different lengths.

Besides, the normalization of Perlin's original gradients do not really matter at all. What matters is that they are all of the same length. A constant scaling of the normals translate to a constant scaling of the final noise value, and it is cheaper to do all the scaling at once at the end, when the final noise value is returned and a multiplication is required anyway to make the value fit nicely in the range [-1,1]. Not scaling the normals is not an error in Perlin's original implementation, it is a deliberate and smart design choice to speed things up. The scalar multiplication with a vector of only ones and zeroes was originally performed in software as a summed selection, not as a dot product. Several floating-point multiplications were saved that way, and that used to make a big difference back in the day.

StefanG
03-22-2011, 02:09 AM
Speaking of inclusion in Mesa: how good are the AMD hardware drivers for Mesa these days?
I have not been following the Mesa development for some time, but I notice it is still stuck at OpenGL 2.1. This noise version is compatible with GLSL 1.20, so it could still be a good fit.
Inclusion in Mesa requires an MIT license, though.

StefanG
03-22-2011, 07:02 AM
I just wrote a quick automatic benchmark for the platforms I have available to me. The program runs for 15 seconds and reports the performance to a logfile. Feel free to post your results here.
Windows benchmark (http://www.itn.liu.se/~stegu/simplexnoise/GLSL-noise-bench-Win32.zip)
MacOS X benchmark (http://www.itn.liu.se/~stegu/simplexnoise/GLSL-noise-bench-MacOSX.zip)
Linux benchmark (http://www.itn.liu.se/~stegu/simplexnoise/GLSL-noise-bench-Linux.zip)
The Windows archive contains a precompiled EXE file. The other two platforms will require a "make", and possibly an installation of GLFW (www.glfw.org (http://www.glfw.org)) if you don't have it already. You may also need to edit your Makefile to suit your particular installation.

Note that the benchmark runs at a very high frame rate on most GPUs, so it makes a big difference if you turn off any desktop compositor you may have running. For Windows 7, switching to fullscreen rendering gave me a 50% performance boost, which means that I saw these very encouraging numbers on my low cost GeForce GTX 260. (4D noise in particular might receive some optimization soon, but I wouldn't expect any huge speedups.)

GL vendor: NVIDIA Corporation
GL renderer: GeForce GTX 260/PCI/SSE2
GL version: 3.2.0
Framebuffer size: 1920 x 1200 pixels

2D simplex noise, version 2011-03-22, 1552.3 Msamples/s
3D simplex noise, version 2011-03-22, 752.4 Msamples/s
4D simplex noise, version 2011-03-22, 429.7 Msamples/s

trinitrotoluene
03-22-2011, 10:15 AM
Result without antialiasing on a radeon 5870 with stock settings


GL vendor: ATI Technologies Inc.
GL renderer: ATI Radeon HD 5800 Series
GL version: 4.1.10524 Compatibility Profile Context
Framebuffer size: 1920 x 1080 pixels fullscreen
2D simplex noise, version 2011-03-21, 6914.1 Msamples/s
3D simplex noise, version 2011-03-21, 3837.2 Msamples/s
4D simplex noise, version 2011-03-21, 2427.0 Msamples/s

With antialiasing 24x


GL vendor: ATI Technologies Inc.
GL renderer: ATI Radeon HD 5800 Series
GL version: 4.1.10524 Compatibility Profile Context
Framebuffer size: 1920 x 1080 pixels fullscreen
2D simplex noise, version 2011-03-21, 1838.5 Msamples/s
3D simplex noise, version 2011-03-21, 1519.3 Msamples/s
4D simplex noise, version 2011-03-21, 1235.4 Msamples/s

PkK
03-22-2011, 12:27 PM
Speaking of inclusion in Mesa: how good are the AMD hardware drivers for Mesa these days?

As usual they're great , and sometimes even better than the official ones for old hardware, but still struggling on newer hardware.


I have not been following the Mesa development for some time, but I notice it is still stuck at OpenGL 2.1.

New Gl features are added slowly over time, but there's still a few GL 3 ones missing. See http://cgit.freedesktop.org/mesa/mesa/plain/docs/GL3.txt for details.

Philipp

ZbuffeR
03-22-2011, 01:16 PM
Vista SP2:

GL vendor: NVIDIA Corporation
GL renderer: GeForce GTX 275/PCI/SSE2
GL version: 3.3.0

2D simplex noise, version 2011-03-21, 2037.3 Msamples/s
3D simplex noise, version 2011-03-21, 962.2 Msamples/s
4D simplex noise, version 2011-03-21, 653.6 Msamples/s

Default window size, is there a way to convince the .exe to run fullscreen ?

trinitrotoluene
03-22-2011, 01:39 PM
Default window size, is there a way to convince the .exe to run fullscreen ?

To run fullscreen, I have modified one line in the source code because I did not see any command line option accepted by the program.



glfwOpenWindow(1920, 1080, 8,8,8,8, 32,0, GLFW_FULLSCREEN)


My result were with Ubuntu 10.10

StefanG
03-22-2011, 02:28 PM
Default window size, is there a way to convince the .exe to run fullscreen ?
Apart from editing the source and recompiling, not right now.
I really should change that, but still, it's nice to see some benchmarks from high-end cards. The result from the MacBook Pro I am running right now is not quite as impressive:


GL vendor: NVIDIA Corporation
GL renderer: NVIDIA GeForce 9400M OpenGL Engine
GL version: 2.1 NVIDIA-1.6.18

2D simplex noise, version 2011-03-21, 197.3 Msamples/s
3D simplex noise, version 2011-03-21, 82.1 Msamples/s
4D simplex noise, version 2011-03-21, 38.8 Msamples/s

StefanG
03-22-2011, 02:52 PM
Benchmark result for ATI Radeon HD 4850:

GL vendor: ATI Technologies Inc.
GL renderer: ATI Radeon HD 4800 Series
GL version: 3.3.10428 Compatibility Profile Context

2D simplex noise, version 2011-03-21, 2455.1 Msamples/s
3D simplex noise, version 2011-03-21, 1413.6 Msamples/s
4D simplex noise, version 2011-03-21, 870.9 Msamples/s

PkK
03-22-2011, 02:52 PM
My over three year old laptop with integrated Intel graphics:



GL vendor: Tungsten Graphics, Inc
GL renderer: Mesa DRI Intel(R) 965GM GEM 20100330 DEVELOPMENT
GL version: 2.1 Mesa 7.10

2D simplex noise, version 2011-03-21, 23.1 Msamples/s
3D simplex noise, version 2011-03-21, 15.4 Msamples/s
4D simplex noise, version 2011-03-21, 9.0 Msamples/s

PkK
03-22-2011, 03:12 PM
And here's the results for software rendering on an Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz:



GL vendor: Mesa Project
GL renderer: Software Rasterizer
GL version: 2.1 Mesa 7.10

2D simplex noise, version 2011-03-21, 0.3 Msamples/s
3D simplex noise, version 2011-03-21, 0.2 Msamples/s
4D simplex noise, version 2011-03-21, 0.2 Msamples/s

IanMc
03-24-2011, 04:04 PM
We've added Stefan's benchmark's (and Stefan) to the repository and included a number of suggestions and fixes from earlier in the thread.
http://github.com/ashima/webgl-noise

StefanG
03-26-2011, 01:57 AM
The code in the Github repository provided by Ian has now been updated rather a lot, and I no longer recommend using my pre-packaged zip archives posted above. They still work, but I will not be updating them on a regular basis. Please use the Github repo to get the latest versions of the shaders and the benchmarking application.

And for those of you wondering, yes, there is a write-up coming. We're working on it right now. In the meantime, feel free to ask any questions here.

JoshKlint
03-26-2011, 02:27 PM
Nice work!

mbentrup
03-27-2011, 03:40 AM
The invSqrt taylor approximation uses the value 0.83666002653408, which is the square root of 0.7, shouldn't this be the inverse root of 0.7, i.e. 1.195228609 ?

StefanG
03-27-2011, 07:58 AM
Yes, of course. Good catch, thanks! It has only a minor effect
on the final result, but it does make a difference. This change
will be committed to the repository as soon as I have
determined the corresponding scaling of the final values.

EDIT: Github repository updated.

IanMc
03-27-2011, 06:10 PM
We've changed the licence to the MIT License.
:)

Ffelagund
03-28-2011, 11:39 AM
Results on a MacBook, not very impressive, but this week I will post the results on a GTX580 :)

GL vendor: NVIDIA Corporation
GL renderer: NVIDIA GeForce 9400M OpenGL Engine
GL version: 2.1 NVIDIA-1.6.26
Desktop size: 1280 x 800 pixels

2D simplex noise, version 2011-03-25, 134.4 Msamples/s
3D simplex noise, version 2011-03-25, 61.7 Msamples/s
4D simplex noise, version 2011-03-25, 25.9 Msamples/s

StefanG
04-01-2011, 02:15 AM
Your benchmark is 25% lower than mine on the same hardware and software (MacBook Pro, GF9400M, MacOS X, 1280x800 fullscreen). Did you run the demo on a single screen? Mirroring, or just having a second display active, tends to slow down the display subsystem rather a lot on MacOS X. An earlier post on page 4 of this thread contains my results.

Ffelagund
04-01-2011, 02:40 AM
Yes, I ran it on a single screen, same resolution, same gfx, OS: MacOSX Snow Leopard, but my Macbook is not the "Pro" version. Perhaps there are slight differences in the CPU speed. I will try again assuring that there aren't any background application that could slow down the system.

sysrpl
04-02-2011, 09:40 PM
On my Giaida N20 nettop (http://www.newegg.com/Product/Product.aspx?Item=N82E16856176006) with...

Ubuntu 10.10 32 bit
Intel Atom D525 (1.8 GHz, dual core)
NVIDIA ION2 with 512MB Graphics

Hooked into my gaming/browsing TV


GL vendor: NVIDIA Corporation
GL renderer: GeForce 210/PCI/SSE2
GL version: 3.3.0 NVIDIA 260.19.06
Desktop size: 1280 x 720 pixels

2D simplex noise, version 2011-03-25, 133.1 Msamples/s
3D simplex noise, version 2011-03-25, 64.3 Msamples/s
4D simplex noise, version 2011-03-25, 36.0 Msamples/s

Even though the console shows this as the first output line, the results looked as expected:
"Fragment shader compile error:"

StefanG
04-04-2011, 06:53 AM
For some reason, that "Fragment shader compile error:" shows up on many platforms, although the error message you get when you ask what went wrong is an empty string. On some systems I have tried, the "error" reported is even "Shader successfully compiled", so I think the notion of when to signal an error is kind of hazy to many GLSL compilers.

trinitrotoluene
04-04-2011, 07:37 AM
I have not read the recent source code of the program and this is only a suggestion.But the program should not rely for shader compile failure on the info log. But it should rely on the compile status.



GLint is_compiled;
void glGetShaderiv(theShader,GL_COMPILE_STATUS,&is_comp iled);

if(is_compiled != GL_TRUE)
{
cout<<"Fragment shader compile error: "<<shaderLog<<endl;
}
else
{
cout<<"Fragment shader compile success:"<<shaderLog<<endl;
}


Oups, I should read the code before I post because the compile status check is done in the noisebench.c file.

mr_rg
04-05-2011, 08:20 AM
His permutation function drops 0 on 0. Otherwise it is good.
I do not understand the replacement for the gradient table though...how does that work ?

StefanG
04-06-2011, 05:34 AM
His permutation function drops 0 on 0. Otherwise it is good.


That 0->0 mapping is not a problem. It is perfectly alright for a permutation to have one or even several fixed points that map to themselves, as long as they do not appear in a too regular pattern.

The permutation is a permutation polynomial: permute(x) is computed as (34*x^2 + x) mod 289.
This is one of the two neat and original ideas in Ian's implementation. (The other one is the clever generation of gradients.)
You can read about permutation polynomials on Wikipedia (http://en.wikipedia.org/wiki/Permutation_polynomial).
It is not a new idea in mathematics, it is just new for this application. A proper journal article on this noise implementation is on its way, but please have patience.

StefanG
04-07-2011, 10:00 AM
The github repository has now been updated with some slight speedups, code cleanups and classic Perlin noise in regular and periodic versions.

2D simplex noise is now only about 20 mult and add operations (including five dot operations), one division, three mod, two floor and one each of step, max, fract and abs.

I get 1,5 billion 2D noise samples per second on my relatively measly Nvidia GTX260. An ATI HD5870 spits out 5 billion samples per second.

StefanG
04-09-2011, 01:00 AM
The 2D simplex noise was just optimized some more. I replaced a division with a multiplication and removed one multiplication and one addition by introducing two more constants. The speedup I see on my system (ATI HD4850) is about 5%.

The level of hand feeding you need to do to optimize GLSL code reminds me of C compilers from the early 1990's.

Dark Photon
04-10-2011, 07:50 PM
...The speedup I see on my system (ATI HD4850) is about 5%. The level of hand feeding you need to do to optimize GLSL code reminds me of C compilers from the early 1990's.
I'm curious if this was your general GLSL experience with ATI, NVidia, and Intel drivers, or just regarding ATI drivers in particular.

With NVidia, I've been amazed at how much complexity/infrastructure you can stack on top, but yet how effectively it aggressively throws away things and transforms the code into something very efficient.

StefanG
04-11-2011, 01:17 AM
I was speaking of ATI drivers in particular, where constant expressions don't seem to be identified and collapsed properly. The other thing I noticed, that replacing a division by a constant with a multiplication by the inverse of the constant makes a difference, is something that would perhaps be considered an aggressive optimization (because it changes the exact value of the result somewhat), and I may have been expecting too much there. GLSL is compiled on-the-fly, after all.

I have absolutely no experience with Intel GPUs.

StefanG
04-11-2011, 01:25 AM
The wiki on the Github repository now links to a rewritten cross platform benchmark with a side by side comparison of my old GLSL noise implementation (which was texture bandwidth limited and used lots of texture lookups) with the new computational version.

Github repository wiki (https://github.com/ashima/webgl-noise/wiki)

Bottom line: my old version is still twice as fast, because there is a lot of texture bandwidth on a modern GPU, but the new version scales better with massive parallelism, and it mixes well into a shader that is already texture bandwidth limited. It may even come almost for free when combined with a texture intensive shader with untapped ALU resources.

StefanG
04-20-2011, 03:08 PM
Just a quick update: yesterday I wrote some cellular noise functions ("Worley noise") of various flavors for GLSL, using the same pseudo-random permutation method as the Perlin noise implementations that started this thread. It turned out well, and these functions share the advantages of the Perlin noise functions: no arrays or textures, GLSL 1.20 compatible and fast enough to be considered for actual use. The code is still a bit raw and needs some more attention to detail, and the brief writeup probably needs a spellcheck, but I'll do that in the next few days.

We'll see if this ends up on the same Github repository as the Perlin noise, or if it's going to be kept separate. In any case, here's an early release of the GLSL shader functions without any supporting CPU code:

Cellular noise in GLSL (http://www.itn.liu.se/~stegu/GLSL-cellular/)

If you need a framework to test it, you can edit the C program I wrote for benchmarking Perlin noise, available from the Github repository (http://github.com/ashima/webgl-noise).

StefanG
06-01-2011, 02:12 PM
Wow. 12,000 views and still ticking for this post. It will be fun to see what people make with this!

I sent in a suggestion for a talk at Siggraph, but the reviewers rejected it, which I more or less expected. However, I put quite some effort into creating a fun visual demo using a few of my own and Ian's noise functions, and you might find it useful, educational or just fun to watch:

http://www.itn.liu.se/~stegu/gpunoise/

Note that this particular talk was rejected and will not be featured at Siggraph, so please ignore the references to an oral presentation in the one-page PDF.

You can still find me at Siggraph at our accepted talk "Next generation Image Based Lighting by HDR Video" if you want to meet me there. That talk is about what I really do for a living - I do noise mainly for fun.

ZbuffeR
06-01-2011, 03:37 PM
Great, I really like both the "fire" shader and the one on the floor :)

StefanG
06-05-2011, 11:47 AM
The one on the floor ("flow noise") actually uses a previously unpublished version of 2D noise with rotating gradients and analytic derivatives. I ran into problems extending it nicely to 3D (I need to get rid of a couple of lookup tables in my software version), but I should at least do simplex-noise-with-derivative for 2-D, 3-D and 4-D. Those should be straightforward ports from my software versions.

A few different variations on cellular noise and the simplex noise with derivatives should end up in the Github repo eventually, but my daytime job has put this on the backburner for a while. I hope to get my act together soon on this.

MarkN
07-06-2011, 08:38 AM
I was very interested to see these new developments of simplex noise shaders, particularly because of the removal of the texture look-ups. I have recently been looking at procedural noise functions for an application in Physics (not graphics) and am especially interested in their spatial frequency power spectrum, which very quickly throws up artefacts! Some of these artefacts appear in both old (texture lookup based) and new versions of the shader, some I have found fixes for and others not. Based only on the 3d versions of noise generation, briefly these are:

1. Discontinuities at simplex boundaries, seemingly because the contribution from the opposite vertices has not decayed completely to zero. Fixed by replacing the constant "0.6-..." with "0.51-..." in the code in both versions. (I'm sure someone who knows the simplex geometry can find the exact constant required).

2. Floating point rounding errors when far from the origin (I think someone alluded to this in an earlier post) that can also cause artefacts at simplex boundaries. I fixed these to a degree in the original code by reordering some of the cell skewing calculations, but can't yet see where to in the new code though the same issue seems to be there.

3. Randomness: The original code produces has some residual structure in its power spectrum when averaged over many noise screens. I could remove that (make it look smooth when averaged over several noise screens) by redefining the w (or alpha) values in the lookup texture using an independent random number generator from matlab.

4. Pattern repeats seem very regular in certain directions in the new code. I don't understand how the permutation polynomial works, nor the significance of the constants (289,34,1,7). Can the pitch of the repeat be increased by changing the constants? The repeats are even visible by eye on relatively fine noise screens!

I hope this is useful in some way and am looking forward to the full write up of these techniques...

kaffiene
07-07-2011, 07:23 PM
Would anyone be able to extend this to provide a version that outputs a gradient vector as well as the noise value? This would very useful for generating surface normals.

StefanG
10-22-2011, 01:13 PM
I have not been keeping track of this thread for a while, so I am sorry for being so very late to respond. Your points are all very valid, and I think you should not use these new functions if you want isotropic and statistically well behaved results. The permutation polynomial was chosen for its simplicity, not for its good permutation properties. I do not have enough math skills to evaluate the quality of that permutation from a theoretical standpoint, but I suspect it has many flaws if you look at it closely enough, and that there are many candidates for better choices. The motivation for everything in the current code is that is works, it looks OK and it is fast. Using it for anything else than pattern generation for visuals is out of bounds for the design spec, so to speak.

On the subject of computing the gradient in addition to the noise value, I have code in C to do just that. I have not yet ported it to GLSL except for the 2D case, but it is a reasonably simple matter to do it for 3D and 4D as well. Right now, I have no time to do that, but 2D GLSL code to get you started is in the demo I linked to above:

http://www.itn.liu.se/~stegu/gpunoise/
(Look at the function "srdnoise" in the "flownoise2" shader)

And the 2D, 3D and 4D versions in C are here:

http://www.itn.liu.se/~stegu/aqsis/DSOs/DSOnoises.html
(Look at the functions in the file "sdnoise1234.c")