Mandelbrot in fragment shaders?

I’m just idly wondering which implementation of the Mandelbrot fractal set renderer would run faster:

  • do the actual exponentiation and test (with KIL) in a big unrolled loop

  • do the successive exponentiation using a exponented-value look-up texture

I suppose the first solution would be more precise, as the exponentiation look-up texture by necessity needs to quantize the input values to 1/1024 resolution or so.

Thoughts?

Could you explain where you’re getting the exponential?

The equation is z = z^2 + c (iterative). I suppose you could exponentiate each side and (since it’d be a complex exponent) do the math in terms of sin’s and cosines, but would this help anything?

I think that the fastest way would be to use an unrolled loop and get as many iterations as you could packed in there. What would you be looking up in a texture that wouldn’t just be a pre-computation of the part of the set the texture covers?

– Zeno

z^2 could be in a look-up texture.
Thus, it would be a texture lookup replacing a complex multiply. Anyway, it was just an idle thought – I’m sure someone will implement it soon enough :slight_smile:

Originally posted by jwatte:
z^2 could be in a look-up texture.
Thus, it would be a texture lookup replacing a complex multiply. Anyway, it was just an idle thought – I’m sure someone will implement it soon enough :slight_smile:

The mandelbrot shader is one of the sample shaders in 3Dlabs’ OpenGL 2.0 HLSL proposal . You can see it running on a Wildcat VP, unfortunately there are no screenshots available :/.

// Based on a renderman shader by Michael Rivero
uniform int maxIterations;
varying vec2 uv;
void main (void)
{
float tmpval;
int iter;
float tempreal, tempimag, Creal, Cimag;
float r2;
vec2 pos = fract (uv);
float real = (pos.s * 3.0 ) - 2.0;
float imag = (pos.t * 3.0 ) - 1.5;
Creal = real;
Cimag = imag;
for (iter = 0; iter < maxIterations; iter++)
{
// z = z^2 + c
tempreal = real;
tempimag = imag;
real = (tempreal * tempreal) - (tempimag * tempimag);
imag = 2 * tempreal * tempimag;
real += Creal;
imag += Cimag;
r2 = (real * real) + (imag * imag);
if (r2 >= 4)
break;
}
// Base the color on the number of iterations
vec4 color;
if (r2 >= 4)
color = vec4 (0, 0, 0, 1.0); // black
else
{
tmpval = fract ((iter / 10));
color = vec4 (tmpval, tmpval, tmpval, 1.0);
}
gl_FragColor = color;

I don’t think you will get any faster by doing z^2 using a lookup texture, all the contrary, it will be quite slower.

[This message has been edited by evanGLizr (edited 08-31-2002).]

We’ve implemented a mandelbrot shader on the Wildcat VP using OGL2’s fragment shader capabilities. Here’s the source.

//
// mandel.vert: Vertex shader for drawing the Mandelbrot set
//
// author: Dave Baldwin, Steve Koren
// based on a shader by Michael Rivero
//
// Copyright © 2002: 3Dlabs, Inc.
//

varying float lightIntensity;
varying vec3 Position;
uniform vec3 LightPosition;

const float specularContribution = 0.2;
const float diffuseContribution = (1.0 - specularContribution);

void main(void) {
vec4 pos = gl_ModelViewMatrix * gl_Vertex;
Position = vec3(gl_Vertex);
vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal);
vec3 lightVec = normalize(LightPosition - vec3(pos));
vec3 reflectVec = reflect(lightVec, tnorm);
vec3 viewVec = normalize(vec3(pos));

float spec = clamp(dot(reflectVec, viewVec), 0.0, 1.0);
spec = spec * spec;
spec = spec * spec;
spec = spec * spec;
spec = spec * spec;

lightIntensity = abs(diffuseContribution * dot(lightVec, tnorm) +
                 specularContribution * spec);

gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;

}

//
// mandel.frag: Fragment shader for drawing the Mandelbrot set
//
// author: Dave Baldwin, Steve Koren
// based on a shader by Michael Rivero
//
// Copyright © 2002: 3Dlabs, Inc.
//

varying vec3 Position;
varying float lightIntensity;

uniform float maxIterations;
uniform float zoom;
uniform float xCenter;
uniform float yCenter;

void main (void)
{
vec3 pos = fract(Position);
float real = ((pos.x - 0.5) * zoom) - xCenter;
float imag = ((pos.y - 0.5) * zoom) - yCenter;
float Creal = real;
float Cimag = imag;

float r2 = 0.0;
float iter;

for (iter = 0.0; iter &lt; maxIterations && r2 &lt; 4.0; ++iter) {
    float tempreal = real;

    real = (tempreal * tempreal) - (imag * imag) + Creal;
    imag = 2.0 * tempreal * imag + Cimag;
    r2   = (real * real) + (imag * imag);
}

// Base the color on the number of iterations
vec3 color;

if (r2 &lt; 4.0) {
    color = vec3(1.2, 0.0, 0.0);
} else {
    float tmpval = fract (iter * 0.05);
    vec3 color1 = vec3(0.5, 0.0, 1.5);
    vec3 color2 = vec3(0.0, 1.5, 0.0);
    color = mix(color1, color2, tmpval);
 }

color = clamp(color * lightIntensity, 0.0, 1.0);

gl_FragColor = vec4(color, 1.0);

}

Barthold

I knew it must’ve been done :slight_smile:

However, regarding this quote:

“I don’t think you will get any faster by doing z^2 using a lookup texture, all the contrary, it will be quite slower.”

Why is that? One z^2 look-up would replace 7 fragment shading instructions (if I count right). What are the relative latencies involved?

z^2 is just z*z. Hardly difficult for even register combiners (or texture_env, for that matter), let alone more powerful shaders.

Originally posted by jwatte:
[b]I knew it must’ve been done :slight_smile:

However, regarding this quote:
Why is that? One z^2 look-up would replace 7 fragment shading instructions (if I count right). What are the relative latencies involved?[/b]

The problem is that the texture lookup is generated and consumed by the fragment shader. It’s not like any “fixed pipeline” texture lookup which can be requested by an early stage of the graphics pipeline and consumed in a later stage. Unless you are able to do work in parallel with the texture lookup, this normally means quite a heavy latency (implementation dependent ).

Note that even if the texture data is already in the texture cache (which may not be the case for textures used as arbitrary lookup tables), you still have to request the data, the request has to flow through intermediate units to arrive to the servicer unit (texture cache) and go back to the originator in the fragment shader. Another solution would be to have a heavily pipelined fragment shader execution unit, so the texture can be requested in one stage of execution and consumed in a later stage, but I don’t think you can fit so many stages between one and the other (specially without inserting bubbles to avoid RAW data hazards).

And this is why we need high level shader compilers now! I really do not want to have to wrap my brain around “data hazards” when all I really want to do is write a shader :slight_smile:

Astonishing to see how Shaders could look like if we had OpenGL 2 already (and astonishing too how much more advanced the wildcat hw must be compared to what the NV30 GL spec tells about the hardware). Does anyone know (or have a good guess) how loops are executed on the wildcat [Are they unrolled?] (and if somebody knows exactly just out of curiosity: What’s the limit for maxIterations?)