Reduce bandwidth: glNormalPointer glTexCoordPoin ?

nicolasbol · September 14, 2009, 1:34pm

Hi guys,

I’m trying to reduce the memory bandwidth consumption as the system I’m working on only has shared memory ( iPhone 3GS).

I’ve noticed that glNormalPointer and glTexCoordPointer can take GL_BYTE and GL_SHORT, that would reduce the bandwidth by a factor of 4 if I could use GL_BYTE.

BUT, as normal are normalized, they are very often in form 0.XXX, it’s never an even number. Same thing for the texture coordinate.

What is the expected behavior if I send normal in the range 0-255 ? Will they be normalized or do I need to activate this via glEnable(GL_NORMALIZE); ?

Alfonse_Reinheart · September 14, 2009, 2:42pm

What is the expected behavior if I send normal in the range 0-255 ?

Yes, they will be normalized. All of the gl*Pointer calls, with the exception of the generic glVertexAttribPointer, will normalize any integer data.

[edit]

Wait, you’re wondering if the actual vector you get back will be normalized. No, it will not be. And glEnable(GL_NORMALIZE) will only help if you’re using the fixed-function pipeline.

Dark_Photon · September 14, 2009, 5:32pm

Not quite true. See Table 2.5 in the spec (repeated in various extensions). Of the legacy attribute set functions, NormalPointer, ColorPointer, and SecondaryColorPointer will auto-normalize fixed-point. The others (e.g. TexCoordPointer) won’t.

To avoid this confusion with these legacy APIs, instead use VertexAttribPointer and the corresponsing enables, where you can specify the normalization policy.

Wait, you’re wondering if the actual vector you get back will be normalized. No, it will not be. And glEnable(GL_NORMALIZE) will only help if you’re using the fixed-function pipeline.

Good point. Still, normalizing the vector in float and then mapping to fixed point yields results plenty good enough for diffuse surfaces. Maybe for selected speculars too.

nicolasbol · September 14, 2009, 9:42pm

So to summarize:

For the fixed pipeline (2G,3G), I can use glNormalPointer, pass normal coordinate as GL_BYTE, ranging from 0-255 and they should be normalized, without using glEnable(GL_NORMALIZE).

For the programmable pipeline (3GS), I can either use glVertexAttribPointer with normalization enabled or normalize myself in my shader (I suspect the primer to be a bit more efficient).

Right ?

scratt · September 14, 2009, 9:51pm

Watch for the signs. I have not done this with normals yet, but do compact all my texture coords, and my colours.

Colours are easy, but for Texture coords because the data is signed you have to be aware of that and it can lead to some unexpected results.

Also be aware of the amount of data you are uploading. There are times where the conversion is not worth the bandwidth saving for low vertex counts.

A few other things…

On the newer GPU it is worth putting data in VBOs, and on the older GPU it’s the same. So overall better to always work with VBOs.
Interleave your data. It’s detailed in the OpenGLES section in the iPhone Dev area. That is a big performance boost.

Alfonse_Reinheart · September 14, 2009, 10:32pm

For the fixed pipeline (2G,3G), I can use glNormalPointer, pass normal coordinate as GL_BYTE, ranging from 0-255 and they should be normalized, without using glEnable(GL_NORMALIZE).

For the programmable pipeline (3GS), I can either use glVertexAttribPointer with normalization enabled or normalize myself in my shader (I suspect the primer to be a bit more efficient).

Right ?

No. The misunderstanding was my fault, as I didn’t understand what kind of normalization you were talking about. Normalization has two different meaning in different contexts.

A component of a vertex attribute (the normal’s X component) can be passed in normalized. This means that the data is stored as an integer, but will be converted automatically to a float on the range [0, 1] (or the range [-1, 1] for signed integers values).

An attribute (or any other vector) can be normal, meaning that it is a unit vector.

Neither of these has anything to do with the other.

glNormalPointer, if you use GL_BYTE values, will have its data normalized in the first way. That is, the input integers on the range [-128, 127] will be converted into floats on the range [-1, 1].

This will not guarantee that the resulting normal vector is a unit vector. To do that, you must glEnable(GL_NORMALIZE) if you’re in fixed-function, or just normalize it in your shader if you’re not.

Dark_Photon · September 15, 2009, 4:52am

To pin some names on these, just to avoid misunderstanding:

Termed Fixed-point normalization. e.g. -128…127 -> -1…1, 0…255 -> 0…1, etc.

An attribute (or any other vector) can be normal, meaning that it is a unit vector.

Termed vector length normalization. e.g. (0,5,0) -> (0,1,0)

Alfonse probably clarified this for you, but just to be clear, glNormalPointer automatically enables fixed-point normalization internally for fixed-point types. For instance, feed it a GL_BYTEs with a -128, and that’ll be mapped to -1. It won’t do vector length normalization.

For the programmable pipeline (3GS), I can either use glVertexAttribPointer with normalization enabled

This is fixed-point normalization. E.g. -128…127 -> -1…1

…or normalize myself in my shader (I suspect the primer to be a bit more efficient).

No, you’re likely talking vector length normalization now.

Just to be clear, when writing your GL_BYTE normal data, represent them in your program first in float3 form. Then normalize them, which’ll result in components in the -1…1 range. Then map these to fixed-point (-128…127) signed chars. Then store them as GL_BYTE arrays. If you use glNormalPointer to load them into the hardware (or glVertexAttibPointer with the normalize flag set), then you’ll automagically get -1…1 component values in your shader.

However, the due to rounding errors (which are always there, for fixed or floating point), your vector lengths will be close to but not precisely 1 in the shader (you lost some precision when you went to fixed-point). You probably won’t even care about those small errors if your surfaces are mostly diffuse. But try adding a normalize to the shader to see if you can see any visible difference. If so, leave the normalize in there. This of course doesn’t recover the precision lost going from float to fixed-point – it just restores your vectors to unit length.

nicolasbol · September 22, 2009, 7:52pm

Thanks you guys for this, it really really helped me to understand better.
Dark Photon: Your precision regarding “Fixed-point normalization” and “vector length normalization” definitely helped making thing vivid.

So, I started to “pack” the data…to experiment tremendous increase in terms of performance:

But the 2G and 3G (fixed pipeline glTexCoordPointer ) graphic result is messed up.

The shader is using GL_SHORT:

glVertexAttribPointer(currentShader->vars[SHADER_ATT_UV], 2, GL_SHORT, GL_TRUE,  sizeof(vertex_t), currentMesh->vertexArray[0].text);

The fixed pipleine is using:

glTexCoordPointer (2, GL_SHORT, sizeof(vertex_t), currentMesh->vertexArray[0].text);

And result is omfg:

Any thought ? Maybe I am supposed to modify the texture matrix in order to perform “Fixed-point normalization” ?

Alfonse_Reinheart · September 22, 2009, 8:11pm

Performance: You’re not rendering enough to matter. You’re only rendering 2626 polygons; that alone isn’t going to be vertex bound. You’re also clearly v-sync’d; you need to turn off v-sync when you’re measuring performance.

As for the rest, I have no idea why there’s a visual issue. Are you using different shaders?

nicolasbol · September 22, 2009, 8:16pm

I don’t think I can disable v-sync on iPhone but I’ll digg it.
2626 poly is not indecent but it’s starting to be a lot for iPhone, especially if you consider that texture coordinate, normals and tangent have to be sent for the 3GS version.

The messed up result is when I am using the fixed pipepline, with the same data.

arekkusu · September 22, 2009, 11:26pm

Yes, you need to use the texture matrix; multiply the texcoords by 1.0/32767.

No, you can’t disable vsync. You can call glFlush() instead of -presentRenderbuffer to get unsynced timing.

nicolasbol · September 28, 2009, 11:20am

Thanks for all your help guys, here are the result:

dukey · September 28, 2009, 3:45pm

the hardware probably only works on floating point data. I would assume it would all get converted to floats. Maybe you are just best to leave everything as floats.

Alfonse_Reinheart · September 28, 2009, 3:50pm

the hardware probably only works on floating point data. I would assume it would all get converted to floats. Maybe you are just best to leave everything as floats.

Attributes are floats (unless they’re specifically integral attributes). But specifying normalized integers allows you to save memory. Floats are 32-bit (unless you’re using half-floats, which is still relatively new for attributes); shorts are 16, and bytes are 8. Why send a color as 4 floats if 4 unsigned normalized bytes will do?

Hardware has been doing automatic conversions from normalized integers to floats since the days of the GeForce 256, and even it was fast at it (for certain formats). It’s best to do what works for taking up the least amount of room in your buffer object.

dukey · September 28, 2009, 4:22pm

Because the hardware most of the time works on floats ! If you send arrays of bytes the driver then has to allocate more space in memory somewhere and convert the data. It really depends what the hardware is expecting.

Look at these bencharks for VBOs
http://www.sci.utah.edu/~bavoil/opengl/vbo/data_types/

Generally if you use float for everything you can’t go too wrong. Yes you get a speed boost using byte over floats for colour. But look at the performance penalty for using bytes for the normals.

Alfonse_Reinheart · September 28, 2009, 4:29pm

If you send arrays of bytes the driver then has to allocate more space in memory somewhere and convert the data.

No, it doesn’t. It does the conversion internally as it loads the attributes. The conversion is free; it costs no performance.

Look at these bencharks for VBOs

That benchmark is ancient. It doesn’t even test generic attributes.

Dark_Photon · September 28, 2009, 5:57pm

That doesn’t invalidate the benchmark. Got any counter-data?

Alfonse_Reinheart · September 28, 2009, 6:08pm

That doesn’t invalidate the benchmark.

Yes, it does. A benchmark for a GeForce 6800 is no more valid for a GeForce 8800 than a benchmark for a GeForce 256.

And again, it doesn’t test generic attributes. So even if you were to accept that it had meaning for more modern hardware, all you could say is how it behaves with glNormalPointer and so forth.

Ilian_Dinev · September 28, 2009, 6:42pm

So this is what the tweaks of GF7x00 were, that nVidia were talking about. Because on GF7x00 and above, all above-benched attributes are of identical speed with VBOs (while being just a bit uniformly slower than DLs). I didn’t keep the benchmark numbers, to show.

Anyway, the OP is talking about an iPhone app. Geforce and Radeon train-of-thought isn’t helping him imho, it’s a Dreamcast-style tile-renderer which caches sent data. PowerVR docs should be looked at.

scratt · September 28, 2009, 8:23pm

A lot of this stuff is discussed in great depth on both the Apple iPhone Developer forums and the http://www.imgtec.com/ forums. Vertex ordering, normalisation, best practices etc.

Also it’s worth noting that a complete GLES1.x and GLES2.0 development pack, with copious examples is available for Linux, OS X and the PC on the http://www.imgtec.com/ website in the developer area.