glDrawElement with independant vertex, normal and texCoord indexes

yannoo · July 10, 2001, 10:41pm

Exist one OpenGL command that can display geometries when colors, normals, texCoords and vertices use separates indexes, but with only one call per vertex ?

Something like glDrawIndexedEXT(vertexIndex, normalIndex, texCoordIndex, colorIndex) for example …

OK, I can always decompose it into 4 OpenGL commands for example :

glColor4fv(&colors[colorIndex]);
glNormal3fv(&normals[normalIndex]);
glTexCoord2fv(&texCoords[texcoordIndex]);
glVertex3fv(&vertices[verticeIndex]);

but I think that is really one lose of cpu time that to use 4 functions per vertex …

@+
Cyclone

imported_john · July 11, 2001, 3:54pm

you’re saying, in other words that;

push param1 
push param2
push param3
call func
push param1 
push param2
push param3
call func
push param1 
push param2
push param3
call func

is significantly slower than

push param1 
push param2
push param3
push param4 
push param5
push param6
push param7 
push param8
push param9
call func

…

cheers,
John

yannoo · July 11, 2001, 9:20pm

Jumps and functions calls are one of slowers instructions on the 80x86/Pentium architecture …

Heu, for example, can you explain me why exist glVertexPointer and glDrawElement family commands on the OpenGL API if the cpu overhead for function call is so little

And the more important, this form of “indexed 3D data” is commonly used because this form of indirection minimize the space used for 3D data storage and can directely map to the Wavefront .obj format and a lot of others 3D data file format …

@+
Cyclone

imported_john · July 11, 2001, 10:51pm

Hello,

branches and calls have historically always been painful because it messes with the pipeline. But thats why they have branch prediction (and feeding insturctions through following calls) these days to keep the pipeline stoked.

THey might be particularly bad on intel chips, but since when has opengl been an intel chip based API? (yes, yes, i know, majority of consumer card malacky yada yada etc dicknose)

and, finally, ‘they’ use vertex pointer primarily for DMA, not necessarily to avoid function calls.

BTW, I am putting forth these arguments not necessarily because I disagree, but because being a sheep is just too sucky. =)Also, it isn’t true to say that every procedure call will really REALLY soak performance.

cheers,
John

yannoo · July 12, 2001, 1:55am

OK, functions calls aren’t really the biggest bottleneck in the OpenGL API

But, my real problem is that i find that i have always to create a LOT of new vertices for nothing …

For example, a “good textured cube” is for me only 8 SHARED vertex coordinates, 8 SHARED colors, 4 SHARED textures coordinates, 6 SHARED normals and 6 faces where each properties of each vertex can be independantly indexed. But this is certainly not 24 vertices of the form {x,y,z, r,g,b, i,j,k} that are NOT SHARED between the 6 faces of the cube …

For me, and on this example of the cube, the OpenGL VertexArray mechanism force to allocate 3x more memory that necessary …

@+
Cyclone

mcraighead · July 12, 2001, 8:39am

Immediate mode really is a slow API. Don’t use it in production code. Replicate vertices if you need to, that’s not a big deal.

Matt

yannoo · July 12, 2001, 10:01pm

Why do you say that the OpenGL API is slow ???

Personnaly, I found this immediate mode API very speed : my current “Cuboule” demo/test run at more that 50 fps on 1600x1200 32 bits with my GeForce2 Ultra on a Athlon 1Ghz with Xfree 4.03 and latest NVidia drivers
=> for me, it’s really a speed 3D API

OK, this demo didn’t use already multi-texture extensions such as bumpmapping or multi-passe algorithms such as reflexions, but I’m certain that the OpenGL 1.2 API coupled with GeForce2 extensions and a little of experience have really the potential for to implement this extensions on my demo without sacrify the real-time framerate.

No, my real problem is more theoric and about the size of the the 3D data storage : I begin to work with some .obj that can have more than 30K vertices and the desindexation of 3D data, necessarry for the OpenGL VertexArray mechanism, can frequently generate more that 65K vertices, thing that cannot handle directly the current Nvidia OpenGL/GLX implementation for Linux (limited to unsigned short, cf. 65K vertices, if my memory is good).

@+
Cyclone

V-man · July 14, 2001, 3:08am

If you are not seeing a difference between immediate mode and indexed mode, then go ahead and use immediate.

Typically, immediate is slower.

Besides, I think this has been suggested before.

V-man

Dirk · July 20, 2001, 2:29pm

> Immediate mode really is a slow API. Don’t use it in production code.
> Replicate vertices if you need to, that’s not a big deal.

It might not be for static objects, but for dynamic stuff (e.g. progressive meshes) it can hurt having to know which vertices are really identical and need to be updated, too. Furthermore the convenience of being able to manipulate a color array without having to worry about which faces/vertices are actually influenced is useful, too.

I’d like to propose something like interleaved indices, where you have a bunch of indices for every vertex. Then you either select one of a number of predefined formats (a la Interleaved Arrays) or give a mapping from index to attribute (i.e. index 0:Vertex|Color, 1:Normal etc.). The former is simpler to implement, the latter more flexible and extensible

Memory access will be worse than InterleavedArrays, but should be very comparable to separate arrays. I can’t really judge how much additional work the driver has to do to support this, or if current chips can do this at all, but I do like the idea and think it shouldn’t be too bad.

Comments?

kieranatwork · August 16, 2001, 4:45am

Please can we have the glDrawElementsEXT? It’s a much nicer solution than duplicating vertices…which flies in the face of the original idea behind indexed arrays.
When I’m deforming my geometries using dynamics, I have to apply the change I make to 1 vertex to a whole bunch of others because they really should be the same vertex…it’s a pain in the arse, and slows things down.
Why can’t we have this extension?

yannoo · August 17, 2001, 5:17am

typedef struct{

GLint	 size;
GLenum   type;
GLsizei	 stride;
GLuchar *pointer;
void    (*func)(GLvoid *)
GLuint  *index;

}GLArrayNEW;

GLArrayNEW colorArray;
GLArrayNEW normalArray;
GLArrayNEW vertexArray;
GLArrayNEW texcoordArray;

glVertexPointerNEW(GLint size, GLenum type, GLsizei stride, const GLvoid *pointer){

vertexArray.size = size;
vertexArray.type = type;
vertexArray.stride = stride;
vertexArray.pointer = pointer;

switch(type){

	case GL_FLOAT :
	switch(size){
		case 1 : vertexArray.func = glTexcoord1fv; break;
		case 2 : vertexArray.func = glTexcoord2fv; break;
		case 3 : vertexArray.func = glTexcoord3fv; break;
		case 4 : vertexArray.func = gltexcoord4fv; break;
	};
	break;

	case GL_UNSIGNED_INT :
	case GL_INT  :
	switch(size){
		case 1 : vertexArray.func = glTexcoord1iv; break;
		case 2 : vertexArray.func = glTexcoord2iv; break;
		case 3 : vertexArray.func = glTexcoord3iv; break;
		case 4 : vertexArray.func = gltexcoord4iv; break;
	};
	break;

	case GL_UNSIGNED_SHORT :
	case GL_SHORT  :
	switch(size){
		case 1 : vertexArray.func = glTexcoord1sv; break;
		case 2 : vertexArray.func = glTexcoord2sv; break;
		case 3 : vertexArray.func = glTexcoord3sv; break;
		case 4 : vertexArray.func = gltexcoord4sv; break;
	};
	break;

	case GL_UNSIGNED_BYTE :
	case GL_BYTE  :
	switch(size){
		case 1 : vertexArray.func = glTexcoord1cv; break;
		case 2 : vertexArray.func = glTexcoord2cv; break;
		case 3 : vertexArray.func = glTexcoord3cv; break;
		case 4 : vertexArray.func = gltexcoord4cv; break;
	};
	break;

	default : vertexArray.func = NULL;
		  break;
}

glVertexPointer(size, type, stride, pointer);

}

glNormalPointerNEW(GLint size, GLenum type, GLsizei stride, const GLvoid *pointer){

normalArray.size = size;
normalArray.type = type;
normalArray.stride = stride;
normalArray.pointer = pointer;

switch(type){

	case GL_FLOAT :		normalArray.func = glNormal3fv; break;
	case GL_UNSIGNED_INT :
	case GL_INT  :		normalArray.func = glNormal3iv; break;  
	case GL_UNSIGNED_SHORT :
	case GL_SHORT  :	normalArray.func = glNormal3sv; break;
	case GL_UNSIGNED_BYTE :
	case GL_BYTE  :		normalArray.func = glNormal3cv; break;

	default : 		normalArray.func = NULL;
}

glnormalPointer(size, type, stride, pointer);

}

glColorPointerNEW(GLint size, GLenum type, GLsizei stride, const GLvoid *pointer){

colorArray.size = size;
colorArray.type = type;
colorArray.stride = stride;
colorArray.pointer = pointer;

switch(type){

	case GL_FLOAT :
	switch(size){
		case 1 : colorArray.func = glColor1fv; break;
		case 2 : colorArray.func = glColor2fv; break;
		case 3 : colorArray.func = glColor3fv; break;
		case 4 : colorArray.func = glColor4fv; break;
	};
	break;

	case GL_UNSIGNED_INT :
	case GL_INT  :
	switch(size){
		case 1 : colorArray.func = glColor1iv; break;
		case 2 : colorArray.func = glColor2iv; break;
		case 3 : colorArray.func = glColor3iv; break;
		case 4 : colorArray.func = glColor4iv; break;
	};
	break;

	case GL_UNSIGNED_SHORT :
	case GL_SHORT  :
	switch(size){
		case 1 : colorArray.func = glColor1sv; break;
		case 2 : colorArray.func = glColor2sv; break;
		case 3 : colorArray.func = glColor3sv; break;
		case 4 : colorArray.func = glColor4sv; break;
	};
	break;

	case GL_UCHAR :
	case GL_CHAR  :
	switch(size){
		case 1 : colorArray.func = glColor1cv; break;
		case 2 : colorArray.func = glColor2cv; break;
		case 3 : colorArray.func = glColor3cv; break;
		case 4 : colorArray.func = glColor4cv; break;
	};
	break;

	default : colorArray.func = NULL;
		  break;
}

glColorPointer(size, type, stride, pointer);

}

glTexcoordArrayNEW(GLint size, GLenum type, GLsizei stride, const GLvoid *pointer){

texcoordArray.size = size;
texcoordArray.type = type;
texcoordArray.stride = stride;
texcoordArray.pointer = pointer;

switch(type){

	case GL_FLOAT :
	switch(size){
		case 1 : texcoordArray.func = glTexcoord1fv; break;
		case 2 : texcoordArray.func = glTexcoord2fv; break;
		case 3 : texcoordArray.func = glTexcoord3fv; break;
		case 4 : texcoordArray.func = glTexcoord4fv; break;
	};
	break;

	case GL_UNSIGNED_INT :
	case GL_INT  :
	switch(size){
		case 1 : texcoordArray.func = glTexcoord1iv break;
		case 2 : texcoordArray.func = glTexcoord2iv; break;
		case 3 : texcoordArray.func = glTexcoord3iv; break;
		case 4 : texcoordArray.func = glTexcoord4iv; break;
	};
	break;

	case GL_UNSIGNED_SHORT :
	case GL_SHORT  :
	switch(size){
		case 1 : texcoordArray.func = glTexcoord1sv; break;
		case 2 : texcoordArray.func = glTexcoord2sv; break;
		case 3 : texcoordArray.func = glTexcoord3sv; break;
		case 4 : texcoordArray.func = glTexcoord4sv; break;
	};
	break;

	case GL_UNSIGNED_BYTE :
	case GL_BYTE  :
	switch(size){
		case 1 : texcoordArray.func = glTexcoord1cv; break;
		case 2 : texcoordArray.func = glTexcoord2cv; break;
		case 3 : texcoordArray.func = glTexcoord3cv; break;
		case 4 : texcoordArray.func = glTexcoord4cv; break;
	};
	break;

}

glTexcoordPointer(size, type, stride, pointer);

}

glVertexIndexiNEW(GLuint *indices){

vertexArray.index = indices;

}

glNormalIndexiNEW(GLuint *indices){

normalArray.index = indices;

}

glColorIndexiNEW(GLuint *indices){

colorArray.index = indices;

}

glTexcoordIndexiNEW(GLuint *indices){

texcoordArray.index = indices;

}

glArrayElementNEW(GLuint i){

if( texcoordArray.func && texcoordArray.index && texcoordArray.stride )
		texcoordArray.func(texcoordArray.pointer+(texcoordArray.index[i]*texcoordArray.stride));		

if( colorArray.func && colorArray.index && colorArray.stride )
		colorArray.func(colorArray.pointer+(colorArray.index[i]*colorArray.stride));

if(normalArray.func && normalArray.index && normalArray.stride )
	normalArray.func(normalArray.pointer+(normalArray.index[i]*normalArray.stride));

if(vertexArray.func && vertexArray.index && vertexArray.stride )
	vertexArray.func(vertexArray.pointer+(vertexArray.index[i]*vertexArray.stride));
else
	glDrawElement(i);

}

glDrawArraysNEW(GLenum mode, GLint first, GLsizei count){

glBegin(mode);
while(count){
	glArrayElementNEW(first);
	first++;
	count--;
}
glEnd();

}

glDrawElementsEXT(GLenum mode, GLsizei count, GLenum type, const GLvoid *indices){

int i;
GLubyte  *pubyte;
GLushort *pushort;
GLuint   *puint;

glBegin(mode);
switch(type){
	case GL_UNSIGNED_BYTE : pubyte = indices;
	for(i=0;i&lt;count;i++)
		glArrayElementNEW(*pubyte++);
	break;

	case GL_UNSIGNED_SHORT : pushort = indices;
	for(i=0;i&lt;count;i++)
		glArrayElementNEW(*pushort++);
	break;

	case GL_UNSIGNED_INT : puint = indices;
	for(i=0;i&lt;count;i++)
		glArrayElementNEW(*puint++);
	break;
}
glEnd();

}

Perhaps a beginning for this extension ???

yannoo · August 17, 2001, 6:07am

I have make errors in the glVertexArrayNEW() function :

You have to read

“vertexArray.func = glVertex…; break;”

and not

“vertexArray.func = glTexcoord…; break;”

But I think that you have already correct this yourself

@+
Cyclone

santyhamer · August 19, 2001, 12:45am

Great idea!!! I have the same problem:
I use pure triangle list instead indexed-shared triangles. This is because in 3dsmax, exist “smooth groups”. Each triangle face has 3 vertex normals ( for making sharp accentuated edges ), 3 vertex coordinates ( for using UVW Unwrap, face-map…) and 3 indices to vertex positions. The problem is that using pure trinagle lists ( i think the UNIQUE valid method for preserving smooth groups and accentuated sharp edges ), i am replicating the vertices positions 3 times… and them, I CAN’T use vertex cache of Geforce… So, WE NEED really this special form of indexing you propose.

zander76 · August 13, 2003, 6:47am

Hello All

Has this been suggested to gl. Does anybody know were this stands. I am currently trying to find a solution to the same problem.

Ben

zander76 · August 13, 2003, 6:52am

Originally posted by cyclone:
[b]Why do you say that the OpenGL API is slow ???

Personnaly, I found this immediate mode API very speed : my current “Cuboule” demo/test run at more that 50 fps on 1600x1200 32 bits with my GeForce2 Ultra on a Athlon 1Ghz with Xfree 4.03 and latest NVidia drivers
=> for me, it’s really a speed 3D API

OK, this demo didn’t use already multi-texture extensions such as bumpmapping or multi-passe algorithms such as reflexions, but I’m certain that the OpenGL 1.2 API coupled with GeForce2 extensions and a little of experience have really the potential for to implement this extensions on my demo without sacrify the real-time framerate.

No, my real problem is more theoric and about the size of the the 3D data storage : I begin to work with some .obj that can have more than 30K vertices and the desindexation of 3D data, necessarry for the OpenGL VertexArray mechanism, can frequently generate more that 65K vertices, thing that cannot handle directly the current Nvidia OpenGL/GLX implementation for Linux (limited to unsigned short, cf. 65K vertices, if my memory is good).

@+
Cyclone[/b]

Hello, actually immediate mode will never reach the speed of a vertex buffer or a display list for a few reasons.

For display lists, it gives the driver writers a chance to optimize, remove duplicate states, change the format to better match the card. You name it.

As far as vertex buffers, current video cards run most efficient when reaching 200 excecutions per call (can’t remember the name for this term off hand). With imitiate mode its a one to one relation ship. One call and one vertex being processed.

So the reason for this, one it give the driver writer control over the loop and were its running, more notably the cpu can be used to process at the same time. Also video cards have multiple pipelines to process verticies.

The last thing to consider. A box has 8 verticies but to renderer in immediate mode you will have to make 24 calls to glVertex3f.

Later

zander76 · August 13, 2003, 8:03am

Hello Everybody

Has anybody figured out how this new extention is should work. Here is my guess.

// Set the vertex pointer
glVertexPointerNEW(3, GL_FLOAT, 0, vertexList);

// Set the texture coordinate list
glTexCoordPointerNEW(2, GL_FLOAT, 0, textureCoord);

// Now here come the trick part
glDrawElementsEXT(GL_TRIANGLES, Count, GL_UNSIGNED_SHORT, pVertexIndices);

Do i have to call glDrawElementsEXT() for texture coordinates as well… If so what is the proper order. Should the texture coords go first or the verticies.

OR

Do i set the indicies first… Something like this.

// Set the vertex pointer
glVertexPointerNEW(3, GL_FLOAT, 0, vertexList);

// Set the texture coordinate list
glTexCoordPointerNEW(2, GL_FLOAT, 0, textureCoord);

// Set the index lists
glVertexIndexNEW(pVertexIndicies);
glTexcoordIndexNEW(pTexIndicies);

// So what is the last option for here then
glDrawElementsEXT(GL_TRIANGLES, Count, GL_UNSIGNED_SHORT, pVertexIndices);

As you can see i have set the Index twice. That does now quite seem right to me.

Thanks, Ben

[This message has been edited by zander76 (edited 08-13-2003).]

vincoof · August 13, 2003, 9:58am

One of the goals of the vertex array extension (yes it is an extension) is to provide grouped vertex data on the pipeline. If the vertex array, color array, normal array, whatever array has to look up a different index table, it will significantly reduce performance and may be worth of immediate-mode performance.

Moreover, interleaved arrays become really pointless with independant index array lookups.

So, I tend to join what matt said : replicate vertices. It will be significantly faster and does not waste too many memory (especially since new index arrays also use memory). It’s also more efficient to cache vertices this way, as it is ensured that vertex, normals, texcoords, etc are always grouped.

zeckensack · August 13, 2003, 12:47pm

Don’t want to spoil the party, but the classic box example is about the only thing that significantly benefits. It’s the pathological corner case.

Then we have index traffic. An int index per color is twice the fetch bandwidth compared to just a color per vertex. That may be reduced by caches, approaching equal bandwidth. Ditto for fog coords.

Normals, tex coords and positions will benefit more, but I’m tempted to say that there’s nowhere near enough benefit to be had in real meshes to justify this kind of (hardware!) complexity.

Who primarily renders cubes anyway?

zander76 · August 13, 2003, 7:46pm

Hello everybody.

I have been trying to implement this extention. It would be really nice to submit to the review board. Anyway, i ran into a problem.

// This does not work
vertexArray.func = glVertex3fv;

The declaration uses GLvoid* as its parameter while glVertex3fv uses const GLfloat*. My compiler will not compile this. Does anybody know how to cast the parameter of glVertex3fv or change the declaration of the original function.

Ben

Cyranose · August 14, 2003, 9:31am

Originally posted by cyclone:
Exist one OpenGL command that can display geometries when colors, normals, texCoords and vertices use separates indexes, but with only one call per vertex ?

Ignoring the tangential issues here, the original question was whether it’s useful to have separate indices for the various vertex components when using vertex arrays.

Unfortunately, it is (and not just for the cube example), though I don’t think it would be easy to even simulate unless such an extension converts multi-indexed data to single-indexed data by duplicating on the fly (yuck, no go).

The desire for such a feature (for me) is not to avoid duplicating colors or texcoords in memory. It’s mainly to avoid duplicating vertices where two vertices at the same point in space have different attributes, such as different normals, different texcoords, or different colors.

The problem comes in in trying to optimize vertex caching. Two nearly identical vertices can’t be cached as one. This shows up wherever you need discontinuities in normals (e.g., hard edges), colors, or texcoords.

But under the hood, it’s likely that the vertex cache stores not only the post-transformed vertex (and normal) but the other post-transformed post-lit parameters too, all keyed by the original index value. So there’s a real question as to whether such an extension could reasonably be supported in HW without multiple caches and lots of extra complexity. And it doesn’t make much sense for vertex programs, where you’d want to cache the computed VP outputs, not the separate inputs.

Anyway, back to the original question. If you have a small number of colors, normals, and/or tex coords and want to re-use those with separate indices, you can simulate this “extension” with a simple vertex program. Load your arrays colors, texcoords, normals, into VP constants (up to the limits, say 96 total vec4s for 1st gen HW) and send down vertices with XYZ and color only. In the VP, use those color R,G,B values to do one or more table lookups for real colors, texcoords, and normals stored in the registers and emit the combined result.

This, of course, does not solve the cache re-use issue either. But I assume (or at least hope) vertices are cached post VP execution.

Avi

[This message has been edited by Cyranose (edited 08-15-2003).]