PDA

View Full Version : Problem with VBOs



Count Duckula
09-25-2006, 05:42 AM
Hello all,

I'm learning how to use VBOs but I have a problem. I have a model that is about 10k triangles and I can render it using vertex arrays or VBOs. With vertex arrays I get about 75fps but if I switch to VBOs I get about 0.003fps... so I must be doing something wrong.

My code looks like this



void InitializeVBOs()
{
int
totalVerticesSize = 0,
totalNormalsSize = 0,
totalTxCoordsSize = 0,
numTextures = 0,
numVertices = 0,
totalVBOSize = 0;

if(
( m_VBOsSupported ) &&
( m_pGeometry[ 0 ] != NULL ) &&
( m_pGeometry[ 0 ]->Get_NumVertices() > 0 )
)
{
numVertices = m_pGeometry[ 0 ]->Get_NumVertices();
numTextures = m_pGeometry[ 0 ]->GetNumTextures();

totalVerticesSize = ( sizeof(float) * 3 ) * numVertices; // for vertices
totalNormalsSize = ( sizeof(float) * 3 ) * numVertices; // for normals
totalTxCoordsSize =
( sizeof(float) * 3 ) * numVertices * numTextures; // for tx coords
totalVBOSize = totalVerticesSize + totalNormalsSize + totalTxCoordsSize;

glGenBuffersARB( 1, &m_GPUVertexBufferID );
glGenBuffersARB( 1, &m_GPUIndexBufferID );

// Bind the vertex data vbo.
glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_GPUVertexBufferID );

// Create a data store big enough for vertices, normals, and texture coords.
glBufferDataARB( GL_ARRAY_BUFFER_ARB, totalVBOSize, NULL, GL_STATIC_DRAW_ARB );

// Load vertices
glBufferSubDataARB(
GL_ARRAY_BUFFER_ARB,// target
0, // start from
totalVerticesSize, // total bytes of data
m_pGeometry[0]->m_vertices // data
);

// Load normals
glBufferSubDataARB(
GL_ARRAY_BUFFER_ARB,// target
totalVerticesSize, // start from
totalNormalsSize, // total bytes of data
m_pGeometry[0]->m_vertexNormals // data
);

// Load texture coordinates
glBufferSubDataARB(
GL_ARRAY_BUFFER_ARB,// target
totalVerticesSize + totalNormalsSize, // start from
totalTxCoordsSize, // total bytes of data
m_pGeometry[0]->m_txCoords // data
);

// Load the indices
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_GPUIndexBufferID );

glBufferDataARB(
GL_ELEMENT_ARRAY_BUFFER, // target
m_numIndices * sizeof(unsigned short), // total size
m_pGeometry[0]->m_indices, // data
GL_STATIC_DRAW_ARB
);
}
}

// ---

void Render()
{
int
totalVerticesSize = 0,
totalNormalsSize = 0,
numvertices = 0,
numtextures = 0;

//...
//...

if( m_pGeometry[0] != NULL )
{
numvertices = m_pGeometry[0]->Get_NumVertices();
numtextures = m_pGeometry[0]->GetNumTextures();
totalVerticesSize = ( sizeof(float) * 3 ) * numvertices;
totalNormalsSize = ( sizeof(float) * 3 ) * numvertices;

// If VBOs are supported and we have valid buffers
if(
m_VBOsSupported &&
( m_GPUVertexBufferID > 0 ) &&
( m_GPUIndexBufferID > 0 )
)
{
// Bind the data buffer
glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_GPUVertexBufferID );

// Bind the element buffer
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_GPUIndexBufferID );

// Setup pointers

glVertexPointer(
3, // coords
GL_FLOAT, // floats
0, // stride is 0, vertices are contiguous
0 // offset is 0, vertices are at the start of the buffer
);

glNormalPointer(
GL_FLOAT, // floats
0, // stride is 0, normals are contiguous
BUFFER_OFFSET( totalVerticesSize ) // offset is (total size of vertices)
);

/*
glTexCoordPointer(
2, // 2 coordinates
GL_FLOAT, // floats
sizeof(float), // stride is one float
BUFFER_OFFSET( totalVerticesSize + totalNormalsSize ) // offset into the buffer
);
*/
}
else
{
// SET VERTEX AND NORMAL POINTER

glVertexPointer(3, GL_FLOAT, 0, m_pGeometry[0]->m_vertices);
glNormalPointer(GL_FLOAT, 0, m_pGeometry[0]->m_vertexNormals);
/*glTexCoordPointer( 3, GL_FLOAT, sizeof(float), m_pGeometry[0]->m_txCoords ); */

}

groupIndices = m_pGeometry[0]->Get_NumIndices();

if( m_VBOsSupported )
{
start_time = timeGetTime() / 1000.0f;

glDrawElements(
GL_TRIANGLES,
groupIndices,
GL_UNSIGNED_SHORT,
0
);

end_time = timeGetTime() / 1000.0f;
sprintf(tmpStr, "VBO CALL = %f", end_time - start_time);
MessageBox(NULL, tmpStr, "VBO CALL", MB_OK );

glBindBufferARB( GL_ARRAY_BUFFER_ARB, 0 );
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, 0 );
}
else
{
start_time = timeGetTime() / 1000.0f;

glDrawElements(
GL_TRIANGLES,
groupIndices,
GL_UNSIGNED_SHORT,
m_pGeometry[0]->m_indices
);

end_time = timeGetTime() / 1000.0f;
sprintf(tmpStr, "VA CALL = %f", end_time - start_time);
MessageBox(NULL, tmpStr, "VA CALL", MB_OK );
}

}
}

With the message box calls I could see the problem seems to be with glDrawElements. For VBOs, the call takes about 1.3 seconds.

Any ideas what could be wrong ?? :(

Also, I downloaded a demo from Delphi3D that renders a terrain using VBOs. The demo renders about 2 million triangles and I get about 9fps. The only difference is that the demo uses an interleaved array for vertices/colors.
Is there anything wrong with the way I setup the data in the VBOs?

Thanks for any help!

Komat
09-25-2006, 07:36 AM
EDIT: Because the slowdown is so big it is possible that the driver is falling to sw fallback because of the limitations of the hardware (initially I assumed that this might be caused by bad interaction with the GPU caches however the numer triangles is low). Imho the most likely reason is the size of the offset (it is too big or there is too big difference between individual inputs) inside the VBO for normal and texture coordinate pointers. Try to use separate VBO for each part of the vertex or interleave them in single VBO.

Count Duckula
09-25-2006, 11:08 AM
Thanks for your reply Komat, I tried using a different VBO for normals and another for TexCoords but the result is still the same. If I interleave them in a single VBO how can I use multitexturing ? would I need a separate interleaved VBO for each texture ?

Komat
09-25-2006, 12:23 PM
Originally posted by Count Duckula:
Thanks for your reply Komat, I tried using a different VBO for normals and another for TexCoords but the result is still the same.
That is strange. What happens if you use only the position array without normals or texture coordinates?.



If I interleave them in a single VBO how can I use multitexturing ? would I need a separate interleaved VBO for each texture ? No. I did not meant to use the InterleavedArrays api which is old and limited, you can interleave manually. The VBO will contain array of structures similiar to the following:

struct VertexStruct {
float x, y, z ;
float nx, ny, nz ;
float u0, v0 ;
float u1, v1 ;
float u2, v2 ;
}And then you will set the array pointers in such way that the stride is sizeof( VertexStruct ) and the offset corresponds to offset of corresponding fields within this structure (e.g. offset of u2 for the third texture coordinates array)

Count Duckula
09-25-2006, 02:27 PM
Originally posted by Komat:
That is strange. What happens if you use only the position array without normals or texture coordinates?.
I've tried it and it shows the model but with flat color since normals are missing but the speed is the same.


No. I did not meant to use the InterleavedArrays api which is old and limited, you can interleave manually. The VBO will contain array of structures similiar to the following:

struct VertexStruct {
float x, y, z ;
float nx, ny, nz ;
float u0, v0 ;
float u1, v1 ;
float u2, v2 ;
}And then you will set the array pointers in such way that the stride is sizeof( VertexStruct ) and the offset corresponds to offset of corresponding fields within this structure (e.g. offset of u2 for the third texture coordinates array) Ahh, ok :) , I hadn't understood that part. I'll try it as well.

Komat
09-25-2006, 10:15 PM
Originally posted by Count Duckula:
I've tried it and it shows the model but with flat color since normals are missing but the speed is the same.
With the same speed you meant that even with only the position it is slow with VBO and fast without VBO?

Count Duckula
09-26-2006, 04:01 AM
Originally posted by Komat:
With the same speed you meant that even with only the position it is slow with VBO and fast without VBO? Yep :(

Komat
09-26-2006, 06:34 AM
Which graphics card do you have?

Komat
09-26-2006, 08:21 AM
It is also possible that something different, unrelated to VBO, is causing sw vertex processing. Because the VBOs are likely to be stored in video memory, this might cause additional performance hit for the sw emulation.

Count Duckula
09-26-2006, 10:22 AM
I have an ATI Mobility Radeon 7500 and I've just tried it on a GeForceFx 5500 and it's very different. On the 5500 I get almost the same speed with VA (160fps windowed, 190fps fs) and VBOs (180fps windowed, 200fps fs). I had thought it would run faster on one of those cards but maybe it's fillrate limited or it's just the pc that's not a high end one hehe..

However, on the 7500, it seems like the driver doesn't like glDrawElements. I had a look at other VBO code on NeHe, basically what it does is duplicate the vertices so it doesn't use indices. I did the same for testing, so I use glDrawArrays instead of glDrawElements and it's better, it's about 8fps, but the VA is still about 70fps.

One thing though... the NeHe code runs at about 60fps, and my duplicated vertices code runs at 8fps, so I might still be doing something stupid hehe.

09-26-2006, 11:02 AM
can you post an app that people can test for themselves (including source)?

songho
09-26-2006, 12:09 PM
I believe Radeon 7500 supports upto OpenGL v1.3, and VBO in software mode. As far as I know, you need Radeon 9600 or higher video cards to run VBO in hardware mode.

Komat
09-26-2006, 12:09 PM
Originally posted by Count Duckula:

However, on the 7500, it seems like the driver doesn't like glDrawElements. I had a look at other VBO code on NeHe, basically what it does is duplicate the vertices so it doesn't use indices. I did the same for testing, so I use glDrawArrays instead of glDrawElements and it's better, it's about 8fps, but the VA is still about 70fps.
The increase of speed when you change to the nonindexed draw might be explained by driver reading the video memory sequentially instead of using the random access based on indices.
The Radeon 7500 is old card with limited vertex processing capabilities so it is quite posible, that you use some vertex feature that is not hw accelerated. Are you using texture matrices, texgens, clip planes, polygon offsets, double sided lighting, separate specular or something similiar that is not used by that NeHe tutorial?

Komat
09-26-2006, 12:16 PM
Originally posted by songho:
I believe Radeon 7500 supports upto OpenGL v1.3, and VBO in software mode. As far as I know, you need Radeon 9600 or higher video cards to run VBO in hardware mode. The Radeon 8500 definitelly can run VBO in hw mode too. For the R7500 the limitation might only come from the driver itself because it can always store the VBO in system memory like ordinary vertex arrays if the hw does not support reading them from different place, however the NeHe VBO code runs fast.

songho
09-26-2006, 05:14 PM
Komat,
Yes, Radeon 8500 is older than 9xxx cards, but it is faster than some newer generation cards, for example, 8500 is faster than 9200. (Here is a naming confusion again.)

What I found was that VBO performance is very poor if glGetString(GL_VERSION) is 1.3 on Radeon cards. (I believe it result from hardware limitation.) Is your 8500 reports v1.3 or v1.5(or v2.0) on windows?

Count Duckula
09-26-2006, 05:32 PM
Thanks for all the replies :)


can you post an app that people can test for themselves (including source)? Got swamped at work today but I'll try to upload it tomorrow =)



I believe Radeon 7500 supports upto OpenGL v1.3, and VBO in software mode. As far as I know, you need Radeon 9600 or higher video cards to run VBO in hardware mode.Ahhh good point, I didn't know that... I checked it (with glGetString ang GLee as well) and it's 1.3 indeed.



The increase of speed when you change to the nonindexed draw might be explained by driver reading the video memory sequentially instead of using the random access based on indices.
Yep, that makes sense.



Are you using texture matrices, texgens, clip planes, polygon offsets, double sided lighting, separate specular or something similiar that is not used by that NeHe tutorial
Not really hehe, I was planning to incorporate that code in another app that uses more features.

Count Duckula
09-27-2006, 06:44 AM
I think I found the problem... I went from ~8 to ~200fps on the 7500 when commenting out these lines:


//glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST);
//glHint(GL_LINE_SMOOTH_HINT, GL_NICEST);
//glHint(GL_POLYGON_SMOOTH_HINT, GL_NICEST);

//glLightfv(GL_LIGHT0, GL_POSITION, defLight_position);
//glLightfv(GL_LIGHT0, GL_SPECULAR, defLight_specular);
//glLightfv(GL_LIGHT0, GL_AMBIENT, defLight_ambient);
glLightfv(GL_LIGHT0, GL_DIFFUSE, defLight_diffuse);

Although it would be good to know why...

jide
09-27-2006, 07:24 AM
Theoretically this should not be the reason why just because when you use 'normal' arrays, the rendering is faster. Light calculations slow things down, this turns out, but they really not directly linked with VBO.

What have you got if you use VBO without indicies ?

Also, give the full sources, I guess it will be easier.

Komat
09-27-2006, 09:01 AM
Originally posted by Count Duckula:
Although it would be good to know why... It is combination of two things:
1) One or more from those lines (probably something from the glHints) forced the driver to do software vertex processing instead of using hw vertex processing unit because the hw does not support that particular feature.

2) The VBO content was very likely stored in video memory or other uncached memory so the graphics card does have fast access to it. Reading from such memory by CPU is very slow, especially if special care is not taken.

Because the driver was emulating vertex processing on CPU and was reading from the memory optimized for GPU access, the result was slow. Without the VBOs, the driver was reading data from cached memory that is optimized for access by the CPU and the performance was significantly better. When you commented out those lines, the vertex processing reverted to the hw one which has no problems with reading from the memory in which the VBO was stored.

jide
09-28-2006, 09:19 AM
I really don't understand why a glHint could participate to produce such a drop. Isn't that glHint only a hint for the driver, not an obligation to exectute ? In this case, can this be considered as a driver bug, or should I accept that as a normal behavior ?

Count Duckula
09-28-2006, 10:05 AM
Originally posted by jide:
...
What have you got if you use VBO without indicies ?

Also, give the full sources, I guess it will be easier. Not sure but I think after I removed the lines the speed was almost the same as using indices.

Anyway, if anyone is interested here are the links to source and exe. The VBO code is in the ABTNode class (InitializeVBOs, Render..)

Source (1.7MB) (http://www.3dgloom.net/code/abtvbo_27092006_v1.4_source.rar)
Exe (1.2MB) (http://www.3dgloom.net/code/abtvbo_27092006_v1.4_exe.rar)

It's an app I coded last year but I had never used VBOs hehe...


Originally posted by Komat:

It is combination of two things:
1) One or more from those lines (probably something from the glHints) forced the driver to do software vertex processing instead of using hw vertex processing unit because the hw does not support that particular feature.

2) The VBO content was very likely stored in video memory or other uncached memory so the graphics card does have fast access to it. Reading from such memory by CPU is very slow, especially if special care is not taken.

Because the driver was emulating vertex processing on CPU and was reading from the memory optimized for GPU access, the result was slow. Without the VBOs, the driver was reading data from cached memory that is optimized for access by the CPU and the performance was significantly better. When you commented out those lines, the vertex processing reverted to the hw one which has no problems with reading from the memory in which the VBO was stored.
Thanks for that explanation :)

Komat
09-28-2006, 10:46 AM
Originally posted by jide:
I really don't understand why a glHint could participate to produce such a drop. Isn't that glHint only a hint for the driver, not an obligation to exectute ?Yes, it is not obligation. The driver was tasked to use the highest quality option and it probably decided to provide the highest quality regardless of the cost. That is perfectly valid behaviour and maybe it was beneficial for some old pre-VBO application.