VBO via PBO

Hi everyone,

I am trying to use VBO for render a set of points. I need to update every render the position of the points. I compute the new coords in the GPU. When it finishes I have a texture with the new points. To render them, I use a VBO. I fill the VBO using PBO strategy.

My problem is that the VBO update + draw is very slow (or I think so). My dataset has 1024x1024 points with XYZW coords of 32bits float. I define the VBO as:
glBufferData(GL_PIXEL_PACK_BUFFER_EXT,size()4sizeof(float), NULL, GL_DYNAMIC_DRAW);

When I draw the points I use the procedure:
glEnableClientState( GL_VERTEX_ARRAY );
glBindBuffer( GL_ARRAY_BUFFER, vbo );
glVertexPointer( 4, GL_FLOAT, 0, (char *) NULL );

glDrawArrays( GL_POINTS, 0, size());

When I draw the points without the update of coords the framerate isn’t very bad (60fps), but when I update the VBO values, the framerate is 1 fps. I read some whitepaper that specifies that the call glVertexPointer is very expensive and it needs to be called 1 time per VBO. But it is called every render :confused:

Someone knows why the update of the VBO is so slow?¿ The new points are inside the GPU in a texture…so I supose the transfer to the VBO could not be so slow.

Any idea?¿

Thanks!

How do you update the vbo? It is possible that the driver copies the data without assistence of the gpu.

I update my vbo with this:

glBindBuffer(GL_PIXEL_PACK_BUFFER_EXT, vbo);

// read the vertex data back from framebuffer
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels(0, 0, m_width, m_height,
GL_RGBA, GL_FLOAT, data);

glBindBuffer(GL_PIXEL_PACK_BUFFER_EXT, 0);

I have the new data in a FBO. How can I know if the texture is not suported and the work is done by the driver?


I have the new data in a FBO. How can I know if the texture is not suported and the work is done by the driver?

Unfortunatelly, there is no way, except for “hmm, this is too slow” to determine that.

I did not use the PBOs myself so my knowledge of possible issues related to it is limited and I can give you only some ideas to try:

The example in the PBO specification uses GL_BGRA format for the glReadPixels so you can try that.

It seems that you are using the FBO to render the data, try to use ordinary back buffer.

Are you using renderbuffers or bound textures for the FBO rendering? Try switch to the oposite.

Thanks Komat!

I will try this options. But I have some questions:

If I use back buffer the values will be clamp to 1.0? The values are coords, so I don’t want that the values are inside the range [0…1]

I use texture for my FBO, I know that this question is trivial, but I don’t know exactly the meaning of use renderbuffers in FBO? Which are the difference between texture and renderbuffer?

This sample does what you are talking about. Try STREAM_COPY hint instead, since you really want to copy from GPU->GPU, not draw from CPU->GPU.

Originally posted by speed:

If I use back buffer the values will be clamp to 1.0?

Yes, altrough with pbuffers and appropriate extensions if might be possible to create context that stores floating point values.


I use texture for my FBO, I know that this question is trivial, but I don’t know exactly the meaning of use renderbuffers in FBO? Which are the difference between texture and renderbuffer?

The renderbuffers are similiar to ordinary back buffer. You can not texture from the renderbuffers and they do not have various power of two size limitations. Internally it is possible that they have different memory layout (optimized for rendering into the renderbuffer, similiar to the ordinary back buffer) than the textures (optimized for rendering from the texture). It is possible that the glReadPixels is optimized only for the back buffer layout.

If you are using an FBO to render to texture make sure that you unbind it when you are done rendering to it, don’t just set the draw buffer back to the back buffer. If you only do the latter the framerate will be very low, if FBO’s are unbound however the framerate will be very high. Hope this helps.

I tried to do all your advices, but I have the same results.

@arekkusu:
I change the flag to STREAM_COPY, and I compare the code of the sample with mine, are they do the same things. I can’t run it because I don’t have any apple, so I can’t benchmark they solution. Do you run it? How many fps can do?

@brtnrdr:
Do you mind that I do ‘BindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0)’ after my rendering? I done it.

@Komat:
I can’t use backbuffer, because I need float precision without clamp. For this reason I use FBO with GL_TEXTURE_RECTANGLE_ARB and GL_RGBA32F_ARB.

My application is very similar than a particule system render. There is anyone that use particles?

For each iteration, I compute the new position of the particles, then I sorted them to render correctly, and finally read them to an VBO and render it with DrawArrays.

My dataset have actually 1024x1024 particles and I have 2 fps. Any suggestions?¿

So sorry…

Just after I wrote the post I saw that I don’t disable the sort process. Now I obtain 70fps. My bottleneck now is the sort process.

Which process to sort in GPU do you use? I use the bitonic strategy of Kipfer and Westerman. Do you know any more efficient?¿