non power of two texture loading slow

shivmitra · July 5, 2011, 12:48am

i m trying to get 2 image frames from 2 diff. videos and then rendering both of them on the opengl screen to get an effect of combined video . For rendering purpose i m using glBuildMipmaps
as in wondows only this allows to have NPOT(non power of two) textures . this is awfully slow (around 60 ms) . then i used glTexImage with gluScaleImage but that also wasnt much improvement . I cant make video POT , so what can i do for faster rendering .

i came across following text on nehe website http://nehe.gamedev.net/data/lessons/lesson.asp?lesson=35

text -
Kevin Rogers Adds: I just wanted to point out another important reason to use glTexSubImage2D. Not only is it faster on many OpenGL implementations, but the target area does not need to be a power of 2. This is especially handy for video playback since the typical dimensions for a frame are rarely powers of 2 (often something like 320 x 200). This gives you the flexibility to play the video stream at its original aspect, rather than distorting / clipping each frame to fit your texture dimensions.

It’s important to note that you can NOT update a texture if you have not created the texture in the first place! We create the texture in the Initialize() code!

I also wanted to mention… If you planned to use more than one texture in your project, make sure you bind the texture you want to update. If you don’t bind the texture you may end up updating textures you didn’t want updated!

mhagain · July 5, 2011, 2:19am

Much of the info on NeHe is hopelessly outdated nowadays.

glTexImage2D needs to respecify the entire texture, which may include reallocating video RAM for it, reallocating system RAM for a system memory backup copy, moving other objects around to make room, texture swapping, etc. Slow for run-time usage. (Some drivers may have a fast path if they can detect that only the data changes, but that would be very implementation-dependent and shouldn’t be relied on.)

Your initial creation can be done with a NULL data parameter, which just specifies the texture object but does not initialize any data. You must call glTexSubImage2D to specify the data for it before you can use it.

glTexSubImage2D updates an already specified texture in-place. No reallocation, no messing, all that happens is data gets transferred from CPU to GPU. Faster (but see the next paragraph).

If the texture is in use (i.e. something is being drawn with it) at the time you do the update, your driver will need to stall the pipeline and wait until drawing has completed before the update can be done. Similarly if an update is needed before you draw with the texture your driver will need to stall and wait until the update has fully completed before it can draw. Slow. Techniques such as double-buffering or using a PBO can help with this.

Getting data to the GPU from the CPU as fast as possible is another primary bottleneck. Using formats such as GL_RGB will not help you here as the internal representation will (most commonly, unless you have strange hardware) be a 32-bit format. Your driver will need to convert from GL_RGB to this 32-bit format as part of the upload process. On some drivers this can be hellishly slow. GL_BGRA is typically the fastest format, and on some hardware a type of GL_UNSIGNED_INT_8_8_8_8_REV makes things even faster. I’ve personally seen texture upload performace go up to 30 or 40 times faster by making these simple changes.

Think that covers more or less everything.

shivmitra · July 5, 2011, 2:56am

thanks …
i will try to implement them in my program …

shivmitra · July 5, 2011, 3:44am

can u point me to some tutorial or some code snippet that uses PBO for textures . I tried on google bt couldnt find something that i understood

mobeen · July 5, 2011, 4:55am

http://www.songho.ca/opengl/gl_pbo.html
ALso have a look through the wiki page http://www.opengl.org/wiki/Pixel_Buffer_Object

mhagain · July 5, 2011, 6:11am

The songho page is pretty good, I second that recommendation, plus it provides a sample app that you can run and verify the performance difference for yourself.