glTexImage2D PBO Uploads take a lot of CPU time?

I’m doing some GPGPU computing using Opengl and a Nvidia Quadro 2000M card.

While profiling I noticed that my glTexImage2D calls for texture uploads from PBOs are taking 20% of the cpu time, as much as the time spent copying over all the data to the PBOs.

glTexImage2D(GL_TEXTURE_2D, 0, 1, width_, height_, 0, GL_RED, GL_UNSIGNED_BYTE, NULL);

Where

glPixelStorei(GL_UNPACK_ALIGNMENT, 8);
glPixelStorei(GL_PACK_ALIGNMENT, 8);

I don’t understand why this is taking so much cpu time? Shouldn’t it just start the asynchronous DMA transfer to the GPU?

Have you tried with other alignments than eight bytes?

If you know texture size before you upload, then better create texture with glTexImage2D, then use PBO + glTexSubImage2d combo to update texture data. Another issue can be GL_RED. I think it is not accelerated (suitable for GPU) format, and driver must perform on-the-fly conversion to something useful for GPU. Check this http://developer.nvidia.com/content/nvidia-opengl-texture-formats