glCopyTexSubImage2D slowness on Intel

Dear forum,

I’m having performance problems with glCopyTexSubImage2D on Intel graphics chips, and I can’t quite figure out why.

I recently retired my usage of pbuffers (since they aren’t supported on Intel hardware) and chose instead to render my off-screen stuff before I re-clear the screen and render my real scene, capturing my what I rendered “off-screen” to a texture with glCopyTexSubImage2D.

On the nVidia and ATi cards I’ve been able to try with, glCopyTexSubImage2D takes around 0.05-0.10 ms, which I think is reasonable, but on Intel hardware it tends to take 50 ms – three orders of magnitude more!

My first thought was that the framebuffer and texture formats didn’t match so that it had to do some expensive conversion, but that doesn’t seem to be the case. My framebuffer has 8 bits of red, green, blue and alpha, and I initialize the texture with glTexImage2D, using GL_RGBA for internalformat.

Has anyone else encountered this problem, or otherwise know what it is?

AFAIK on the Intel GPUs shared RAM is used for texture storage etc. so the kind of operations you are going to do are going to require lots of copying from client to server and back again…

Are you implying that it cannot be done on Intel hardware? In that case, how do people actually do off-screen rendering in a way that works on Intel?

If it is supported (I hope), you should use framebuffer objects. The way you are doing it is by far the slowest! :slight_smile: This is pretty normal, since you have to wait between the two passes and in addition clear the framebuffer between. Though if you swap buffers between the two passes it may be faster but I have never tried that and it looks very experimental especially on intel hardware! :slight_smile:

Unfortunately, neither FBOs nor pbuffers seem to be supported on Intel hardware.

Might I trouble you to explain a bit further why the way I’m doing it is slow, because I don’t really understand. According to my measurements, clearing the framebuffer hardly even takes any measurable time, and the only thing that actually takes much time (on Intel hardware, that is) is just the call to glCopyTexSubImage2D, which – correct me if I’m wrong – I would have to call even if I used pbuffers or FBOs, no?

Without pBuffers and fbo support you are right, it is hard to do better.
I was implying that this method is slow because, you have to perform several renderings sequentially. With pBuffers or fbo, which use additionnal buffers, it can be done asynchroneously thus in a faster way.
I though you were using glCopyTexSubImage2D to save your 1st pass data, am I right? Actually with fbo you render directly in a texture and you don’t have to perform some costly texture copy operations.

With pBuffers or fbo, which use additionnal buffers, it can be done asynchroneously thus in a faster way.

Huh. Can the card to that? I didn’t know that, but that certainly sounds useful. Thanks for the tip! It is unfortunate that I cannot use it here. :slight_smile:

Actually with fbo you render directly in a texture and you don’t have to perform some costly texture copy operations.

That also sounds useful. I thought they worked like pbuffers in that I’d still have to copy the rendered data into a texture. Again, it is unfortunate that I cannot use FBOs. :slight_smile:

To get back to the original issue, though; I cannot imagine that it’s supposed to take 50 ms even for an Intel card to copy the framebuffer into a texture. Does anyone have an idea what I might be doing wrong?

FBO is mostly not supported on Intel but pbuffers should be! try to check again for extensions. On the other hand glCopyTexSumImage2D is not/should not be that slow besides it allows for antialiasing in offscreen buffer which is hardly supported on Intel via FBO anyways. back to the problem glCopy performed always reasonable in my situations so there is nothing left to use and only glReadBuffer or glCopy the first one beeing extremly dead slow. You might try to update the driver, since honestly it is the only way out. I had to many problems, crashes, incorrect display which only driver updates could fix. Intel drivers suck on OpenGL, nothing new here.

Unfortunately, I am only all too sure that pbuffers aren’t supported. I know it’s hard to believe, but here are all the extensions that an Intel 945GM reports on Windows Vista:

GL_ARB_depth_texture, GL_ARB_multitexture, GL_ARB_point_parameters, GL_ARB_shadow, GL_ARB_texture_border_clamp, GL_ARB_texture_compression, GL_ARB_texture_cube_map, GL_ARB_texture_env_add, GL_ARB_texture_env_combine, GL_ARB_texture_env_dot3, GL_ARB_texture_env_crossbar, GL_ARB_transpose_matrix, GL_ARB_vertex_buffer_object, GL_ARB_window_pos, GL_EXT_abgr, GL_EXT_bgra, GL_EXT_blend_color, GL_EXT_blend_func_separate, GL_EXT_blend_minmax, GL_EXT_blend_subtract, GL_EXT_clip_volume_hint, GL_EXT_compiled_vertex_array, GL_EXT_cull_vertex, GL_EXT_draw_range_elements, GL_EXT_fog_coord, GL_EXT_multi_draw_arrays, GL_EXT_packed_pixels, GL_EXT_rescale_normal, GL_EXT_secondary_color, GL_EXT_separate_specular_color, GL_EXT_shadow_funcs, GL_EXT_stencil_two_side, GL_EXT_stencil_wrap, GL_EXT_texture_compression_s3tc, GL_EXT_texture_env_add, GL_EXT_texture_env_combine, GL_EXT_texture_lod_bias, GL_EXT_texture_filter_anisotropic, GL_EXT_texture3D, GL_3DFX_texture_compression_FXT1, GL_IBM_texture_mirrored_repeat, GL_NV_blend_square, GL_NV_texgen_reflection, GL_SGIS_generate_mipmap, GL_SGIS_texture_edge_clamp, GL_SGIS_texture_lod, GL_WIN_swap_hint

I’ve also checked on a 945G, a 915GM and a 855GM on Windows XP, as well as a 945GM on Linux with X.org, and none of them report the pixel_buffer extension.

This is just a software limitation. The same 945 supports both pbuffers and FBOs (and GLSL too) on the Mac OS X driver.

Just a guess, try GL_BGRA_EXT instead of GL_RGBA as internal texture format for glTexImage2D - maybe the driver is performing some slow conversion internally.