OpenGL for old software

I’ve been migrating some old software over to new hardware and have run into some difficulties. Minimizing changes to the way the software works is top priority so most of the work has been focused around emulating the way the old hardware worked, allowing the software to think it’s still running on the old hardware.

My OpenGL issue is in the implementation of the old memory mapped display that was being used. The old graphics system consisted of a monochrome 200x300 led display that could cycle between two buffers in memory. Each bit in a buffer corresponds to a pixel, either it is on or off. The display could be cycled between the two buffers at a software controlled rate, effectively doing things like making text flash on screen.

This functionality is almost identical to SwapBuffer() in a double buffered scheme. The problem is that SwapBuffer() leaves the back buffer undefined instead of preserving the buffers and actually flipping between them. Otherwise this would have been the solution to my problem.

The old software is multi-threaded, with up to 4 threads manipulating bits in both of the display buffers to draw lines, rectangles, text, etc. This part of the software cannot be changed and it must continue to be able to draw from multiple threads.

I’ve already integrated openGL into the software to the point where I have 4 threads, each with an openGL context bound to the display. The problem is that I don’t have two buffers to work with, all rendering is currently going to GL_FRONT.

The newer hardware is limited too, glxinfo indicates that I do not have FBOs, PBOs, stereo buffers, or auxiliary buffers.

My solution thus far has been to create two textures which are shared between all 4 threads and represent the two buffers, each of size 256x512 to accomodate a 200x300 area. Instead of writing to bits in a buffer, the software has been modified to write to pixels in a texture using glTexSubImage2D. The new hardware takes the texture and stretches it to a 768x1024 display. I will likely end up using double buffering to minimize tearing and flicker.

One problem I have with this method is that turning a pixel on is not instantaneous like it would be with the old hardware. The display doesn’t update until the texture is redrawn. I can overcome this by redrawing the texture repeatedly during its ‘cycle time’. But I have some concerns over the performance cost of trying to quickly refresh a full screen texture. The newer hardware is a very low end machine but it at least has direct rendering.

The second problem is that I’m not sure if this is the best way to do it. I’ve heard glTexSubImage2D can be sluggish in comparison with rendering primitives and I need to keep cpu usage to a minimum while still getting responsive graphics.

In summary, I need to be able to render to two buffers from multiple threads and be able to flip what is currently seen on the screen between the two buffers. The contents of the buffers must remain intact.

Is there a better/more efficient/faster way of doing this?

Perhaps OpenGL isn’t suited to this? Framebuffer stuff should be handled using a framebuffer oddly enough rather than a polygon rendering pipeline.

This is currently running on both Windows and Linux. OpenGL was chosen for portability. Polygons are well suited to this task since it allows me to avoid having to specify each pixel on the 768x1024 display. The inability to truely swap between buffers is the problem and I would prefer to resolve it with OpenGL.

What’s wrong with SDL’s framebuffers? I believe SDL is pretty cross platform and far better suited to this task.

I believe SDL was avoided from the start due to not being thread safe and adding thread protection around SDL in the old software wasn’t feasible.

Do you see a framebuffer approach having better performance than the two texture solution I mentioned previously? The solution works, I’d just like to be sure I haven’t overlooked a faster method.

Well, as GL is meant to have its textures uploaded once and reused often, it doesn’t like their data being changed constantly; it’s not optimised for it. With SDL, its framebuffer is memory mapped.

If there’s a potential thread safety problem and you’ve got a nicely working GL implementation don’t bother; if it ain’t broke don’t fix it as they say.

On Vista/7/OS X/Compiz, the window manager is double buffered and v-synced anyway so you won’t be able to eradicate the delay issue even if you switched to a framebuffer.

How do these two methods compare in performance?

Initialize
-Designate two 200x300 areas on GL_BACK to act as two buffers
Drawing threads
-Draw to GL_BACK using polygons to update the buffers
Screen thread
-Alternate between the two buffers by setting raster position
-glCopyPixels from GL_BACK to GL_FRONT, also upscaling from 200x300 to 768x1024 using glPixelZoom

Initialize
-Create two 200x300 textures to act as two buffers
Drawing threads
-Draw to the textures by generating pixel data and glTexSubImage2D
Screen thread
-Alternate between the two buffers with glBindTexture
-Draw texture to GL_FRONT, stretching it to fullscreen

I can see some performance in sending vertices instead of pixel data to update the buffers in video memory, but I’m uncertain how glCopyPixels+glPixelZoom would compare to drawing a fullscreen texture when updating the screen.

Have you profiled your application to see if the items you have concerns about are actually causing any problems for you? It’s perfectly possible to run fullscreen video with OpenGL and without any performance issues, so the concerns you have may be totally unnecessary.

Performance of glTexSubImage2D is dependent on a lot of factors, but the single most important one is to pick a format and type for your parameters that match what is actually supported in hardware as closely as possible. What this means is that you should avoid formats like GL_RGB (despite the fact that you might think it “saves memory”) and use GL_BGRA instead. Your graphics driver will convert all formats to 4-component BGRA anyway, and you can save a lot of work in terms of driver overhead by just giving it the data in the format it wants to begin with.

GL_UNSIGNED_INT_8_8_8_8_REV can be 30 times faster than GL_UNSIGNED_BYTE on certain hardware (Intel) and is supported by everything that’s less than 10 or so years old, so use that in combination with GL_BGRA for a nice performance increase.

Make sure that the format and type you pick match those used in your original glTexImage2D call too otherwise the driver may need to do more conversion for you, which will most likely happen in software and slow thing down.

Don’t attempt to update a texture/draw it/update again/draw again in the same frame as you will hit synchronisation problems (the draw must fully complete before the update can happen). Instead buffer all updates into system memory first, then use a single glTexSubImage2D call followed by a single draw per frame.

Drawing directly to the front buffer is Evil on newer consumer hardware; the OpenGL spec requires it so it’s possible but you shouldn’t do it. Use a double-buffered context instead and it will just make your life a lot easier.

This approach should allow you to use a single texture backed up by a single system memory buffer (i.e. a simple static array). Updates are done directly to the system memory buffer, and the texture is only updated and drawn to screen before each SwapBuffers call. You should be able to retain 99% of your original application code unmodified this way, with only the routine that actually puts the end result on-screen needing to be changed. It should easily get 60 FPS on even low end hardware too.

Unable to profile at this time since the system won’t be put under a load until most of the bugs are taken care of. It may do just fine, but as a beginning opengl programmer, I want to try to be sure I haven’t completely overlooked a better solution. Also, I would like to minimize resource usage to allow room for future growth since the software/hardware will be in use for the next ~10 years.

Performance of glTexSubImage2D is dependent on a lot of factors, but the single most important one is to pick a format and type for your parameters that match what is actually supported in hardware as closely as possible. What this means is that you should avoid formats like GL_RGB (despite the fact that you might think it “saves memory”) and use GL_BGRA instead. Your graphics driver will convert all formats to 4-component BGRA anyway, and you can save a lot of work in terms of driver overhead by just giving it the data in the format it wants to begin with.

GL_UNSIGNED_INT_8_8_8_8_REV can be 30 times faster than GL_UNSIGNED_BYTE on certain hardware (Intel) and is supported by everything that’s less than 10 or so years old, so use that in combination with GL_BGRA for a nice performance increase.

I’ve attached the glxinfo for the system. Based on the glxinfo I had assumed I should be using GL_RGBA for my textures and the pixel data I generate for them. Should I be using GL_BGRA anyway?

 
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.2
server glx extensions:
    GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
    GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGI_swap_control, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_visual_select_group
client glx vendor string: SGI
client glx version string: 1.4
client glx extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_allocate_memory, 
    GLX_MESA_copy_sub_buffer, GLX_MESA_swap_control, 
    GLX_MESA_swap_frame_usage, GLX_OML_swap_method, GLX_OML_sync_control, 
    GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGI_video_sync, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGIX_visual_select_group, GLX_EXT_texture_from_pixmap
GLX version: 1.2
GLX extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_import_context, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_allocate_memory, 
    GLX_MESA_copy_sub_buffer, GLX_MESA_swap_control, 
    GLX_MESA_swap_frame_usage, GLX_OML_swap_method, GLX_SGI_make_current_read, 
    GLX_SGI_swap_control, GLX_SGI_video_sync, GLX_SGIS_multisample, 
    GLX_SGIX_fbconfig, GLX_SGIX_visual_select_group
OpenGL vendor string: Tungsten Graphics, Inc
OpenGL renderer string: Mesa DRI Intel(R) 852GM/855GM 20061017 x86/MMX/SSE2
OpenGL version string: 1.3 Mesa 7.0.4
OpenGL extensions:
    GL_ARB_imaging, GL_ARB_multisample, GL_ARB_multitexture, 
    GL_ARB_point_parameters, GL_ARB_texture_border_clamp, 
    GL_ARB_texture_compression, GL_ARB_texture_cube_map, 
    GL_ARB_texture_env_add, GL_ARB_texture_env_combine, 
    GL_ARB_texture_env_crossbar, GL_ARB_texture_env_dot3, 
    GL_ARB_texture_mirrored_repeat, GL_ARB_texture_rectangle, 
    GL_ARB_transpose_matrix, GL_ARB_vertex_buffer_object, 
    GL_ARB_vertex_program, GL_ARB_window_pos, GL_EXT_abgr, GL_EXT_bgra, 
    GL_EXT_blend_color, GL_EXT_blend_equation_separate, 
    GL_EXT_blend_func_separate, GL_EXT_blend_minmax, GL_EXT_blend_subtract, 
    GL_EXT_clip_volume_hint, GL_EXT_cull_vertex, GL_EXT_compiled_vertex_array, 
    GL_EXT_convolution, GL_EXT_copy_texture, GL_EXT_draw_range_elements, 
    GL_EXT_fog_coord, GL_EXT_histogram, GL_EXT_multi_draw_arrays, 
    GL_EXT_packed_pixels, GL_EXT_point_parameters, GL_EXT_polygon_offset, 
    GL_EXT_rescale_normal, GL_EXT_secondary_color, 
    GL_EXT_separate_specular_color, GL_EXT_stencil_wrap, GL_EXT_subtexture, 
    GL_EXT_texture, GL_EXT_texture3D, GL_EXT_texture_edge_clamp, 
    GL_EXT_texture_env_add, GL_EXT_texture_env_combine, 
    GL_EXT_texture_env_dot3, GL_EXT_texture_filter_anisotropic, 
    GL_EXT_texture_lod_bias, GL_EXT_texture_object, GL_EXT_texture_rectangle, 
    GL_EXT_vertex_array, GL_3DFX_texture_compression_FXT1, 
    GL_APPLE_client_storage, GL_APPLE_packed_pixels, 
    GL_ATI_blend_equation_separate, GL_IBM_rasterpos_clip, 
    GL_IBM_texture_mirrored_repeat, GL_INGR_blend_func_separate, 
    GL_MESA_pack_invert, GL_MESA_ycbcr_texture, GL_MESA_window_pos, 
    GL_NV_blend_square, GL_NV_light_max_exponent, GL_NV_texture_rectangle, 
    GL_NV_texgen_reflection, GL_NV_vertex_program, GL_NV_vertex_program1_1, 
    GL_OES_read_format, GL_SGI_color_matrix, GL_SGI_color_table, 
    GL_SGIS_generate_mipmap, GL_SGIS_texture_border_clamp, 
    GL_SGIS_texture_edge_clamp, GL_SGIS_texture_lod, GL_SUN_multi_draw_arrays

Vis  Vis   Visual Trans  buff lev render DB ste  r   g   b   a  aux dep ste  accum buffers  MS   MS
 ID Depth   Type  parent size el   type     reo sz  sz  sz  sz  buf th  ncl  r   g   b   a  num bufs
----------------------------------------------------------------------------------------------------
0x23 16 TrueColor    0     16  0  rgba   1   0   5   6   5   0   0   16  0   0   0   0   0   0   0
0x24 16 TrueColor    0     16  0  rgba   0   0   5   6   5   0   0   16  0   0   0   0   0   0   0
0x25 16 TrueColor    0     16  0  rgba   1   0   5   6   5   0   0   16  8   0   0   0   0   0   0
0x26 16 TrueColor    0     16  0  rgba   0   0   5   6   5   0   0   16  8   0   0   0   0   0   0
0x27 16 TrueColor    0     16  0  rgba   1   0   5   6   5   0   0   16  0  16  16  16   0   0   0
0x28 16 TrueColor    0     16  0  rgba   0   0   5   6   5   0   0   16  0  16  16  16   0   0   0
0x29 16 TrueColor    0     16  0  rgba   1   0   5   6   5   0   0   16  8  16  16  16   0   0   0
0x2a 16 TrueColor    0     16  0  rgba   0   0   5   6   5   0   0   16  8  16  16  16   0   0   0
0x2b 16 DirectColor  0     16  0  rgba   1   0   5   6   5   0   0   16  0   0   0   0   0   0   0
0x2c 16 DirectColor  0     16  0  rgba   0   0   5   6   5   0   0   16  0   0   0   0   0   0   0
0x2d 16 DirectColor  0     16  0  rgba   1   0   5   6   5   0   0   16  8   0   0   0   0   0   0
0x2e 16 DirectColor  0     16  0  rgba   0   0   5   6   5   0   0   16  8   0   0   0   0   0   0
0x2f 16 DirectColor  0     16  0  rgba   1   0   5   6   5   0   0   16  0  16  16  16   0   0   0
0x30 16 DirectColor  0     16  0  rgba   0   0   5   6   5   0   0   16  0  16  16  16   0   0   0
0x31 16 DirectColor  0     16  0  rgba   1   0   5   6   5   0   0   16  8  16  16  16   0   0   0
0x32 16 DirectColor  0     16  0  rgba   0   0   5   6   5   0   0   16  8  16  16  16   0   0   0
0x83 32 TrueColor    0     32  0  rgba   0   0   8   8   8   8   0    0  0   0   0   0   0   0   0