Seperate rendering thread for better performance?

Nighthawk · January 8, 2007, 7:29am

Hi,

i am currently redesigning the rendering engine of a medical simulator.

The target platform has a multi core cpu. 30 fps should be achieved, with the maximum time possible being available for simulation.

Currently the main loop looks like this:

simulate world(time variable)
render world(15ms)
swap buffers(2ms)

I would like to render while the simulation is running, so the main loop would look like this:

simulate world
copy world
wait for last rendering to finish
start in seperate thread: render the world copy and swap buffers

Will this approach improve performance? Has anyone tried something similiar?

Current drivers from ATI/Nvidia are supposed to use multithreading internally, how well does it work in practice?

Thanks for your comments…

zeoverlord · January 8, 2007, 9:45am

Yes it can improve performance, how much depends a little on how fast the simulation is.

Nighthawk · January 8, 2007, 10:47am

The simulation will take as long as possible;-)
So currently up to 15ms per frame(to reach the 30fps).

If more time is available for simulation we increase the simulation quality(mesh resolution etc.)

yooyo · January 8, 2007, 2:29pm

Yes… use threads to render calculated data in higher framerate. For example, simulation may work at 10fps, but rendering may work at 60fps. This mean, app will rendert 6x same frame but you have chance to change camera, or generally, better UI feedback. Something like:

simulation thread:
while (!bQuit)
{
 simulate();
 lock_exchange_buffer()
 copy_solution_to_exchange_buffer();
 unlock_exchange_buffer();
}

rendering thrad;
while (!bQuit)
{
 update_camera_etc();
 lock_exchange_buffer()
 render_slotion_from_exchange_buffer();
 unlock_exchange_buffer();
 SwapBuffers();
}

jide · January 8, 2007, 3:11pm

It can or it cannot.

This depends on many factors. Mainly how the synchronization between threads are made will affect the result. From the tests I made years ago (on single cpu only), the rendering was almost twice slowler.

And this will depend on how your sim works. If all the sim loop needs to be fully finished before you can start the rendering (because you can’t know all the things to draw until the end of the loop for example), then the rendering thread will spend much time to wait for the sim thread (this might not be a real problem in your situation). But if the rendering thread could start rendering as soon as rendering data are providen, then less time will be lost.
This is all about homogeneity: threads need not to spend time waiting for the others, and one thread should not work much faster than the other (mainly the sim thread must not provide data too fast).

What’s actually come into my mind is that you can make a ‘shot’ of your sim data from your rendering thread as soon as it has finished the previous rendering. But results will depend on how fast you can do that shots. And you might not be able to do it, depending on how your sim works.

Here is a thread speaking about that:

http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=3;t=014902#000016

hope that could help.

Nighthawk · January 9, 2007, 8:11am

@yooyo:
We are considering this approach for a future project were we need a guaranteed 60fps.

In the current case it makes little sense since our camera is fixed mostly;-)

It is a view through a microscope, with tissue manipulated by instruments to give you an idea.
(Should have said that earlier, sorry)

@jide:
Twice as slow, ouch.

Just an idea:
Did you try to increase rendering thread priority?
Maybe this helps to keep the GPU fed with rendering commands. Most of the time, the rendering thread should be idle waiting for the GPU to finish drawing etc.

I dont think the “snapshot” idea would work well here. The sim should have finished before starting to render, since most geometry is dynamic(with triangles being added/deleted).

Anyway, thanks for the link… My idea was was the same as in the apple doc, but on application level instead of driver level.

I see one problem with driver level multithreading: you get no feedback wether the driver has to wait for a particular openGL command or not. For example, glReadpixels() will wait for all pending rendering to finish before returning. But what about making a pbuffer current or uploading a texture? In theory it is possible to do the real work in another thread but does it really happen?

I think i will try the application rendering thread, another question:
When copying the world for the rendering thread, would you
a.) upload the vertex data directly to the VBOs (blocks the sim thread longer)
b.) copy it to another memory location and upload it in the rendering thread? (more overhead)

Thanks for your input so far…

jide · January 9, 2007, 10:20am

If your sim needs to fulfill the work before the rendering can start its job then I’ll suggest you that the sim thread send a message to the rendering thread, telling it can render now. This is the best choice from what I know. A message like waiking up the rendering thread that will sleep again when rendering call are all made.

I was under Linux, and still is under Linux. Increasing the priority wouldn’t be of help since X runs on low priority (default user one). I tried to run X on real-time priority but faced several issues. I haven’t tried it since this time. I’m just waiting for having a SMP machine on my own

The results I had were normal since I was on single cpu and so, only a portion of the work was done before the thread switched. They didn’t have had time to fulfill and were switched by the OS scheduler.

;

If the driver has to wait for a command to finish, then the command won’t return until the GPU finishes its work. Otherwise, commands will pend until an action tells the driver to fulfill its work.
Parallelism is quiete special, and specially between GPU and CPU.

Your last question is just about what I was telling in my previous post. What I suggest is simply to the sim thread to update the data into new memory location (keeping the initial data unaltered). Obviously (hopefully in fact), the rendering thread will have less work to do (you’ll have to know that point), so the rendering thread will freelly be able to use the data as soon as the sim thread has finished.
On the thread I stippled, one guy said it’s possible to do that work without any synchronization at all. But to my point of view this really depends on how fast threads are running. If your rendering thread can process as fast or faster than the sim thread will update the data, then this will greatly be of help, that turns out. But if this is the contrary, then you’ll have bad things in the rendering.
So using dynamic VBO might be of help.

For you, I guess this depends on how the simulation needs to be good. If the rendering only shows how the seems behave, then don’t block the sim thread. Otherwise, block it.

Hope that could help.