PDA

View Full Version : Multi-threading render?



Exoide
01-18-2010, 11:09 AM
Hi fellows!

I want to know if it's possible to render a frame using two parallel threads?

In my project I'm not using display lists because of the amount of memory they spend due to the amount of object I'm rendering (99500 or so with a total of 1350000 vertices) so I did a copy of the vertices once they're transformed and I use them to create the frame. Then I render the objects one by one iterating the array of objects.

It's OK so far but I want to increase the speed of render if possible and I think that threads is the last chance I have.

What I think is that I can create two threads and each one start in the half of the object's array so the render can be faster.

I've heard that Microsoft incorporated something like this in DirectX 11.

Now (in case the answer to the first question is yes) the question is:

The amount of work needed to achieve it is compensated with the increment of speed of the render?


Thank you.

Rosario Leonardi
01-18-2010, 01:06 PM
With windows you can share object between openGl context and use one context per thread, but you can't share the framebuffer, framebuffer object or pBuffer. So the answare in no, also the driver is tipically monothread, and this wont give you any advantage.

DirectX can put command on a "display list" and then execute these list on the main rendering thread.

Your problem is that your application is CPU limited?
Sending 99500 draw command is not a good idea, did you apply a frustrum culling before (you can do it on another thread), also you can pack in a single render call all the object with the same shaders parameters.

Exoide
01-18-2010, 01:26 PM
Hi Rosario,


With windows you can share object between openGl context and use one context per thread, but you can't share the framebuffer, framebuffer object or pBuffer. So the answare in no, also the driver is tipically monothread, and this wont give you any advantage.


OK. Then I'll abort the idea of multi-threading.


Your problem is that your application is CPU limited?

No. Actually my application is single thread so CPU time is not a limitation so far.

My problem (and maybe is not a problem) is when I render the scene.


Sending 99500 draw command is not a good idea

I'm not sending 99500 commands I'm sending 99500 objects that is a total of a little more than 1.350 million's commands and it takes about 4 seconds to render.

What I want is to reduce that time to around 0.5 seconds.


did you apply a frustrum culling before (you can do it on another thread), also you can pack in a single render call all the object with the same shaders parameters.

No I'm not applying a frustum culling because I don't know if it makes sense because I'm doing a viewport and I need to render the model every time the user pan it and I don't know what's the next user's move in the panning command.

Do you have any idea about how to boost performance in the render for this case?

Thank you for your help.

aqnuep
01-18-2010, 04:20 PM
With windows you can share object between openGl context and use one context per thread, but you can't share the framebuffer, framebuffer object or pBuffer.

Yes, you cannot share the framebuffer, but you can share textures attached to a framebuffer object, so in theory it is possible.


So the answare in no, also the driver is tipically monothread, and this wont give you any advantage.

It can actually sometimes provide you some benefit, but in this particular use case I don't think so. You can use to render shadow maps as an example in another thread that might provide more performance.

You should better try to reduce batching by using some advanced techniques.

Things that might provide you better performance if used wisely:
GL_EXT_multi_draw_arrays
GL_ARB_vertex_array_object
GL_EXT_texture_array
etc.

If you tell me more details, maybe I can give you better advices.

Rosario Leonardi
01-19-2010, 01:54 AM
.... with a total of 1350000 vertices)..

....is a total of a little more than 1.350 million's commands...
Ehm.. you are not using immediate mode? Don't you? I hope not.


No I'm not applying a frustum culling because I don't know if it makes sense because I'm doing a viewport and I need to render the model every time the user pan it and I don't know what's the next user's move in the panning command.

You don't have to care of the NEXT pan command, the frustum culling will be done when you already know the camera pose.

Brolingstanz
01-19-2010, 03:04 AM
Intel's TBB (thread building blocks) project may be worth a look.

Exoide
01-19-2010, 06:01 AM
Hi aqnuep,


Yes, you cannot share the framebuffer, but you can share textures attached to a framebuffer object, so in theory it is possible.

I was thinking yesterday about what you and Rosario said and I think it's better to render in the main thread.


You should better try to reduce batching by using some advanced techniques.

Things that might provide you better performance if used wisely:
GL_EXT_multi_draw_arrays
GL_ARB_vertex_array_object
GL_EXT_texture_array
etc.

Of course if these extensions let me reduce the batching I'll use them. I'll read one by one in detail to see how I can integrate them to my viewport.


If you tell me more details, maybe I can give you better advices.

Tell me what else you need to know to help me? I'm a newbie in OpenGL and I don't know what can be useful to you.

Thank you for your help.

Exoide
01-19-2010, 06:21 AM
Hi Rosario,


Ehm.. you are not using immediate mode? Don't you? I hope not.

Yes I'm. I tried before with display list but like I said before due to the amount of objects in use using display list is not convenient because the app uses 3 times the amount of memory used by the immediate mode, in fact I'm talking about 620MB of ram so I tried something different, actually I do all the transformation of every object and keep a copy of the transformed vertices of each object so I don't have to glPushMatrix() nor glPopMatrix() I use always the identity matrix for the model-view and incredibly it works almost as quickly as display lists. In fact the difference (using a profiler to compare times) is about 0.5 seconds so analyzing the tradeoff between use of memory and speed I chose immediate mode that actually only emit glVertex3dv() commands to the driver because no transformations are needed to render the objects.


You don't have to care of the NEXT pan command, the frustum culling will be done when you already know the camera pose.

Do you mean using frustum culling for the current frame? Something like having a list apart with the objects that will be rendered in the current frame?

If yes it sounds like a good idea because the speed is increased drastically for operations like drawing.

Thank you very much for your help.

Exoide
01-19-2010, 06:27 AM
Hi Brolingstanz,


Intel's TBB (thread building blocks) project may be worth a look.

It sounds good too but I'm currently doing the project using C#. I'll check if there's a ported library for the .NET framework.

Thank you for your tip.

Rosario Leonardi
01-19-2010, 07:46 AM
With immediate mode basically your video card is sleeping waiting for the glEnd(). This is what I mean with CPU limited.
Let's resume..
You have a lot of object with few polygon per object. With a total more then 1 million vertex (not very much for a modern GPU).
I have computer the final/world coordinate of the vertex and you are sending the triangles like that


glBegin(GL_TRIANGLES)
for(int i = 0; i < polyCount*3; i++)
glVertex3fv(vertexList[polyList[i]);
glEnd();

Where vertexList is an array of 1 milion float x 3
First solution: try again Display list
A display list like that will take
sizeof(float) * 3 * numVertex + some byte for glBegin/glEnd enconding, this should be 12 * numVertex ~> 16Mb
620Mb is too much, I have rendered larger mesh with DL without problem.
Note: for Chuck Norris's sake never use GL_COMPILE_AND_EXECUTE when you create a new display list.
Second solution: use VBO!! (or VAO)
Read this:
http://www.songho.ca/opengl/gl_vbo.html
Then.. place all your vertex in a VBO, place the index in another VBO and render like that:


frustrumCulling(objectList);
bind and setup VBO // tutorial explain how
for each( o in object)
{
if(!o.isOutsideFrustrum())
glDrawElements(GL_TRIANGLES, o.count, GL_UNSIGNED_INT, o.bufferOffset);
}
unbind the vbo //if you want

I hope the pseudocode is clear in this case you have the command to setup the vbo (4~10 commands) and then 69K drawElement-s

In this case you can also use glMultiDrawElements as aqnuep suggest.

Exoide
01-19-2010, 08:45 AM
Hi Rosario,

I understood your idea and the tutorial is very interesting I'll read it carefully.

Thank you once again for your help and your time.