PDA

View Full Version : High Poly Object rendered very slow



miujin
04-26-2011, 02:01 AM
Hi,

I try to write a viewer for 3D scan data, but when I load a high-resolution scan, it's very slow.

I'm using one VBO for the complete scan, one FBO with multiple render targets (2 at the moment), shader for lightning and picking. And I don't know where the problem or bottleneck is or isn't it possible to render 2 million triangles in real-time without using some optimization techniques like Level of Detail objects or special storing structures in the vbo (fan/strips).

My question is what are the main bottlenecks? So that I can check if I have some bottleneck in my program. Or is there a special OpenGL state which I can activate with glEnable(GL_FAST_RENDERING) :-D

I've read something and I think the problem could be:
- no Level of Detail objects
- to huge data for one VBO
- to many API calls for VBO using
- bad structure in the Vertex Array

Before I change to many code I want to know your opinion if I am on the right way to solve the problem or maybe I forget one important step.

ZbuffeR
04-26-2011, 02:19 AM
Have you read this : http://www.opengl.org/wiki/Performance ?
The reduce-the-size-of-window test is simple to do and will help find the potential bottlenecks.

By the way, 2 million * 60 per second is quite a lot, what hardware do you have, and how many milliseconds per frame does it currently takes to render ?

miujin
04-26-2011, 03:04 AM
I've read the article.

When I reduce the size of the window there is no performance boost.

I know that 2 million triangles are a lot, but I can display the object with programs like Geomagic in realtime. My test system has a Gefoce GTX 275. And a colleague has written a program with DirectX and XNA and has no problem to display 2 million triangles.
For the 2 million scan my program needs about 500 ms.
I think either there has to be a huge bottleneck or XNA/Geomagic use some optimization techniques.

ZbuffeR
04-26-2011, 03:33 AM
Indeed such a card should be able to do 10x better.
Show your code :)
Did you try to look at your mesh with http://meshlab.sourceforge.net/ ?

miujin
04-26-2011, 04:36 AM
At the moment i work with a 1 280 000 triangle mesh. My program render it with 1-2 fps, Meshlab is rendering it with 7.5 fps.

Here you have the important parts of my code:
code (http://codepad.org/JG8taZ4r)

And here my shader programms
shader (http://codepad.org/MHVRFN6X)

_arts_
04-26-2011, 04:48 AM
Why do you enable/disable attrib arrays for each vbo ?

Did you tried to use display lists to compare the rendering time ?

Also, it might be slow because you're using java ?

miujin
04-26-2011, 04:58 AM
Why do you enable/disable attrib arrays for each vbo ?

Because I want that every VBO can use different attrib arrays, for example one VBO use only Vertex and Color Array and another uses NormalArray (I don't put that to the code, because that would make it to complex).


Did you tried to use display lists to compare the rendering time ?

I've tried display lists, but there wasn't a better result.


Also, it might be slow because you're using java ?

I'm using C#, but I don't think, that using such VM languages like java or C# isn't the problem, because my colleague also uses C#.

_arts_
04-26-2011, 05:05 AM
So, use vertex pointer each time after a new vbo binding.

miujin
04-26-2011, 05:33 AM
???
Did you mean, that I have to set the vertex, normal etc. after

GL.BindBuffer(BufferTarget.ArrayBuffer, vbo.V_ID);

in the DrawVBO method?

I set the all the pointer after creating the vbo in AddRenderObject, this happen at line 112.

mhagain
04-26-2011, 05:40 AM
Why not change your shaders to simple pass-through shaders and see if that has any effect on performance? At the very least it will help a lot with isolating potential causes of performance bottlenecks here.

Agreed that the EnableClientState/DisableClientState calls should go outside the for loop, but I don't think they're going to be a huge issue here (although that largely depends on how many times you're going through the loop).

I'm a little suspicious about that "if" in the fragment shader too; that could be moved to the vertex shader or possibly removed altogether which wouldn't hurt either way. Your IAmbient calculation could also move to the vertex shader or preferably become a uniform - that's unnecessary overhead.

_arts_
04-26-2011, 05:44 AM
???
Did you mean, that I have to set the vertex, normal etc. after

GL.BindBuffer(BufferTarget.ArrayBuffer, vbo.V_ID);

in the DrawVBO method?

I set the all the pointer after creating the vbo in AddRenderObject, this happen at line 112.

Yes, definately. Vertex pointer and so on are active for the last bound vbo. Doing like you do result in unknown behaviour:



The array's buffer binding is set when the array pointer is specified. Using the vertex array as an example, this is when VertexPointer is called. At that time, the current array buffer binding is used for the vertex array. The current array buffer binding is set by calling BindBufferARB with a <target> of ARRAY_BUFFER_ARB. Changing the current array buffer binding does not affect the bindings used by already established arrays.

BindBufferARB(ARRAY_BUFFER_ARB, 1);
VertexPointer(...); // vertex array data points to buffer 1
BindBufferARB(ARRAY_BUFFER_ARB, 2); // vertex array data still points to buffer 1


cf: http://www.opengl.org/registry/specs/ARB/vertex_buffer_object.txt

McLeary
04-26-2011, 11:13 AM
Why not change your shaders to simple pass-through shaders and see if that has any effect on performance? At the very least it will help a lot with isolating potential causes of performance bottlenecks here.

My suggestion is to actually disable the shaders and see what happens.
Another test I would try is to use a small mesh for tests purposes, like the Stanford Bunny or Dragon. Another thing is to disable vertical sync in order to let OpenGL render as much frames as it can. Even with such a small frame rate, I would try to disable vsync to be sure this is not a bottleneck.

miujin
04-27-2011, 02:03 AM
When I disable the shader, the program runs a little bit faster, but nut fast enough. The scene is still jerking.
For testing I'm using some smaller meshes (40.000-400.000 triangles) and there the program is fast enough.
Disable VSync has also no effect.

dorbie
04-27-2011, 04:39 PM
You're sending full float for everything including rgba twice!!.

There is no indication you using indexes in any way. You're just using a draw arrays.

If this is a poly soup model (triangles not tristrip) you're even worse off because primitive type is wasteful, in your case it's data dependent (that's a bad thing because some data will be really slow in your software and faster in better software).

Tristrip will get you 3X performance, indexes exploiting cache coherence can easily double that or more on big meshes. Not sending a whole load of data per vertex you don't need as full float will give you a further possible bandwidth boost but it depends at that time where you're bottlenecked.

Currently you have very basic "get it on the screen" code. You can easily get 3x - 6X (or more) the performance through indexed cache coherent tristrips and data packing improvements.

Index it, rationalize the verts and send it through nvtristrip then re-sort and re-index for VBO access order and render it with drawelements. Then clean up your packed vertex data to reduce vertex in-memory size.

dorbie
04-27-2011, 04:44 PM
Most of the suggestions in this thread miss the core problem with your graphics code. You WILL get major improvements implementing my suggestions. Fiddling around elsewhere will have only a marginal impact on performance. Unfortunately restructuring to incorporate an indexed tristriper takes a bit of work and incurs a startup cost for optimizing the data.

Let us know what your results are.

dorbie
04-27-2011, 04:52 PM
P.S. it has been observed that simply sorting indexed triangles to exploit cache residence can have a massive performance boost and may even be faster than tristripping due to fewer ordering constraints. You could index, sort for adjacency (vertex order cache residency) and sequencing and draw as indexed triangles. This is all the more likely to work for you because your per-vertex data payload is massive compared to the index overhead.

YMMV from platform to platform though depending on implementation, indexed tri strips may still be preferred on some targets.

dorbie
04-27-2011, 05:08 PM
P.P.S. "BeginMode.Triangles" suggests you're drawing this as triangles (I first assumed this was data driven but it's probably a C++ wrapper definition of GL_TRIANGLES), so this confirms that the code is doing at least 3X the work it needs to and a lot worse as described above. All advice above still stands, this just reinforces the observation w.r.t. primitive type.

miujin
04-28-2011, 05:16 AM
Thank you for the answer.
So the problem is the structure of my VBOs, which I've assumed, too. So I know try to optimize the structure and tell you if the result.

dorbie
04-28-2011, 11:55 AM
Thank you for the answer.
So the problem is the structure of my VBOs, which I've assumed, too. So I know try to optimize the structure and tell you if the result.

Not how I'd word it.

The problem is your primitive type with DrawArrays rather than DrawElements with cache coherent indices and all the implications that has for the way you need to structure your data to make the latter work. You will need to dispatch with DrawElements and tristrip with a cache coherent tri-stripper like nvtristrip. Secondarily your per vertex payload is unnecessarily large storing 2 vec4 floats purely for color. i.e. 32 bytes per vertex just for color.

VBOs are simply a graphics storage mechanism that hints at graphics residence and/or non volatility to the driver after dispatch. Useful and important, but I think OK in your code.

miujin
04-29-2011, 04:26 AM
Now I implement an algorithm to delete duplicate vertices and create a IndexBuffer and the program runs really fast. But know my picking doesn't work anymore. I'm using color picking and so I have to render every triangle in a unique color. I've used the secondary color for this information, but know I can only hold one color information for one vertex. So how can I render every triangle in a unique color with an IndexArray?
I could make it with the shader with some bit shifting or a 1D texture, which holds the Index coded as an RGBA value.
Or is there a better way?

Alfonse Reinheart
04-29-2011, 04:43 AM
But know my picking doesn't work anymore. I'm using color picking and so I have to render every triangle in a unique color.

Then you need to have an array of colors that matches the array of positions, such that each triangle gets a separate color when rendered. Then, turn of interpolation, so that each triangle only uses that color. This array would only be used for picking.

skynet
04-29-2011, 05:37 AM
Then you need to have an array of colors that matches the array of positions, such that each triangle gets a separate color when rendered. Then, turn of interpolation, so that each triangle only uses that color. This array would only be used for picking.

That would oppose his efforts to remove duplicate vertices and makes it _really_ awkward to create an index/vertex arrangement which makes sure that the provoking vertex of each triangle is not shared by other triangles.

Maybe using gl_PrimitiveId() in the fragment shader would already help him?!

dorbie
06-05-2011, 06:47 AM
Even with flat shading when you use indexing that shares vertices triangles will share an invoking vertex and therefore color.

You could have two draw modes, one for rendering and one for picking.

The other way is to know the complete set of triangles invoked by a colored vertex so that when you get a pick result you can then narrow it down to a very few triangles and you draw those triangles only (with unique colors) to complete your pick.

RefleX
06-07-2011, 12:47 AM
Don't know if this has been mentioned but you can optimize your index buffer for the post transform cache (http://www.opengl.org/wiki/Post_Transform_Cache) using this library (http://code.google.com/p/vcacne/). I've also read that you can restrict your vertex buffer size and split it up to work best on certain hardware.