PDA

View Full Version : Drawing millions of objects



AndrewC
12-05-2010, 09:45 PM
Hey all,

I've just started coding in OpenGL, since I found that software rendering is just too slow for what I need.

Basically I'd like to render a scene with millions of objects; specifically, tiny quads. There could potentially be 10s or 100s of millions. With my initial software-only implementation, I an into performance issues. I implemented a simple OpenGL-based render, but I noticed that

1) Loading is even slower than the software implementation
2) There is little performance improvement when panning and zooming

I'm wondering if there are some basic improvements I can try to speed things up. All I'm doing now is reading the data into an array at startup, and in my paint routine, just looping through each point and plotting it in the scene. Here's my pseudocode:


Init function:
Load data into memory

Render function:
Clear screen
Load identity matrix
Perform transformations
foreach point in collection
set appropriate colour
plot point using Vertex2
Swap buffers

All the quads are usually quite sparse, although there may be cases when they are close together.
I'm using C# with OpenTK.

Does anyone have any suggestions on how I can speed things up?

Arkaid D.
12-05-2010, 11:36 PM
First: Are these quads all the same? (ie: each quad is similar in shape to the others), then considering creating a display list for a single quad and then calling it, instead of using direct mode. ( http://nehe.gamedev.net/data/lessons/lesson.asp?lesson=12 )

Next, calculate the shape and position of the view frustum and clip out unnecessary quads. ( http://www.lighthouse3d.com/opengl/viewfrustum/index.php?intro )

You can also clip out quads that are too far away from the camera and would look too small / insignificant on screen.

aqnuep
12-06-2010, 01:24 AM
I would recommend you not to use immediate mode (i.e. vertex2 and stuff). Rather put your mesh data to a VBO and possibly use geometry instancing. These could help a lot.

AndrewC
12-06-2010, 07:41 PM
First: Are these quads all the same? (ie: each quad is similar in shape to the others), then considering creating a display list for a single quad and then calling it, instead of using direct mode. ( http://nehe.gamedev.net/data/lessons/lesson.asp?lesson=12 )

Yes, all quads are identical in shape, but not colour. Looks like display lists are exactly what I need. So the basic premise is, I create the quad once, it gets stored in (the graphics card's?) memory, and whenever I wanna draw it, the card already knows how?



Next, calculate the shape and position of the view frustum and clip out unnecessary quads. ( http://www.lighthouse3d.com/opengl/viewfrustum/index.php?intro )

I'm drawing on a 2-dimensional surface, and they're all on the same surface, so there's never any chance that some will be clipped by the frustum unless the user pans/zooms.


You can also clip out quads that are too far away from the camera and would look too small / insignificant on screen.
All quads will be at a constant size regardless of zoom level.


I would recommend you not to use immediate mode (i.e. vertex2 and stuff). Rather put your mesh data to a VBO and possibly use geometry instancing. These could help a lot.

How is a VBO different from a display list? It sounds like the same thing-- objects are created in the graphics card's memory and called up when necessary.

Alfonse Reinheart
12-06-2010, 07:56 PM
Yes, all quads are identical in shape, but not colour. Looks like display lists are exactly what I need. So the basic premise is, I create the quad once, it gets stored in (the graphics card's?) memory, and whenever I wanna draw it, the card already knows how?

... Not the way you mean to do it.

If you're going to use a display list for this, then you put all of your quads in it. Not one. You do not want to have millions of glCallList calls with calls to glColor between them.

If your field of quads is static (neither the position nor the color of any of the quads changes), and no other material parameters (shader state, etc) changes between different quads, then you simply put them all in a single display list and call glCallList once.

AndrewC
12-06-2010, 08:39 PM
... Not the way you mean to do it.

If you're going to use a display list for this, then you put all of your quads in it. Not one. You do not want to have millions of glCallList calls with calls to glColor between them.

If your field of quads is static (neither the position nor the color of any of the quads changes), and no other material parameters (shader state, etc) changes between different quads, then you simply put them all in a single display list and call glCallList once.

Oh, damn.
The quads are all indeed static, but not the same colour (shade.) I'm using an 8-bit colour palette for a maximum of 256 colours, so I could make 256 display lists, one for each colour, right?

[edit]
Actually after re-reading that tutorial, it doesn't seem like I need to do that. I can just create a display list which contains the instructions on how to draw one quad, and then before calling the list just specify a colour.

Alfonse Reinheart
12-06-2010, 09:19 PM
The quads are all indeed static, but not the same colour (shade.)

What do you mean by that? The color of the quad is part of its data. The 4 vertex positions and 4 vertex colors make up its data. Does this data change? Does the color of a quad at a given position change?

If not, then just put all of the quads in a single display list.


Actually after re-reading that tutorial, it doesn't seem like I need to do that. I can just create a display list which contains the instructions on how to draw one quad, and then before calling the list just specify a colour.

This is specifically what you don't want to do.

AndrewC
12-07-2010, 06:12 AM
Ok, I put all the points in the display list as suggested.

Panning performance is now improved, but now zooming suffers. This is because every time I zoom, I have to rebuild the display list and space the quads farther apart or closer together, depending on how the user has zoomed.

In a test case with 3.7 million quads, it takes nearly 4 seconds to rebuild the list.

Memory usage is also much higher than direct mode (as expected) since in addition to locally storing all the points from the file, I also need to store the display list.

With one set of test data, in direct mode the application uses ~93MB. With a display list, it uses ~425MB, although usage peaked at ~700MB.

This seems awfully high; I'm wonder how point cloud viewing software works? They can display tens of millions of points or more with relative ease. Here's an example: http://www.youtube.com/watch?v=vwrJxBrni8M

With my meager 3.7 million point (quad) dataset, if I jump up to hundreds of millions like in that video, even the most powerful computer will choke.

ZbuffeR
12-07-2010, 07:49 AM
but now zooming suffers. This is because every time I zoom, I have to rebuild the display list and space the quads farther apart or closer together, depending on how the user has zoomed.

In a test case with 3.7 million quads, it takes nearly 4 seconds to rebuild the list.
No wonder it does. Why you need to rebuild the list again ? Just change the GL_MODELVIEW_MATRIX, no need to rebuild.
Anyway, if you prefer having more control about the data optimization, and less compile time, go for VBO.

mhagain
12-07-2010, 09:52 AM
I'm using an 8-bit colour palette for a maximum of 256 coloursDoes this mean that you're running in color index mode? If so, you should be aware that it's very unlikely to be hardware accelerated, and you should run in RGBA mode instead.

_arts_
12-07-2010, 12:43 PM
Alfonse Reinheart, I don't agree. Maybe I'm wrong, but making a single display list with millions of quads, is a big mess. Doing a single display list storing one quad, then rendering it billions of times with defining its color before (inside one or several for loop(s) statements) looks far better for me.

ZbuffeR
12-07-2010, 01:10 PM
Yes you are wrong. DL is interesting when doing few GL calls, keeping a maximum of stuff near the GPU or at least in the driver.
Doing a GL call for each quad is a complete waste.

mhagain
12-07-2010, 03:40 PM
Alfonse is very right here. Every OpenGL call incurs a CPU overhead, which may be small or may be large, depending on factors such as what the call is, how full the command buffer currently is, how good (or bad) your driver is, and so on. The fewer of these calls you make, the better your application will run. One call will always be better than even thousands of calls, never mind millions. Otherwise your program will be badly CPU-bound, your GPU will not be performing efficiently, and you will never get good performance out of your program.

AndrewC
12-07-2010, 06:03 PM
No wonder it does. Why you need to rebuild the list again ? Just change the GL_MODELVIEW_MATRIX, no need to rebuild.
Because I want to keep the same quad size regardless of zoom level, and all quads are stored in a single display list. If I have a single quad in the display list and call it that many times, it's just as slow as rendering directly.


Anyway, if you prefer having more control about the data optimization, and less compile time, go for VBO.
I've read a bit about VBOs but I don't quite understand them. How are they different from display lists?



I'm using an 8-bit colour palette for a maximum of 256 coloursDoes this mean that you're running in color index mode? If so, you should be aware that it's very unlikely to be hardware accelerated, and you should run in RGBA mode instead.
I'm unfamiliar with how to change the palette mode, so I'm running with whatever is the default (I'm guessing RGBA.)


Alfonse Reinheart, I don't agree. Maybe I'm wrong, but making a single display list with millions of quads, is a big mess. Doing a single display list storing one quad, then rendering it billions of times with defining its color before (inside one or several for loop(s) statements) looks far better for me.

Yes you are wrong. DL is interesting when doing few GL calls, keeping a maximum of stuff near the GPU or at least in the driver.
Doing a GL call for each quad is a complete waste.

Alfonse is very right here. Every OpenGL call incurs a CPU overhead, which may be small or may be large, depending on factors such as what the call is, how full the command buffer currently is, how good (or bad) your driver is, and so on. The fewer of these calls you make, the better your application will run. One call will always be better than even thousands of calls, never mind millions. Otherwise your program will be badly CPU-bound, your GPU will not be performing efficiently, and you will never get good performance out of your program.
I found that putting all points into a single display list is MUCH faster than make a display list with just one quad and calling it millions of times.


Ok some more info:
I'm basically making a point cloud viewing tool, albeit a very primitive one. So I'm going to need some kind of optimizations in order to be able to display tens of millions of points at once. The reason I've been using quads is because point size can change, so unless I can tell OpenGL that a point is 5x5 pixels, I need to use a quad instead.

[edit]
Jeez I feel like an idiot. Apparently there IS a way to change the point size-- glPointSize. Why didn't I think to check this before? Anyways, I'm gonna try experimenting with this to see what kind of results I get. If anyone has any optimization techniques, I'd love to hear them.

_arts_
12-08-2010, 02:47 AM
ZbuffeR and mhagain, thank you for letting me know about this.

ZbuffeR
12-08-2010, 05:41 AM
The reason I've been using quads is because point size can change, so unless I can tell OpenGL that a point is 5x5 pixels, I need to use a quad instead.
You can do that in the vertex shader : send 4 vertices at the same position, and 'inflate' them in realtime according to your need.
It will still work with a display list or a static VBO.

glPointSize may do the job, just be aware it has limitations.

Dark Photon
12-08-2010, 05:49 AM
glPointSize may do the job, just be aware it has limitations.
Right. For instance, if you want to batch points with different sizes together in the same draw call, you can do that using gl_PointSize (GLSL) with GL_VERTEX_PROGRAM_POINT_SIZE. You can make the points bigger that way too (IIRC).

david_f_knight
12-08-2010, 10:21 AM
Because I want to keep the same quad size regardless of zoom level....
This can also be easily implemented with the geometry shader; all you need then is a single vertex for each quad.

AndrewC
12-09-2010, 05:13 PM
What are the limitations of glPointSize? I just tried a quick test and it seems to do exactly what I need.

Also, could someone explain (or point me to a place that explains) vertex shaders, pixel shaders, and geometry shaders? These concepts are completely foreign to me.

Dark Photon
12-09-2010, 07:33 PM
Also, could someone explain (or point me to a place that explains) vertex shaders, pixel shaders, and geometry shaders? These concepts are completely foreign to me.

This gives it to you in context:

http://www.opengl.org/wiki/Rendering_Pipeline_Overview

But basically think of it this way: When you give the GPU, say, a triangle to draw. It has to do some work to get that triangle on-screen. The work that needs to be done per vertex (a triangle having 3 vertices) gets run in a "vertex shader". And the work that needs to be done per pixel (or in general, sample) filled by the triangle runs in a "fragment shader" (D3D calls this a "pixel shader"). Those are the two main staple shaders. You can specify what happens in each of these stages by writing your own shaders.

Geometry shader is quite a bit more recent, and isn't near as commonly used, or as useful. It essentially lets you do "basic" dynamic generation of primitives on the GPU. For instance, you issue a draw call on the CPU with some POINTS primitives, and it turns those into QUADS primitives on the GPU. If you have a geometry shader in your program, it runs between the vertex shader and the fragment shader.

Zenja
12-16-2010, 02:53 AM
I'm suprised no one has suggest you use Point Sprites. They were specifically designed to allow you to draw a ridiculous number of textured quads with same dimensions, and you only need to pass a single position to the pipeline.

mhagain
12-16-2010, 03:33 AM
The primary limitation of glPointSize, point sprites, and point parameters that is relevant here is that they have an implementation-dependent maximum size. This means that the requirement to be able to zoom will most likely not be satisfied correctly. Otherwise they would be a fine solution.

aqnuep
12-16-2010, 04:58 AM
Agree, the point size limitation can be an issue, however depends on the target platform. ATI does support 8192 sized points for several GPU generations now and NVIDIA supports it as well in their lately appearing GPU generations.

Besides that, you can still generate point sprites using geometry shaders if that is an option. This solution does not suffer from any size limitation and the 1:4 vertex input-output ratio for geometry shaders is usually optimized in hardware (at least it is on ATI GPUs according to their programming guide).

david_f_knight
12-16-2010, 11:44 AM
... the 1:4 vertex input-output ratio for geometry shaders is usually optimized in hardware (at least it is on ATI GPUs according to their programming guide).
What ATI programming guide is that? You wouldn't happen to have an URL for it, by any chance?

aqnuep
12-16-2010, 11:53 AM
Sorry, I always suppose that everybody reads the same stuff as me :)
Here is the link:

http://www.amddevcentral.com/media/gpu_assets/ATI_Radeon_HD_2000_programming_guide.pdf

This is for the HD2000 series GPUs, however it is valid also for 3000 and 4000 series and mostly applies to later generations as well.

This is what it says:


When the geometry shader does 1:1 input/output, i.e. does no amplification or reduction, the shader can run with similar performance characteristics as a regular vertex shader, and similarly since the geometry shader is expected to be commonly used for replacing point sprites (by expanding point primitives to a triangle strip with two triangles) there is also special hardware for taking care of the 1:4 case.