Performance Issue: Drawing many primitives (>1M)

Hi all,

I’m developing 2D real-time data visualization application. My application shows data using OpenGL primitive types (e.g. GL_POINTS, GL_LINE_STRIP…).
I tried VBO, but, since I do not know number of point data, creating vertex buffers repeteadly become inefficient. Also, I thought, drawing 1M GL_POINTS in 1000px-900px screen does not make sense.

Is there any reducing technique for calculating if same pixel to be drawn already? If there is I could ignore many of them, because already I’ll not be able to show them on screen.

To sum up, I actually need some advices to draw big primitive type data. Any help would be appreciated.

Thanks.

[QUOTE=alicana;1265862]Hi all,

I’m developing 2D real-time data visualization application. My application shows data using OpenGL primitive types (e.g. GL_POINTS, GL_LINE_STRIP…).
I tried VBO, but, since I do not know number of point data, creating vertex buffers repeteadly become inefficient. Also, I thought, drawing 1M GL_POINTS in 1000px-900px screen does not make sense.

Is there any reducing technique for calculating if same pixel to be drawn already? If there is I could ignore many of them, because already I’ll not be able to show them on screen.

To sum up, I actually need some advices to draw big primitive type data. Any help would be appreciated.

Thanks.[/QUOTE]
Im afraid your description may be vague, How many primitives will you draw a time? for a one frame you may be not draw as many vertices as 1M, uniform shader can calculate them automatically.

are you meaning draw no the same points or store no the same points in vbo?
if you use depth test, that will increase display speed, as well as index list, but you cant reduce vertices number.

I tried VBO, but, since I do not know number of point data, creating vertex buffers repeteadly become inefficient.

Generally speaking, you would allocate a buffer that is “big enough” to handle some maximum amount of data. You don’t allocate based on what your variable data currently is; you allocate based on what it will ever be.

And if you go over the limit… do whatever should happen when you run out of memory.

Also, I thought, drawing 1M GL_POINTS in 1000px-900px screen does not make sense.

First, what resolution are you rendering to that only has 1000 pixels? Even a 100x100 image exceeds that.

More importantly… what does it matter? GPUs are very good at rendering stuff. And a million points, all rendered in a single draw call, is nothing to them. It’s when you try to render that in multiple draw calls that you have a problem.

Performance-wise, I’d be far more concerned about the cost of uploading a million points before the cost of drawing them. Using proper streaming techniques and such.

Don’t optimize before you have profiled. Before bothering to “optimize” this, you should make certain that it is your performance bottleneck.

Because odds are good that it isn’t.

[QUOTE=reader1;1265863]Im afraid your description may be vague, How many primitives will you draw a time? for a one frame you may be not draw as many vertices as 1M, uniform shader can calculate them automatically.

are you meaning draw no the same points or store no the same points in vbo?
if you use depth test, that will increase display speed, as well as index list, but you cant reduce vertices number.[/QUOTE]

Assume that an application, which is placed outside, sends point data sequentially. Point structure contains ID, (X,Y) position, color and point size information.
Also, this data can be reach 2 million. Obviously, its impossible that drawing 2M glVertex2f() between glBegin()/glEnd(). So, I tried to use VBO with initial capacity by giving 2M.

But, VBO approach does not allow me to add (resizing capacity), update or removing data from itself. (maybe VBO can be updated but I don’t know?)

Then, I thought that, insignificant to draw 2M point in a window. Most of them will not able to be shown (am I still right here?)

My following question is now:

  1. What would you do if you have big number of data to be drawn, that are added-removed sequentially?
  2. Is Fragment shader or vertex shader is vital in this situation ?

I really need some advice

The attribute arrays must be large enough to hold the data, but they can also be larger. The number of primitives drawn is determined by the “count” parameter to the draw call (glDrawArrays etc), not the size of the attribute array.

So, allocate a buffer which is “large enough”. If you ever find that it isn’t actually large enough, allocate a new, larger buffer.

If rendering order doesn’t matter, you can add a point by appending it to the end of the array and incrementing the variable holding the number of points, and can remove a point by copying the data for the last point over the data for the removed point then decrementing the variable holding the number of points.

If rendering order does matter, then you can use the above approach in conjunction with vertex indices (glDrawElements rather than glDrawArrays), so that you only need to order the indices rather than all of the attribute data. It may also help to segment the data by using e.g. glMultiDrawArrays (if each segment contains some free space, you only need to move other elements within the segments which are modified). Inserting or removing multiple elements at once is likely to be more efficient than operating one at a time (typically, you sort the modifications so that the memory can be accessed sequentially rather than randomly).

Not necessarily. From the few details you’ve provided, it doesn’t seem that shaders would even be useful.

Shaders can sometimes be useful for reducing the size of the data by eliminating redundancy, but that doesn’t appear to be the issue here.

But, VBO approach does not allow me to add (resizing capacity), update or removing data from itself. (maybe VBO can be updated but I don’t know?)

Data in a buffer object can be updated like this. Again, streaming techniques will be of benefit to you.

Then, I thought that, insignificant to draw 2M point in a window. Most of them will not able to be shown (am I still right here?)

I feel that you are far too concerned about what doesn’t get shown without having a profiling test to prove that what isn’t being shown is a performance problem. Do not optimize without proof.

What would you do if you have big number of data to be drawn, that are added-removed sequentially?

I would draw them, then measure the performance of the rendering. If the performance is adequate, I’m done. If not, profile it to find out where the performance is a problem.

  1. Is Fragment shader or vertex shader is vital in this situation ?

Vital? No. But I don’t consider them optional regardless of the circumstance.

Also, there are cases where they could conceivably ease your performance burden, for certain kinds of performance issues. For example, if memory transfer is a problem, you can use shaders to reduce your data size. Instead of giving each vertex an independent color, you could have a palette of colors (in a UBO, not a texture), and your per-vertex data only contains an unsigned byte per vertex containing that color’s index. This requires that the number of individual colors in the dataset is relatively small. And even then, it’s not clear without testing it if that would help.

But testing things like that is easier if you’re already using shaders than if you need to suddenly introduce them.

[QUOTE=Alfonse Reinheart;1265864]
First, what resolution are you rendering to that only has 1000 pixels? Even a 100x100 image exceeds that.
.[/QUOTE]
Ridiculous, What resolution are you sure to make up to recognize 1000pixels?
Typo or Sypo? not kidding? well, you must change your glasses you are wearing, or magnify letter M more larger, to see it is not K order.

More importantly… what does it matter? GPUs are very good at rendering stuff. And a million points, all rendered in a single draw call, is nothing to them. It’s when you try to render that in multiple draw calls that you have a problem.

Perfect English are you, but direct to another way. I like to hear about gpu as well.

[QUOTE=alicana;1265865]Assume that an application, which is placed outside, sends point data sequentially. Point structure contains ID, (X,Y) position, color and point size information.
Also, this data can be reach 2 million. Obviously, its impossible that drawing 2M glVertex2f() between glBegin()/glEnd(). So, I tried to use VBO with initial capacity by giving 2M.
[/QUOTE]
To draw 2M or 20M or 200M into memory is not a problem for todays device.
glBegin()/glEng() is really inefficient. instead you can use vbo as you said. which can add, delete and modify contents.
In fact, you could assign several vbo to content your big data requirement flexibly.

But, VBO approach does not allow me to add (resizing capacity), update or removing data from itself. (maybe VBO can be updated but I don’t know?)

To draw 2M or 20M or 200M into memory is not a problem for todays device.
glBegin()/glEng() or list is really inefficient or inconvinient. instead you can use vbo. which can add, delete and modify contents.
you may refer to glBufferData(), glMapBuffer(),etc.

Then, I thought that, insignificant to draw 2M point in a window. Most of them will not able to be shown (am I still right here?)

My following question is now:

  1. What would you do if you have big number of data to be drawn, that are added-removed sequentially?
    I really need some advice

you do certainly not draw up to 2M points in a window, you may make an error calculation, or they are not in a window(or frame). even if for 4k definition, you can not.
suppose you fill all the screen with primitives, the points will be far less than 300k at most.
But you hve to know, to draw them and to show them on the screen is totally not a same concept.
for example, if you would have two triangles, a vertex of one is covered by another one, you do certainly not lose this vertex for imagine to reduce store size. otherwise, the computer or gl will not construct this triangle. you will be shown only two points, or get another different one, or get a warning in build.

  1. Is Fragment shader or vertex shader is vital in this situation ?

of cuase not, agree to above fellows idea. if you don’t deal with rendering, ignore it.

First of all, thank you all for informative posts. I’m implementing vbo by expanding my vbo buffer. You all guys gave me very useful ideas.

I realized that, lets say I have allocated a buffer for 2 million:

==> 2*10e6 (data objects) * 2 (x, y position) * SIZEOF_DOUBLE (position value type)

And, I did not put any single value to the buffer. The following code still try to draw:


glBindBuffer(GL2.GL_ARRAY_BUFFER, vertexBufferIndices);
glEnableClientState(GL2.GL_VERTEX_ARRAY);
glVertexPointer(2, GL2.GL_DOUBLE, 3 * Buffers.SIZEOF_DOUBLE, 0);
glDrawArrays(GL2.GL_POINTS, 0, vertices[0]);

This is expected behaviour, isnt it ? Give me some courage :slight_smile:

I read some other users benchmark tests vbo vs immidiate mode they concluded there is no extreme difference.

http://stackoverflow.com/questions/430555/when-are-vbos-faster-than-simple-opengl-primitives-glbegin

http://www.gamedev.net/topic/574242-vbo–immediate-benchmark/

Never do this. You cannot use doubles without using shaders. And you cannot use doubles natively without using glVertexAttribLPointer.

What this tells OpenGL to do is that your vertex data are doubles. But the input into the vertex processing is a float. Therefore, the driver must convert your data from double to float. Almost certainly on the CPU.

That’s bad.

You need to be sending floats, not doubles. So convert your vertex data to floats yourself.

[QUOTE=alicana;1265877]I read some other users benchmark tests vbo vs immidiate mode they concluded there is no extreme difference.

http://stackoverflow.com/questions/430555/when-are-vbos-faster-than-simple-opengl-primitives-glbegin
[/quote]

I would also hope that you read the first answer to that question that explains the difference. Profiling, at the end of the day, is not a simple matter.

For fully dynamic data (i.e that changes each time it’s used) immediate mode can be just as fast as using buffer objects, but this depends on the type of objects you’re drawing and other factors.

Buffer objects, in common with client-side vertex arrays, can allow you to concatenate primitives, so instead of many draw calls with different primitive types you can submit an entire object in a single draw call. That can be a performance win.

With immediate mode this is slow:

    for (int i = 0; i < numpoints; i++)
    {
        glBegin (GL_POINTS);
        glVertex3fv (points[i]);
        glEnd ();
    }

But this is fast:

    glBegin (GL_POINTS);

    for (int i = 0; i < numpoints; i++)
        glVertex3fv (points[i]);

    glEnd ();

Your bottleneck may be elsewhere. Particularly if drawing millions of points, you’re more likely to be bottlenecked on fillrate than on vertex submission. So whether you use immediate mode or buffer objects is going to be quite irrelevant: the decision of which to use will depend on whether you’re purist about only using core context calls, or using GL ES, or even personal preference.

Consider drawing your data set differently. Rather than drawing points you could write your data into a big array and then glDrawPixels it: that would also work.

Get rid of the double. Seriously. You don’t need it. This is something that beginners seem to pick up from awful awful awful tutorials (like NeHe) (and seems to me to be particularly prevalent among those using Java for some odd reason, but that may be just an anecdotal observation). Use floats instead; they’re more than good enough for the vast majority of data. Using doubles, worst case is that your entire per-vertex pipeline will drop back to software emulation. Best case is that it’ll run in hardware but slower. Only use doubles if you absolutely, positively, 100% know that you need them, and for 2D data constrained to the bounds of a window, you absolutely, positively, 100% don’t need them. If the data set is arriving as doubles, then convert to and store as floats before drawing. I’ll say this again because it needs to be emphasised: for the kind of drawing you’re doing, you don’t need to use doubles.

Ok, Im conviced to use float instead of double.

By the way, you talked about fillrate and I found this quote:

Scene complexity can be increased by overdrawing, which happens when “an object is drawn to the frame buffer, and then another object (such as a wall) is drawn on top of it, covering it up. The time spent drawing the first object was wasted because it isn’t visible.” When a sequence of scenes is extremely complex (many pixels have to be drawn for each scene), the frame rate for the sequence may drop. When designing graphics intensive applications, one can determine whether the application is fillrate-limited by seeing if the frame rate increases dramatically when the application runs at a lower resolution or in a smaller window. [2]

My main question about in this thread post was that can I reduce point that 'll be drawn.

Example:

  1. I have 900 x 700 pixels application window
  2. I set ortho to Xmin = 0, xMax = 9000, ymin =0 , yMax = 28000

So, 900 pixel wide represented by 9000 ortho. Therefore, 100 glVertex may be represented in same pixel on screen (???)

First, If its right so far, can I calculate that? or do I need dpi values etc ? (for e.g: glVertex2f(1, 3200) and glVertex2f(5,3200) drawn onto same pixel ? )

Second, does OpenGL have some this kind of techniques. (Because, I saw vbo does not make significant change)