glMultiDrawElements or glPrimitiveRestartindex?

From what I understand they are basically the same. The wiki confirmed my impression. So I don’t see what the advantages would be of one over the other. Is it just the memory management overhead in glMultiDrawElements? Should I prefer glPrimitiveRestartindex for that reason?
I actually might prefer glMultiDrawElements. It looks “cleaner”

Well, consider this: glMultiDrawElements uses a client-side array that specifies offsets into the buffer object for rendering. Primitive restart-based rendering is all done on the GPU. It’s entirely possible that multi-draw is done with a loop right on the CPU and therefore doesn’t buy you any performance gains you couldn’t get yourself.

Also, primitive restart works in combination with all other rendering methods. There’s no glMultiDrawRangeElements, for example, nor is there glMultiDrawElementsInstanced.

Thanks for the answer

Really? Isn’t the purpose of methods like this to send a batch to your dedicated hardware? If it is like you say glMultiDrawElements is basically useless. Why would I need something like that?

I’m not sure of what you mean. If I understand correctly I can do both of these with glMultiDrawElements. Do you mean it’s easier to do that with glPrimitiveRestartindex?

EDIT: actually maybe I can’t do the instanced part that easily

Really? Isn’t the purpose of methods like this to send a batch to your dedicated hardware? If it is like you say glMultiDrawElements is basically useless. Why would I need something like that?

Because, at some point in the past, it sounded like a good idea.

Also, that doesn’t mean that there’s a guarantee of zero performance improvement with its use. There are some things that drivers need to do (verify that buffer objects exist and are on the GPU, get their pointers, etc) that would have to be done with multiple repeated calls.

As with anything else, if performance is important, you should test and see what happens.

If I understand correctly I can do both of these with glMultiDrawElements.

Not according to the quick reference card. There are only 3 multi-draw functions: MultiDrawArrays, MultiDrawElements, and MultiDrawElementsBaseVertex.

Or I can use glPrimitiveRestartindex :slight_smile:
I could stand some extra memory management, and I’m not even sure about that. But if this is the price I have to pay I’ll go straight with primitive restart. At this point I don’t see a real advantage in using glMultiDrawElements. Why isn’t it deprecated?

I know that. I always do my research :slight_smile:
What I mean is that I could get the same result as a hypothetical glMultiDrawRangeElements using glMultiDrawElements, since I can specify my range there. Doing instanced drawing seems a bit more problematic though.

However, I would say this is now more of academic interest, as glPrimitiveRestartindex seems the way to go.
I wold just be curious to know why glMultiDrawElements is still in the specification.

Actually just using GL_TRIANGLES as the primitive type would be what I’d recommend. No need for any special case handling, multiple complex chains of triangles can be grouped together, you can keep the same vertexes in the same order as before, and you still get all the benefits of indexing.

I’ve seen a few discussions about using triangles or strips. At the end, as far as I understand, the problem is just the maximization of hits in the vertex cache. At the moment I’m favoring strips, but I’ll have to look in detail at this issue later. I already have a question I want to ask about that :slight_smile:
Indeed this is a point for triangles. You don’t need restarting.

This topic is probably exhausted, but I want to add some lines from my personal experience:

  1. glDrawElements using primitive restart is certainly faster than glMultiDrawElements. The reason is not in the execution speed on the GPU (there is a difference of only about several percents), but in the way drivers interpret glMultiDrawElements command. I don’t know what exactly happens in drivers, but it seems that they execute glDrawElements in a loop. (I think Alfonse also stated that several years ago, but I can’t find that post). The reason I believe this happens is in the fact that I got 400% of speed up in my code when I turned to primitive restart. But that speedup was only in CPU time! GPU time almost stays the same. So, restart primitives may unload your CPU, but would not make your drawing faster. Please, don’t misunderstand what I said; unloading CPU is a very useful effect of primitive restart approach.

  2. Although indexing enables more flexible use of GL_TRIANGLES primitive, GL_TRIANGLE_STRIPS are not for the recycle bin yet. For many regular structures (like regular grids), GL_TRIANGLE_STRIPS enables more compact index buffers. For example, for NxN regular grid

  • GL_TRIANGLES requires 32(N-1)*(N-1) = 6(N^2-2N+1) = 6N^2 indices
  • GL_TRIANGLE_STRIPS requires 2N(N-1) = 2N^2-2N = 2N^2 indices
  • GL_TRIANGLE_STRIPS with cache optimization proposed by Ignacio Castaño requires 2N^2 – 2N + floor(N/(cacheSize-1))*2N vertices
    It would be interesting to discuss about the efficiency of vertex post-transform cache, but on the first glance it seems that new cards are less sensitive to this optimization (I need to carry out some experiments in order to prove or reject this statement/premise). But the point is that STRIPS consumes about three times less memory space for the regular grid representation comparing to triangles.

Obviously the topic wasn’t really exhausted :slight_smile:

Both points are interesting, but I find the second one especially interesting, for the simple reason that I have never heard of this strips with cache optimization. I did know the name of Ignacio Castano though. I believe he wrote some stripifying code.
My first thought is that indeed strips are more convenient, because you need less data. It’s the reason why I use them. However, I found this thread, where there are some points in favor of triangles. I still have to clearly understand what it says though.
For what I see the point is basically the maximization of the cache. It would be ideal to know the capabilities of your cache, so you can optimize your strips/triangles based on the hardware you have. But I’m afraid it’s not so easy.
Another point is that you might have to do the stripification every time before the rendering, overloading the CPU, but this is not necessary if you have a fixed set of objects. In that case you need to do that only once, maybe even offline.

The first point is for me a useful confirmation that multidraw has to go into the trashcan. Utterly useless method.

I missed that thread. :slight_smile:
Now it is several months old so I’ll not add new comments.

mhagain little bit overemphasizes influence of vertex post-transform cache taking into account vertices only. Currently I’m trying totally to remove vertex attributes from my renderings. In that case indices are the only data that is streamed to the GPU (along with textures). Having three times less data is significant in that case. When reading the thread that you’ve quoted carefully read what Daniel (aqnuep) and Dark Photon said.

You can read more about Ignacio’s method here: Optimal Grid rendering But I get almost the same rendering speed with simple primitive restart. That’s why I said that I have to carry out more experiments. Vertex shaders I used were pretty simple.

Some more of my thoughts on this matter.

I’m not too concerned about memory and/or bandwidth overhead for indexes. One would need to be really using huge amounts of indexes for this to become a dominant factor. I tend to prefer 16-bit indexes myself so impact is halved anyway.

The requirement of 6n^2 indexes for GL_TRIANGLES needs to take into account that you’ll most commonly get far superior vertex reuse, with a theoretical best case reuse of double that which you’ll get with GL_TRIANGLE_STRIP (and consequently half the memory/bandwidth for vertexes). That’s assuming best-base for both; real-world will be somewhere in between. Let’s go nuts. At one extreme I’ve seen meshes where the original stripification was not that great, and which, when converted to plain indexed triangles, needed as little as 10% of the vertexes as before. And bouncing back again, worst-case with plain triangles will get you no better than with strips, but with the additional overhead of those extra indexes.

As vertexes are always going to be more heavyweight than indexes this can all add up to accepting additional overhead in one area in exchange for reduced overhead in another, and which can translate into huge performance gains if the use case is appropriate, but it’s something you need to measure for your own datasets before making a decision either way. But you do need to balance both sides of this equation (more indexes in exchange for less vertexes) otherwise you’re just working from incorrect data.

The point about vertex caches becoming less relevant on newer hardware is relevant and needs further discussion, testing, benchmarking and experimenting.

As always - profile, profile, profile. And do it before putting in the work. If vertex submission or the vertex pipeline are not bottlenecks in your program, none of this will do you any good. That can’t be stressed enough.

Summary is that neither approach is inherently better than the other all of the time. For the most common types of meshes you’ll be using, plain triangles with indexes are more likely to outperform strips (with or without indexes) on bandwidth and vertex reuse terms, so they are what I’d personally recommend in the general case. You can always find something that reverses that however, so it’s up to you whether or not you want to add special-case code for that something. And if your program’s bottleneck is actually elsewhere, you can completely ignore all of this. :wink:

I tend to prefer 16-bit indexes myself so impact is halved anyway.

Do you prefer this to the extent that you perform multiple draw calls (possibly with BaseVertex)? That is, if you have a model with more than 2^16 vertices in it, do you issue multiple draw calls to render it, or just a single draw call?

The requirement of 6n^2 indexes for GL_TRIANGLES needs to take into account that you’ll most commonly get far superior vertex reuse, with a theoretical best case reuse of double that which you’ll get with GL_TRIANGLE_STRIP (and consequently half the memory/bandwidth for vertexes).

This is the statement that is in dispute.

Forsythe’s calculations dealt with index reuse, not vertex reuse. That is, the number of indices specified by the user that get reused in the space of a draw call. What he didn’t take into account was that strips have fewer indices by their very nature, and therefore will naturally have less index reuse. This makes his numeric comparisons less useful for determining which one works better.

That being said, the following is true: given an arbitrary mesh, it is easier to implement an algorithm to get good vertex reuse with a triangle-based algorithm than a strip-based one. Just writing the code to do it is easier in the triangle case.

Neither. I prefer it to the extent that it’s more widely supported in hardware.

Neither. I prefer it to the extent that it’s more widely supported in hardware.

I was asking what you do if you’re presented with a mesh that has more than 2^16 vertices in it.

32-bit indices are widely supported in hardware, and have been for quite some time. They’re certainly slower in much hardware, but they’ve been supported for a long time.

Where did you encounter hardware without support for 32bit indices?

I completely agree with you!

For arbitrary complex mesh usage of TRIANGLES is probably more flexible than STRIPS for achieving more hits in the vertex-cache. I just said that there are special cases, like regular grids, where STRIPS are much better. So we agreed also about this. :slight_smile:

I also agree with you that if the bottleneck is not in VS stage, all this is totally irrelevant. That’s why I said I need more experiments with very complex vertex shaders in order to estimate efficiency of vertex-cache.

The only disagreement is in the statement that STRIPS could require more vertices (or I misunderstood the point of “more indexes in exchange for less vertexes” ). The number of vertices is exactly the same in both cases.

Please consider the case of rendering the mesh that doesn’t have any vertex attribute. In that case, the indices are only data required. The position of each vertex can be generated according to its ID, and other data can be read from textures. In this case using STRIPS, if possible, could reduce memory footprint of the mesh. Of course, for arbitrary mesh it could be probably very hard to achieve good vertex-cache utilization. Much harder than with triangles. The point of my previous post is that TRIANGLE_STRIPS shouldn’t be a priori rejected in favor of TRIANGLES. :wink:

Please consider the case of rendering the mesh that doesn’t have any vertex attribute. In that case, the indices are only data required.

Technically, you don’t even need that; you can (theoretically) just call glDrawArrays with some number of vertices. Note that I have not seen this tested, nor have I tested it myself, so I can’t be sure that this actually works. But core GL 3.x and above should be able to do it.

I’m doing that on a regular base, and it works.
Although I’m using glDrawElements because of it’s ability to control vertex-cache reusing through index reordering. Also glDrawArrays assumes multiplication of the same vertices which requires some gymnastics with ID that can be totally skipped with indexed based drawing (like with glDrawElements).

Agreed. “Indexed” TRIANGLE_STRIPS have some marginal utility, under some limited circumstances.

However, I think we may agree (as I mentioned in another thread) that “non-indexed” TRIANGLE_STRIPS should be avoided. ATVR up near 2.0 – nowhere down near 1.0 where you can get with indexed TRIANGLEs.

Okay, I have something I wanted to ask later, but I’ll take advantage of this thread to do that now. So:

  1. can someone explain when or why triangles are better than strips? I can’t imagine a situation where the layout is so complex that strips miss the cache so much.
    Actually, how big can a cache be?

  2. would it be possible and reasonable to optimize the process for a specific cache? OpenGL doesn’t have a way to get information about the cache, does it?

Thanks