PDA

View Full Version : unknown-sized uniform struct arrays?



BenFoppa
09-10-2014, 12:07 AM
It seems like what is and isn't valid for a uniform array has a whole lot of caveats, so:

In OpenGL 3.3, can I have a uniform variable-sized array of structs in my vertex shader? What are the constraints? How do I go about declaring and using it? How do I go about loading data into it? The end goal is that the shader has no inputs, and uses gl_VertexID to infer the index into the uniform array (e.g. if using 36 vertices to construct a cube from triangles, the index would be gl_VertexID/36).

If that's not possible, how else can I go about creating a variable-sized buffer of data to be indexed from within the shader?

Thanks!

mhagain
09-10-2014, 01:58 AM
Any reason to not just use a vertex buffer for this?

If you actually look at what you want to achieve, you've got:

- An array of structures, where,
- You can't declare the size of it up-front, and,
- You want to index it using gl_VertexId.

This is exactly what vertex buffers do.

BenFoppa
09-10-2014, 09:26 AM
Any reason to not just use a vertex buffer for this?

If you actually look at what you want to achieve, you've got:

- An array of structures, where,
- You can't declare the size of it up-front, and,
- You want to index it using gl_VertexId.

This is exactly what vertex buffers do.

I couldn't find the right way to use them, i.e. without uploading extra data into VRAM to indicate where to index to. How do I index into a VBO from GLSL? I should clarify that I'm not using gl_VertexID as the index directly, but doing some modifications. Here's the main() I was hoping to write for my vertex shader (assuming we do a naive 12 triangles = 36 vertices to draw a cube):



void main() {
int block_id = gl_VertexID / 36;
int vertex_id = gl_VertexID % 36;
texture_position = texture_positions[vertex_id];
normal = normals[vertex_id / 6];
world_position = block_data[block_id].position;
type = block_data[block_id].type;

gl_Position = projection_matrix * vec4(world_position, 1.0);
}

Cornix
09-10-2014, 09:29 AM
You can always use textures like an array.

mhagain
09-10-2014, 10:05 AM
OK, from reading your second post I surmise that you're trying to save memory on duplicated data by fetching attributes from storage other than the main vertex buffer. Is this correct? If so I'd advise that you initially code with a full fat vertex containing the duplicated data and determine if it actually is a problem before you attempt this kind of pre-emptive optimization. You may find that (while you do save memory) fetching vertex data in this manner completely blows your cache locality and ends up being substantially slower overall (memory isn't everything).

BenFoppa
09-10-2014, 12:06 PM
OK, from reading your second post I surmise that you're trying to save memory on duplicated data by fetching attributes from storage other than the main vertex buffer. Is this correct? If so I'd advise that you initially code with a full fat vertex containing the duplicated data and determine if it actually is a problem before you attempt this kind of pre-emptive optimization. You may find that (while you do save memory) fetching vertex data in this manner completely blows your cache locality and ends up being substantially slower overall (memory isn't everything).

Yeah, that's what I'm trying to do.

Good point! It's worth mentioning that this isn't *exactly* preemptive - I started reducing VRAM usage because I ran out due to intensely duplicated data (e.g. vertex normals and texture positions), and FPS tanked as a result (or at least, FPS tanked when I increased the buffer size, and I assume it had to do with memory usage, since it doesn't tank anymore). Although we might be at the point where reducing it further might not be worth it.

It seems like the easiest solution is to make the index into the uniform array an input to the vertex shader, but then I'll have buffers filled with only continuous runs of N identical indices (N=36 here, but would be reduced with triangle strips, for example). There's *got* to be a cache-friendly way to reduce that - fetching an identical element from a VBO is slower than not fetching anything at all.


You can always use textures like an array.

I was hoping to avoid this - it seems a lot sketchier.. Although it makes me realize that I'll have a fixed capacity no matter what, so maybe I can just fix the capacity for the uniform array as well.. So can one bulk-load an array of structs, or does every member have to be loaded individually?

Edit: ^ the answer is "individually" (http://stackoverflow.com/questions/16739993/passing-custom-type-struct-uniform-from-qt-to-glsl-using-qglshaderprogram), so maybe using arrays of the members is a better way to go.

Edit 2: Okay, there's an upper bound to how big a uniform array can be. Maybe a texture is the most straightforward solution after all..

mhagain
09-11-2014, 01:44 AM
If you've got so much data that you run out of video memory, there are other ways of dealing with the situation.

From the mention of blocks it looks like you're writing a Minecraft clone/voxel engine, so that can make things really easy.

If you want normals, and if you're drawing blocks so each vertex of each face has the same normal, then instead of including them in your vertex data you calculate them in a geometry shader.

If you don't care about block rotation you can use instancing. Here you store the full fat vertex for only a single block and then for each block you draw the data is reduced to 4 floats: position and size (if you're not drawing cubes you'll need 6 floats per block).

On the other hand if performance falls off a cliff you shouldn't just assume it's due to memory usage. You may be doing something else that's triggering a slow path in your driver, maybe even a software emulated path.

BenFoppa
09-11-2014, 03:48 PM
If you've got so much data that you run out of video memory, there are other ways of dealing with the situation.

From the mention of blocks it looks like you're writing a Minecraft clone/voxel engine, so that can make things really easy.

If you want normals, and if you're drawing blocks so each vertex of each face has the same normal, then instead of including them in your vertex data you calculate them in a geometry shader.

If you don't care about block rotation you can use instancing. Here you store the full fat vertex for only a single block and then for each block you draw the data is reduced to 4 floats: position and size (if you're not drawing cubes you'll need 6 floats per block).

On the other hand if performance falls off a cliff you shouldn't just assume it's due to memory usage. You may be doing something else that's triggering a slow path in your driver, maybe even a software emulated path.

What's the advantage of calculating normals in a geometry shader over in the vertex shader? Aren't geometry shaders less performant and less well-supported? That's what I'm doing now - the vertex shader I posted above just gets the normal from a uniform array, which I assume is faster than branching, though I haven't investigated extensively (I'd really hope it's faster than branching, since branching definitely shouldn't be the most performant option). Such small performance differences in the vertex shader probably aren't going to be a bottleneck either way, I suppose. At any rate, I want to apply a similar technique for the rest of the duplicated block data, and there should absolutely be a performant way to do that.

Instancing is definitely on the list of things to do, but I'm not intent on limiting myself to cubes; slopes will be included in the blocks in the future, which greatly reduces the opportunity for, and advantage of, instancing.

As for the performance tanking, I suspect VRAM usage because simply increasing the buffer capacity was enough to cause the dramatic FPS reduction as well.

Edit: You're right, at this point, it's probably worth just throwing the block data into a VBO and having an index buffer that looks like (0,0,0,...0,1,1,1,...1,2,2,2,...) . It bothers me immensely, but it seems like optimizing further is going to take much more work than it's worth at this point, and that getting an "optimal" result (i.e. one that takes full advantage of the assumptions we can make because of my use case, like not even bothering to fetch the block type again for the next vertex of a block) might require hacking around OpenGL altogether using something like OpenCL. Again, WAY work than it's worth for the yield it would bring at this point. Plus, there's an upper bound on how many things I should even be *trying* to render; good optimizations at the game logic level might completely eliminate my risk of maxing out VRAM, who knows?

mhagain
09-11-2014, 06:15 PM
What's the advantage of calculating normals in a geometry shader over in the vertex shader?

It's a tradeoff.

Let's assume that you're correct and that having a large vertex size is the root cause. Calculating them in a geometry shader will enable you to shave 3 floats (12 bytes) off the size of each vertex, which may be enough to eliminate your performance loss. You'll lose some from the GS stage for sure, but the point is that the amount you lose from the GS may not be as much as that which you lose from the fatter vertex format. Say you lose 50% from the fatter vertex, but only 10% or even 25% from the GS. That's a net gain by comparison.

Sometimes you get to choose the lesser of two evils.

BenFoppa
09-11-2014, 06:32 PM
It's a tradeoff.

Let's assume that you're correct and that having a large vertex size is the root cause. Calculating them in a geometry shader will enable you to shave 3 floats (12 bytes) off the size of each vertex, which may be enough to eliminate your performance loss. You'll lose some from the GS stage for sure, but the point is that the amount you lose from the GS may not be as much as that which you lose from the fatter vertex format. Say you lose 50% from the fatter vertex, but only 10% or even 25% from the GS. That's a net gain by comparison.

Sometimes you get to choose the lesser of two evils.

I'm still kind of confused, sorry - where are these bytes being saved? I still have to eventually come up with 12 bytes for the normal vector, and those bytes are still getting passed to each invocation of the fragment shader. If I calculate them in the vertex shader, where do they go that they wouldn't go if I calculated them in the geometry shader?

mhagain
09-12-2014, 01:54 AM
I'm still kind of confused, sorry - where are these bytes being saved? I still have to eventually come up with 12 bytes for the normal vector, and those bytes are still getting passed to each invocation of the fragment shader. If I calculate them in the vertex shader, where do they go that they wouldn't go if I calculated them in the geometry shader?

OK, I think I understand the part that's confusing you.

Right now you're seeing yourself as having two options. One is to add the 12 bytes to your vertex format that you define using glVertexAttribPointer (or glNormalPointer). That makes your vertex buffer bigger.

The second is to do a lookup on another buffer and fetch the normals from that. This is what you mean when you say "calculate them in the vertex shader".

The GS method is actually neither of these.

With the GS method you don't make your vertex buffer bigger and you don't need to do the lookup. Because a GS can operate on an entire triangle, and because you probably want per-face normals rather than per-vertex normals, when I say "calculate" I mean actually "calculate"; i.e with mathematics.

No normals in your input vertex format, no array lookup. Insetad you do:


vec3 Normal = normalize (cross (gl_PositionIn[1].xyz - gl_PositionIn[0].xyz, gl_PositionIn[2].xyz - gl_PositionIn[0].xyz));

You do this once only at the start of your GS code, then emit the 3 vertices in your triangle (just because a GS can emit additional vertices it doesn't mean that it has to) with this normal tagging along.

No extra memory usage.

BenFoppa
09-12-2014, 11:13 AM
OK, I think I understand the part that's confusing you.

Right now you're seeing yourself as having two options. One is to add the 12 bytes to your vertex format that you define using glVertexAttribPointer (or glNormalPointer). That makes your vertex buffer bigger.

The second is to do a lookup on another buffer and fetch the normals from that. This is what you mean when you say "calculate them in the vertex shader".

The GS method is actually neither of these.

With the GS method you don't make your vertex buffer bigger and you don't need to do the lookup. Because a GS can operate on an entire triangle, and because you probably want per-face normals rather than per-vertex normals, when I say "calculate" I mean actually "calculate"; i.e with mathematics.

No normals in your input vertex format, no array lookup. Insetad you do:


vec3 Normal = normalize (cross (gl_PositionIn[1].xyz - gl_PositionIn[0].xyz, gl_PositionIn[2].xyz - gl_PositionIn[0].xyz));

You do this once only at the start of your GS code, then emit the 3 vertices in your triangle (just because a GS can emit additional vertices it doesn't mean that it has to) with this normal tagging along.

No extra memory usage.

Heh. I COMPLETELY forgot that the GS could operate on things that weren't individual vertices. That makes perfect sense now, thanks a bunch!