Uniform Buffer Objects, dynamic sized arrays and lights

FrankBoltzmann · July 23, 2013, 3:09pm

Hello, I’m new to this forum as a poster. I tried making the title as descriptive as possible so here goes:

I’m new to UBOs and I’ve seen many different implementations and I don’t think none of them serves my purpose and/or I can’t understand UBOs completely. In my application, I have a number of lights that isn’t fixed. I’d like to feed them to the GPU so I can use them in the shader code. Bonus: I’d prefer to have lights stored in the GPU much like the VBOs, where I just need to bind the buffer’s ID (GLuint) to use it. The trick is that even if this is possible, you’d have to bind several UBOs, one for each light and bind a certain index to them so that it maps to the array on the shader code. Is there any way to do this?

Right now, I’m looking at having my fragment shader like this:


struct Light {
   vec3 position;
   float padding;
}
layout (std140) uniform Lights {
   Light light[];
}

and then iterating through the lights. This dynamic allocation of arrays is possible in 4.3 as far as I know. However, I have problems in the C++ code:

Initialization:


glGenBuffers(1,&m_lightsUBO); // this generates UBO, OK
glBindBuffer(GL_UNIFORM_BUFFER, m_lightsUBO); // this binds it, OK
glBufferData(GL_UNIFORM_BUFFER, /*size*/, /*data*/, GL_DYNAMIC_DRAW); // this allocates space for the UBO. 
glBindBufferRange(GL_UNIFORM_BUFFER, /*buffer index*/, m_lightsUBO, 0, /*size*/); // this binds UBO to Buffer Index

Though I don’t need to specify the data at this point, I do need to specify a size… but I don’t know how many lights I will need. This is a run-time value but if I move this code to a per-frame function I’ll be creating a new UBO every frame and thus defeating the purpose. Also, I’m not sure what this Buffer Index should be.
After the initialization, I can pretty much do this:

Per Frame:


glBindBuffer(GL_UNIFORM_BUFFER,m_lightsUBO);
glBufferSubData(GL_UNIFORM_BUFFER,0,data.size(),data.data());
glBindBuffer(GL_UNIFORM_BUFFER,0);

Which means I’m actually sending data to the GPU every frame, instead of just referencing a Buffer Object.

I’d like to know if what I’m looking for is possible and if so, how to do it. If not, then I’d like to know the closest possible alternatives to what I want.

Thank you in advance!

Alfonse_Reinheart · July 23, 2013, 4:11pm

Uniform blocks must have an explicit size specific in the shader. Shader storage blocks however do not; they can be unbounded in size, with the size taken dynamically based on the range of the buffer object bound to the SSBO.

SSBOs are only available in GL 4.3 hardware.

Uniform blocks are probably sufficient for your needs. You can set a maximum hard limit on the number of lights, then simply pass fewer than that as a uniform variable:


#define MAX_NUM_TOTAL_LIGHTS 100
struct Light {
  vec3 position;
  float padding;
}
layout (std140) uniform Lights {
  Light light[MAX_NUM_TOTAL_LIGHTS];
  int numLights;
}

So your uniform buffer would always have space for 100 lights, but you would also pass a variable that says how many to use.

In any case, you will always have some kind of maximum size in your buffer object, no matter whether you use UBOs, SSBOs, buffer textures, or some other mechanism. You don’t want to be constantly allocating buffers of arbitrary size. You allocate the maximum size, then fill it with whatever you need.

FrankBoltzmann · July 23, 2013, 8:49pm

I have 4.3 hardware, I’m even considering using beta 4.4 drivers.

My example was simple, but I will actually require lights to have about 20N in size and I will also need an array with Materials which can also be 20N in size. The reason for having several materials is that I’ll have the material index per pixel on a second shader pass that will do the light evaluations on a quad with a render-targeted texture. I will also be stress testing so I didn’t really want to have a maximum size for the array. But even if I use your Uniform Block suggestion, do I keep a reference (GLuint) for the Buffer Objects so I can just bind them before drawing without passing the data to the shaders unless a light is modified? What I want is to have an array of structs in the shader which I can bound dynamically per frame. For example, binding the C++ struct lights[i].ubo to the shader variable lights[i] on a per-frame basis after having generated every lights[i].ubo on a initialisation stage by buffering the data on lights[i].position, lights[i].cutoff_angle and so on.

EDIT: Oh I think I’m starting to understand what you can actually do with it. Could I, for example, allocate space for 100 lights, buffer the data into the shader, save the UBO id and then later when one of the lights gets updated I could buffer the data of that light into the shader without buffering all of the lights? I don’t want to keep pushing things from the CPU to the GPU if the GPU could already have stored data like the lights… I would prefer to change for example light 33 and light 66 without sending the other lights over to the shader. So what I basically want is an array of light structs in the shader that may have specifically indexed lights being updated on demand.

FrankBoltzmann · July 24, 2013, 8:21pm

Should this be moved to OpenGL coding: Advanced or Beginners since it isn’t just GLSL and there seem to be more people actively helping in that section?

hlewin · July 25, 2013, 5:41am

Maybe I’m reading you wrong, but: glBufferSubData updates a part of the buffer-object. So all you have to do is declare a uniform-block in the shader (with a sufficient-once-for-all number of lights) , create an equally-sized buffer in gl, bind it to the uniform block and - whenever you Change a light-setting - update the part that corresponds to it.
I don’t exactly know what you mean by 20N lights though. If the N is meant Alfonse’s 100 then maybe a uniform buffer is too large as the overall-size of uniforms is limited. Then you’d have to use textures or Images to store and get-into-the-shader your light-Settings.

FrankBoltzmann · July 25, 2013, 7:01am

N is the size of a float. In the OpenGL description of layout std140 it is said that data is stored in blocks of 4N if there is a vec3 or vec4, 2N if there is a vec2 and N if it’s just floats.

Each of my lights has 20N, thats 5 * vec4. So if there’s 100 lights, that’s 500 vec4s or 2000 floats. Same for materials so I could be passing 4000 floats per frame to the GPU and I really wanted to use the BUS as little as possible if I already have the data there.

For materials, I really just need to collect all of my materials and send them once, unless a new material appears at run-time which is highly unlikely. Then when collecting all the objects, I pair them with a material ID so even if the objects have to be updated a lot (due to changes in one of the scenegraph matrices) the materials won’t change so all I really need is the material ID to be passed to the shader on a per-primitive basis. For the lights, might be as you are saying: I allocate 2000 floats in the GPU’s memory and I feel it the first time, then whenever a light is updated I send the part of it that needs updating. So if light 33 gets updated I do a glBufferSubData(GL_UNIFORM_BUFFER,3320sizeof(GLfloat),20*sizeof(GLfloat),data.data()); and that only sends 20 floats through the BUS to the GPU, while the other lights are already stored in the GPU’s memory, right?

hlewin · July 25, 2013, 11:46am

That’s the theory. I’ve to say I’m unsure about a limit of uniforms that reside in a buffer. 2000 floats would - if I remember right - have been too much for my old Laptop lying around (but that one is quite a few years old). There is a limit for the upper number of uniform-floats etc., but - as I said - I’m unsure about if those do apply for uniform-blocks at all. But I’m really no expert on the newer features of gl as I’m mainly working with Features present in 2.1. I’ve never used nor concerned with ShaderStorage-blocks etc. Maybe they’re a more adequate way for your needs: If ShaderStorage-blocks are - as Alfonse stated - not explicitely sized in the shader it is highly probable that no such limits on the number of uniform floats apply to them.

For materials, I really just need to collect all of my materials and send them once, unless a new material appears at run-time which is highly unlikely.

I won’t worry so much about how much data gets send. If a buffer or texture Needs to be grown by a fex Bytes once every 20 Frames that’s nothing. Just do partial updates. If a buffer Needs to be grown that is one allocation, one buffer copy + the update with new data. Sounds like 2 Million clock-cycles at most…

Alfonse_Reinheart · July 25, 2013, 12:15pm

2000 floats would - if I remember right - have been too much for my old Laptop lying around (but that one is quite a few years old).

If it supported UBOs at all, then it is required to allow individual uniform blocks to contain at least 16KB of data. So 2000 floats is merely half of the minimum capacity; AMD supports 64KB buffers even on my old HD 3300.

I won’t worry so much about how much data gets send. If a buffer or texture Needs to be grown by a fex Bytes once every 20 Frames that’s nothing. Just do partial updates. If a buffer Needs to be grown that is one allocation, one buffer copy + the update with new data. Sounds like 2 Million clock-cycles at most…

Which explains why the ARB just released an extension who’s primary purpose is to make growing the size of a buffer object after initial creation impossible.

No, the ARB have made it abundantly clear that expanding the size of a buffer (or texture) in-situ is not a good idea.

hlewin · July 25, 2013, 12:26pm

Simply trying to grow it was my first idea before I was rasping my formulation
Creating a new buffer, copying the old and filling it up should be no problem if done once or twice a second - but I can just speak from my not alltoo hardware-demanding perspective.

And: you caught me. If I remember right UBOs were one thing my old lappi could not handle so I decided against using them - at least until I do not feel the need to test my stuff on other hardware regularly. That thing gave up compiling shaders containing for-loops with an animosous “Shader to supported by HW:”<Nothing>

FrankBoltzmann · July 25, 2013, 1:55pm

[QUOTE=Alfonse Reinheart;1253108]If it supported UBOs at all, then it is required to allow individual uniform blocks to contain at least 16KB of data. So 2000 floats is merely half of the minimum capacity; AMD supports 64KB buffers even on my old HD 3300.

Which explains why the ARB just released an extension who’s primary purpose is to make growing the size of a buffer object after initial creation impossible.

No, the ARB have made it abundantly clear that expanding the size of a buffer (or texture) in-situ is not a good idea.[/QUOTE]

So what do you think of what I said? Allocating a UBO of 100 * 5 * 4 floats, binding the UBO ID to a binding point and block index so that you just need to bind it to use it. Then when an individual light is updated is it possible to update that specific location in the UBO? Using BufferSubData like I said?

hlewin · July 25, 2013, 2:56pm

That is something you can take for granted.

FrankBoltzmann · July 25, 2013, 5:49pm

What can I take for granted?

UPDATE ON THE TOPIC: I have tried SSBOs which basically let me have a pointer to an memory location that is mapped to the SSBO data, so I can easily change the stuff I want individually if I must. But now I have a dilemma:

Should I use SSBOs or UBOs? From what I’ve seen, I can allocate space for the SSBO per frame depending on whether the number of active lights has changed. But isn’t this worse than just allocating a MAX value using a UBO and only filling a part of it? Also, I’ve read that writing to a SSBO is slower than writing to a UBO, so shouldn’t I just allocate a really big UBO and fill it with the active lights on a per-frame basis?

From what I can tell, only sending data for the lights that have been updated might prove to be harder than I initially thought since I’d have to find a way to deal with deleted lights. So I think I’ll just stick with sending all the active lights.

What do you guys think?

Alfonse_Reinheart · July 25, 2013, 6:42pm

What can I take for granted?

That what you said will work.

From what I’ve seen, I can allocate space for the SSBO per frame depending on whether the number of active lights has changed. But isn’t this worse than just allocating a MAX value using a UBO and only filling a part of it?

Yes, but you can do that with SSBOs too. Did you not read the part about where I pointed out that, not 4 days ago, the ARB released an OpenGL feature who’s primary purpose is to make it impossible to reallocate space for a buffer object?

Also, I’ve read that writing to a SSBO is slower than writing to a UBO

There is no such thing as an SSBO. Or a UBO. Or a VBO.

They are just [i]buffer objects[/i]: unformatted linear arrays of memory stored and managed by OpenGL. You can use a buffer for shader storage purposes, then turn around and use it for UBO. You can do transform feedback into a buffer, then upload that data to a texture via PBO. You can use a buffer with a buffer texture, write with image load/store to it, then use it as vertex data with glVertexAttribPointer.

All buffer objects provide the same functionality.

What may be slower is reading from it in your shader. UBOs will (in all likelyhood) be copied into the constant local storage of your shaders, so reading will be quite fast. SSBO’s are basically just a nice form of Image Load/Store via buffer textures, so they’re treated like global memory accesses.

FrankBoltzmann · July 25, 2013, 7:14pm

[QUOTE=Alfonse Reinheart;1253150]That what you said will work.

Yes, but you can do that with SSBOs too. Did you not read the part about where I pointed out that, not 4 days ago, the ARB released an OpenGL feature who’s primary purpose is to make it impossible to reallocate space for a buffer object?

There is no such thing as an SSBO. Or a UBO. Or a VBO.

They are just [i]buffer objects[/i]: unformatted linear arrays of memory stored and managed by OpenGL. You can use a buffer for shader storage purposes, then turn around and use it for UBO. You can do transform feedback into a buffer, then upload that data to a texture via PBO. You can use a buffer with a buffer texture, write with image load/store to it, then use it as vertex data with glVertexAttribPointer.

All buffer objects provide the same functionality.

What may be slower is reading from it in your shader. UBOs will (in all likelyhood) be copied into the constant local storage of your shaders, so reading will be quite fast. SSBO’s are basically just a nice form of Image Load/Store via buffer textures, so they’re treated like global memory accesses.[/QUOTE]

So UBO’s should be faster to read from in my shader? Since, as you said, it is impossible to reallocate space for a Buffer Object, I will have to allocate for example 100 lights whether I use UBOs or SSBOs. If this is true, then I should use UBOs, shouldn’t I?

Also, in my code, I’m using glBufferData with NULL data, per frame, just before I use glMapBufferRange. My SSBO initialization is basically generating it and binding as a GL_S_S_B to a predefined binding point. My questions are:

If I have defined MAX_LIGHTS = 100 and I only need to upload 30 lights, what’s the differences between using glBufferData with size = current_number_of_lights and size = MAX_LIGHTS? You said it doesn’t reallocate space, so what does it really do? If I use glBufferData with size = 30 and the next frame use it with size = 31, what happens?
Should I really be using glBufferData each frame? Or should I use it once in initialization and then just make sure glBufferSubData doesn’t upload something bigger than the space I’ve allocated?
Should I really be using glMapBufferRange or should I instead be using glBufferSubData? I figured glMapBufferRange either copies the data to host memory and then copies changes back to the GPU, or it returns a pointer directly to GPU memory which is would be risky. So it seems glBufferSubData is just better since it just copies the specific data you want to the GPU at that point.

EDIT: (EDIT2 responds to this) This is strange, I tried using the .length method to check the light array length and it always returns 1. I’ve decided to see what would happen if I took glBufferData (with NULL pointer) and nothing happened, it didnt seem to be doing anything at all. I am basically generating a buffer, binding it and using glBufferSubData to upload the data with 3 lights… this results in 3 lights being correctly evaluated in the shaders accessing light[0], light[1] and light[2] but light.length returns 1 so I’m basically accessing a memory position out of the array. I don’t really understand what’s going on…

EDIT2: After some testing it seems that light.length compiles but isn’t supposed to even be used as it has nothing to do with array length. The correct way would be light.length() but there is a known bug where you need to wrap it with uint(uintBitsToFloat(light.length())) to get the unsigned int out of it. So now that I get the correct length of the array I managed to test some things:

If I use glBufferData(100,NULL) during initialization and then I use glBufferSubData(3,data) the array length is 100. If I only use glBufferSubData(3,data) it doesn’t allocate anything as expected. If I use glBufferData(3,data) per frame and then change to glBufferData(4,data) the array’s length also goes from 3 to 4, meaning it allocated a bigger space. But then how have they made it impossible to reallocate? And is it better to use glBufferData(3,data) per frame or use glBufferData(100,NULL) when initializing and glBufferSubData(3,data) per frame?

Alfonse_Reinheart · July 25, 2013, 8:20pm

OK, let’s just cut to the chase. Go read this and implement one of those streaming strategies.

Since, as you said, it is impossible to reallocate space for a Buffer Object

I didn’t say it was impossible. I said that recent functionality allows you to make it impossible. And since that functionality exists to make using them faster, that’s a strong hint that you shouldn’t be doing it in the first place.

I will have to allocate for example 100 lights whether I use UBOs or SSBOs.

The ability to resize the storage for a buffer object has nothing to do with how you use it.

Uniform blocks must be of a specific size. Therefore, whatever buffer object you use for them must be at least that size. It could be bigger, but it can’t be smaller.

You said it doesn’t reallocate space

Where? I said that ARB_buffer_storage/GL 4.4 allows you to allocate buffers that cannot be reallocated. And that means that it was a mistake for OpenGL to let you reallocate them to begin with. So you should never do it.

I figured glMapBufferRange either copies the data to host memory and then copies changes back to the GPU, or it returns a pointer directly to GPU memory which is would be risky. So it seems glBufferSubData is just better since it just copies the specific data you want to the GPU at that point.

No it doesn’t. It copies the specific data to the GPU eventually.

Consider this. If you map the buffer, generate your light data every frame into that pointer, and unmap it, the worst-case scenario is that the driver will have to DMA-copy the data from the mapped pointer into the buffer object. It will do that at a time of its choosing, but sometime before you do anything that reads from that data. The best-case scenario is that you’re writing directly to the buffer object’s storage. This is much more likely if you use GL_INVALIDATE_BIT to invalidate the buffer (since you’re overwritting all of its contents).

If you use BufferSubData, you must generate your data into an array of your own, and you give that to BufferSubData. Worst-case, BufferSubData must then copy that array into temporary memory, and later DMA-copy that into the buffer. The reason why is quite simple. If the buffer is currently in use (is going to be read by GL commands that you have already issued that haven’t executed yet), then it can’t simply overwrite that data. The OpenGL memory model doesn’t allow later commands to affect earlier ones. So the implementation must delay the actual DMA-copy into the buffer storage until that storage is no longer in use. And since BufferSubData cannot assume that the pointer it was given will still be around after BufferSubData returns, it must copy that data into temporary memory and DMA from that into the buffer later.

So worst-case with BufferSubData is that there are two temporary buffers. You had to generate your lighting data into one temporary buffer, and OpenGL had to copy it into another temporary buffer.

Best case with BufferSubData is that it is able to do the DMA immediately. But that almost never happens. Why? Because DMA’s aren’t instantaneous. They’re an asynchronous operation. Also, DMA’s typically can’t happen directly from client memory. So most implementations of BufferSubData are still going to have to copy the buffer into some temporary, DMA-able memory, and then DMA it up to the GPU.

With mapped pointers, odds are very good that, if the pointer you get isn’t actually the buffer, it’s at least memory that’s DMA-ready. So the worst-case scenario for mapping is equal to the best case scenario for BufferSubData.

So yes, if performance is a concern (and at this point, it shouldn’t be. Stop prematurely optimizing stuff), mapping will only ever be equally as bad as BufferSubData, and can be a good deal faster.

FrankBoltzmann · July 25, 2013, 9:46pm

[QUOTE=Alfonse Reinheart;1253153]OK, let’s just cut to the chase. Go read this and implement one of those streaming strategies.