PDA

View Full Version : GLSL struct packing / texture buffers



sleap
06-15-2011, 03:38 AM
Hi,
There seems to be very little information on this or maybe I'm looking in the wrong places. It's now possible to declare a struct in GLSL and an array of that type. This array can map to global video memory - a texture buffer object for example. However I'm finding it slower to use an array of structs than a big chunk of floats which I look up manually.

eg.

//particle_header.h
struct Particle
{
vec4 position;
vec4 velocity;
};

//shader.frag
#include "particle_header.h" //inserted in shader loading code
Particle* particles;

void main()
{
int index = ...
particles[index].position += ...
discard;
}

This data can also be accessed CPU side via glMapBuffer:

#include "particle_header.h"
...
Particle* particles = glMapBuffer(...);
particles[123].position = ...

I really like the idea of this functionality but as mentioned I have found it much slower than say "vec4* particles;". Additionally, GLSL will pack the struct differently to g++. I've attempted to manually pad the struct but I'd like a cleaner way to do it. At the very least some definite rules that GLSL follows to pack structs.

Some indication is given here:
http://msdn.microsoft.com/en-us/library/bb509632(v=vs.85).aspx

Is there a standard way to use structs in GLSL which is both modular (easy to access from both GPU and CPU) and fast?

Thanks in advance

Alfonse Reinheart
06-15-2011, 04:59 AM
It's now possible to declare a struct in GLSL and an array of that type.

That's always been possible, since before the 2.0 days.


This array can map to global video memory - a texture buffer object for example.

No it can't. Buffer textures are textures; they're accessed with texture accessing commands. Those return vec4's, not arbitrary structs.

What you're thinking about are uniform buffer objects.


#include "particle_header.h" //inserted in shader loading code

Is this Cg, or are you running with #extension ARB_shading_language_include?


Particle* particles;

This is not valid GLSL syntax. Not unless you're talking about that NVIDIA extension for bindless graphics.


I really like the idea of this functionality but as mentioned I have found it much slower than say "vec4* particles;"

... what? That's still using a pointer, which is bindless graphics. It's the same concept.


At the very least some definite rules that GLSL follows to pack structs.

That's what uniform buffers are for. You're using NVIDIA's bindless graphics, so look up how their extension does it.


Some indication is given here:
http://msdn.microsoft.com/en-us/library/bb509632(v=vs.85).aspx


No, those are rules for D3D's equivalent to uniform buffers.


Is there a standard way to use structs in GLSL which is both modular (easy to access from both GPU and CPU) and fast?

First of all, you should never be mapping buffers to read from them like that. Second, UBOs are your best bet for speed, but they have relatively small limits on their byte size (65536 bytes on some platforms). Anything else will be slower.

sleap
06-29-2011, 04:48 AM
Thanks for the reply! Indeed so - I am using nvidia's GL_NV_shader_buffer_load. I guess I'm allocating memory as a texture buffer and then getting a pointer to that memory with nvidia's extension.

The #include inserts the header with my own parsing code. Yes, struct arrays were possible but not globally shared ones, which is what I want. This is essentially for GPGPU without having to interface with CUDA or OpenCL as the end result is still GL rendering. For that purpose I don't think UBOs are what I want.

I guess my question should have been: Where can I find the struct aligning/packing rules for nvidia's bindless graphics?
The answer is here: http://developer.download.nvidia.com/opengl/specs/GL_NV_shader_buffer_load.txt
(no idea how I missed it as I'm pretty sure I searched the page)

I have never had to align structs in C++ but it would be useful to be able to access arbitrary structs from either end. A compiler option to switch on nvidia's GLSL struct alignment would be amazing.

There still remains the problem of access speed. It may be a bug on my part (more likely) but perhaps the overhead of packing the data outweighs the cache advantages of interleaving with current hardware?