Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 29

Thread: SSBO and VBO help

  1. #11
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,475
    Quote Originally Posted by CaptainSnugglebottom View Post
    Most importantly, since glMapBuffer is so much easier to use. I was wondering whether I can use glMapBuffer at the same time, for both VBO and SSBO.
    You should be able to, yes.

    However, after you get your correctness bugs fixed and start looking at performance, I think you may find (as I have) that you get much better performance by avoiding use of glMapBuffer(). Unless you orphan the buffer before you call it, it is likely to cause internal synchronization in the GL driver (read: your CPU draw thread prevented from running while the GL driver and GPU "catches up" with the work you fed it thus far).

    Better to look at using PERSISTENT+COHERENT mapped buffers, or UNSYNCHRONIZED buffer mapping in combination with buffer orphaning. For details, see Buffer Object Streaming in the wiki. At least familiarize yourself with how to orphan a buffer, and try it when you hit odd buffer update stalls.
    Last edited by Dark Photon; 02-14-2018 at 05:09 AM.

  2. #12
    Thank you all for the answers. I have started repackaging my structs to avoid using vec3s.

    The major issue was me using incorrect attribute buffering (buffered integer as a float for object indexing), fixing that coupled with redoing my structs fixed the issue and I can actually render things properly.

    However, as I was adding stuff to the structs, I actually stumbled upon another issue that might have been preventing my program from working.

    This is the struct that I used in C++ and its GLSL counterpart the last time the code worked:

    Code :
    struct graphicsObjectData {
    	float	posVec[3]
    	float	rotVec[3];					// 12	24
    	float	sclVec[3];					// 12	36
    	int		textureLayer;			// 4	40
    };

    Code :
    struct objectData {
    	vec2 posXY;			// 8	8
    	vec2 posZ_rotX;		// 8	16
    	vec2 rotYZ;			// 8	24
    	vec2 sclXY;			// 8	32
    	float sclZ;				// 4	36
    	int textureLayer;		// 4	40
    };

    And the data was uploaded using glBufferSubData:

    Code :
    			glBindBuffer(GL_SHADER_STORAGE_BUFFER, this->objectSSBO);
     
    				objectDataBuffer.posVec[0] = object->posVec[0];
    				objectDataBuffer.posVec[1] = object->posVec[1];
    				objectDataBuffer.posVec[2] = 1.0 - graphics2DDepthChange*(*objectCount); // change if the triangles are outside the camera limit
    				objectDataBuffer.rotVec[0] = object->rotVec[0];
    				objectDataBuffer.rotVec[1] = object->rotVec[1];
    				objectDataBuffer.rotVec[2] = object->rotVec[2];
    				objectDataBuffer.sclVec[0] = object->scaleVec[0];
    				objectDataBuffer.sclVec[1] = object->scaleVec[1];
    				objectDataBuffer.sclVec[2] = object->scaleVec[2];
    				objectDataBuffer.textureLayer = object->textureLayer;
     
    			glBufferSubData(GL_SHADER_STORAGE_BUFFER, sizeof(graphicsObjectData)*objectIndex, sizeof(graphicsObjectData), &(objectDataBuffer));

    ... however the moment I add another integer to the struct, the problem of objects being messed up returns. The new structs from C++ and GLSL are:

    Code :
    struct graphicsObjectData {
    	float	posVec[3];						// 12	12
    	float	rotVec[3];						// 12	24
    	float	sclVec[3];						// 12	36
    	int		textureLayer;					// 4	40
    	int		drawEnable;					// 4	44
    };

    Code :
    struct objectData {
    	vec2 posXY;			// 8	8
    	vec2 posZ_rotX;			// 8	16
    	vec2 rotYZ;			// 8	24
    	vec2 sclXY;			// 8	32
    	float sclZ;					// 4	36
    	int textureLayer;		// 4	40
    	int drawEnable;			// 4	44	
    };

    Since this is the only difference between the working state and not, I think adding 2 integers in a row causes some sort of offset in the struct layout, if not in GLSL then definetely in C++. I was wondering whether there's a solution for this.
    Last edited by CaptainSnugglebottom; 02-14-2018 at 04:25 PM.

  3. #13
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,925
    Quote Originally Posted by CaptainSnugglebottom View Post
    Thank you all for the answers. I have started repackaging my structs to avoid using vec3s.
    Unless the amount of memory involved is significant, I'd suggest the opposite: add padding fields to the C++ structure so that you can just use vec3s in the GLSL structures. Or move the int/float fields to occupy what would otherwise be padding.

    Quote Originally Posted by CaptainSnugglebottom View Post
    Since this is the only difference between the working state and not, I think adding 2 integers in a row causes some sort of offset in the struct layout, if not in GLSL then definetely in C++. I was wondering whether there's a solution for this.
    What is sizeof(graphicsObjectData)? On a 64-bit system, it's conceivable that C++ is rounding the size to a multiple of 8 (implementations are free to add any amount of padding anywhere other than before the first member).

  4. #14
    Unless the amount of memory involved is significant, I'd suggest the opposite: add padding fields to the C++ structure so that you can just use vec3s in the GLSL structures. Or move the int/float fields to occupy what would otherwise be padding.

    What is sizeof(graphicsObjectData)? On a 64-bit system, it's conceivable that C++ is rounding the size to a multiple of 8 (implementations are free to add any amount of padding anywhere other than before the first member).
    I actually realized what the problem was, based on this Stackoverflow page. Not the first time GLSL weird packaging rules caused issues. I think I'm starting to dislike it.

    sizeof(graphicsObjectData) is actually what it's supposed to be, right now its 44 bytes, at least with Visual Studio. GCC might be doing it differently.

    Right now I decided to separate the floats so I define each float component separately, but I was wondering whether vec3(posX, posY, posZ) actually takes away anything from the performance.

    I will give padding a go tho just for practice. I am planning to draw up to 10 million objects and even so I might lose like a few mebibytes, which shouldn't be a problem for SSBO (which is why I decided to use it).

    Edit: On the other hand, if I have 10+ million extra bytes to send to the GPU due to padding, it might slow things down quiet a bit.
    Last edited by CaptainSnugglebottom; 02-14-2018 at 05:40 PM.

  5. #15
    Senior Member OpenGL Guru
    Join Date
    Jun 2013
    Posts
    2,925
    Quote Originally Posted by CaptainSnugglebottom View Post
    I actually realized what the problem was, based on this Stackoverflow page
    .
    I'm not sure why I overlooked that in my previous post. If a structure contains vec2s, the structure itself needs to be aligned to a multiple of 8, which means that it's size will be a multiple of 8.

    Quote Originally Posted by CaptainSnugglebottom View Post
    Not the first time GLSL weird packaging rules caused issues. I think I'm starting to dislike it.
    It really isn't that weird. GLSL vectors are first-class objects, so they need to be aligned to a multiple of their size (i.e. a vec2 needs 8-byte alignment). Note that SSE has the same requirement, so you should try to use 8-byte alignment on float arrays where practical (modern compilers will try to use SSE vectorisation where possible; if you want the code to work on non-SSE CPUs, you have to request that explicitly).

    Quote Originally Posted by CaptainSnugglebottom View Post
    Right now I decided to separate the floats so I define each float component separately, but I was wondering whether vec3(posX, posY, posZ) actually takes away anything from the performance.
    Depending upon the GPU architecture, it may improve it. On GPUs where everything is a vec4, putting all of the components in the same vec4 will be more efficient for operations which use all of the components. But I have no idea whether that's applicable to any GPU which supports SSBOs.

    Quote Originally Posted by CaptainSnugglebottom View Post
    I am planning to draw up to 10 million objects.
    In which case, packing may be worthwhile. But I'd suggest first trying reordering so that the scalar fields use the "spare" components.

  6. #16
    Hmm, well I finished the thing, and SSBO + VBO only provides a 2x performance increase over calling draw for each object separately. For 10000 sprites, with texturing and alpha channels, I get around 3 FPS (up from 1-1.5 FPS). Kind of a let down, lol.

    I wonder how else I can change the pipeline to get a decent performance.

    Would instancing be possible for different geometries within the same draw call? So that, I make objects with their own VBOs (VAO is the same for all objects) and then pass the VBO pointers, instead of going through each object and loading data into a single big VBO?
    Last edited by CaptainSnugglebottom; 02-16-2018 at 09:58 AM.

  7. #17
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,031
    For 10000 sprites, with texturing and alpha channels, I get around 3 FPS (up from 1-1.5 FPS).
    You can get better performance than that with immediate mode rendering. So clearly, you're doing something wrong. This would be performance I might expect if you rendered each sprite with a separate shader or something.

    Odds are good you're hitting a software path somehow.

  8. #18
    I thought the performance remained low due to me re-writing the SSBO/VBO with the same object data that remains constant. If I find a way to avoid that, the performance would improve. Right now I am looking at multi-threading SSBO/VBO filling. I can have both SSBO and VBO mapped at the same time (to answer my earlier questions), so technically I can separate loading 10000 objects into loading 2500 objects in 4 threads. I think glMapBuffers will help preventing OpenGL throwing an out of context error.


    Also I technically draw things twice, due to the nature of my renderer, and stage 4 requires its own FBO since OpenGL doesn't handle feedback.

    Also I redraw things a couple of times after that, which I can remove completely if I use my brain and move things around.

    But comparing to the 1:1 rendering, keeping all things the same, the performance should still be better. I am not sure where else the drop could be.

  9. #19
    Hmm, I think I know what you meant now. Guess I will try SubDating everything now. Can't make it worse, that's for sure.

    Click image for larger version. 

Name:	PerformanceResults.jpg 
Views:	17 
Size:	19.7 KB 
ID:	2674

    EDIT: The work load is now spread out a bit, but overall performance is just as poor. Weird. Also, apparently glBufferSubData is faster on the SSBO than VBO (VBO's takes 36%+).
    Last edited by CaptainSnugglebottom; 02-16-2018 at 06:17 PM.

  10. #20
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,031
    Have you done any actual performance testing or profiling to determine where your bottleneck is? Because you seem to be assuming that uploading 10,000 sprites is your bottleneck, which seems decidedly unlikely. If you really can't transfer more than a few hundred kilobytes per second to your GPU, then your card has serious problems. It seems far more likely that you're screwing something else up.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •