Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 3 of 3 FirstFirst 123
Results 21 to 30 of 30

Thread: SSBO and VBO help

  1. #21
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,008
    Have you done any actual performance testing or profiling to determine where your bottleneck is? Because you seem to be assuming that uploading 10,000 sprites is your bottleneck, which seems decidedly unlikely. If you really can't transfer more than a few hundred kilobytes per second to your GPU, then your card has serious problems. It seems far more likely that you're screwing something else up.

  2. #22
    I did as much performance testing as Visual Studio allows me. It claims that most of my performance is gone due to nvoglv64.dll and gdi32.dll, which is everything OpenGL related. I would like to know what other ways I could do to find the issue.

    Rendering is still a big part of it tho, if I disable texturing, it jumps from 2-3 FPS to 23-27 FPS for 10000 objects. I should also probably mention that I run it on a 2012 laptop.

    If you really can't transfer more than a few hundred kilobytes per second to your GPU, then your card has serious problems
    Actually, it's almost 4.7 megabytes. 108 bytes of object data 10000 objects for the SSBO, 60 bytes of vertex data for 6 vertexes for 10000 objects.
    Last edited by CaptainSnugglebottom; 02-16-2018 at 08:45 PM.

  3. #23
    Member Regular Contributor
    Join Date
    May 2016
    Posts
    465
    as far as i can tell you are using the mapped buffer pointer to update the buffer in a for-loop, which means the buffer is mapped the whole time. alternatively you can try to build a local "cpu-sided" buffer (std::vector, ::reserve(buffersize)), set the data in that buffer and then upload it somehow, either glBufferSubData() or glMapBufferRange() (or via buffer streaming if the previous data isnt relevant anymore)

  4. #24
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,394
    Batch Rendering using SSBO

    SSBO will contain an array of structs that contains all data for each object
    struct objectVarsData {
    float posVec[3];
    float rotVec[3];
    ...
    SSBO + VBO only provides a 2x performance increase over calling draw for each object separately. For 10000 sprites, with texturing and alpha channels, I get around 3 FPS (up from 1-1.5 FPS

    ...I thought the performance remained low due to me re-writing the SSBO/VBO with the same object data that remains constant.

    ...Also I technically draw things twice, due to the nature of my renderer, and stage 4 requires its own FBO since OpenGL doesn't handle feedback.

    ...Also I redraw things a couple of times after that, which I can remove completely if I use my brain and move things around.
    This whole thread is wandering around in the weeds.

    You took something that sounds really simple, you made it a lot more complex, and you still have very poor performance to show for it.

    Rather than try to optimize your much-more-complex tech approach...

    I'd suggest you ignore your tech approach for a second, pop back up to the top level, and tell us what you are trying to accomplish. What's the big picture? Are you just drawing a bunch of point sprites (quads) with texturing and alpha? Is it more complicated than that? If so, how? Then sketch out your original (non-SSBO/non-VBO/etc.) implementation for us (show some code snippets). Also, tell us what GPU/driver/OS you are targeting, the number of sprites you're aiming to render, and at what target frame time. You're more likely to get good performance in the end with this route.

    I'd recommend that you first understand clearly why your original (non-SSBO/VBO/etc.) implementation is slow, and what you need to change (minimally) to remove its primary bottlenecks and net you good performance. Folks here can help you with that.

    I did as much performance testing as Visual Studio allows me.
    It claims that most of my performance is gone due to nvoglv64.dll and gdi32.dll, which is everything OpenGL related.
    Ok, so you're GL driver (CPU) and/or GPU performance limited. Which means to get better performance, you need to change how you're using OpenGL to drive the GPU.

    There are other ways to profile GPU-based apps than running the MSVS Profiler on them. For instance, having "feature toggles" in your app where you can switch on/off various pieces of your draw loop for debugging can be useful for isolating how much frame time each feature takes.

    Would instancing be possible for different geometries within the same draw call?
    Please explain what's different about the geometries. Do these sprites you're rendering have different numbers of vertices (e.g. != 4)?
    Last edited by Dark Photon; 02-17-2018 at 08:10 AM.

  5. #25
    I'd suggest you ignore your tech approach for a second, pop back up to the top level, and tell us what you are trying to accomplish. What's the big picture? Are you just drawing a bunch of point sprites (quads) with texturing and alpha? Is it more complicated than that? If so, how? Then sketch out your original (non-SSBO/non-VBO/etc.) implementation for us (show some code snippets). Also, tell us what GPU/driver/OS you are targeting, the number of sprites you're aiming to render, and at what target frame time. You're more likely to get good performance in the end with this route.
    For educational purposes, I am building a multi-purpose engine. The idea is to be able to support geometry (triangle based), lines, and points. Right now I am doing the 2D rendering pipeline, where I assume all triangle based shapes to be flat, and to be ordered in a way (since there's usually some sort of hierarchy to 2D graphics, most important objects on top). In both 2D and 3D pipelines, I will be using (already have, but it's disabled) the differed shading technique as a way to optimize shading operations. Since differed shading inherently does not work well with the transparent objects, I had to separate operations into 4 stages:

    Stage 1: Render solids (alpha == 1) in FBO1
    Stage 2: Do differed shading, save to FBO2
    Stage 3: Render alphas, using pre-rendered depth buffer from stage 1 to discard all fragments covered by non-transparent objects. Each alpha fragment is rendered with shading applied.
    Stage 4: Render the layer's output to SceneFBO

    This is done for each layer, with results from each layer are rendered on top of each other in stage 4. Also for each objects I render control geometry to its own output, where each object has its own unique application-wide control value. After all layers are rendered, the mouse position is extracted to trigger flag in whatever object the mouse is pointing at.

    I can't really show you the code snippets because I wouldn't know where to start. It's just about 5000 lines of object oriented code right now.

    Most of the things I mentioned, I implemented by calling draw calls for each object. So right now I am trying to learn something new while fixing the performance issue. I am aiming at using Windows OS, but I try to use cross-platform libraries in case I need to use my thing on a Linux machine.

    Ok, so you're GL driver (CPU) and/or GPU performance limited. Which means to get better performance, you need to change how you're using OpenGL to drive the GPU.

    There are other ways to profile GPU-based apps than running the MSVS Profiler on them. For instance, having "feature toggles" in your app where you can switch on/off various pieces of your draw loop for debugging can be useful for isolating how much frame time each feature takes.
    I absolutely understand that drawing 10000+ objects by constantly uploading object data into buffers is insanity. It has bad design written all over it, which is why I will be implementing batch rendering next.

    Please explain what's different about the geometries. Do these sprites you're rendering have different numbers of vertices (e.g. != 4)?
    Yes exactly.

    Right now I am thinking about making prototypes of each object type that contain its own geometry data and VBO location for batch rendering. That way when I make an object, I can use instanced rendering for drawing data from the predefined VBO. That should remove any need to update the vertex data at all, which is 77% of the data I upload to the video card every frame right now.

  6. #26
    Member Regular Contributor
    Join Date
    May 2016
    Posts
    465
    i suggest you start reading about rendering techniques ("OpenGL Superbible", "OpenGL Programming Guide" and other books/articles by nvidia and so on)

  7. #27
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,008
    I am building a multi-purpose engine.
    It should be noted that "performance" and "multi-purpose" don't go together. Imposing limitations on your scene is what allows you to be able to make optimizations. The more options you give to the user, the fewer options you leave for optimization.

    In both 2D and 3D pipelines, I will be using (already have, but it's disabled) the differed shading technique as a way to optimize shading operations.
    ... why would you need to use deferred shading for 2D rendering? I could understand needing deferred shading if you're rendering billboards or something, but most 2D sprite rendering doesn't even use lighting.

    Right now I am thinking about making prototypes of each object type that contain its own geometry data and VBO location for batch rendering. That way when I make an object, I can use instanced rendering for drawing data from the predefined VBO. That should remove any need to update the vertex data at all, which is 77% of the data I upload to the video card every frame right now.
    Until you have positively identified the bottleneck, you should not be making those kinds of decisions. After all, what good does it do to reduce your data uploads by 77% if data uploading is not what's causing your performance problem.

    The best way to figure this out is to reduce everything down to just the OpenGL stuff. Rip out your entire engine (or just open up a new OpenGL project), and rebuild just the sequence of operations needed to produce the output. The best way to do that is to get an OpenGL trace tool, have it spit out a log of OpenGL commands, and then put those commands in your new application.

    From there, start profiling. Use timer queries to figure out how long operations on the GPU are taking. Pull things out and see if it improves performance. Start figuring out what is causing your problem.

    Only when you know what the problem is can you actually solve it.

  8. #28
    It should be noted that "performance" and "multi-purpose" don't go together. Imposing limitations on your scene is what allows you to be able to make optimizations. The more options you give to the user, the fewer options you leave for optimization.
    I know that, but I was hoping to get something more than 3 FPS for something basic like 10k sprites. The question is how to achieve that, and that's why I wanted to try batch rendering.

    ... why would you need to use deferred shading for 2D rendering? I could understand needing deferred shading if you're rendering billboards or something, but most 2D sprite rendering doesn't even use lighting.
    I must be nuts, but I want to try something not a lot of people do.

    Until you have positively identified the bottleneck, you should not be making those kinds of decisions. After all, what good does it do to reduce your data uploads by 77% if data uploading is not what's causing your performance problem.
    I played around with the "comment out" tool, and it appears that most of my performance loss is in the fragments shaders that use too many if statements. Rendering a plain color, with alpha pass disabled gives me 15+ FPS for 10000 objects. For 1000 objects, the render goes from 30 FPS to 130 if I disable everything. So the main reason for my slowdown is the uber shader, but due to the nature of what I want to do, I guess I cannot change that.

    Still tho, it does not mean I should not be looking into other things. Drawing just 2000 triangles at 130 FPS (with all effects off) is still not enough.
    Last edited by CaptainSnugglebottom; 02-17-2018 at 08:49 PM.

  9. #29
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,008
    Oh and stop reporting performance as "FPS". Performance is best measured in actual frame time.

  10. #30

    Persistent Buffers

    Hello again,


    I have stumbled upon a Steam Game Dev conference that featured a presentation on modern technique for vertex data streaming, in particular a method utilizes persistent buffers. I implemented the solution the presenter proposed, replacing my SSBO and VBO buffers with persistent ones.


    Initialization:
    Code :
    	// Object SSBO
    			glGenBuffers(1, &(this->objectSSBO));
    			glBindBuffer(GL_SHADER_STORAGE_BUFFER, this->objectSSBO);
     
    			glBufferStorage(GL_SHADER_STORAGE_BUFFER, graphics2DMaximumSSBOSize_Byte * 3, NULL, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);
     
    		this->objectSSBOAddrStart = (graphics2DObjectData *) glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, graphics2DMaximumSSBOSize_Byte * 3, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);
     
    			glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
     
    	// Object VBO
    			glGenBuffers(1, &(this->objectVBO));
    			glBindBuffer(GL_ARRAY_BUFFER, this->objectVBO);
     
    		glGenVertexArrays(1, &(this->objectVAO));
    		glBindVertexArray(this->objectVAO);
    		glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(graphics2DObjectVertexData), (GLvoid*)offsetof(graphics2DObjectVertexData, position));
    		glEnableVertexAttribArray(0);
    		glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, sizeof(graphics2DObjectVertexData), (GLvoid*)offsetof(graphics2DObjectVertexData, uvCoordinates));
    		glEnableVertexAttribArray(1);
    		glVertexAttribIPointer(2, 1, GL_UNSIGNED_INT, sizeof(graphics2DObjectVertexData), (GLvoid*)offsetof(graphics2DObjectVertexData, objectIndex));
    		glEnableVertexAttribArray(2);
     
    			glBufferStorage(GL_ARRAY_BUFFER, graphics2DMaximumVBOSize_Byte*3, NULL, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);
     
    		this->objectVBOAddrStart = (graphics2DObjectVertexData *) glMapBufferRange(GL_ARRAY_BUFFER, 0, graphics2DMaximumVBOSize_Byte * 3, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);
     
    			glBindBuffer(GL_ARRAY_BUFFER, 0);

    Synchronization:
    Code :
    		// Waiting for buffer
    			GLenum waitStatus = GL_UNSIGNALED;
    			if (this->subSceneSync) {
    				while ((waitStatus != GL_ALREADY_SIGNALED) && (waitStatus != GL_CONDITION_SATISFIED))
    				{
    					waitStatus = glClientWaitSync(this->subSceneSync, GL_SYNC_FLUSH_COMMANDS_BIT, 1);
    				}
    			}
     
    		this->objectVBOAddr = this->objectVBOAddrStart + this->currentBuffer*graphics2DMaximumVBOSize_Byte;
    		this->objectSSBOAddr = this->objectSSBOAddrStart + this->currentBuffer*graphics2DMaximumSSBOSize_Byte;		
     
    		/////////////////////////////////////
    		//    FETCH AND RENDER HERE
    		/////////////////////////////////////
     
    		this->currentBuffer = (this->currentBuffer + 1) % 3;
     
    		// Locking the buffer
    			if (this->subSceneSync) glDeleteSync(this->subSceneSync);
     
    			this->subSceneSync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);

    Rendering:
    Code :
    			glBindFramebuffer(GL_FRAMEBUFFER, this->subSceneFBO1);
    			glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
    			glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
     
    			glUseProgram(graphics2DStage1ObjectShader);
     
    				glActiveTexture(GL_TEXTURE0);
    				glBindTexture(GL_TEXTURE_2D, this->textureAsset->colorMapID);
    				glUniform1i(graphics2DStage1ObjectColorMapLocation, 0);
     
    				glActiveTexture(GL_TEXTURE1);
    				glBindTexture(GL_TEXTURE_2D, this->textureAsset->normalMapID);
    				glUniform1i(graphics2DStage1ObjectNormalMapLocation, 1);
     
    				glActiveTexture(GL_TEXTURE2);
    				glBindTexture(GL_TEXTURE_2D, this->textureAsset->specularMapID);
    				glUniform1i(graphics2DStage1ObjectSpecularMapLocation, 2);
     
    				glActiveTexture(GL_TEXTURE3);
    				glBindTexture(GL_TEXTURE_2D, this->textureAsset->lightMapID);
    				glUniform1i(graphics2DStage1ObjectLightMapLocation, 3);
     
    			glEnable(GL_DEPTH_TEST);
    			glDepthMask(GL_TRUE);
    			glDisable(GL_BLEND);
     
    			// Binding SSBO
    				glBindBuffer(GL_SHADER_STORAGE_BUFFER, this->objectSSBO);
    				glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, this->objectSSBO);
     
    			// Binding VBO
    				glBindVertexArray(this->objectVAO);
    				glBindBuffer(GL_ARRAY_BUFFER, this->objectVBO);
    				glDrawArrays(GL_TRIANGLES, graphics2DMaximumVerteces*this->currentBuffer, vertexIndex);


    These are the only major changes from the last working version of my thing.

    However nvoglv64.dll crashes during SSBO data filling for the very first object. Addresses seem to be good, all the buffer switching is proper as well. My video card does support the ARB_BUFFER_STORAGE extension. Is there anything else I can check to ensure the working order.

    The program also crashes with just VBO being persistent, but it actually makes it past frame 1, so I don't think having 2 persistent buffers is an issue.

    Suggestions are appreciated.
    Last edited by CaptainSnugglebottom; 02-18-2018 at 10:33 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •