Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 25

Thread: Problem with glReadPixels using FBO

  1. #11
    Member Regular Contributor
    Join Date
    Jul 2012
    Posts
    460
    Since you seem to target games, and as Dark Photon mentioned it, it is now very current for games to be directly rendered in FBOs. Then use a blit to draw the image on the screen. Many of the effects or various other technics you'll want to add will get a lot of benefit from it. And since you'll render directly into an FBO you'll have direct access to the depth texture, color texture (and ie normals...). This will allow you to render to FP textures also, will be more easy in case you'll move to deferred rendering too.

    So consider this.

  2. #12
    Intern Contributor
    Join Date
    Nov 2017
    Posts
    79
    Thank you guys for help!

    I just try to repeat CryEngine solution: https://www.gamedev.net/articles/pro...chnique-r4103/

    Photon, your suggestion sounds good:

    Quote Originally Posted by Dark Photon View Post
    0) Render scene to FBO with depth buffer backed by a depth texture
    1) Reproject depth texture on GPU to generate another depth texture
    2) Use depth texture for culling, etc.

    But how can I cull meshes on GPU side?


    On CPU side it looks clear - check visible pixels while rasterizing (I'm stil not sure of that).
    How can I keep relation between current mesh and its visibility on the GPU side?
    Last edited by nimelord; 01-10-2018 at 02:26 PM.

  3. #13
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,399
    Quote Originally Posted by nimelord View Post
    But how can I cull meshes on GPU side?
    There are several types of culling: frustum culling, occlusion culling, and backface culling. We'll ignore the latter since it's typically sub-object and the pipeline can apply that pretty efficiently.

    And to facilitate this, let's just take an example. Suppose we have 1000 instances of some object we want to render at different points in our scene. And we want to frustum cull and occlusion cull them on the GPU.

    So we start with a list of instances, each with its own bounding sphere (which we can pass down into the shader). For each instance (which we blast-render to the GPU in an instanced draw call), in the shader we can test its bounding sphere against the frustum planes to determine whether it's definitely outside the view frustum or not. If it is, we throw the instance away. If not, then we serialize this instance into a list (via transform feedback) to render with later (...with an indirect draw call).

    If you also want to occlusion cull (sounds like you're interested in this), then you have a number of options. You can do occlusion query tests against the depth buffer using bounding primitives for each instance, and then conditional render each, though for a lot of instances that could get pretty expensive if done in the usual way. Another option is to pre-generate a MIPmap of your depth map and then in a shader you can perform a conservative occlusion test by reading 1-4 samples out of the appropriate level of your depth MIPmap which cover your object. The nice thing about that approach is you don't need to rasterize a bounding primitive for each instance, and you know immediately in the shader (after the depth texture lookups) whether you're going to kill off the instance or not. You can even combine this into the same shader that does frustum culling above and then only serialize out the list of instances that pass both 1) the frustum-cull and 2) the occlusion-cull (which you can then render with an indirect draw call).

    Before you go to this trouble though...

    Honestly, I'd first recommend making sure you have a fair amount frame time you can potentially reclaim with culling (of some type) before you add any of this.

    First start by doing on-CPU frustum-culling. Then compare the draw time needed to render the scene without this culling applied, against the time needed to render the scene "with" this culling applied. Don't count the cost of actually doing the culling though. If you don't see much difference, don't bother with frustum-culling. If you see a fairly big difference, definitely implement per-object or per-instance frustum culling, either on the CPU and/or the GPU.

    After doing this, do the same test for occlussion-culling. That is, time draw time w/o occlusion culling applied, to draw time w/ occlusion culling having been applied, and don't count the time needed to do the occlusion culling (yet). No big difference? Dump occlusion culling. Big difference? Consider implementing it in some form.
    Last edited by Dark Photon; 01-10-2018 at 05:51 PM.

  4. #14
    Intern Contributor
    Join Date
    Nov 2017
    Posts
    79
    Quote Originally Posted by Dark Photon View Post
    So we start with a list of instances, each with its own bounding sphere (which we can pass down into the shader). For each instance (which we blast-render to the GPU in an instanced draw call), in the shader we can test its bounding sphere against the frustum planes to determine whether it's definitely outside the view frustum or not. If it is, we throw the instance away. If not, then we serialize this instance into a list (via transform feedback) to render with later (...with an indirect draw call).
    Do you mean this loop in the pipeline? (I didn't know about such possibility):
    Click image for larger version. 

Name:	gl_pipeline.jpg 
Views:	22 
Size:	40.5 KB 
ID:	2595

    Quote Originally Posted by Dark Photon View Post
    Then compare the draw time needed to render the scene without this culling applied, against the time needed to render the scene "with" this culling applied.
    My general target is a open massive forest with ruined buildings.
    Now I have small test scene with boxes instead designed trees.
    Raw solution without culling produce 30FPS for small test scene and for middle scene FPS dramatically fall.
    I implemented frustum culling on the CPU side (I didn't know about pipeline loop before you previous message).
    And it speeded the rendering up from 30FPS to 120FPS. (x4 - good result I think)
    But it is not enough for my target. Definitely it needs to be processed by occlusion culling.


    So I have to implement "Occlusion culling".
    And I will research for technical details for solution you described here.
    Thank you very much.



    PS:
    In my culling process I apply "Frustum culling" for shadow maps too.
    For rendering shadow map I use filtered for camera meshes with correction for "shadow in scene" visibility.
    And I intersect result set with frustum culling result of light point of view.
    So I reuse scene filtered result for culling for shadow point of view.
    Can I do such reusing in GPU variant?
    Last edited by nimelord; 01-11-2018 at 01:01 AM.

  5. #15
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,399
    Quote Originally Posted by nimelord View Post
    Do you mean this loop in the pipeline? (I didn't know about such possibility):
    No, that's the old, deprecated GL_FEEDBACK mode that was present in very early OpenGL versions.

    I'm talking about GL_TRANSFORM_FEEDBACK. GL_FEEDBACK is somewhat similar in concept. However, it captures all of the data in a CPU memory buffer, whereas GL_TRANSFORM_FEEDBACK lets you capture the data in a buffer object on the GPU, which is better for efficiency (you want the data on the GPU anyway, not all the way back across the bus in CPU memory). See this wiki page for details: Transform Feedback (OpenGL Wiki).

    For a glimpse of where this fits within the OpenGL rendering pipeline, see the 4 or so "Transform Feedback" mentions in the middle of the OpenGL Pipeline map here: OpenGL 4.4 Pipeline Map (thanks to Patrick Cozzi for hosting it).

    Raw solution without culling produce 30FPS for small test scene and for middle scene FPS dramatically fall.
    I implemented frustum culling on the CPU side (I didn't know about pipeline loop before you previous message).
    And it speeded the rendering up from 30FPS to 120FPS. (x4 - good result I think)
    But it is not enough for my target.
    I'd recommend you compare benchmarks using frame time (e.g. milliseconds or seconds), rather than in FPS. There are lots of reasons, but this blog post sums them up pretty well: Performance (Humus). FPS isn't very useful for profiling the individual consumers of your frame time for instance (which you need to do to optimize your frame processing). Also, up-front I'd suggest that you come up with a the maximum frame time you can spend doing everything in a frame. Once you get your worst case to comfortably fit within it, you're done.

    In my culling process I apply "Frustum culling" for shadow maps too.
    For rendering shadow map I use filtered for camera meshes with correction for "shadow in scene" visibility.
    And I intersect result set with frustum culling result of light point of view.
    So I reuse scene filtered result for culling for shadow point of view.
    Can I do such reusing in GPU variant?
    Sure. However, if you really are starved for performance, you don't want to end up sending much more down the pipe than you minimally have to. You should structure your scene and your rendering such that it is very, very cheap to cull away geometry which isn't in the frustum you're rendering. I'd recommend course-grain culling on the CPU and then if you need it more fine-grained culling on the GPU via a transform feedback method. For instance, there's no sense in having the GPU cull out geometry per primitive or per object instance when a whole object (or group of objects) is not even close to within the view frustum ... assuming you can cull that away quickly.
    Last edited by Dark Photon; 01-13-2018 at 05:23 PM.

  6. #16
    Intern Contributor
    Join Date
    Nov 2017
    Posts
    79
    Quote Originally Posted by Dark Photon View Post
    I'd recommend you compare benchmarks using frame time (e.g. milliseconds or seconds), rather than in FPS.... Once you get your worst case to comfortably fit within it, you're done.
    Good advice.

    For small scene: 8.35 ms vs 33.74 ms: the same result, I mean - x4


    Thank you, a lot!

  7. #17
    Intern Contributor
    Join Date
    Nov 2017
    Posts
    79
    Quote Originally Posted by Dark Photon View Post
    Another option is to pre-generate a MIPmap of your depth map and then in a shader you can perform a conservative occlusion test by reading 1-4 samples out of the appropriate level of your depth MIPmap which cover your object. The nice thing about that approach is you don't need to rasterize a bounding primitive for each instance, and you know immediately in the shader (after the depth texture lookups) whether you're going to kill off the instance or not.

    Seems I found good document with description of technique that you meant.

    I just leave it here for other people who are searching the solution: http://rastergrid.com/blog/2010/10/h...usion-culling/

  8. #18
    Intern Contributor
    Join Date
    Nov 2017
    Posts
    79

    Smile Reprojection of depth map on the GPU side.

    Try to reproject depth map from previous camera position to current camera position.


    There are steps I should do for that:
    1) In the end of render cycle I save 'projection view matrix' and depth buffer to the texture.
    2) On the start of next render cycle I restore world positions with inverted 'prejection view matrix' from previous scene frame and project it for current camera position
    3) use result data for something....

    How can I do reprojection on GPU side?
    I mean I don't have positions for vertex shader and I just have depth texture and two 'prejection view matrixes' for prev and current frames.


    Thanks for answer.

  9. #19
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,399
    There's no need to start a new thread here as this is just a continuation of the same topic.

    Realizing that this technique is going to leave you with artifacts due to using one-frame-late occlusion data...

    The first of these two URLs on "Coverage Buffer Occlusion Culling" describes one way to handle this (see reconstructPos):

    Last edited by Dark Photon; 01-22-2018 at 05:52 AM.

  10. #20
    Intern Contributor
    Join Date
    Nov 2017
    Posts
    79
    Quote Originally Posted by Dark Photon View Post
    There's no need to start a new thread here as this is just a continuation of the same topic.
    Ok.

    I try to understand this one: http://rastergrid.com/blog/2010/10/h...usion-culling/

    Full source code here: http://rastergrid.com/blog/downloads/mountains-demo/


    Code :
    void MountainsDemo::renderScene(float dtime) {
     
    	this->drawCallCount = 0;
     
    	// update camera data to uniform buffer
    	this->transform.ModelViewMatrix = mat4(1.0f);
        this->transform.ModelViewMatrix = rotate(this->transform.ModelViewMatrix, this->camera.rotation.x, vec3(1.0f, 0.0f, 0.0f));
        this->transform.ModelViewMatrix = rotate(this->transform.ModelViewMatrix, this->camera.rotation.y, vec3(0.0f, 1.0f, 0.0f));
        this->transform.ModelViewMatrix = rotate(this->transform.ModelViewMatrix, this->camera.rotation.z, vec3(0.0f, 0.0f, 1.0f));
    	this->transform.ModelViewMatrix = translate(this->transform.ModelViewMatrix, -this->camera.position);
    	this->transform.MVPMatrix = this->transform.ProjectionMatrix * this->transform.ModelViewMatrix;
    	glBindBuffer(GL_UNIFORM_BUFFER, this->transformUB);
    	glBufferSubData(GL_UNIFORM_BUFFER, 0, sizeof(this->transform), &this->transform);
     
    	// bind offscreen framebuffer
    	glBindFramebuffer(GL_FRAMEBUFFER, this->framebuffer);
        glClear(GL_DEPTH_BUFFER_BIT);
     
    	// draw terrain
    	glUseProgram(this->terrainPO);
     
    	glBindVertexArray(this->terrainVA);
     
    	glActiveTexture(GL_TEXTURE0);
    	glBindTexture(GL_TEXTURE_2D, this->heightmap);
    	glActiveTexture(GL_TEXTURE1);
    	glBindTexture(GL_TEXTURE_2D, this->terrainTex);
    	glActiveTexture(GL_TEXTURE2);
    	glBindTexture(GL_TEXTURE_2D, this->detailTex);
     
    	bool visible[7][7] = { false };
    	this->visibleBlocks = 0;
    	// terrain elements will be drawn only in a 7x7 grid around the camera
    	float x = roundf(-this->camera.position.x / TERRAIN_OBJECT_SIZE);
    	float z = roundf(-this->camera.position.z / TERRAIN_OBJECT_SIZE);
    	for (int i=-3; i<=3; i++)
    		for (int j=-3; j<=3; j++)
    			// perform view frustum culling for the terrain elements
    			if ( cullTerrain( vec4( TERRAIN_OBJECT_SIZE*(i-x), 0.f, TERRAIN_OBJECT_SIZE*(j-z), 1.f ) ) ) {
    				glUniform2f(glGetUniformLocation(this->terrainPO, "Offset"), TERRAIN_OBJECT_SIZE*(i-x), TERRAIN_OBJECT_SIZE*(j-z));
    				glDrawElements(terrainDraw.prim_type, terrainDraw.indexCount, GL_UNSIGNED_INT, (void*)terrainDraw.indexOffset);
    				this->drawCallCount++;
    				// store visibility so we can use it during the tree instance rendering
    				visible[i+3][j+3] = true;
    				this->visibleBlocks++;
    			}
     
    	// create Hi-Z map if necessary
    	if ( this->cullMode == HI_Z_OCCLUSION_CULL ) {
    		glUseProgram(this->hizPO);
    		// disable color buffer as we will render only a depth image
    		glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
    		glActiveTexture(GL_TEXTURE0);
    		glBindTexture(GL_TEXTURE_2D, this->depthTex);
    		// we have to disable depth testing but allow depth writes
    		glDepthFunc(GL_ALWAYS);
    		// calculate the number of mipmap levels for NPOT texture
    		int numLevels = 1 + (int)floorf(log2f(fmaxf(SCREEN_WIDTH, SCREEN_HEIGHT)));
    		int currentWidth = SCREEN_WIDTH;
    		int currentHeight = SCREEN_HEIGHT;
    		for (int i=1; i<numLevels; i++) {
    			glUniform2i(glGetUniformLocation(this->hizPO, "LastMipSize"), currentWidth, currentHeight);
    			// calculate next viewport size
    			currentWidth /= 2;
    			currentHeight /= 2;
    			// ensure that the viewport size is always at least 1x1
    			currentWidth = currentWidth > 0 ? currentWidth : 1;
    			currentHeight = currentHeight > 0 ? currentHeight : 1;
    			glViewport(0, 0, currentWidth, currentHeight);
    			// bind next level for rendering but first restrict fetches only to previous level
    			glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, i-1);
    			glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, i-1);
    			glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, this->depthTex, i);
    			// dummy draw command as the full screen quad is generated completely by a geometry shader
    			glDrawArrays(GL_POINTS, 0, 1);
    			this->drawCallCount++;
    		}
    		// reset mipmap level range for the depth image
    		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, 0);
    		glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, numLevels-1);
    		// reset the framebuffer configuration
    		glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, this->colorTex, 0);
    		glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, this->depthTex, 0);
    		// reenable color buffer writes, reset viewport and reenable depth test
    		glDepthFunc(GL_LEQUAL);
    		glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
    		glViewport(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT);
    	}
     
    	if ( !this->showDepthTex ) {
    		// render tree instances and apply culling
    		glUseProgram(this->cullPO);
    		glUniformSubroutinesuiv(GL_VERTEX_SHADER, 1, &this->subIndexVS[this->cullMode]);
    		glUniformSubroutinesuiv(GL_GEOMETRY_SHADER, 1, &this->subIndexGS[this->LODMode ? 1 : 0]);
     
    		glEnable(GL_RASTERIZER_DISCARD);
    		glBindVertexArray(this->cullVA);
     
    		for (int i=0; i<NUM_LOD; i++)
    			glBeginQueryIndexed(GL_PRIMITIVES_GENERATED, i, this->cullQuery[i]);
     
    		glBeginTransformFeedback(GL_POINTS);
    		for (int i=-3; i<=3; i++)
    			for (int j=-3; j<=3; j++)
    				if ( visible[i+3][j+3] ) {
    					glUniform2f(glGetUniformLocation(this->cullPO, "Offset"), TERRAIN_OBJECT_SIZE*(i-x), TERRAIN_OBJECT_SIZE*(j-z));
    					glDrawArrays(GL_POINTS, 0, this->instanceCount);
    					this->drawCallCount++;
    				}
    		glEndTransformFeedback();
     
    		for (int i=0; i<NUM_LOD; i++)
    			glEndQueryIndexed(GL_PRIMITIVES_GENERATED, i);
     
    		glDisable(GL_RASTERIZER_DISCARD);
     
    		glBindVertexArray(this->terrainVA);
    		// draw skybox
    		glUseProgram(this->skyboxPO);
    		glActiveTexture(GL_TEXTURE0);
    		glBindTexture(GL_TEXTURE_2D_ARRAY, this->skyboxTex);
    		// dummy draw command as the skybox itself is generated completely by a geometry shader
    		glDrawArrays(GL_POINTS, 0, 1);
    		this->drawCallCount++;
     
    		// draw trees
    		glUseProgram(this->treePO);
    		glActiveTexture(GL_TEXTURE0);
    		glBindTexture(GL_TEXTURE_2D_ARRAY, this->treeTex);
    		glActiveTexture(GL_TEXTURE1);
    		glBindTexture(GL_TEXTURE_2D, this->terrainTex);
     
    		// get the number of instances from the query object
    		for (int i=0; i<NUM_LOD; i++) {
    			if ( this->showLODColor ) {
    				switch ( i ) {
    				case 0: glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 1.0, 0.0, 0.0, 1.0); break;
    				case 1: glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 0.0, 1.0, 0.0, 1.0); break;
    				case 2: glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 0.0, 0.0, 1.0, 1.0); break;
    				}
    			}
    			glBindVertexArray(this->treeVA[i]);
    			glGetQueryObjectiv(this->cullQuery[i], GL_QUERY_RESULT, &this->visibleTrees[i]);
    			if ( this->visibleTrees[i] > 0 ) {
    				// draw the trees
    				glDrawElementsInstanced(treeDraw[i].prim_type, treeDraw[i].indexCount, GL_UNSIGNED_INT, (void*)(treeDraw[i].indexOffset*sizeof(uint)), this->visibleTrees[i]);
    				this->drawCallCount++;
    			}
    		}
    		if ( this->showLODColor ) {
    			glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 1.0, 1.0, 1.0, 1.0);
    		}
    	}
     
    	// bind default framebuffer and render post processing
    	glBindFramebuffer(GL_FRAMEBUFFER, 0);
    	glUseProgram(this->postPO);
     
    	// visualize depth buffer texture if needed
    	if ( this->showDepthTex ) {
    		glUseProgram(this->depthPO);
    		glUniform1f(glGetUniformLocation(this->depthPO, "LOD"), this->LOD);
    	}
     
    	glActiveTexture(GL_TEXTURE0);
    	glBindTexture(GL_TEXTURE_2D, this->colorTex);
    	glActiveTexture(GL_TEXTURE1);
    	glBindTexture(GL_TEXTURE_2D, this->depthTex);
    	glDisable(GL_DEPTH_TEST);
    	// dummy draw command as the full screen quad is generated completely by a geometry shader
    	glDrawArrays(GL_POINTS, 0, 1);
    	this->drawCallCount++;
    	glEnable(GL_DEPTH_TEST);
     
    	GLenum glError;
    	if ((glError = glGetError()) != GL_NO_ERROR) {
    		cout << "Warning: OpenGL error code: " << glError << endl;
    	}
     
    }

    Where is the rendering of first depth map?
    It must be ready before building of mipmap as I think.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •