PDA

View Full Version : Problem with glReadPixels using FBO



nimelord
01-08-2018, 11:57 AM
Hi, All!

I have a problem with glReadPixels function.
There is an empty result.


My code:



// initialization
depthMapFBO = glGenBuffers();
glBindBuffer(GL_PIXEL_PACK_BUFFER, depthMapFBO);
glBufferData(GL_PIXEL_PACK_BUFFER, display.getWidth() * display.getHeight() * 4, GL_DYNAMIC_DRAW);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
....

// using after moment when scene become rendered
glBindBuffer(GL_PIXEL_PACK_BUFFER, depthMapFBO);
ByteBuffer buffer = BufferUtils.createByteBuffer(display.getWidth() * display.getHeight() * 4);
glReadPixels(0, 0, display.getWidth(), display.getHeight(), GL_DEPTH_COMPONENT, GL_FLOAT, buffer);
ByteBuffer glMapBuffer = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_WRITE, buffer); // is not null. it seems buffer initialized ok.
makeScreenShot(display.getWidth(), display.getHeight(), glMapBuffer);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);



But if I make screenshot like this, all is ok:


ByteBuffer buffer = BufferUtils.createByteBuffer(display.getWidth() * display.getHeight() * 4);
glReadPixels(0, 0, display.getWidth(), display.getHeight(), GL_DEPTH_COMPONENT, GL_FLOAT, buffer);
makeScreenShot(display.getWidth(), display.getHeight(), buffer);



What is wrong with depthMapFBO's variant?


Thanks for answer!

GClements
01-08-2018, 04:09 PM
Whatever the problem is, it's probably in makeScreenShot(). In particular, I can't see how it would make sense to pass a function pointer (glMapBuffer) in one case and a ByteBuffer object in the other case.

Dark Photon
01-08-2018, 06:22 PM
My code:

If this is supposed to be C++, it's not valid, and it shouldn't even compile.

You should include GL prototypes and compile with a type-checking C++ compiler. That may help you clear up some of your problems.

nimelord
01-09-2018, 02:34 AM
If this is supposed to be C++, it's not valid, and it shouldn't even compile.

You should include GL prototypes and compile with a type-checking C++ compiler. That may help you clear up some of your problems.

It is not C++, it is a java library. LWJGL - thin wrapper on the native OpenGl calls.
GL function signatures is almost the same.

I already implemented scene with cascade shadow mapping for the directional light, point light shadows via 'texture cube', and spot light shadow maps. And see smallest difference in signatures between C++ and LWJGL - it is glGenBuffers signature and VBO references as int, without GLunit type.

I do not know why, but I solved all my problems by reading this forum and most of them was explaned for other OpenGl juniors.
So, I mean this forum is much more effective place for solutions for me than others.

I'm sorry for the difference between yous platform and my one.

Maybe you see other problems than the wrong syntax?
I'm sure I'm using gl calls incorrectly, but all examples I could find didn't help me find where is problem.

mhagain
01-09-2018, 06:12 AM
OK, quite a few problems with your code.

First of all, you are not actually using an FBO - you're using a PBO - a Pixel Buffer Object. That's not a code problem, it's terminology, but it's important to get it correct because if you go looking for help with FBOs you won't actually find any useful help.

Secondly, you're using the PBO incorrectly, in that you're setting up as if you wish to glReadPixels into a PBO, then you do the actual glReadPixels as if you were reading it into a system memory pointer, then you continue as if you were reading into a PBO.

The last argument of your glReadPixels call, if a buffer object is currently bound to GL_PIXEL_PACK_BUFFER, is not a system memory pointer but is instead interpreted as an offset into the buffer object's data store. So what you probably actually intend to use here is 0.

The buffer's data store may then be accessed via glMapBuffer or glGetBufferSubData.

nimelord
01-09-2018, 08:15 AM
The last argument of your glReadPixels call, if a buffer object is currently bound to GL_PIXEL_PACK_BUFFER, is not a system memory pointer but is instead interpreted as an offset into the buffer object's data store. So what you probably actually intend to use here is 0.


You are right, it is working fine, now.
It was rude mistake.

Thank you a lot!

mhagain
01-09-2018, 11:24 AM
The other thing to watch out for with glReadPixels is that if it needs to do a format conversion during the read it will kill your performance (not that glReadPixels is fast to begin with). We typically see this when people attempt to read GL_RGB or GL_BGR data, which will always require a format conversion.

With reading the depth buffer you might get better performance by first checking if you actually have a 32-bit floating point depth buffer to begin with, then adjusting the call to match the format you actually do have. If you really need it as floating point it can often be faster to convert in code yourself than it is to let GL do it for you.

With PBOs the idea is not to read to a PBO then map and access it immediately after. Instead you should wait some arbitrary amount of time - one frame is good to start with - between the read and the map. This is to give the read sufficient time to complete asynchronously before you map. If you absolutely must have the data immediately then not using a PBO at all might be faster (but watch that format conversion).

I mention these because it seems possible that you're trying to optimize this, but going down the wrong route to do so.

nimelord
01-09-2018, 12:27 PM
I try to implement 'Coverage buffer' for occlusion culling.


I need to do several simple steps (as I learned earlier):

1 - copy depth buffer from current rendered frame to texture.
2 - render given texture to another low-resolution texture for downscale resolution.
3 - get low-resolution texture to CPU side.
4 - reproject depth map from previous frame to current.
5 - rasterize result depth map with bound boxes for culling meshes.


Now I'm trying to do 1st step.
And I began to think that I'm doing wrong things.

GL_PIXEL_PACK_BUFFER - means data movement between GPU and CPU. It will be required later, not now.

it seems for 1st step I need glBlitFramebuffer.

Thank you!


PS: glBlitFramebuffer - let me skip #2.

Dark Photon
01-09-2018, 06:24 PM
1 - copy depth buffer from current rendered frame to texture.
2 - render given texture to another low-resolution texture for downscale resolution.
3 - get low-resolution texture to CPU side.
4 - reproject depth map from previous frame to current.
5 - rasterize result depth map with bound boxes for culling meshes.
...
Now I'm trying to do 1st step.
And I began to think that I'm doing wrong things.
...
it seems for 1st step I need glBlitFramebuffer.

For #1, you can bypass this by targeting your scene rendering (step #0) to a Framebuffer Object (FBO), and back the depth buffer with a depth texture. Then you've already got it in a texture.

For #2, you can use glBlitFramebuffer to do the resize if you're not particular on how the depth values are chosen. Otherwise, do a custom render as you suggested.

That said, I suspect the whole reason you're doing #2 is so that #3 is faster, right? Be sure to time your result (particularly #1-#3) to make sure you're comfortable with the cost.

If not, another option to consider (which may very well be cheaper, and will allow you to skip #3 and possibly #2) is to just keep the depth texture on the GPU and using it full-res there. That is:

0) Render scene to FBO with depth buffer backed by a depth texture
1) Reproject depth texture on GPU to generate another depth texture
2) Use depth texture for culling, etc.

I wouldn't expect perfect results using last-frame's data to render the current frame. But feel free to give it a shot.

mhagain
01-09-2018, 08:36 PM
Yeah, whichever way you do it, getting the depth buffer from the GPU to the CPU is going to involve a pipeline stall. Size of the depth buffer is not going to be as important as the stall, so downsizing it is not going to be as much a performance optimization as you might think.

Silence
01-10-2018, 12:07 AM
Since you seem to target games, and as Dark Photon mentioned it, it is now very current for games to be directly rendered in FBOs. Then use a blit to draw the image on the screen. Many of the effects or various other technics you'll want to add will get a lot of benefit from it. And since you'll render directly into an FBO you'll have direct access to the depth texture, color texture (and ie normals...). This will allow you to render to FP textures also, will be more easy in case you'll move to deferred rendering too.

So consider this.

nimelord
01-10-2018, 07:20 AM
Thank you guys for help!

I just try to repeat CryEngine solution: https://www.gamedev.net/articles/programming/graphics/coverage-buffer-as-main-occlusion-culling-technique-r4103/

Photon, your suggestion sounds good:



0) Render scene to FBO with depth buffer backed by a depth texture
1) Reproject depth texture on GPU to generate another depth texture
2) Use depth texture for culling, etc.


But how can I cull meshes on GPU side?


On CPU side it looks clear - check visible pixels while rasterizing (I'm stil not sure of that).
How can I keep relation between current mesh and its visibility on the GPU side?

Dark Photon
01-10-2018, 05:39 PM
But how can I cull meshes on GPU side?

There are several types of culling: frustum culling, occlusion culling, and backface culling. We'll ignore the latter since it's typically sub-object and the pipeline can apply that pretty efficiently.

And to facilitate this, let's just take an example. Suppose we have 1000 instances of some object we want to render at different points in our scene. And we want to frustum cull and occlusion cull them on the GPU.

So we start with a list of instances, each with its own bounding sphere (which we can pass down into the shader). For each instance (which we blast-render to the GPU in an instanced draw call), in the shader we can test its bounding sphere against the frustum planes to determine whether it's definitely outside the view frustum or not. If it is, we throw the instance away. If not, then we serialize this instance into a list (via transform feedback) to render with later (...with an indirect draw call).

If you also want to occlusion cull (sounds like you're interested in this), then you have a number of options. You can do occlusion query tests against the depth buffer using bounding primitives for each instance, and then conditional render each, though for a lot of instances that could get pretty expensive if done in the usual way. Another option is to pre-generate a MIPmap of your depth map and then in a shader you can perform a conservative occlusion test by reading 1-4 samples out of the appropriate level of your depth MIPmap which cover your object. The nice thing about that approach is you don't need to rasterize a bounding primitive for each instance, and you know immediately in the shader (after the depth texture lookups) whether you're going to kill off the instance or not. You can even combine this into the same shader that does frustum culling above and then only serialize out the list of instances that pass both 1) the frustum-cull and 2) the occlusion-cull (which you can then render with an indirect draw call).

Before you go to this trouble though...

Honestly, I'd first recommend making sure you have a fair amount frame time you can potentially reclaim with culling (of some type) before you add any of this.

First start by doing on-CPU frustum-culling. Then compare the draw time needed to render the scene without this culling applied, against the time needed to render the scene "with" this culling applied. Don't count the cost of actually doing the culling though. If you don't see much difference, don't bother with frustum-culling. If you see a fairly big difference, definitely implement per-object or per-instance frustum culling, either on the CPU and/or the GPU.

After doing this, do the same test for occlussion-culling. That is, time draw time w/o occlusion culling applied, to draw time w/ occlusion culling having been applied, and don't count the time needed to do the occlusion culling (yet). No big difference? Dump occlusion culling. Big difference? Consider implementing it in some form.

nimelord
01-11-2018, 12:51 AM
So we start with a list of instances, each with its own bounding sphere (which we can pass down into the shader). For each instance (which we blast-render to the GPU in an instanced draw call), in the shader we can test its bounding sphere against the frustum planes to determine whether it's definitely outside the view frustum or not. If it is, we throw the instance away. If not, then we serialize this instance into a list (via transform feedback) to render with later (...with an indirect draw call).

Do you mean this loop in the pipeline? (I didn't know about such possibility):
2595



Then compare the draw time needed to render the scene without this culling applied, against the time needed to render the scene "with" this culling applied.

My general target is a open massive forest with ruined buildings.
Now I have small test scene with boxes instead designed trees.
Raw solution without culling produce 30FPS for small test scene and for middle scene FPS dramatically fall.
I implemented frustum culling on the CPU side (I didn't know about pipeline loop before you previous message).
And it speeded the rendering up from 30FPS to 120FPS. (x4 - good result I think)
But it is not enough for my target. Definitely it needs to be processed by occlusion culling.


So I have to implement "Occlusion culling".
And I will research for technical details for solution you described here.
Thank you very much.



PS:
In my culling process I apply "Frustum culling" for shadow maps too.
For rendering shadow map I use filtered for camera meshes with correction for "shadow in scene" visibility.
And I intersect result set with frustum culling result of light point of view.
So I reuse scene filtered result for culling for shadow point of view.
Can I do such reusing in GPU variant?

Dark Photon
01-13-2018, 05:18 PM
Do you mean this loop in the pipeline? (I didn't know about such possibility):

No, that's the old, deprecated GL_FEEDBACK mode that was present in very early OpenGL versions.

I'm talking about GL_TRANSFORM_FEEDBACK (https://www.khronos.org/opengl/wiki/Transform_Feedback). GL_FEEDBACK is somewhat similar in concept. However, it captures all of the data in a CPU memory buffer, whereas GL_TRANSFORM_FEEDBACK lets you capture the data in a buffer object on the GPU, which is better for efficiency (you want the data on the GPU anyway, not all the way back across the bus in CPU memory). See this wiki page for details: Transform Feedback (https://www.khronos.org/opengl/wiki/Transform_Feedback) (OpenGL Wiki).

For a glimpse of where this fits within the OpenGL rendering pipeline, see the 4 or so "Transform Feedback" mentions in the middle of the OpenGL Pipeline map here: OpenGL 4.4 Pipeline Map (https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGL44PipelineMap.pdf) (thanks to Patrick Cozzi for hosting it).


Raw solution without culling produce 30FPS for small test scene and for middle scene FPS dramatically fall.
I implemented frustum culling on the CPU side (I didn't know about pipeline loop before you previous message).
And it speeded the rendering up from 30FPS to 120FPS. (x4 - good result I think)
But it is not enough for my target.

I'd recommend you compare benchmarks using frame time (e.g. milliseconds or seconds), rather than in FPS. There are lots of reasons, but this blog post sums them up pretty well: Performance (http://www.humus.name/index.php?page=Comments&ID=279) (Humus). FPS isn't very useful for profiling the individual consumers of your frame time for instance (which you need to do to optimize your frame processing). Also, up-front I'd suggest that you come up with a the maximum frame time you can spend doing everything in a frame. Once you get your worst case to comfortably fit within it, you're done.


In my culling process I apply "Frustum culling" for shadow maps too.
For rendering shadow map I use filtered for camera meshes with correction for "shadow in scene" visibility.
And I intersect result set with frustum culling result of light point of view.
So I reuse scene filtered result for culling for shadow point of view.
Can I do such reusing in GPU variant?
Sure. However, if you really are starved for performance, you don't want to end up sending much more down the pipe than you minimally have to. You should structure your scene and your rendering such that it is very, very cheap to cull away geometry which isn't in the frustum you're rendering. I'd recommend course-grain culling on the CPU and then if you need it more fine-grained culling on the GPU via a transform feedback method. For instance, there's no sense in having the GPU cull out geometry per primitive or per object instance when a whole object (or group of objects) is not even close to within the view frustum ... assuming you can cull that away quickly.

nimelord
01-16-2018, 03:25 AM
I'd recommend you compare benchmarks using frame time (e.g. milliseconds or seconds), rather than in FPS.... Once you get your worst case to comfortably fit within it, you're done.

Good advice.

For small scene: 8.35 ms vs 33.74 ms: the same result, I mean - x4 :)


Thank you, a lot!

nimelord
01-19-2018, 06:23 AM
Another option is to pre-generate a MIPmap of your depth map and then in a shader you can perform a conservative occlusion test by reading 1-4 samples out of the appropriate level of your depth MIPmap which cover your object. The nice thing about that approach is you don't need to rasterize a bounding primitive for each instance, and you know immediately in the shader (after the depth texture lookups) whether you're going to kill off the instance or not.


Seems I found good document with description of technique that you meant.

I just leave it here for other people who are searching the solution: http://rastergrid.com/blog/2010/10/hierarchical-z-map-based-occlusion-culling/

nimelord
01-22-2018, 04:45 AM
Try to reproject depth map from previous camera position to current camera position.


There are steps I should do for that:
1) In the end of render cycle I save 'projection view matrix' and depth buffer to the texture.
2) On the start of next render cycle I restore world positions with inverted 'prejection view matrix' from previous scene frame and project it for current camera position
3) use result data for something....

How can I do reprojection on GPU side?
I mean I don't have positions for vertex shader and I just have depth texture and two 'prejection view matrixes' for prev and current frames.


Thanks for answer.

Dark Photon
01-22-2018, 05:26 AM
There's no need to start a new thread here as this is just a continuation of the same topic.

Realizing that this technique is going to leave you with artifacts due to using one-frame-late occlusion data...

The first of these two URLs on "Coverage Buffer Occlusion Culling" describes one way to handle this (see reconstructPos):


Coverage Buffer as Main Occlusion Culling Technique (https://www.gamedev.net/articles/programming/graphics/coverage-buffer-as-main-occlusion-culling-technique-r4103/) (Gamedev.net)
GPU Driven Occlusion Culling in Life is Feudal (https://bazhenovc.github.io/blog/post/gpu-driven-occlusion-culling-slides-lif/) (bazhenovc)

nimelord
01-23-2018, 07:02 AM
There's no need to start a new thread here as this is just a continuation of the same topic.

Ok.

I try to understand this one: http://rastergrid.com/blog/2010/10/hierarchical-z-map-based-occlusion-culling/

Full source code here: http://rastergrid.com/blog/downloads/mountains-demo/




void MountainsDemo::renderScene(float dtime) {

this->drawCallCount = 0;

// update camera data to uniform buffer
this->transform.ModelViewMatrix = mat4(1.0f);
this->transform.ModelViewMatrix = rotate(this->transform.ModelViewMatrix, this->camera.rotation.x, vec3(1.0f, 0.0f, 0.0f));
this->transform.ModelViewMatrix = rotate(this->transform.ModelViewMatrix, this->camera.rotation.y, vec3(0.0f, 1.0f, 0.0f));
this->transform.ModelViewMatrix = rotate(this->transform.ModelViewMatrix, this->camera.rotation.z, vec3(0.0f, 0.0f, 1.0f));
this->transform.ModelViewMatrix = translate(this->transform.ModelViewMatrix, -this->camera.position);
this->transform.MVPMatrix = this->transform.ProjectionMatrix * this->transform.ModelViewMatrix;
glBindBuffer(GL_UNIFORM_BUFFER, this->transformUB);
glBufferSubData(GL_UNIFORM_BUFFER, 0, sizeof(this->transform), &this->transform);

// bind offscreen framebuffer
glBindFramebuffer(GL_FRAMEBUFFER, this->framebuffer);
glClear(GL_DEPTH_BUFFER_BIT);

// draw terrain
glUseProgram(this->terrainPO);

glBindVertexArray(this->terrainVA);

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, this->heightmap);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, this->terrainTex);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, this->detailTex);

bool visible[7][7] = { false };
this->visibleBlocks = 0;
// terrain elements will be drawn only in a 7x7 grid around the camera
float x = roundf(-this->camera.position.x / TERRAIN_OBJECT_SIZE);
float z = roundf(-this->camera.position.z / TERRAIN_OBJECT_SIZE);
for (int i=-3; i<=3; i++)
for (int j=-3; j<=3; j++)
// perform view frustum culling for the terrain elements
if ( cullTerrain( vec4( TERRAIN_OBJECT_SIZE*(i-x), 0.f, TERRAIN_OBJECT_SIZE*(j-z), 1.f ) ) ) {
glUniform2f(glGetUniformLocation(this->terrainPO, "Offset"), TERRAIN_OBJECT_SIZE*(i-x), TERRAIN_OBJECT_SIZE*(j-z));
glDrawElements(terrainDraw.prim_type, terrainDraw.indexCount, GL_UNSIGNED_INT, (void*)terrainDraw.indexOffset);
this->drawCallCount++;
// store visibility so we can use it during the tree instance rendering
visible[i+3][j+3] = true;
this->visibleBlocks++;
}

// create Hi-Z map if necessary
if ( this->cullMode == HI_Z_OCCLUSION_CULL ) {
glUseProgram(this->hizPO);
// disable color buffer as we will render only a depth image
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, this->depthTex);
// we have to disable depth testing but allow depth writes
glDepthFunc(GL_ALWAYS);
// calculate the number of mipmap levels for NPOT texture
int numLevels = 1 + (int)floorf(log2f(fmaxf(SCREEN_WIDTH, SCREEN_HEIGHT)));
int currentWidth = SCREEN_WIDTH;
int currentHeight = SCREEN_HEIGHT;
for (int i=1; i<numLevels; i++) {
glUniform2i(glGetUniformLocation(this->hizPO, "LastMipSize"), currentWidth, currentHeight);
// calculate next viewport size
currentWidth /= 2;
currentHeight /= 2;
// ensure that the viewport size is always at least 1x1
currentWidth = currentWidth > 0 ? currentWidth : 1;
currentHeight = currentHeight > 0 ? currentHeight : 1;
glViewport(0, 0, currentWidth, currentHeight);
// bind next level for rendering but first restrict fetches only to previous level
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, i-1);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, i-1);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, this->depthTex, i);
// dummy draw command as the full screen quad is generated completely by a geometry shader
glDrawArrays(GL_POINTS, 0, 1);
this->drawCallCount++;
}
// reset mipmap level range for the depth image
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, numLevels-1);
// reset the framebuffer configuration
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, this->colorTex, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, this->depthTex, 0);
// reenable color buffer writes, reset viewport and reenable depth test
glDepthFunc(GL_LEQUAL);
glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
glViewport(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT);
}

if ( !this->showDepthTex ) {
// render tree instances and apply culling
glUseProgram(this->cullPO);
glUniformSubroutinesuiv(GL_VERTEX_SHADER, 1, &this->subIndexVS[this->cullMode]);
glUniformSubroutinesuiv(GL_GEOMETRY_SHADER, 1, &this->subIndexGS[this->LODMode ? 1 : 0]);

glEnable(GL_RASTERIZER_DISCARD);
glBindVertexArray(this->cullVA);

for (int i=0; i<NUM_LOD; i++)
glBeginQueryIndexed(GL_PRIMITIVES_GENERATED, i, this->cullQuery[i]);

glBeginTransformFeedback(GL_POINTS);
for (int i=-3; i<=3; i++)
for (int j=-3; j<=3; j++)
if ( visible[i+3][j+3] ) {
glUniform2f(glGetUniformLocation(this->cullPO, "Offset"), TERRAIN_OBJECT_SIZE*(i-x), TERRAIN_OBJECT_SIZE*(j-z));
glDrawArrays(GL_POINTS, 0, this->instanceCount);
this->drawCallCount++;
}
glEndTransformFeedback();

for (int i=0; i<NUM_LOD; i++)
glEndQueryIndexed(GL_PRIMITIVES_GENERATED, i);

glDisable(GL_RASTERIZER_DISCARD);

glBindVertexArray(this->terrainVA);
// draw skybox
glUseProgram(this->skyboxPO);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D_ARRAY, this->skyboxTex);
// dummy draw command as the skybox itself is generated completely by a geometry shader
glDrawArrays(GL_POINTS, 0, 1);
this->drawCallCount++;

// draw trees
glUseProgram(this->treePO);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D_ARRAY, this->treeTex);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, this->terrainTex);

// get the number of instances from the query object
for (int i=0; i<NUM_LOD; i++) {
if ( this->showLODColor ) {
switch ( i ) {
case 0: glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 1.0, 0.0, 0.0, 1.0); break;
case 1: glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 0.0, 1.0, 0.0, 1.0); break;
case 2: glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 0.0, 0.0, 1.0, 1.0); break;
}
}
glBindVertexArray(this->treeVA[i]);
glGetQueryObjectiv(this->cullQuery[i], GL_QUERY_RESULT, &this->visibleTrees[i]);
if ( this->visibleTrees[i] > 0 ) {
// draw the trees
glDrawElementsInstanced(treeDraw[i].prim_type, treeDraw[i].indexCount, GL_UNSIGNED_INT, (void*)(treeDraw[i].indexOffset*sizeof(uint)), this->visibleTrees[i]);
this->drawCallCount++;
}
}
if ( this->showLODColor ) {
glUniform4f(glGetUniformLocation(this->treePO, "ColorMask"), 1.0, 1.0, 1.0, 1.0);
}
}

// bind default framebuffer and render post processing
glBindFramebuffer(GL_FRAMEBUFFER, 0);
glUseProgram(this->postPO);

// visualize depth buffer texture if needed
if ( this->showDepthTex ) {
glUseProgram(this->depthPO);
glUniform1f(glGetUniformLocation(this->depthPO, "LOD"), this->LOD);
}

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, this->colorTex);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, this->depthTex);
glDisable(GL_DEPTH_TEST);
// dummy draw command as the full screen quad is generated completely by a geometry shader
glDrawArrays(GL_POINTS, 0, 1);
this->drawCallCount++;
glEnable(GL_DEPTH_TEST);

GLenum glError;
if ((glError = glGetError()) != GL_NO_ERROR) {
cout << "Warning: OpenGL error code: " << glError << endl;
}

}


Where is the rendering of first depth map?
It must be ready before building of mipmap as I think.

Dark Photon
01-24-2018, 07:29 PM
See MountainsDemo::renderScene() in mountains.cpp.

In particular, note that the FBO that's bound at the top of this function has by default slice 0 of this->depthTex set as the depth buffer (see glFramebufferTexture2D() calls in setupFramebuffer()). That base depth buffer is rendered under "// draw terrain". Then below that, the hierarchical Z map is created in render passes under "// create Hi-Z map if necessary" by iteratively reading from level i-1 of this->depthTex and writing to level i, where i = 1..numLevels-1.

nimelord
01-25-2018, 07:11 AM
See MountainsDemo::renderScene() in mountains.cpp.

In particular, note that the FBO that's bound at the top of this function has by default slice 0 of this->depthTex set as the depth buffer (see glFramebufferTexture2D() calls in setupFramebuffer()). That base depth buffer is rendered under "// draw terrain". Then below that, the hierarchical Z map is created in render passes under "// create Hi-Z map if necessary" by iteratively reading from level i-1 of this->depthTex and writing to level i, where i = 1..numLevels-1.

Ok.

It looks as that implementation render to the depth texture only with the terrain.
Thus culling is processing only using terrain, am I right?

So If I have huge thick forest without mountains. I will have too expensive cost of depth map rendering. (It requires "occlusion culling" using the trees).

And I think I should:
1) use reprojected depth map from previous frame. (it one difference from "mountains example")
2) build depth mipmap. (as in "mountains example")
3) cull with "Transform Feedback" (as in "mountains example")
4) have profit :)


It seems, that is what you tried to explane me in https://www.opengl.org/discussion_boards/showthread.php/200335-Problem-with-glReadPixels-using-FBO?p=1290032&viewfull=1#post1290032

Thank you!

Dark Photon
01-25-2018, 05:58 PM
It looks as that implementation render to the depth texture only with the terrain.
Thus culling is processing only using terrain, am I right?

In the quick look I did yesterday, that's what it looked like to me.

nimelord
02-01-2018, 01:55 PM
I have result of reprojection:

2656

Top view is an original depth map.
Bottom one is a reprojected map rotated around Y axis for 0.2 degrees (so small rotation).
The left front looks a little bit darker (closer).
But right front too much brighter (looks strange, I mean: behind skybox).

It is correct result for reprojection or I'm doing something wrong?

nimelord
02-03-2018, 01:53 PM
:doh:Of course it was implemented incorrectly.
This is correct variant, I hope :)

reproject shader:


#version 330

in vec2 outTexCoord;
uniform sampler2DArray depthTexture;
uniform mat4 prevFrameInvertedProjViewMatrix;
uniform mat4 currFrameProjViewMatrix;

void main() {

float depth = texture(depthTexture, vec3(outTexCoord, 0)).r;
vec2 cspos = vec2(outTexCoord.x * 2 - 1, (1-outTexCoord.y) * 2 - 1);
vec4 depthCoord = vec4(cspos, depth, 1.0);
depthCoord = prevFrameInvertedProjViewMatrix * depthCoord;
vec4 position = vec4(depthCoord.xyz / depthCoord.w, 1.0);
vec4 projPosition = currFrameProjViewMatrix * position;
gl_FragDepth = projPosition.z / projPosition.w;
}


It shows us the holes of the parts which didn't be seen on previous frame.

result:
2657