Heat map visualization shader help please

I don’t have a lot of shader programming experience, so I’m trying to do something that will expand my horizons.

Suppose you had a list of events at various world positions, with an influence radius, and suppose you had a 3d model of an environment. I’m trying to figure out how to best take this world event data and render it in 3d in a form that illustrates the usual heat map functionality, such as a radial falloff of influence around the events, and an accumulation of stacked influences for overlapping events, that ultimately result in a cool to warm color mapping based on the weight range.

I’m not sure if I’m totally trying the wrong stuff, but my current attempt at getting something working is by passing an array of event world positions into the shader that I render the world with and then I’m trying to figure out in my shader how to calculate the accurate world position of the fragment, so that I can basically calculate the accumulation of event ‘weights’ that should effect that particular pixel from all the events. I’m thinking I would need to write those accumulated results into a floating point FBO buffer and then render the scene again and mapping the accumulated values in that buffer to the cold to hot color gradiant that I want to see it as.

Anyways, I’m spinning my wheels at this first step. I don’t really know how to calculate the world position of a fragment with which to use with the world position of the events passed in through a uniform array. I’m trying a dead simple test setup in order to see this working, by having an event in the middle of my map, and the color should blend from green to red for pixels within 500.0 world distance from the event, which should result in me seeing a color gradient 500 world units around my event position.

Anyone know how to calculate the world position of a fragment in the shader? My mesh doesn’t have UVs, as it is a simple colorized debugging model of the navigation mesh of a game level.

Any help appreciated, both on my current attempt and if there is a better or alternative way to do this.

Ok, so if I gather correctly, you want to take a 3D scalar intensity field generated by a set of 3D point emitters, sample it at the surfaces of the objects in your environment, and render that as a color value on the geometry, with the color value chosen from a 1D heat map gradient.

This sounds very similar to standard realtime rendering using point light sources (except for the heat map), so google that and you’ll get tons of hits with example code (fragment lighting, vertex lighting, etc.). For instance, one of many here.

Lots of ways to fry that fish, but passing in an array of point data for the emitters when rendering the scene is probably reasonable if you want the shader to dynamically sample/compute the field value at sample points every redraw and the number of point emitters isn’t huge. Of course, if you’re doing an interactive/realtime vis of this (e.g. with user trackball to reorient/zoom the model) and computing the value of the scalar field is expensive or otherwise impractical for realtime, then it’s probably better to precompute and possibly even presample the field before rendering …but for now, we’ll assume that computing the value of this scalar field is “cheap” and can be done at render time on the GPU, with either forward or deferred shading.

…and then I’m trying to figure out in my shader how to calculate the accurate world position of the fragment, so that I can basically calculate the accumulation of event ‘weights’ that should effect that particular pixel from all the events. …Anyways, I’m spinning my wheels at this first step.

Ok. Couple things that might be useful to you here:

First, computing a WORLD-space position in the shader. If you’re using a standard forward shading approach, where you’re computing the color/radiance of your fragments as you are rasterizing the original polygonal geometry for scene objects that you want colored (or lit), then you don’t need to compute the WORLD-space position of the fragment from complete scratch. You typically pass in an OBJECT-space position to your vertex shader, and if you also pass in a MODELING transform as a uniform, you can use that to take those positions to WORLD-space very simply (or more conventionally, pass in a MODELVIEW transform and take those positions to EYE-space). If you want this in the fragment shader, just pass WORLD (or EYE) space positions down to it via an interpolator. See the link I posted above for one example of this (search for ecPos).

On the other hand, if you were applying this to your scene in a deferred shading/rendering style, where the geometry is pre-rasterized into multiple screen-size buffers and long since forgotten, then you’d need to recompute the full vec3 fragment position since you likely only have a depth value for the fragment saved off. For that, you could use something like this. No need if you’re using forward shading though.

On choosing forward or deferred: Where you might want to use a deferred rendering style approach is when you have “a lot” of point light sources (hundreds, thousands) potentially influencing your 3D scalar (or vector) field, and you want to avoid the complete waste of considering them all at every fragment rendered. Basically, it lets you take advantage of spatial locality fairly easily to avoid lots of wasted compute cycles for this case. There are other optimization strategies for the “lots of point sources” case as well (clustered deferred/forward, etc.)

(Also, regarding WORLD-space positions: you typically don’t work with WORLD-space in your shader as that puts a small limit on how big your “world” can be, due to limited float precision; for this reason, often EYE-space is used for these computations instead, but if your world is small you don’t care).

Second, there’s the question of at what “rate” you sample your 3D scalar field. You could compute it at each vertex of your geometry and interpolate the results across polygons. Or you could compute it at each fragment (typically per-pixel, but could be per sub-pixel sample if supersampling is enabled).

For the former, in the vertex shader you could just take the OBJECT-space positions, map them to WORLD (or EYE) space, sample or compute your scalar field value, map that through a 1D lookup texture to give you the heat map color, and then store that in a vec3 interpolator to be passed to the fragment shader (the GPU would automatically interpolate the color value across triangles for you). The fragment shader would then just use that interpolated color value as its output color value. For the latter, your vertex shader would be very simple: You’d just compute the WORLD (or EYE) space position and store that in an interpolator for passing to the fragment shader. Then in the fragment shader, you basically take that and do everything else described above.

I’m thinking I would need to write those accumulated results into a floating point FBO buffer and then render the scene again and mapping the accumulated values in that buffer to the cold to hot color gradiant that I want to see it as.

Not necessarily. If the cost of computing values for the 3D scalar field is relatively cheap, then you could do this computation and accumulation in registers within each fragment shader (or vertex shader) execution and not need an intermediary buffer. GPUs have a ton of power nowadays – you can get away with a lot here (that is, it may not look cheap but it could be “cheap enough”).

If the cost of computing values for the 3D scalar field was expensive purely because you have a lot of point sources and forward shading is too slow for this case, a deferred rendering technique might make realtime computation at the surfaces of your visible objects possible, if that was desirable. Here’s where you’d have FBO intermediaries which you can use to accumulate the total value of the field at the surfaces of your objects.

If however computing the values of the field was too expensive for realtime eval in any form, there are a number of ways you could precompute and potentially presample your 3D scalar field prior to rendering so that rendering is really realtime. For instance, you could precompute/presample the 3D scalar field to some resolution (on the CPU or GPU) and store it in a 3D texture (or 2D texture, for 2.5D scenarios), which would then be sampled and interpolated dynamically on the GPU during realtime rendering. Or, you could precompute/presample the field at the vertices of your geometry, store those values off on your geometry, and then rendering is almost mindless because all the work has been done. There are other options too, if sampling/computing your field values is too expensive for realtime.

My mesh doesn’t have UVs, as it is a simple colorized debugging model of the navigation mesh of a game level.

Hmm. Sounds like a physics-based environment. :slight_smile:

Let me know if I didn’t completely answer your question.

Wow that’s a lot of information. I appreciate your taking the time to respond in such detail. Unfortunately my familiarity with all the terminology of graphics rendering is not all that advanced. I’m an A.I. programmer by trade, which is why I’m trying to set up visualization on a navigation mesh built for an A.I. bot.

Couple things. First, computing a WORLD-space position in the shader. If you’re using a standard forward shading approach, where you’re computing the color/radiance of your fragments as you are rasterizing the original polygonal geometry for scene objects that you want colored (or lit), then you don’t need to compute the WORLD-space position of the fragment from complete scratch. You typically pass in an OBJECT-space position to your vertex shader, and if you pass in a MODELING transform as a uniform, that’ll take those positions to WORLD-space very simply (or more conventionally, pass in a MODELVIEW transform and take those positions to EYE-space). If you want this in the fragment shader, just pass WORLD (or EYE) space positions down to it via an interpolator.

I’m sorry, but I’m still newbie enough to the graphics programming lingo that I don’t fully understand the big picture here. Specifically I’m still fuzzy on the various spaces that the different shader stage operates on. From what I’ve learned(and hopefully have a correct understanding), the vertex shader will calculate vertex positions and will interpolate the position value for the shader stage of that geometry. What space are those vertex positions in within the shader, model space? Are you saying that I could calculate the world space in the vertex shader into a varying variable, which I think will interpolate it such that the fragment shader would get the world position based on the interpolation? And are you saying that the depth based calculations like you linked to are for situations where you are beyond that point in the rendering pipeline? I’ve spent a day or so tinkering with some of those functions and haven’t been able to get them working, probably from my lack of understanding. I’ve been passing in a depth of


float depth = gl_FragCoord.z / gl_FragCoord.w;

Most of the uses appear to pull from the depth buffer, which I think I understand as being a deferred rendering implementation since it needs to reconstruct the position from the depth buffer and pixel only. It sounds like you are saying that isn’t necessary for my desired use case, as I would be doing the calculations directly in the fragment shader of the geometry I’m colorizing. Do I have that right?

I don’t think calculating the data at the vertex level is sufficiently detailed. With the mesh being such low poly for navigational purposes, the influence of the events may be only a small radius inside a larger polygon. I would like to be able to visualize the fine grain event distribution within a large room that may only be represented as a rectangle via 2 triangles.

Not to complicate things before I have the basics down, but the reason I mention a floating point FBO is that I also have a longer term goal of wanting the application to normalize the height map automatically. For example, there could be hundreds of events that overlap a small area, and they must be allowed to accumulate arbitrarily to a large value. The hope would then be to somehow be able to look at the maximum and minimum values somehow and then the fragment shader would use that range in order to perform its colorization. This is a stretch goal, and at first the range will probably be user defined via sliders or a simple GUI or something. Someone said in another forum this might be possible by rendering the entire map to a buffer, and then recursively rendering that frame buffer at half size with a shader that writes as a pixel value, the min/max weight of the 4 pixels around it, effectively propagating the min/max weighting up to a 1x1 texture that I can do a getpixel or whatever in order to get the weight range automatically from the rendered scene. It sounds rather complex and not something I want to worry about just yet.

Just to give more context, I’m trying to get my feet wet in some graphics and shader stuff in order to make a heat map visualizer for game analytics. Sorta like you might have seen pretty often in a game context or a variety of other contexts.

Here is an example of a map. It will generally be very low poly, probably only a few thousand triangles in most cases. In this case the entire level is less than 800 triangles, though it is one of the smaller ones. If you are familiar with team fortress games, this is a navigation mesh from 2fort.

I’m not sure if this is a reasonable expectation, especially since I don’t really see any examples of people doing heatmaps in 3d realtime, but there will be ‘events’ in the thousands, maybe even tens of thousands for long game matches for events such as shots being fired and such. I think the majority of cases the data will be visualized from a gods eye view where most of the data will be in play at a given time. Ideally I would like the viewer to be able to visualize the events in real time, essentially as they come in, such as if the viewer is connected to the game via network. If that is too infeasible the alternative is to take a file dump of the data set and load, preprocess, and then be able to fly around it to look at the information. I show the side view of the mesh to show that there is sufficient enough overlap in the map geometry that a 2d heatmap is far less useful than a 3d one, though far easier to implement.

From the research I have done and the people I have talked to, some of the implementation possibilities tread on deferred rendering and/or treating the events as lights somehow, but as there may be thousands of them, I get concerned whether those approaches are viable, since they may all be visible most of the time just by the nature of the visualization. When I learned about passing data into the shaders in the form of uniform vectors, I immediately thought that it would be easy to pass the relevant event data into the shader that way in the form of something like this.

uniform int		eventCount;
uniform vec4	events[32]; // this would get much bigger at some point obviously
uniform float       eventWeight[32]; // the strength of the event, additively added to other events of similar type to accumulate arbitrarily
uniform float       eventRadius[32]; // the world distance radius of the event, through which the weight reduces to 0

Then, knowing that world space information in the shader, I was hoping that my fragment shader could basically do something like this pseudocode


float fragmentWeightAccum = 0.0;
for( i = 0; i < eventCount; i++ )
{
fragmentWeightAccum += CalculateEventEffectOnWorldPosition( fragmentWorldPos, i );
}

gl_FragColor = MapWeightingToColorGradient( fragmentWeightAccum ); // probably by mapping it to a user defined min/max weighting, maybe eventually a rendering trick could provide back the max weighting from all the event blending it performs on the GPU.

I think this is what you mean by calculating the values in the registers of the shader, and not requiring an FBO.

I’m not sure how scalable this would be in terms of event count performance falloff, but it seemed simple enough to try at least. In my day or two of trying to figure out how to get the world fragment position though I’ve been mostly confused by the various ‘spaces’ discussed in the various threads I’ve come across.

Again I appreciate your time and wisdom. I intend to share this publicly as an analytic viewer when it reaches a usable point.

grr, it won’t let me post an image URL of my test map.

Hey, no problem. We’ve all been there.

Here’s an overview. In particular, see the top diagram here:

OBJECT coordinates are typically what you feed the GPU (aka model coordinates). MODELVIEW is a product of two transforms: the MODELING transform, which takes OBJECT-space to WORLD-space, and VIEWING which takes WORLD-space to EYE-space.

The OpenGL Programming Guide has a good chapter named “Viewing” IIRC which describes the transforms if you want more detail. If you have a specific question, feel free to post.

From what I’ve learned(and hopefully have a correct understanding), the vertex shader will calculate vertex positions and will interpolate the position value for the shader stage of that geometry. What space are those vertex positions in within the shader, model space?

The pipeline is flexible, so there lots of other options here, but most frequently you feed vertex positions into your vertex shader through a vertex attribute populated outside of the GPU program on the CPU. You can put these input positions in whatever space you want (since you’re writing the vertex shader), but most commonly these are in the OBJECT-space of the model. Via a vertex shader output, the GPU needs to be provided these positions in CLIP-space (see diagram above), so all that’s strictly required is you transform these input position to clip space via the MODELINGVIEWINGPROJECTION matrix aka ModelViewProj.

If you instead wanted to feed world-space positions into your vertex shader you could. In this case, your MODELING transform would just be the identity.

Now, for whatever shader you use to compute point event influences, you’re probably going to want these positions in WORLD or EYE space. And to get that, you just multiply your input OBJECT-space positions by the MODELING transform or MODELVIEW transform, respectively.

Are you saying that I could calculate the world space in the vertex shader into a varying variable, which I think will interpolate it such that the fragment shader would get the world position based on the interpolation? And are you saying that the depth based calculations like you linked to are for situations where you are beyond that point in the rendering pipeline?

Yes, and yes. You’ve got it.

…Most of the uses appear to pull from the depth buffer, which I think I understand as being a deferred rendering implementation since it needs to reconstruct the position from the depth buffer and pixel only. It sounds like you are saying that isn’t necessary for my desired use case, as I would be doing the calculations directly in the fragment shader of the geometry I’m colorizing. Do I have that right?

If evaluating your 3D scalar field is sufficiently cheap, yes, exactly.

I don’t think calculating the data at the vertex level is sufficiently detailed. With the mesh being such low poly for navigational purposes, the influence of the events may be only a small radius inside a larger polygon. I would like to be able to visualize the fine grain event distribution within a large room that may only be represented as a rectangle via 2 triangles.

Gotcha. So sampling at the vertices and interpolating is out.

Not to complicate things before I have the basics down, but the reason I mention a floating point FBO is that I also have a longer term goal of wanting the application to normalize the height map automatically. For example, there could be hundreds of events that overlap a small area, and they must be allowed to accumulate arbitrarily to a large value. The hope would then be to somehow be able to look at the maximum and minimum values somehow and then the fragment shader would use that range in order to perform its colorization. …there will be ‘events’ in the thousands, maybe even tens of thousands for long game matches for events such as shots being fired and such.

I see. That makes sense. Also, this mention of “hundreds” to “tens of thousands” of point influences really casts doubt that computing the scalar field computation directly in the shader while rasterizing the mesh is going to be fast enough at the fragment level.

I think the majority of cases the data will be visualized from a gods eye view where most of the data will be in play at a given time. Ideally I would like the viewer to be able to visualize the events in real time, essentially as they come in, such as if the viewer is connected to the game via network. If that is too infeasible the alternative is to take a file dump of the data set and load, preprocess, and then be able to fly around it to look at the information. I show the side view of the mesh to show that there is sufficient enough overlap in the map geometry that a 2d heatmap is far less useful than a 3d one, though far easier to implement.

Just to clarify, is the heatmap you want to render only for values sampled on the 2D mesh (which itself is probably 2.5D). Or do you actually want to render a volumetric field?

From the research I have done and the people I have talked to, some of the implementation possibilities tread on deferred rendering and/or treating the events as lights somehow, but as there may be thousands of them, I get concerned whether those approaches are viable, since they may all be visible most of the time just by the nature of the visualization.

It just depends on your needs (more on this below). Thing is, when you get into the hundreds or thousands of influences, you don’t want to be computing the influence of each of these for every single pixel on the screen if you don’t have to (when you’re aiming for realtime or at least interactive performance that is). Frequently if you have this many on the screen, the area of influence of most items is relatively small. Deferred just takes advantage of that to speed things up.

When I learned about passing data into the shaders in the form of uniform vectors, I immediately thought that it would be easy to pass the relevant event data into the shader that way in the form of something like this.

uniform int        eventCount;
uniform vec4    events[32]; // this would get much bigger at some point obviously
uniform float       eventWeight[32]; // the strength of the event, additively added to other events of similar type to accumulate arbitrarily
uniform float       eventRadius[32]; // the world distance radius of the event, through which the weight reduces to 0

That’s sure what I’d start with, just to get some first renderings up. The issue you run into is there’s a limit on the amount of uniform space you can pass into a shader, so at some point you end up needing to shift your tech approach a bit.

Then, knowing that world space information in the shader, I was hoping that my fragment shader could basically do something like this pseudocode

float fragmentWeightAccum = 0.0;
for( i = 0; i < eventCount; i++ )
{
fragmentWeightAccum += CalculateEventEffectOnWorldPosition( fragmentWorldPos, i );
}

gl_FragColor = MapWeightingToColorGradient( fragmentWeightAccum ); // probably by mapping it to a user defined min/max weighting, maybe eventually a rendering trick could provide back the max weighting from all the event blending it performs on the GPU.

I think this is what you mean by calculating the values in the registers of the shader, and not requiring an FBO.

Yes, and that’s simplest to start with (I would). But with your goal of thousands to tens of thousands, you’ll probably have to shift approaches as you scale this up.

Yeah, sorry. New users can’t post images for the first few posts – helps thwart those annoying forum spammers that create dummy accounts to post their junk.

Just post the URL without the leading http://, possibly with a few spaces in it to get it posted, and I’ll tweak it to show.

Here the images i wanted to post earlier just to get an idea of my test area.

So hey, armed with a bit better understanding from your first post I managed to get something working rather quickly. I think I got hung up on search terms yesterday that I was chasing the wrong type of implementation. Thanks again for putting me back on track.

Do you know what the max uniform limit is? Will this method scale up to thousands or tens of thousands of events? If not, would a reasonable performant alternative be to ‘encode’ the events into a large floating point RGBA texture? Unless I reduce the data I get, it may mean using 2 pixel values per event, as I am trying to have per event x,y,z,radius,weightmin,weightmax. Maybe eventually put some sort of type identifier as a filtering mechanism. That’s less ideal of a use for a float value but I guess it could work.

With 32 events.
http://i39.tinypic.com/coq52.jpg

You can manipulate the weight max in the parameter list to get a real time colorize adjustment. Pretty useful for being able to visualize subtle areas of weighting that the large accumulations end up washing out. Would still be nice to be able to algorithmicly figure out the min/max value within which to clamp the manual adjustment or let it auto adjust.

I need to figure out how to get some basic white light shading in there so everything doesn’t look so uniform and flat. Know offhand a simple shading adjustment I can add to the shaders in order to essentially hard code a top down directional white light so I can tell depth and layers apart? Thanks.

Here’s the shaders as is

Vertex

uniform mat4 worldMat;

varying vec3 worldPosition; 

void main()
{
	worldPosition	= worldMat * gl_Vertex;
	gl_Position		= gl_ModelViewProjectionMatrix * gl_Vertex;
}

// Fragment

varying vec3 worldPosition; 

const int MaxEvents = 32;
uniform int		eventCount;

uniform vec3	events[MaxEvents];
uniform float	eventRadius[MaxEvents];
uniform vec2	eventWeightRange[MaxEvents];

// color normalization
uniform float	weightMax;

void main()
{
	const float eventHeightMax = 16.0;

	float weightAccum = 0.0;
	for( int i = 0; i < eventCount; i++ ) 
	{
		// reject height differences so we dont project through much height variation as we do horizontally
		if ( abs( events[ i ].z - worldPosition.z ) < eventHeightMax )
		{
			// Compute distance between surface and event position 
			float dist = distance( events[ i ], worldPosition );
			if ( dist <= eventRadius[ i ] )
			{
				float distRatio = clamp( dist / eventRadius[ i ], 0.0, 1.0 );		
				weightAccum += mix( eventWeightRange[ i ].y, eventWeightRange[ i ].x, distRatio );
			}
		}
	}

	// the w component is the weight ratio, rather than alpha
	vec4 heatgradient[5];
	heatgradient[ 0 ] = vec4( 1.0, 0.0, 0.0, 1.0 ); // red
	heatgradient[ 1 ] = vec4( 1.0, 1.0, 0.0, 0.75 ); // yellow
	heatgradient[ 2 ] = vec4( 0.0, 1.0, 0.0, 0.50 ); // green
	heatgradient[ 3 ] = vec4( 0.0, 0.0, 1.0, 0.25 ); // blue
	heatgradient[ 4 ] = vec4( 1.0, 0.0, 1.0, 0.00 ); // magenta
	
	if ( weightAccum > 0.0 )
	{
		float weightRatio = clamp( weightAccum / weightMax, 0.0, 1.0 );
		
		vec4 col = vec4( 1 );
		for ( int i = 1; i < 5; ++i )
		{
			if ( weightRatio <= heatgradient[ i-1 ].w && weightRatio >= heatgradient[ i ].w )
			{
				float t = ( weightRatio - heatgradient[ i-1 ].w ) / ( heatgradient[ i ].w - heatgradient[ i-1 ].w );
				col = mix( heatgradient[ i-1 ], heatgradient[ i ], t );
			}
			//col = mix( col, heatgradient[ i ], smoothstep( col.w, heatgradient[ i ].w, weightRatio ));
		}
		
		gl_FragColor = col;
		gl_FragColor.a = 1.0;
	}
	else
	{
		gl_FragColor = vec4( 1.0 );
	}
	
}

Just to clarify, is the heatmap you want to render only for values sampled on the 2D mesh (which itself is probably 2.5D). Or do you actually want to render a volumetric field?

No volumetric field, basically shading the floor polygons that players/AI walk on.

Yes, and that’s simplest to start with (I would). But with your goal of thousands to tens of thousands, you’ll probably have to shift approaches as you scale this up.

Is the more scaleable approach deferred rendering or is there other alternatives? I am guessing that the costly part of doing it this way, assuming I could get the larger data sets into the shader somehow, like storing them in a big float texture or something, is that each pixel of the rendered object will be looping sequentially through a potentially big data set in order to accumulate its weight information. Even though that data is cache friendly in how it is being searched, it is still touching a lot of data.

Perhaps I could set up some form of grid partitioning where the world space of the pixel could index into a texture and somehow get a far reduced set of data to go through.

Maybe a 2d texture or coarse 3d that is effectively treated as an occupancy grid that is mapped to a texture sized to the dimensions of the world and all the events rendered into the event as a simple black or white, such that the heatmap shader can early out of doing any search at all if it is a pixel that has no event overlap.

Or perhaps there is some sort of trick where I can render simple quads that represent the events in a way that reduces the expensive fragment work to only areas where events exist, and there isn’t a bunch of pixels that end up with no influence needing to run through a bunch of events only to end up with nothing.

If varies by card, but on a relatively recent GPU (GTX580), the max amount of ordinary uniform space for a frag shader is about 2048 32-bit floats (MAX_FRAGMENT_UNIFORM_COMPONENTS), so depending on the amount of space per point event, we’re talking dozens to hundreds assuming there aren’t any other big consumers of uniform space. Past that you could shift to storing in uniform buffer objects, where you get ~14 binding points each of which can hold ~64KB. And you can go to texture or image data past that to further exceed that limit (I would actually go straight to texture and skip UBOs myself).

But I suspect before you even get to that point with the space issue, you’ll find you want another approach just due to time consumption applying all these point sources to every fragment on the screen.

Will this method scale up to thousands or tens of thousands of events?

That depends on your frame rate requirements, target GPU, and complexity of your weight computation. But I’d push your existing technique as far as you can until you know you need a plan B. Then you “know” you need it.

Would still be nice to be able to algorithmicly figure out the min/max value within which to clamp the manual adjustment or let it auto adjust.

You can definitely do that as a post-process. Ping-pong reduction as you described before on the GPU, or for starters just do a CPU readback of the resulting accumulated weights and reduce there (i.e. compute min/max).

I need to figure out how to get some basic white light shading in there so everything doesn’t look so uniform and flat. Know offhand a simple shading adjustment I can add to the shaders in order to essentially hard code a top down directional white light so I can tell depth and layers apart?

Just mixing in an dot( normal, lightvector ) term in there, attenuated to-taste, will get you a long way.

Ok so I moved the event data into a floating point texture so I could crank up the numbers. Couple oddities.

[ATTACH=CONFIG]536[/ATTACH]

I would have thought that colorization via a fragment shader would not be able to z-fight, but you can see some z fighting when zoomed out. Is there something that can cause z fighting like this with shader work?

Here is a closer view
[ATTACH=CONFIG]537[/ATTACH]

Secondly, it appears to be fill rate limited, as scaling the window down or up affects the performance significantly. I basically expect this due to the complexity of the shader at the moment, but the part I didn’t expect is that this performance is also reflected in CPU usage in task manager. I would have thought fill rate limitations would be on the GPU side. Even with the render calls being blocking calls I guess I would expect the program to basically block, and not be reflected in terms of CPU usage.

Hmm… Ok, just to verify, you’re still doing the frag shader loop over all events, and this is firing as you rasterize the poly surfaces of your level, right? Also, you don’t have nearly coincident surfaces do you?

Try pushing your near clip out and/or pulling your far clip in (glFrustum) – mainly the former. If you don’t see any changes in the artifact, it’s not z-fighting as in normal depth buffer fighting.

So the next question that arises is is it a function of the weight computation algorithm your are using. For instance, if your level floors are exactly 8 meters apart, are you using the number 8 as a hard cut-off for the influence distance of an event on a fragment? If so, could be that the fighting is actually there in your shader logic. For instance, instead of using a step function (e.g. step()), try smoothstep() or similar which you can use to fade out the influence over a distance range.

In any case, try varying your weighting function to see if it has an influence over the artifact.

Some things to check:

  • Are you running sync-to-vblank (and with double-buffering), or are you free running? The latter of course will drive up your CPU. You want the former.
  • Do you have a glFinish() after your SwapBuffers call? You want this. Otherwise the GPU will read ahead to start queuing up subsequent frames?
  • Are you submitting your batches to the GPU fairly efficiently (i.e. minimizing state changes, not using immediate mode, etc.)?
  • Are you doing a fair amount of CPU app-side work in just submitting your mesh for rendering?
  • Are you running with a decent desktop GPU card, or running off an integrated GPU (GPU integrated into the CPU)? Obviously the former is likely to perform considerably better and without slamming your CPU.
  • Are you running with the Windows compositor enabled (Aero, DWM, or whatever name its masquerading as nowadays)? If so, try disabling that. It’s a waste of cycles. Old rumors were that fullscreening a 3D app would do this as well as lend your app use of vsync, but I don’t keep up with Microsoft annoyances like this.
  • Which GPU vendor/driver version are you running? glGetString() with GL_VENDOR, GL_RENDERER, and GL_VERSION may be useful here.
  • The way some GL drivers “sleep” until vsync can be configured, because some mechanism result in very high CPU utilization (needless thread preemption, etc.). For instance, NVidia allows you to flip between usleep(0), sched_yield(), and busy wait (the latter two may yield high CPU utilization).

Hmm, I see the artifacts have something to do with my z height clamp, although I’m not sure why. The idea is to only accumulate events within a z height tolerance of 16. It’s a hard cutoff. I could see why artifacts might occur on surfaces that are right at 16 units apart from an entity, where maybe certain pixels are being rejected and others aren’t due to floating point issues or something? If I comment out the world z rejection it doesn’t show these artifacts. I’ll tinker with it some more.

if ( abs( eventInfo.z - worldPosition.z ) < eventHeightMax )
Vendor "NVIDIA Corporation"
Renderer "GeForce GTX 460 SE/PCIe/SSE2"
Version "4.3.0"

I’m using Vsync enabled. My app is doing nothing but rendering. The world mesh from a VBO with the shader. It’s a pretty trivial draw loop. I don’t get why the CPU is getting so hammered based on the fill rate apparently, as it scales with the window size. I mean, I sort of expect poor performance for now, but not reflected in insane cpu usage. In the profiler all the cpu time appears to be going towards the driver and thread wait related functions. Where do you configure how it sleeps?

Hmm… Sounds like it may be how the driver is waiting on events. Either 1) waiting on space to open up in the GPU command buffer, or 2) waiting on vsync.

You might check the NVidia driver README.txt file for Windows (probably installed with your driver) to see what it says about configuring yield behavior. The Linux driver README says this:


11E. OPENGL YIELD BEHAVIOR
There are several cases where the NVIDIA OpenGL driver needs to wait for
external state to change before continuing. To avoid consuming too much CPU
time in these cases, the driver will sometimes yield so the kernel can
schedule other processes to run while the driver waits. For example, when
waiting for free space in a command buffer, if the free space has not become
available after a certain number of iterations, the driver will yield before
it continues to loop.

By default, the driver calls sched_yield() to do this. However, this can cause
the calling process to be scheduled out for a relatively long period of time
if there are other, same-priority processes competing for time on the CPU. One
example of this is when an OpenGL-based composite manager is moving and
repainting a window and the X server is trying to update the window as it
moves, which are both CPU-intensive operations.

You can use the __GL_YIELD environment variable to work around these
scheduling problems. This variable allows the user to specify what the driver
should do when it wants to yield. The possible values are:
    __GL_YIELD         Behavior
    ---------------    ------------------------------------------------------
    <unset>            By default, OpenGL will call sched_yield() to yield.
    "NOTHING"          OpenGL will never yield.
    "USLEEP"           OpenGL will call usleep(0) to yield.

There’s probably an analogous setting for Windows, but I don’t know what it is.

I have verified before that USLEEP can sometimes greatly decrease driver CPU utilization on Linux.

The above description also prompts the possibility that it might be the Windows compositor that’s eating your CPU. Might disable that. Full-screening your GL app might do that, but that’ll also drive up your fill.

To eliminate your code as a possible cause when diagnosing this CPU problem, I’d cook a simple GL app that just clears the screen to random colors every frame. Should be very easy on the fill situation in your app and largely result in the driver blocked waiting on vsync in the “no compositor” case. But with a compositor (Aero/DWM/etc.), there’s obviously more work to be done behind the scenes each time you redraw your window. Who knows what that costs…

Also, vaguely remember hearing that Windows folks often want to disable NVidia’s “threaded optimization” to get rid of high CPU utilization.

Here’s a random websearch hit:

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.