Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 2 FirstFirst 12
Results 11 to 15 of 15

Thread: Heat map visualization shader help please

  1. #11
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,126
    Quote Originally Posted by JSwigart View Post
    I would have thought that colorization via a fragment shader would not be able to z-fight, but you can see some z fighting when zoomed out. Is there something that can cause z fighting like this with shader work?
    Hmm.. Ok, just to verify, you're still doing the frag shader loop over all events, and this is firing as you rasterize the poly surfaces of your level, right? Also, you don't have nearly coincident surfaces do you?

    Try pushing your near clip out and/or pulling your far clip in (glFrustum) -- mainly the former. If you don't see any changes in the artifact, it's not z-fighting as in normal depth buffer fighting.

    So the next question that arises is is it a function of the weight computation algorithm your are using. For instance, if your level floors are exactly 8 meters apart, are you using the number 8 as a hard cut-off for the influence distance of an event on a fragment? If so, could be that the fighting is actually there in your shader logic. For instance, instead of using a step function (e.g. step()), try smoothstep() or similar which you can use to fade out the influence over a distance range.

    In any case, try varying your weighting function to see if it has an influence over the artifact.
    Last edited by Dark Photon; 11-12-2013 at 07:53 PM.

  2. #12
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,126
    Quote Originally Posted by JSwigart View Post
    Secondly, it appears to be fill rate limited, as scaling the window down or up affects the performance significantly. I basically expect this due to the complexity of the shader at the moment, but the part I didn't expect is that this performance is also reflected in CPU usage in task manager. I would have thought fill rate limitations would be on the GPU side. Even with the render calls being blocking calls I guess I would expect the program to basically block, and not be reflected in terms of CPU usage.
    Some things to check:

    - Are you running sync-to-vblank (and with double-buffering), or are you free running? The latter of course will drive up your CPU. You want the former.
    - Do you have a glFinish() after your SwapBuffers call? You want this. Otherwise the GPU will read ahead to start queuing up subsequent frames?
    - Are you submitting your batches to the GPU fairly efficiently (i.e. minimizing state changes, not using immediate mode, etc.)?
    - Are you doing a fair amount of CPU app-side work in just submitting your mesh for rendering?
    - Are you running with a decent desktop GPU card, or running off an integrated GPU (GPU integrated into the CPU)? Obviously the former is likely to perform considerably better and without slamming your CPU.
    - Are you running with the Windows compositor enabled (Aero, DWM, or whatever name its masquerading as nowadays)? If so, try disabling that. It's a waste of cycles. Old rumors were that fullscreening a 3D app would do this as well as lend your app use of vsync, but I don't keep up with Microsoft annoyances like this.
    - Which GPU vendor/driver version are you running? glGetString() with GL_VENDOR, GL_RENDERER, and GL_VERSION may be useful here.
    - The way some GL drivers "sleep" until vsync can be configured, because some mechanism result in very high CPU utilization (needless thread preemption, etc.). For instance, NVidia allows you to flip between usleep(0), sched_yield(), and busy wait (the latter two may yield high CPU utilization).
    Last edited by Dark Photon; 11-12-2013 at 07:29 PM.

  3. #13
    Junior Member Newbie
    Join Date
    Nov 2013
    Posts
    7
    Hmm, I see the artifacts have something to do with my z height clamp, although I'm not sure why. The idea is to only accumulate events within a z height tolerance of 16. It's a hard cutoff. I could see why artifacts might occur on surfaces that are right at 16 units apart from an entity, where maybe certain pixels are being rejected and others aren't due to floating point issues or something? If I comment out the world z rejection it doesn't show these artifacts. I'll tinker with it some more.

    Code :
    if ( abs( eventInfo.z - worldPosition.z ) < eventHeightMax )

    Code :
    Vendor "NVIDIA Corporation"
    Renderer "GeForce GTX 460 SE/PCIe/SSE2"
    Version "4.3.0"

    I'm using Vsync enabled. My app is doing nothing but rendering. The world mesh from a VBO with the shader. It's a pretty trivial draw loop. I don't get why the CPU is getting so hammered based on the fill rate apparently, as it scales with the window size. I mean, I sort of expect poor performance for now, but not reflected in insane cpu usage. In the profiler all the cpu time appears to be going towards the driver and thread wait related functions. Where do you configure how it sleeps?

  4. #14
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,126
    Hmm... Sounds like it may be how the driver is waiting on events. Either 1) waiting on space to open up in the GPU command buffer, or 2) waiting on vsync.

    You might check the NVidia driver README.txt file for Windows (probably installed with your driver) to see what it says about configuring yield behavior. The Linux driver README says this:

    Code :
    11E. OPENGL YIELD BEHAVIOR
    There are several cases where the NVIDIA OpenGL driver needs to wait for
    external state to change before continuing. To avoid consuming too much CPU
    time in these cases, the driver will sometimes yield so the kernel can
    schedule other processes to run while the driver waits. For example, when
    waiting for free space in a command buffer, if the free space has not become
    available after a certain number of iterations, the driver will yield before
    it continues to loop.
     
    By default, the driver calls sched_yield() to do this. However, this can cause
    the calling process to be scheduled out for a relatively long period of time
    if there are other, same-priority processes competing for time on the CPU. One
    example of this is when an OpenGL-based composite manager is moving and
    repainting a window and the X server is trying to update the window as it
    moves, which are both CPU-intensive operations.
     
    You can use the __GL_YIELD environment variable to work around these
    scheduling problems. This variable allows the user to specify what the driver
    should do when it wants to yield. The possible values are:
        __GL_YIELD         Behavior
        ---------------    ------------------------------------------------------
        <unset>            By default, OpenGL will call sched_yield() to yield.
        "NOTHING"          OpenGL will never yield.
        "USLEEP"           OpenGL will call usleep(0) to yield.

    There's probably an analogous setting for Windows, but I don't know what it is.

    I have verified before that USLEEP can sometimes greatly decrease driver CPU utilization on Linux.

    The above description also prompts the possibility that it might be the Windows compositor that's eating your CPU. Might disable that. Full-screening your GL app might do that, but that'll also drive up your fill.

    To eliminate your code as a possible cause when diagnosing this CPU problem, I'd cook a simple GL app that just clears the screen to random colors every frame. Should be very easy on the fill situation in your app and largely result in the driver blocked waiting on vsync in the "no compositor" case. But with a compositor (Aero/DWM/etc.), there's obviously more work to be done behind the scenes each time you redraw your window. Who knows what that costs...

  5. #15
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,126
    Also, vaguely remember hearing that Windows folks often want to disable NVidia's "threaded optimization" to get rid of high CPU utilization.

    Here's a random websearch hit:

    * http://wiki.phoenixviewer.com/100_cpu

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •