Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 3 of 6 FirstFirst 12345 ... LastLast
Results 21 to 30 of 53

Thread: Bindless Stuff

  1. #21
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948

    Re: Bindless Stuff

    I get results like this:
    Please use code tags to make tables like that more legible.

  2. #22
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,126

    Re: Bindless Stuff

    Quote Originally Posted by Dan Bartlett
    It was on a Nvidia GeForce 9500 card...
    Here are my results when drawing fairly large triangles
    Thanks for the tests, but this test hardware and method makes me suspect that you are very likely going to be GPU limited much of the time (large triangles = lots of fill, and this is a slow card).

    Where you are going to see the most benefit from bindless is where you're "not" waiting on your GPU to get the work done. You're waiting on your CPU to pump the batches. That is, in cases where your GPU is fairly fast, and your CPU/CPU memory is relatively slow, such that you just can't keep the GPU fed..

    Also as Alfonse pointed out, for the maximum benefit, you need to be rendering a lot of "different" batches from different buffers. This maximizes your chance for cache misses, which is where bindless shines. Also, don't render super large triangles. To maximize the bindless benefit, the goal is not to be GPU limited here.

    VAOs were in many cases the same speed as just using VBOs, possibly when limited by something else...
    Which strongly suggests your test program is not CPU/batch submit limited for those cases, which is where you're going to get the max speed-up from bindless.

    It was mostly to see if I could reach anywhere near the 7x speedup that was achieved in NVidia's test-case, and where/when bindless started to have an effect.
    To maximize bindless benefit, you want a fast GPU and a relatively slow CPU/CPU mem (e.g. slow memory clock, smaller memory caches, etc.) and batches that aren't super-huge (more CPU batch setup overhead). The benefit is going to be different for different hardware, but it shouldn't ever net you a loss.

    To make that test setup even uglier, running other threads on other cores sharing the same CPU caches which push data out of the cache, causing more cache misses. But just running enough different batches through one thread should do that too.

    Bindless + VAO combined seem to be faster than VBOs, but not as fast as bindless by itself
    That is my experience too. Don't stack VAOs on top of bindless -- you lose perf. Bindless gives you everything VAOs give you and more.

    This may be due to the expense of having a bazillion little VAOs floating around in the driver that are otherwise each causing cache misses when accessed. Dunno. But bindless apparently avoids this overhead by letting you store nearly all of the VAO state on your side in the data structures you store your batches in, which are already in the cache at that point anyway while you're submitting draw calls.

    Quote Originally Posted by Alfonse Reinheart
    FPS is not a useful measure; milliseconds is.
    Right (emphasis mine):

    * Performance (Humus)
    * The evils of fps

    Besides including irrelevent "cruft", FPS is the inverse of time, and thus varies non-linearly with time (which is one reason it's fairly useless). For instance, the performance difference between 80 and 90 fps is actually ''greater than'' (i.e. more impressive than) the performance difference between 125 fps and 150 fps. Why? Well, invert to seconds/frame and see. And if you have to invert to make sense out of this nonsense anyway, why use FPS at all? Just use milliseconds (ms).


  3. #23
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,126

    Re: Bindless Stuff

    Quote Originally Posted by DmitryM
    I don't get it. Who cares about objects with less than 100 vertices? Drawing many of them without any instancing method involved is not efficient from the start.
    Thing is, sometimes you want to draw 1000 little boxes, or 5000 little balls, all photocopies of each other (or slight munges). In those cases, instancing shines. (...if you don't care about culling efficiency.)

    But sometimes you really do want lots of varied content, and instancing is like hammering in a screw. It's not the right solution.

    You want cheaper batches. And that's what bindless gives you.

    It also avoids some of the contortions you end up doing to efficiently cull instances. Instances can really kill your perf through loss of frustum-culling granularity if you're not careful. Faster batches means you can tolerate smaller instance groups, which means better frustum-culling from the get go.

  4. #24
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948

    Re: Bindless Stuff

    Instances can really kill your perf through loss of frustum-culling granularity if you're not careful.
    If by "careful" you mean "I frustum cull my instances", then yes.

    Instancing has nothing to do with frustum culling, unless you're only thinking static instancing. In which case, you should say that.

  5. #25
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,076

    Re: Bindless Stuff

    I'm glad that bindless has finally achieved such attention (after one year of existence).

    Well, I don't like generic tests because they show nothing. If someone reports a 2x speed boost in a real application, than it is for respect. Bundless can achieve that if there are thousands of VBOs even on fast CPUs with enough cache.

    Before going deeper into analysis, it would be useful to clarify some facts about the test.

    First, is there a glFinish() call at the end of the drawing method. If there is no such call than the results are not valid. I have a lot of experiences with NVIDIA drivers on Windows, and my early tests (few years ago) were not valid because of that.

    Second, what method (function) is used to measure the time? On Windows I suggest using PerformanceCounter. (Don't laugh at me, I know for the bugs on some motherboards, but they are past, and even if you still have such one, measuring small intervals excludes the error).

    Third, it can be useful to find the bottleneck of the application to justify the frame-rate. Currently I'm investigating debuggers/profilers for OpenGL in order to purify my code. Those tools can really be useful. (By the way I'm a little bit disappointed by Nexus. Or maybe I expected too much...)

    And for the units of measured values, my opinion is that the pseudo-frame-rate is much better for the most of readers, than ms. the pseudo-frame-rate is just the inverted value of the number of seconds something lasts, but the measuring interval is terminated before SwapBuffers or similar frame terminator and any screen synchronization routine should be eliminated.

  6. #26
    Senior Member OpenGL Guru knackered's Avatar
    Join Date
    Aug 2001
    Location
    UK
    Posts
    2,833

    Re: Bindless Stuff

    Alfonse, what is your problem? You appear to be attacking someone who is trying to help and, unless I missed a paypal debit, he's doing it for free. Measure your language or bugger off. I'm finding this useful. To everyone else, thank you for your efforts. I shall continue reading until I have something to contribute.

  7. #27
    Member Regular Contributor
    Join Date
    Aug 2008
    Posts
    433

    Re: Bindless Stuff

    Quote Originally Posted by Aleksandar
    First, is there a glFinish() call at the end of the drawing method. If there is no such call than the results are not valid. I have a lot of experiences with NVIDIA drivers on Windows, and my early tests (few years ago) were not valid because of that.
    No, but performance is measured over many frames, it shouldn't be needed should it? (Well, maybe 1 glFinish() after the final frame, but it always keeps a running total of framerate, so it would hit performance doing glFinish() after every frame)

    Quote Originally Posted by Aleksandar
    Second, what method (function) is used to measure the time? On Windows I suggest using PerformanceCounter. (Don't laugh at me, I know for the bugs on some motherboards, but they are past, and even if you still have such one, measuring small intervals excludes the error).
    It's using the built in GLScene performance monitoring code, the relevant parts look like this:

    Code :
    // stripped down render loop
    if FFrameCount = 0 then
      QueryPerformanceCounter(FFirstPerfCounter);
    # render scene #
    if not (roNoSwapBuffers in ContextOptions) then
      RenderingContext.SwapBuffers;
    Inc(FFrameCount);
    QueryPerformanceCounter(perfCounter);
    Dec(perfCounter, FFirstPerfCounter);
     
    if perfCounter > 0 then
      FFramesPerSecond := (FFrameCount * vCounterFrequency) / perfCounter;
    Code :
    TGLSceneBuffer = class(TGLUpdateAbleObject)
      ...
      public
        {: Current FramesPerSecond rendering speed.<p>
           You must keep the renderer busy to get accurate figures from this
           property.<br>
           This is an average value, to reset the counter, call
           ResetPerfomanceMonitor. }
        property FramesPerSecond: Single read FFramesPerSecond;
        {: Resets the perfomance monitor and begin a new statistics set.<p>
           See FramesPerSecond. }
        procedure ResetPerformanceMonitor;
    end;
    Code :
    procedure TGLSceneBuffer.ResetPerformanceMonitor;
    begin
      FFramesPerSecond := 0;
      FFrameCount := 0;
      FFirstPerfCounter := 0;
    end;

    It's basically just counting the number of frames rendered after you reset the performance monitor, and dividing by the total time taken between just before the first frame was rendered after a performance monitor reset + just after the last frame was rendered.

    Typically you'd just query "FramesPerSecond" every couple of seconds + reset the performance monitor straight after with "ResetPerformanceMonitor()" to get a framerate displayed that is responsive.

    Code :
    FPS := GLSceneViewer1.FramesPerSecond;
    GLSceneViewer1.ResetPerformanceMonitor();

  8. #28
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    934

    Re: Bindless Stuff

    Shouldn't you use timer query?
    http://www.opengl.org/registry/specs...imer_query.txt

    Great job Dan anyway, it's complicated to have this sort of thing sorted but it's good to have some numbers.

    (I am still wondering how VAO end up in OpenGL 3. Who could ever see any good in this feature design this way? Oo)

  9. #29
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948

    Re: Bindless Stuff

    I am still wondering how VAO end up in OpenGL 3. Who could ever see any good in this feature design this way?
    Apple has had VAOs around for years. It seemed to work for them.

  10. #30
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    934

    Re: Bindless Stuff

    Define "work"?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •