Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 5 FirstFirst 1234 ... LastLast
Results 11 to 20 of 47

Thread: VBOs strangely slow?

  1. #11
    Advanced Member Frequent Contributor
    Join Date
    May 2001
    Posts
    566

    Re: VBOs strangely slow?

    Try not use VBOs then

    The question is, why they push the use of a new feature if it's not implemented well?

    Conclusion, even with traditional glBegin/End you can get an outstanding performance as long as you algorithmically optimize vs. instruction/pixel/hardware optimization.

    Personally, I only believe in hardware rasterizer as an alternative fast way to software rasterization. Other than this, try use a pure shader path and see if it's slower or faster

  2. #12
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948

    Re: VBOs strangely slow?

    Try not use VBOs then

    The question is, why they push the use of a new feature if it's not implemented well?

    Conclusion, even with traditional glBegin/End you can get an outstanding performance as long as you algorithmically optimize vs. instruction/pixel/hardware optimization.
    Did you read the thread? He said, "In the end, glMapBuffer was (much) faster; preceding it with a glBufferData with a null pointer (discarding its contents), 10% faster than the vertex array." In short, VBOs worked better for him, once he was using the correct API. So your "conclusion" is errant nonsense.

    On topic, you should use glMapBufferRange if that extension is available. Using the invalidation flag, you don't even need the glBufferData(NULL) part.

  3. #13
    Advanced Member Frequent Contributor
    Join Date
    May 2001
    Posts
    566

    Re: VBOs strangely slow?

    "Try not use VBOs then"

    If it's confusing...because sometimes you change the order of commands and you get different/unexpected results on middle end hardware.

    "The question is, why they push the use of a new feature if it's not implemented well?"

    Because ppl talking about expected a huge perfomance gain when using VBOs.

    "Conclusion, even with traditional glBegin/End you can get an outstanding performance as long as you algorithmically optimize vs. instruction/pixel/hardware optimization."

    A better way to optimize software

  4. #14
    Senior Member OpenGL Pro Ilian Dinev's Avatar
    Join Date
    Jan 2008
    Location
    Watford, UK
    Posts
    1,290

    Re: VBOs strangely slow?

    ....
    Try shoving 10 million tris to the gpu per frame at 60fps without VBOs, while needing flexibility that display-lists do not give (and you'd like to not waste VRAM for the different permutations required otherwise with DLs).

  5. #15
    Junior Member Newbie
    Join Date
    Feb 2010
    Posts
    1

    Re: VBOs strangely slow?

    Use GL_STATIC_DRAW and don't update your data pointer (via glBufferData or glBufferSubData)in your for loop. I would think these calls cause the geometry to be sent over to the graphics adapter every time.

  6. #16
    Junior Member Newbie
    Join Date
    Feb 2010
    Posts
    14

    Re: VBOs strangely slow?

    That's pretty much the idea. I don't actually update the array here (because I just want to benchmark transfer speed, not random-number generation), but the target program writes new data every frame.

    Well, mapping the buffer works very nicely.

  7. #17
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,209

    Re: VBOs strangely slow?

    Quote Originally Posted by Baughn
    In the end, glMapBuffer was (much) faster; preceding it with a glBufferData with a null pointer (discarding its contents), 10% faster than the vertex array.

    All's well that ends well? I guess, but it's still not obvious to me why the other way of using them is in this case /slower/.
    That's interesting. When I've tried Map vs. Sub, Sub was faster (with invalidate of course, so multiple buffers are in-flight in the driver [allegedly], and fixing the VBO max size -- no resizing).

    But yeah, pure VBOs are odd. You'd think they'd always be faster, but some of the time they're slower (most of the time on pre-SM4 cards). Unless you play the "Ouija board" correctly per card per driver rev.
    • Map vs. Sub.
    • Invalidate vs. not.
    • Sync vs. not.
    • Static vs. stream vs. dynamic.
    • Dynamic max VBO size or not.
    • Interleaved attributes vs. separate.
    • Multiple batches per VBO vs. not.
    • Mixing index and vertex arrays in one buffer or not.
    • Max VBO size X or Y.
    • 32-byte-aligned verts or not.
    • Ring of N buffers or one.
    • Vtx fmts X or Y for colors, normals, texcoords, etc.
    • Latency between upload and use X or Y.
    • Call glVertexPointer first, last, or in between.
    Heck, one of our devs even found it can be faster using CPU-side index list with VBO vertex attributes on some cards, when the index list changes frequently.

    On pre-SM4 cards VBO perf used to be a total crapshoot, with it more likely to be slower than client arrays than faster, and that's without any dynamic VBO updates (you laugh, but we still have customers in the field with these and thus have to support them; these cards are only ~3yrs old and our customers use lots of GPUs). For recent gen cards, it's getting easier to be faster with VBOs, though still possible to find cases where VBOs lag. Batch setup seems more expensive with them than client arrays.

    VBO updates aside though, I will say I am pleased with VBO performance on recent cards particularly using NVidia's bindless batch data extension. With that, I can get very near to the performance of their legendary display lists (it's ~2X slower without bindless). So no doubt NVidia display lists use bindless internally (of course). VBOs+bindless is definitely the future (unless they come up with something even faster )

    The question is, why they push the use of a new feature if it's not implemented well?
    That's a very good question. VBO's would have been a much easier sell if they didn't positively suck when they were first introduced, which lasted for several generations of cards. They're still a Ouigi board, but the Ouigi board has gotten much smaller on recent cards.

    Another reason VBOs weren't such a slam-dunk sell is the vendors did not provide guidance to say specifically "this is how you get the fastest VBO performance on our cards: use permutation A,B,C,F,M,P,R". And when there was a tip dropped, if you tried it, half the time it was worse performance.

  8. #18
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,209

    Re: VBOs strangely slow?

    Quote Originally Posted by Baughn
    In the end, glMapBuffer was (much) faster; preceding it with a glBufferData with a null pointer (discarding its contents), 10% faster than the vertex array.
    Could you post your exact map code example in a follow-up? I think it'd be useful/informative for a number of folks to run all 3, and let you/us verify that everyone's seeing similar results on varying GPUs/vendors/drivers on exactly the same code.

  9. #19

    Re: VBOs strangely slow?

    Not sure if someone said it before but, what are your GL_MAX_ELEMENTS_VERTICES and GL_MAX_ELEMENTS_INDICES.

    Extending those things will have bad results.

  10. #20
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,209

    Re: VBOs strangely slow?

    Just for kicks, and to come to some VBO upload performance conclusions on modern hardware (at least with this 8MB/upload example), I thought I'd take the original two permutations (same VBO sizes/contents/rendering), and try a few variations for comparison:

    1. 2.163s - Client arrays
    2. 2.801s - BufferData load/reload
    3. 2.876s - BufferData NULL, BufferSubData load
    4. 1.985s - BufferData NULL, MapBuffer load, UnmapBuffer
    5. 2.013s - glMapBufferRange MAP_INVALIDATE load, UnmapBuffer
    6. 2.078s - glMapBufferRange MAP_INVALIDATE load, UnmapBuffer with buffer load/use lag of 2

    Test setup:
    - NVidia GTX 285 GPU, Core i7 920 CPU, PCIe 2.0
    - NVidia 190.32 Linux drivers

    Option #3 used to be the fastest. But on modern hardware/drivers it's now the dead slowest , at least with this example.

    Also, options #4 and #5 can be made ~60ms faster (3%) merely by using fewer buffers (e.g. 1 instead of 3).

    It's interesting to note this nets an upload rate of ~2 GB/sec (6.4 GB/sec practical max PCIe2, 8.0 GB/sec theoretical max, 8.3 GB/sec theoretical max CPU mem).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •