Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 24

Thread: Optimal Streaming Strategy

  1. #11
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Quote Originally Posted by Dark Photon
    Fairly rarely. The stream VBO is large enough to handle one frame of data plus a fair amount of extra, so orphaning doesn't happen often.
    Oh man. I was thinking: "Why would he need to make sure that everything is processed before explicitly syncing with SwapBuffers()?" What I didn't see, although fairly obvious, is that you have to orphan the buffer when it fills up because it could do so in mid-frame and some portions of it may not already be processed when trying to write new stuff. That's why there's syncing (i.e. absence of the UNSYNCHRONIZED bit) when you orphan the whole buffer. Damn...

    However, I have to agree with Alfonse on the async invalidation being a little contradictory. If you absolutely know that a well defined range of the buffer will not be in use when mapping, why the need to invalidate? Why not simply do an async write mapping and overwrite your contents with new stuff that fits into said range? If you can't be sure, you need to synchronize anyway if you don't want to possibly risk corruption.

    I think you're going towards scenarios where you don't want to further fill the buffer with new stuff if there's some place where clearly unused data resides - so you want to swap portions instead of risking orphaning if it's not absolutely necessary. But how can you be positive about definitely unused ranges if you don't track if data in a buffer has not actually been used for some time? This implies three possible solutions:

    1. go Alfonse's way and can the whole thing, thus syncing by orphaning the complete buffer - this will lead to data that has been in use in previously and will be in use afterwards has to be re-uploaded to the buffer
    2. go aqnuep's way and invalidate ranges synchronously, thus avoiding corruption of data still in flight - this may force the application to wait which only makes sense if you've got stuff to do. Just like with occlusion queries.
    3. employ an eviction strategy akin to commonly used CPU caching and asynchronously replace unused ranges, orphan only when really needed - this used additional CPU cycles and memory because you have to remember what data wasn't used for how many frames. There could be other schemes, however.

    Is that about it?

    Quote Originally Posted by aqnuep
    But if you would orphan at the end with GL_MAP_INVALIDATE_*_BIT or with glBufferSubData then it could result in corruption.
    But only with GL_MAP_UNSYNCHRONIZED_BIT, right? Sorry for being obnoxious - just want to make sure I'm not missing something.
    Last edited by thokra; 02-07-2013 at 05:29 AM. Reason: small correction.

  2. #12
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    I think you're misunderstanding the usage pattern here. It's a pattern of dumping stuff to a buffer and rendering that stuff. Then dumping more stuff and rendering it. Nothing is being saved; all of the data is transient, one-use only. It's for the kind of stuff you would have used immediate mode for back in the old days. GUI elements and such.

    None of this is static, and nobody really cares where it lives in memory. There are no objects, no "ranges", nada. Just sequences of vertex data. So you effectively build a ring buffer out of a buffer object.

    Each thing you draw takes up a certain amount of buffer object space. You map that space, write your data, and render with it. You now no longer care about that data, so long as OpenGL eventually gets it. You do it again with some more data, so you just slide over to the next unused space. Eventually you run out of buffer object space for new data, so you start over at the beginning.

    This "requires" buffer object orphaning, lest you make the driver actually check to see if that region of the buffer is in use. Because you don't want to be unsync mapping some part of the buffer that's still in use. By invalidating the buffer, you ensure that none of it is in use anymore.

  3. #13
    Member Regular Contributor
    Join Date
    Aug 2008
    Posts
    450
    As well as the excellent post by Rob Barris that explains it fairly thoroughly, here's a blog post by Fabian Giesen with some tips about using write combining (which will likely be used when filling the mapped buffer) efficiently too: http://fgiesen.wordpress.com/2013/01...t-your-friend/
    Here's the summary:
    • If itís a dynamic constant buffer, dynamic vertex buffer or dynamic texture and mapped ďwrite-onlyĒ, itís probably write-combined.
    • Never read from write-combined memory.
    • Try to keep writes sequential. This is good style even when itís not strictly necessary. On processors with picky write-combining logic, you might also need to use volatile or some other way to cause the compiler not to reorder instructions.
    • Donít leave holes. Always write large, contiguous ranges.
    • Check the rules for your target architecture. There might be additional alignment and access width limitations.

  4. #14
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Then we obviously had different things in mind. Also, Dark Photon mentioned that he wanted to keep streamed data in a buffer as long as possible - I could have misunderstood his intentions though.

    My concern stays valid though: You have to upload data again and again although it doesn't differ from the previous render call. Does that perform well?
    Last edited by thokra; 02-07-2013 at 05:16 AM.

  5. #15
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    You have to upload data again and again although it doesn't differ from the previous render call. Does that perform well?
    Compared to what? Consistency of performance is often more important than just getting good performance sometimes and bad performance other times. This method makes performance consistent, regardless of whether data changes or not. So rather than getting a performance spike when data changes, you get solid, consistent performance all the time.

    If you're not making much use of that uploading bus for anything else, there's really no reason why you can't do this effectively. Yes, it depends on how much stuff you're doing this for, but it's rarely that much stuff.

  6. #16
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Quote Originally Posted by Alfonse
    Compared to what?
    Compared to partial eviction. Sure, it's relative to problem size. I imagine something like flight simulation where you travel at high speeds over some densely crowded areas with a lot of varying geometry. In that case, not taking advantage of whatever coherency you got theoretically seems less wise than trying to swap only parts of the buffer.

    EDIT: But you're right, for most applications such high frequency probably isn't applicable and you can just draw your static stuff and the little dynamic rest like you suggest. I'd like to see both approaches in both scenarios though - just out of curiosity.

  7. #17
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,190
    Quote Originally Posted by Alfonse Reinheart View Post
    Why would the driver "pre-fill" any of the buffer? If you're mapping for writing, you aren't supposed to read it. So it doesn't matter what's there.
    As a simple example to illustrate the point, suppose you map 20 bytes for write but only write 3 random bytes in this 20 byte range. True, you didn't read it. But what's the poor driver supposed to transfer to the GPU?

    That said, I'm not a driver developer.
    Last edited by Dark Photon; 02-07-2013 at 06:01 AM.

  8. #18
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,190
    Quote Originally Posted by Alfonse Reinheart View Post
    I think you're misunderstanding the usage pattern here. It's a pattern of dumping stuff to a buffer and rendering that stuff. Then dumping more stuff and rendering it. Nothing is being saved; all of the data is transient, one-use only.
    Close, but not exactly. One "write" only. Multiple reads. Reuse the copy you put there as long as possible (i.e. until the next orphan).

  9. #19
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,128
    Quote Originally Posted by DarkPhoton
    Close, but not exactly. One "write" only. Multiple reads. Reuse the copy you put there as long as possible (i.e. until the next orphan).
    So I got it right.

    Maybe we should come up with a wiki article on this topic. Who wants to lookup the "slow VBO" thread again and again.

  10. #20
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,190
    Quote Originally Posted by thokra View Post
    I imagine something like flight simulation where you travel at high speeds over some densely crowded areas with a lot of varying geometry. In that case, not taking advantage of whatever coherency you got theoretically seems less wise than trying to swap only parts of the buffer.
    I think there's still some misunderstanding left here. With the write-once, read-many streaming VBO approach I'm talking about, there is tons of reuse frame-after-frame-after-frame due to coherency because you're basically just re-blasting 99.5% of the data you've already uploaded to the streaming VBO in previous frames. The streaming VBO "is" the cache. This is especially advantageous if you can lock that VBO on the GPU and blast those batches with bindless. That's as cheap as it gets!

    Yes, having to orphan every so often and "re-seed the cache" is a disadvantage. But the alternative is a filling/coalescing holes "garbage collection" scheme that adds complexity and is problematic for real-time performance. Or, never re-use anything and reupload everything every time you use it. It's up to you though.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •