Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 59 of 63 FirstFirst ... 9495758596061 ... LastLast
Results 581 to 590 of 623

Thread: The ARB announced OpenGL 3.0 and GLSL 1.30 today

  1. #581
    Member Regular Contributor
    Join Date
    Apr 2006
    Location
    Irvine CA
    Posts
    299

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    You can guarantee safety without fences, if you constrain your access patterns to use a "write once" model - write batches using an ascending cursor/offset, and orphan the buffer once you hit the end (glBufferData(..., NULL)) - at which point you wind the cursor back to zero and proceed again. This lets you pack many disjoint variable sized batches of data into one healthy sized VBO without any un needed blocking/waiting.

    If you have repeated partial update needs then you have to use something like an explicit fence + unsynchronized, or just go back to BufferSubData. The value of MapBufferRange combined with fences is higher if you have many more things crammed into the buffer object.

  2. #582
    Junior Member Newbie
    Join Date
    Sep 2004
    Posts
    11

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    It seems that AreTexturesResident() and PrioritizeTextures() are on the depreciated list.

    I wonder why this is the case, possible explanations i see are:
    - textures are always resident in the future. When i try to allocate a new texture and it does not fit onto the GPU an out of memory error is created
    - there will come a memory management extension that will allow to achieve same functionality on your own
    - it will go away with simply no way to achieve similar functionality

    Or did i miss something here, since no one else seems to complain this could also be true

  3. #583
    Advanced Member Frequent Contributor arekkusu's Avatar
    Join Date
    Nov 2003
    Posts
    783

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    Quote Originally Posted by Michael Gold
    Don't want a deprecated feature to be removed? Speak up!
    I don't understand the rationale behind deprecating CLAMP_TO_BORDER and the constant border color.

    I completely understand deprecating textures with borders... but the constant border color?

  4. #584
    Junior Member Regular Contributor
    Join Date
    Feb 2000
    Location
    Santa Clara, CA
    Posts
    172

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    CLAMP_TO_BORDER and TEXTURE_BORDER_COLOR are not deprecated. This was a spec bug which will be corrected in an updated 3.0 spec.

  5. #585
    Member Regular Contributor
    Join Date
    Mar 2005
    Posts
    301

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    Quote Originally Posted by Michael Gold
    In order to do this reliably, moving the data and remapping the PTEs need to happen atomically. I don't see a way to do this for arbitrarily large buffers.
    I agree, but I'd also throw in the sabot of SMP systems. As I see it, it would require lock, effectively "starve" every CPU, switch, and finally flush. AFAICT that would be both a performance and logistic disaster (waiting to happen).

    So instead of this, can we try to find alternatives? Could a model where the actual mapping is abstracted from the app work? I'm just brainstorming a bit now...

    Imagine a "multimapped" (just invented word) buffer, where the user instead of using it as a ring buffer (as suggested previously) that would have quite serious implications wrt page- and mapping-granularity get two (or more, as requested by the client) memory areas to play with - but using a single mapping (section). Each of them would only be ("legally") usable by the server after an "unmap" call (that wouldn't be a heavy unmap at all, but simply a synchronization flush and possibly an additional revocation of writable on the memory). Then the client could continue write data in another of the "segments" it got, while the server could process the previous data confident it's not modified under its feet.

    Again, I was just brainstorming. I don't know how much it could save in time (if it's at all feasible), but at least it could, if possible, allow a single mapping (kmode<->umode) to be "reused" and effectively persistemt (from the client's POV).

    But then, if the system changed screen setup and the physical VRAM got reallocated/invalidated... Damn, but I still think it was a possibly neat idea.

  6. #586
    Junior Member Regular Contributor
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    163

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    Quote Originally Posted by Rob Barris
    You can guarantee safety without fences, if you constrain your access patterns to use a "write once" model - write batches using an ascending cursor/offset, and orphan the buffer once you hit the end (glBufferData(..., NULL)) - at which point you wind the cursor back to zero and proceed again. This lets you pack many disjoint variable sized batches of data into one healthy sized VBO without any un needed blocking/waiting.
    I've thought about this issue some more this weekend, and think I might have found a better alternative when taking into consideration threading. And it turns out that it would NOT require my previous suggestion. With the ascending cursor scheme you have say a VBO which is allocated at a size which can hold at least 3 frames worth of dynamic geometry. This buffer must be unmapped to use for drawing, which I see as a problem for threading.

    However what if one was to instead break this VBO into at least 3 (and perhaps more if necessary) separate VBOs and use the following scheme instead. Initialize all VBOs, and use MapBufferRange() to map the entire range of all VBOs using MAP_WRITE_BIT | MAP_INVALIDATE_BUFFER_BIT | MAP_FLUSH_EXPLICIT_BIT | MAP_UNSYNCHRONIZED_BIT. Then add these mapped VBOs to a FIFO queue. Draw frames using the following steps in the primary GL context thread,

    1. When the first worker thread finishes working on the current frame, remove a new mapped VBO from the FIFO queue, and begin finished worker threads on the next frame. No delay is necessary because the VBOs in the FIFO are already mapped, worker threads can begin to insert new dynamic geometry into the next VBO regardless of what GL or the GPU is doing.

    2. Block, or do something else and poll for completion, for all worker threads to complete for the current frame.

    3. Use FlushMappedBufferRange() on the range(s) of the current VBO which got written by the worker threads. Then unmap the VBO.

    4. Do all drawing for the frame.

    5. Now remap the VBO using MapBufferRange() and the same access bits used in initialization. Add this mapped VBO back into the FIFO so that after another say 2 (or more if necessary) frames, it is used again.

    Michael, seems to me that this model would provide the best performance for an application which wanted to generate dynamic geometry using multiple threads (at the expense of giving the driver less flexibility to move memory).

    Is there a better way?

  7. #587
    Member Regular Contributor
    Join Date
    Apr 2006
    Location
    Irvine CA
    Posts
    299

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    When you use the invalidate whole buffer bit, this is meant to detach/orphan the current storage so you do not need to do any of your own juggling of old buffers. You could be running 12 buffer-fulls ahead of the driver, the driver gets to keep track of which ones have actually been fully consumed and can make that storage available again for the next invalidated buffer. If the driver and hardware are infinitely fast, then you won't get that far ahead. If they are much slower, then at some point the driver will enforce command queue depth (or storage footprint) limiting and calls will start to take more time as perceived by the issuing thread.

  8. #588
    Junior Member Regular Contributor
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    163

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    Quote Originally Posted by Rob Barris
    When you use the invalidate whole buffer bit, this is meant to detach/orphan the current storage so you do not need to do any of your own juggling of old buffers. You could be running 12 buffer-fulls ahead of the driver, the driver gets to keep track of which ones have actually been fully consumed and can make that storage available again for the next invalidated buffer. If the driver and hardware are infinitely fast, then you won't get that far ahead. If they are much slower, then at some point the driver will enforce command queue depth (or storage footprint) limiting and calls will start to take more time as perceived by the issuing thread.
    I should remove the invalidate bit and rely on only the unsynchronized bit instead?

    Is my example a case of the programmer trying to outsmart the driver? One primary goal here is to try and force the driver to do nearly all buffer overhead at creation time instead of at map time. Also to try and force the driver to allocate VBOs together at one time to limit GPU memory fragmentation.

    Also I should have noted in my previous post that one could modify the construct above to work in under a single frame delay by breaking up into many more than 3 subframes/VBO buffers.

  9. #589
    Member Regular Contributor
    Join Date
    Apr 2006
    Location
    Irvine CA
    Posts
    299

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    Well that's an interesting question. If you can actually calculate the upper bounds output in terms of bytes per frame of data needing to be written, then you might be able to put something together where you never have to block.

    However in a more general sense this is difficult to do (i.e. if you have no idea how much data will come down per frame) - unless you have fences. So usually you find out after a frame or two what the real demand is, so if the driver is managing orphaned buffers efficiently it should converge on a working set size that eliminates need for new allocations, and the application can remain oblivious to how many real buffers the driver had to juggle.

  10. #590
    Member Regular Contributor
    Join Date
    Nov 2003
    Location
    Czech Republic
    Posts
    317

    Re: The ARB announced OpenGL 3.0 and GLSL 1.30 tod

    Now something easier ...

    Look at this function:
    wglShareLists
    The wglShareLists function enables multiple OpenGL rendering contexts to share a single display-list space.
    Display lists are be gone in OpenGL 3.0+
    Any plans to rename this function or to introduce a new one?
    Imagine a newbie learning opengl 3.1. "Share what? Display lists? whats that?"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •