Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 11 of 19 FirstFirst ... 910111213 ... LastLast
Results 101 to 110 of 184

Thread: Official feedback on OpenGL 4.0 thread

  1. #101
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948

    Re: Official feedback on OpenGL 4.0 thread

    Actually, the concern is more what Groovounet was suggesting: that ATI might pretend that the entire HD 5xxx line may support 4.0 and silently convert doubles to floats.

    Without conformance tests, what would there be to stop such a thing? Indeed, I seem to recall something similar happening at the 2.0 transition, when unrestricted NPOT support was required by the specification. Some hardware that couldn't actually support unrestricted NPOTs would still advertise GL 2.0, but silently break if you used NPOTs.

  2. #102
    Junior Member Regular Contributor
    Join Date
    Jul 2000
    Location
    Roseville, CA
    Posts
    159

    Re: Official feedback on OpenGL 4.0 thread

    Here's one more vote for DSA in the core.

    ARB_explicit_attrib_location is nice. Now if we only had ARB_explicit_uniform_location (including samplers) as well for all shader types, we would finally get all the capabilities back that we enjoyed with ASM shaders and Cg. Right now, it's silly to keep rebinding uniforms over many shaders in cases when they have the same value globally.

  3. #103
    Member Regular Contributor
    Join Date
    Apr 2006
    Location
    Irvine CA
    Posts
    299

    Re: Official feedback on OpenGL 4.0 thread

    Quote Originally Posted by Eric Lengyel
    Here's one more vote for DSA in the core.

    ARB_explicit_attrib_location is nice. Now if we only had ARB_explicit_uniform_location (including samplers) as well for all shader types, we would finally get all the capabilities back that we enjoyed with ASM shaders and Cg. Right now, it's silly to keep rebinding uniforms over many shaders in cases when they have the same value globally.
    UBO can solve some of those problems if you have a common group of uniforms that can be stored together?

  4. #104
    Junior Member Regular Contributor
    Join Date
    Jul 2000
    Location
    Roseville, CA
    Posts
    159

    Re: Official feedback on OpenGL 4.0 thread

    Quote Originally Posted by Rob Barris
    UBO can solve some of those problems if you have a common group of uniforms that can be stored together?
    Maybe. But UBOs are not available across all hardware I need to support (SM3+), and using UBOs is not guaranteed to be the fastest path. Something with the same effect as glProgramEnvParameter*() from ASM shaders is what I'd really like to see for GLSL.

  5. #105
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948

    Re: Official feedback on OpenGL 4.0 thread

    But UBOs are not available across all hardware I need to support (SM3+)
    A quite a bit of GL 2.1 hardware is coming to the end of support from IHVs. So even if they add an extension tomorrow, there is still a lot of hardware out there that won't have access to it.

    Something with the same effect as glProgramEnvParameter*() from ASM shaders is what I'd really like to see for GLSL.
    Is that guaranteed to be the "fastest path?"

  6. #106
    Senior Member OpenGL Guru
    Join Date
    Dec 2000
    Location
    Reutlingen, Germany
    Posts
    2,042

    Re: Official feedback on OpenGL 4.0 thread

    I agree with Eric. Something to make uniforms like the environment variables from ARB_fragment_program would be extremely helpful.

    In my apps i actually treat all uniforms like env variables. If two shaders use a uniform with the same name, any update in the app will change both uniform values, no matter which of the two shaders (if any) is bound, atm. Honestly i don't see a reason why shaders should have their own local variables, at all.

    Right now, everyone who treats uniforms as global program state, that the shader can "query" by creating a uniform with the given name, has to do lots of complicated management to get those values into the shaders and try to prevent unnecessary updates.

    Jan.
    GLIM - Immediate Mode Emulation for GL3

  7. #107
    Junior Member Newbie
    Join Date
    Mar 2010
    Posts
    1

    Re: Official feedback on OpenGL 4.0 thread

    Quote Originally Posted by Ilian Dinev
    Maybe the RHD57xx do support doubles, but in DX's nature it's not a mentionable/marketable feature?
    (disclaimer: I haven't checked any in-depth docs on 57xx)

    Edit: http://www.geeks3d.com/20091014/rade...point-support/

    Maybe there's a glGetIntegerv() call to check precision, just like gl_texture_multisample num-depth-samples in GL3.2 ?
    Can i add my voice to those searching for clarification on whether single-precision DX11 cards will be able to run OpenGL4, and if not whether headline features of OGL4.0 and DX11 like tesselation will be unavailable on those cards?

    As far as i know it is only the 5800 cards that can do dual-precision floats.

    If 57xx cards won't be OpenGL 4.0 compliant then this is a shame, because even if there is an easy vender extension that can be called to achieve features like tesselation it is a balls up from a consumer perception POV, and it would be a shame to return to the vender extension market fragmentation whose supposed obsclelence was to my mind one of the best features of OpenGL 3/4 development.

    p.s. congrats and thanks to the OpenGL development team.

  8. #108
    Junior Member Newbie
    Join Date
    Mar 2010
    Posts
    14

    Re: Official feedback on OpenGL 4.0 thread

    Quote Originally Posted by Alfonse
    You can already do that with geometry shaders and layered framebuffer objects.

    Before asking for something, you might want to make sure you don't already have it wink
    Alfonse, I am aware of geometry shaders + FBOs but they don't solve my problem (nor are they thes same as what I am suggesting, namely true parallel rasterization, which I do realize is pie in the sky). Consider the metrics of the situation.

    I have an algorithm that pre-renderes low quality parts of my scene from several view angles. Two examples would be cube-map based environment mapping for fake reflectivity or cascading shadow maps. In both cases depending on the view angle, it is quite possible that the batches and meshes for each "view" (six orthagonal frusta for the cube map or perhaps four or more non-overlapping frusta for CSM) will be mostly unique and different. In other words, the scene graph subset submitted to the far shadow map and near won't be the same.

    In this case:
    - I am submitting a lot of geometry. My tests with gshaders indicate that they slow down my throughput.
    - I am submitting the geometry and batch state changes in series when it is actually independent for the render targets. That is, there is a 4x or 6x paralization win I don't get.

    Further more, if the complexity of the shader is very low (CSM = depth only, probably not fill rate limited !:-) and the GPU is quite powerful, then I don't expect to really be able to drive the shading resources to their fullest potential by submitting lots of these simple batches. And there are a lot of them.

    Since my app is already often batch limited, it only gets worse having to hit the scene graph over and over.

    In the meantime, some of our users pick up SLI hardware, and then ATI comes out and demos a wall of video with the HD5000.

    Hence my pie in the sky dream: if the GPU companies want to find a way to sell more hardware, sell me more command processors and let me prepare my gajillions of shadow maps in parallel.

    Quote Originally Posted by Jan
    Actually that would be like having 6 contexts (though the GPU would still process everything in sequence, i guess).
    Right - if you were to code a CSM with threaded GL now, this is what would happen, and you might have _some_ benefit from preparing command buffers in parallel, if the driver does this well, and some loss, due to context switches.

    Of course i assume that Alfonse's suggestion actually solves the problem at hand. I don't see how truly multi-threading the GPU should be of value to anyone.
    It sort of does but sort of does not. gshaders + layered FBOs still requires a serialized submit (and serialized rasterization) of multiple independent pre-rendered textures (dynamic environment maps, shadow maps, what-have-you). If the driver is threaded you've gone from using 1/8th of your high-end machine to 2/8ths.

    @bsupnik: Correct, when people talk about "multi-threaded rendering" they are concerned with offloading CPU computations to different cores.
    Right...the problem is that with truly serialized rendering, we still have one producer and one consumer...that's not going to scale up to let me use the 8 cores (16 soon..joy) that some of our users have in their machines.

    To put it simply, if I have to prepare 8 shadow maps to render my scene, and the 8 shadow maps are CPU/batch bound*, and I have 8 cores and I'm only using one...well, that's the math.

    If i could create a command-list, i could take the whole complex piece of code, put it into another thread and only synchronize at one spot, such that the main-thread only takes the command-list and executes it, with no further computations to be done.
    Why can't you do that now? I tried this with X-Plane, although the benefit in parallel execution wasn't particularly large vs. the overhead costs. You can roll a poor-man's display list/command Q by building a "command buffer" of some abstracted form of the gl output of your cull sequence...of course, this assumes that the actual work done by the cull loop in your app comes in some reasonably uniform manner that can be buffered easily as commands, and it also assumes that the actual "draw" is cheap except for driver overhead. :-)

    cheers
    Ben

  9. #109
    Senior Member OpenGL Pro Ilian Dinev's Avatar
    Join Date
    Jan 2008
    Location
    Watford, UK
    Posts
    1,290

    Re: Official feedback on OpenGL 4.0 thread

    You have glMapBuffer, the ARB_draw_indirect, PBO, texture-buffers and UBOs. Make several degenerate triangles, to group objects by num_primitives and vtx-shader (static or skinned or wind-bent or etc), and do lots of instancing with double-referencing through indices of data. Will skip vtx-cache (it's getting emulated on ATi cards anyway), but you can draw any current-style shadowmap scene in 16*3 draw-calls. [ 16=log2(65535 primitives/mesh) , 3 = [static, skinned, wind-bent ] . Each group of 48 draw-calls computed on whichever cpu core, written in the pre-mapped BufferObjects. The main thread only unmapping and executing the draw-calls.

    The gpu cmdstream is one. Wanting to magically push more and more via the old methods indefinitely won't work. So, look at the HW caps, and think out of the box .

  10. #110
    Senior Member OpenGL Guru
    Join Date
    Dec 2000
    Location
    Reutlingen, Germany
    Posts
    2,042

    Re: Official feedback on OpenGL 4.0 thread

    @bsupnik: I think i cannot use DLs to do that, because many of the commands i use are not compiled into display lists. It is really a huge piece of rendering code that sets many different shaders, uniforms, binds textures, executes (instanced) drawcalls, etc. AFAIK at least some of that would not be included into DLs. Additionally i would need a second GL context and i would need to share all the data between them. I have no experience with that, but i think there are resources that are not easily shared.
    It would really be nice to create a "command buffer context" like in D3D11, that can cache ALL commands in a buffer and does not need to share resources, because those resources are not really accessed in that context, and then execute that command buffer in the main context.

    Of course i could create such a command buffer manually, through an own abstraction layer, but that is very much work and i don't have time for that. Actually, if i had time to invest into that piece of code, i would rewrite it entirely and make it much more efficient from the ground up. But atm that's out of the question. For the future a command buffer would still be nice to have, some things are simply hard to parallelize. Also the driver could already do some work when building the command buffer, and make later less work in the main thread.

    The simple conclusion is: We need to use multi-threading to improve our results, but OpenGL makes it very hard to use multi-threading for the rendering-setup. That needs to change.

    Jan.
    GLIM - Immediate Mode Emulation for GL3

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •