#246790 - 10/06/08 02:48 PM
Re: GL 3 & D3D: The War Is Over
[Re: Timothy Farrar]
|
OpenGL Guru
 
Registered: 03/15/01
Posts: 3768
|
Do you really think nvidia (or ati for that matter) is going to bother making a d3d specific tessellation hardware. Do you really think they are going to waste silicon on something that might not be used that much in the beginning, rather than to add more cores. "Adding more cores" won't get them the performance needed for tessellation. Maybe they'll implement it using straight software, or maybe it'll be a partial software/specialized hardware thing. Who knows? DX is giving the IHVs the freedom to decide how to go. In the spec for glsl there is some mention for blend shader like functionality (they rejected it because of some perceived performance issues for some unmentioned IHV), at that time the best hardware was the GF6xxx (or could even have been even the FX series), hardware has evolved a little bit more than that today. Yeah, and they still don't exist. As in, there is no hardware that can do that. Nor is there any in the near future. Why is is that DX get to add "hardware" features and not openGL? Because they make API releases in a timely manor? Because they release an API that is widely used? Because they get with the IHVs and find out what the hardware is going to be able to do in the near future and thus implement it in their API? Because their position as market leader allows them to dictate to an extent what IHVs will put into their hardware. Possibly all of the above. OpenGL is not in a position to do these things. It probably never will be.
|
|
Top
|
|
|
|
#246955 - 10/09/08 12:12 AM
Re: GL 3 & D3D: The War Is Over
[Re: Korval]
|
OpenGL Pro
   
Registered: 02/07/00
Posts: 1057
Loc: Morrisville, NC, USA
|
Programmable blenders have been difficult in the past in many architectures because GPUs have very deep pipelines, and one natural design is to tie blenders closely to the memory interface.
People have also asked to be able to perform raster operations in the shader program, but this is also tricky to do at full speed 1) with overlapping in-flight fragments and 2) multisample antialiasing.
These aren't insurmountable problems, but they're difficult at least, and for unclear benefit.
Tilers do offer a much more naturally extensible back end because they put the pixels in the cache, right next to the shader. Very low latency, very high bandwidth access to pixels has some really attractive properties.
Tilers have traditionally been unattractive because they place a heavier burden on the host CPU to bin and replay the command stream.
The interesting reason that tilers may be on the comeback is that a) a many-core GPU can do its own binning and b) a smart GPU can traverse a spatial data structure directly rather than making the CPU serialize what the GPU then needs to de-serialize.
If tilers are successful (which seems likely for any software-centric renderer), they will ease many aspects of programmability in the pipeline - like a reconfigurable pipeline, tessellation support, programmable blenders, alternative renderers like REYES or image order. The one exception for the near term is probably texture fetch, though I could imagine it becoming a separate (but distinct) coprocessor before too long.
The funny thing to me about all this is that graphics seems to be in the back seat on the future of GPUs. The graphics APIs have become relatively staid, and we're all looking with enthusiasm at what the compute APIs might do for us.
_________________________
Cass Everitt -- cass@xyzw.us
|
|
Top
|
|
|
|
#246993 - 10/09/08 08:12 AM
Re: GL 3 & D3D: The War Is Over
[Re: Korval]
|
OpenGL Pro
   
Registered: 02/07/00
Posts: 1057
Loc: Morrisville, NC, USA
|
In desktop discrete and high-end consoles (users that care about 3D graphics at all), I think you can argue pretty convincingly that tilers have not been successful to date. In the low end and embedded markets, the value proposition is a lot more favorable to them than in desktop discrete.
Traditionally, anyway. Like I said though, all the current winds of change are much more favorable to a tiler than they have been in the past. Reconfigurable pipelines, programmable blending, hw accelerated non-traditional renderers, gpgpu,... these make a many-autonomous-core-with-big-cache device attractive. And tiling is a no-brainer decision on such an architecture.
Dominance can be self-affirming. Immediate mode renderers were able to outpace tilers in the past due to different market and technology conditions. As a result, applications were written in such a way that tilers had difficulty with them.
The current innovation path with immediate mode renderers has pretty much stalled at DX9 capabilities, and the architectures themselves are not driving fundamental changes to the APIs. Changes in the technology landscape that Moore's law dictates are erasing many of the traditional benefits of an immediate mode pipeline tuned to maximize memory bandwidth efficiency and hide huge latencies.
The companies that recognize this trend are the ones that will be dominant in graphics 5 years from now. It may be the same companies that are there today, but it may not be.
_________________________
Cass Everitt -- cass@xyzw.us
|
|
Top
|
|
|
|
#247018 - 10/09/08 11:58 AM
Re: GL 3 & D3D: The War Is Over
[Re: cass]
|
Regular Contributor
  
Registered: 10/31/07
Posts: 163
Loc: Madison, WI
|
The interesting reason that tilers may be on the comeback is that a) a many-core GPU can do its own binning and b) a smart GPU can traverse a spatial data structure directly rather than making the CPU serialize what the GPU then needs to de-serialize. Yeah, but the long term future of many-core might require a fundamental change in order to insure good scalability. Core to core communication and latency easily becomes a bottleneck, where a single monolithic coherent cache simply doesn't scale. We would effectively get a bunch of cores with separate memory which we would need to stream (ie slow like texture streaming) assets to (based on resources required by the core for display traversal), and application which broadcasts a relatively small per frame data stream to each core (ie positions of camera, objects, etc). Then cores which do their own display traversal and update their tiles of the screen. Tiles get send to something which builds final output for display (or displays). Perhaps even overlapping tiles to possibly blend out artifacts of asset streaming being "just to late". Effectively the distributed raytracing model. We are bound to end up in the extremely non-uniform memory access model (ie effectively a network between cores) as computation continues to scale. This is already starting to happen, for example AMD sticking to dual chip for the highend with only 5 GiB/s per direction of cross communication and another 5 GiB/s per direction core communication with host. Seems to me that there is going to be a limit in the not too distant future that NVidia will hit which makes a single monolithic chip not cost effective. As for the short term. I don't see current GPUs being all that different from tilers (in fact with a pre-z pass they become somewhat like differed tile based renderers). Just that we have a serialized setup and a semi-parallel binning happening in hardware. Where the 2x2 fragment quads get binning in a fixed distribution across cores. Effectively each independent core gets a sequence of smaller tiles to process. The output merger (a cache) works in small tiles, accumulating 2x2 fragment quads into tiles and then doing per tile global memory transactions. Seems to me that the primary difference in Larrabee to GPU is tile size/granularity, details on swapping tiles in and out of cache, and binning. GPU might also have an advantage here in framebuffer output latency in that they can begin processing right away in draw order and swap more tiles in and out of cache. As for programmable blending and the near future of GPU hardware, if the CUDA PTX docs are any sign of NVidia's future, there is the .surf memory space which is accessible via surface instructions with R/W access and per context shareable. Just the .surf memory space hasn't been implemented yet. If this is in designs for future hardware, then we get a high latency (with respect to Larrabee) coherent cache on NVidia hardware as well, perhaps with free type conversion and blending. IMO, in 2009 I think we get an early view of how everything will unfold, if we still have NVidia's single chip monsters and triangle setup is parallelized in hardware (ie scaling a little better with bandwidth and computation), then our current GL/DX graphics API should remain quite useful for a while...
|
|
Top
|
|
|
|
|
25786 Members
13 Forums
54073 Topics
280436 Posts
Max Online: 482 @ 08/11/08 06:19 PM
|
|
|