PDA

View Full Version : instancing support and multiple render targets



michagl
04-29-2005, 02:05 PM
i've been trying to figure out what the 'instancing support' component of the shader3.0 specifications means.

if it isn't obvious what i'm talking about look at the bottom of this document:

http://www.microsoft.com/whdc/winhec/partners/shadermodel30_NVIDIA.mspx

sorry, but also while i'm here i would like to know exactly what MRT is. i assumed this meant you could say set multiple colours in a frag shader and it would write out to multiple frame buffers, but i really don't know.

its easy to find 'buzz words' but hard to find real tangible info.

can i get a brief run down or document please?

sincerely,

michael

dorbie
04-29-2005, 03:09 PM
What could you possibly need to be added to the description:

"Instancing allows the programmer to store a single tree, and then several other vertex data streams to specify the per-instance color, height, branch size and so on. For instance, a single 1000-vertex tree model would contain the vertex positions and normals, and a 200-element vertex streams would contain positions, colors, heights, and branch length values. Instancing allows the programmer to submit a single draw call, which renders each of the 200 trees, using the same data for the basic tree shape, but then vary it through the per-instance streams."

If you interpret "tree model" literally as a model of a tree (the leafy kind), there's no jargon and it's quite easy to understand.

michagl
04-29-2005, 04:17 PM
sorry, was that in the document i referenced? i will have to give it another look... i thought it ended at the end of the vertex shader table.

still i'm a bit confused by that description and i read it at least three times just now...

of course i follow the jist of it, but the details leave me wanting.

i will try to pin it down with a few more reads...

but i would still very much like to know how this is implimented with opengl/shaders that couldn't be done with 2.0 shaders.

sorry again if that quote was from the document i referenced. maybe my browser didn't get around to loading it... dial-up in the middle of nowhere.

michagl
04-29-2005, 04:32 PM
this instancing functionality looks awesome... can opengl not leverage this yet?

any demos of these trees available directx or not?

i'm sure i can put this to use serious use if had the hardware.

what are the typical numbers on the instancing?

is there any hardware in there for parallel rendering? or does the hardware just replicate dispatches?

do the 6800 nvidia cards have any better instancing capabilities over the 6600 cards?

dorbie
04-29-2005, 06:04 PM
OpenGL has display lists and you can change state beween display list and even use that to drive shader parameters for example but I don't think that's exactly analogous to the D3D stuff. I'm not sure though.

michagl
04-29-2005, 06:39 PM
are the d3d shader3.0 specifications purely hardware related, or does microsoft mix hardware and software specifications?

that is is the instancing feature purely a hardware side feature? and if so is it just a matter of the opengl boards working out how they want to address this new hardware functionality from the API perspective that is holding this up?

also is the instancing really supported reasonably in current hardware, or is it just a goal sort of like vertex texturing?

Korval
04-29-2005, 08:26 PM
Instancing in D3D is designed primarily to work around a fundamental problem in the Direct3D architecture and API. Calling DrawPrimitive (the primary array drawing function, along with DrawIndexedPrimitive) is not a thing to take lightly if you are at all concerned about performance. Such a call directly calls a hardware driver (not like OpenGL drivers), and hardware drivers have to run outside of OS protected mode, thus forcing a CPU switch to Ring 0. This is not a cheap operation; according to nVidia, a 1GHz CPU can only do approximately 100,000 of them per second. Not a lot if you're runing at 75fps.

Through instancing, instead of drawing a mesh 1,000 times, you draw it using one DrawPrimitive call (and a massive index list). That way, you have saved 999 potential DrawPrimitve calls that can be used on more interesting stuff.

Here's the thing. The equivalent OpenGL function, glDrawElements, does not force an immediate switch to Ring 0. The GL driver will need to do one eventually, in order to feed the hardware FIFO. However, the GL driver decides when this is a good idea, not the external code. Additionally, if you have made several glDrawElements calls since the last time it had to feed the hardware FIFO, it can queue them up and drop them into the FIFO all at once. Effectively, OpenGL allows the GL driver to marshal calls to the GPU.

As a side note, glFlush is a way to controll marshalling. It tells the GL implementation to block until it has actually placed all waiting commands into the hardware FIFO. If the hardware is slower than the caller, it can take a while for the hardware FIFO to be empty enough to need refilling.

So, clearly, OpenGL doesn't need instancing nearly as much as D3D. However, this is not to say that instancing wouldn't be a performance win on GL; glDrawElements is not free after all, and neither are state changes (another thing that instancing gets around by hiding them in vertex attributes). There's been some significant debate in this forum on this very subject, with no conclusions drawn either way. Personally, I think it should be exposed, since it might give a performance win, and it already exists in D3D drivers.

However, there is a strong suggestion, though no certifiable proof, that ATi's D3D instancing implementation is purely software based. According to some, it turns the single call into many separate calls. You still get the performance benifit of not having 999 swaps to Ring 0 by this method, since it is fully in the control of the hardware driver.

Obli
04-29-2005, 11:30 PM
Originally posted by Korval:
There's been some significant debate in this forum on this very subject, with no conclusions drawn either way. Personally, I think it should be exposed, since it might give a performance win, and it already exists in D3D drivers.I read that with attention so maybe I can summarize what I've understood from the pas discussions. It's basically what has already been said in a more concise and less accurate fashion.
(Against instancing)
Since GL processes small batches much faster than D3D, the performance gain would be less noticeable.
(Pro instancing)
Even on old video cards (I guess the situation isn't changed now), big batches takes "proportionally" less time to render, sometimes by am order of magnitude.

Anyway I took a glance at D3D's instancing interfaces and I'm not sure I like them too much. So, I just hope that it will take some redesign before going to GL.

LarsMiddendorf
04-30-2005, 12:55 AM
There is a pdf describing pseudo instancing in the nvidia sdk. They use glMultiTexCoord to pass the Modelview Matrix which is faster than glUniform or changing the Matrix.

http://download.developer.nvidia.com/dev..._instancing.pdf (http://download.developer.nvidia.com/developer/SDK/Individual_Samples/DEMOS/OpenGL/src/glsl_pseudo_instancing/docs/glsl_pseudo_instancing.pdf)

michagl
04-30-2005, 10:59 AM
thanks for clearing this up everyone.

that is surprising how durnderheaded d3d is. it souns as if it effectively has no queue / arbitor system. maybe some people get off on having a direct link to the hardware, but at the end of the day it is probably not worth it. why come d3d cannot have a state for enabling queueing?

as for instancing, i'm up for anything that can help bring the disparity between small batching and large batching to a proportional equilibrium.

but i agree that introducing instancing concepts into the opengl api will almost undoubtedly be quite a can of worms, and could probably use as much thought as possible. i have no problem waiting on it, but if real functionality exists in the hardware for any manner of instancing i think it should be exposed asap by third party disposable extensions if need be.

surely some sort of cva/var/vbo mutual exclusive situation could be arranged. (if that is there is really physical hardware out there for this stuff)

Korval
04-30-2005, 12:59 PM
it souns as if it effectively has no queue / arbitor system.The techical term is "Marshalling". And D3D doesn't have it because it can't.

One principle advantage of D3D is that Microsoft writes a whole lot of it, so IHV's don't have to write very much to get a functioning implementation. When you call a D3D function, you go into Microsoft's code, and then it, if needed, calls the hardware driver (switching to Ring 0) via an API known only to IHVs. The IHVs implement this API into their drivers.

This is why D3D drivers tend to be more stable than GL ones (particularly cross-hardware); implementing a D3D driver is far less complex than an OpenGL one.

As with most advantages, it has drawbacks. One of these is that the Microsoft part of D3D has no real idea of the hardware. When you call glDrawElements, you're calling directly into IHV code, so it knows exactly how to marshal these calls. When you call D3D's DrawPrimitive, it has no idea if the hardware can accept the call immediately or if it should set it aside for later. Since the IHV portion of D3D runs in Ring 0, there's no chance for the people who have access to this information to know how to correctly marshal calls.

So it's wrong to say that it was a boneheaded idea on Microsoft's part. DirectX has as one of its principle design goals to make it relatively easy for IHVs to write implementations. In some cases, this inhibits performance. To them (for the moment, as they are actually correcting this in DX10) this is a reasonable tradeoff. And, on some level, I can't disagree, considering the sheer number of bugs we OpenGL developers have to deal with on a daily basis.


but i agree that introducing instancing concepts into the opengl api will almost undoubtedly be quite a can of worms, and could probably use as much thought as possible.I've kinda danced around what "instancing" actually is. Really, the functionality required for "instancing" to work is just a modification on how the vertex processor transforms a vertex index into a memory address to pull data from. Effectively, you set up the ability to have the vertex processor perform a simple transformation on the index (in D3D's case, a mod operation I believe) before translating it into a memory address. So, the GL API would simply be a way to set that mod value on a per-array level.

It is through the clever use of this functionality that one achieves the result of instancing.


if real functionality exists in the hardware for any manner of instancing i think it should be exposed asap by third party disposable extensions if need be.Unfortunately, nVidia doesn't think that it is necessary under OpenGL; they agree with the ARB's opinion that instancing performance can be mostly approximated by a combination of immediate mode calls and glDrawElements calls.

M/\dm/\n
04-30-2005, 01:41 PM
I don't know what your stance is, but I think it's bad to flood GL with 100 different vendor specific codepaths for things that gives like 1% performance increase and 10% more code, possibly bugs...

michagl
04-30-2005, 02:12 PM
thanks for all the input korval.

to me a proper opengl instancing api would be most likely a combination of software and hardware functionality.

the way i see it, from the high level api perspective, it would be equivalent to say passing a single set of vertex attributes which result in multiple 'instances' of the gemoetry being produced, perhaps with modulated non vertex attributes such as transforms, textures, materials, shaders, etc.

i believe technicly it would be best to describe opengl instancing as modulating everything but the per vertex attributes. (something has to remain constant otherwise it isn't instancing).

you would setup everything that modulates the instancing prior to dispatching a glDrawElements command, and beyond that everything would be implimentation dependant.

for instance you could have an array of say 100 transforms in video memory. then the instancing infrastructure would pass the singl DrawElements sequence bound to the 100 transforms and the hardware would recieve this and perform one hundred loops of of the 'vbo' setup without consulting the driver while changing the transform with each iteration.

when the smoke is clear you get a field of indapendantly transformed daisies or whatever you want. and even though the daisy geometry itself is a relatively small geometry batch, you don't loose any thing to this because the batches are all processed without driver intervention.

so this is basicly how i would invision opengl instancing. the actual new API commands is a whole nother problem though that i think should be given considerable consideration... but i do believe some framework like this will eventually be integrated into opengl even if it might seem like too 'high-level' of a task for a low-level api like opengl right now. eventually graphics will get so hairy that things that seem high-level now will be considered low-level in the future. i think this is inevitable.

of course with this sort of functionality it is easy to image parrallel rendering pipelines... but even if no parallel capabilities exist at the least instancing could in many cases drasticly elliminate redundant interaction between drivers and hardware.

Brolingstanz
04-30-2005, 02:23 PM
it's my understanding that d3d9 runtime has a command buffer mechanism not that unlike the GL's in that a kernel mode transition is deferred pending a full buffer. this leaves me scratching my head though...i have always heard of the high costs of api calls in the d3d runtime as well.

software based instancing might not be that bad in the end provided it can somehow sync with the buffer flush in a graceful way and all the necessary commands will fit neatly in a single buffer (there would be only one mode transition per flush in this case). actually im having a hard time seeing the difference except for any batch dispatch overhead associated with chunking the instance up 1000 times. i suspect the devil is in the details though.

regards,
bonehead

Korval
04-30-2005, 08:27 PM
I don't know what your stance is, but I think it's bad to flood GL with 100 different vendor specific codepaths for things that gives like 1% performance increase and 10% more code, possibly bugs...I don't know who you're talking to, but we haven't discussed a vendor-specific extension. Plus, nobody's come forth with conclusive evidence that OepnGL would only get a 1% performance increase from it. Lastly, I submit that glslang has caused far more driver bugs than any 10 other extensions in OpenGL, let alone this comparitively trivial functionailty.


so this is basicly how i would invision opengl instancing.That is way too high level. The thing about D3D's functionality is that mesh instancing isn't the only thing it can be used for. Since they expose the mod operation and values directly, you can use it for more than just instancing behavior.

At the moment, I can't think of anything more to do with it, but that hardly means that there isn't anything out there. OpenGL is a low-level API, and should remain such.

Brolingstanz
05-01-2005, 11:40 AM
for instance you could have an array of say 100 transforms in video memory. then the instancing infrastructure would pass the singl DrawElements sequence bound to the 100 transforms and the hardware would recieve this and perform one hundred loops of of the 'vbo' setup without consulting the driver while changing the transform with each iteration.the way it works in d3d is that you have 2 streams...one has the geometry to render, the other has the per instance stuff. depending on whether you're using indices, the geometry channel has either complete copies of your geometry (mod by stride), or a single copy for the indexed version (mod by instance count). this is really just basic streaming...with a twist (or mod i should say).

complements,
bonehead

michagl
05-01-2005, 11:56 AM
it might seem a little high level in this day and age, but if it can be managed on the hardware avoiding redundant driver/hardware communication and bookkeeping it would be a worth while win.

what i proprosed is really no different than what is achieved with the vbo interface or display lists, just a way of sort mixing display list and vbo vunctionality in a well defined fashion that does not require changing client side states.

michagl
05-01-2005, 08:01 PM
you've posted in the wrong thread somehow boney.

V-man
05-02-2005, 08:05 AM
Farcry has instancing support

http://www.driverheaven.net/showthread.php?s=&threadid=51500

I hear that the ATI X... cards have hw instancing support. All SM3 hw should as well.

AFAIK, no one has demontrated that GL would not benifit from this.

More
http://www.humus.ca/index.php?page=3D&ID=52
http://www.idvinc.com/