PDA

View Full Version : Mapping of Vertex Formats to DMA Streams



Christian Schüler
06-04-2004, 02:47 AM
Hi,
recently, I became curious about the way HW maps Vertex Formats and Vertex Streams. In DirectX, you usually have few streams of some declared vertex format. In OpenGL however there are no vertex formats, it's all glPointer().

This then raised the question, which scheme does more justice to how HW actually works. Are there some 8 or 16 DMA channels available to match the OpenGL model?

jwatte
06-04-2004, 08:01 AM
It depends on the hardware.

And, for most hardware, I wouldn't view it as a "DMA channel" as much as "cache lines" or something similar in the hardware. I e, it's possible that data stride and aliasing matters as much as how scattered (or not) it is.

Also, the driver is likely to be able to combine interleaved arrays into a single memory fetch. If you have a struct that contains your vertex, and your vertex array is an array of this struct, then all the Pointer() calls collapse down to a single "stream" anyway.

Christian Schüler
06-05-2004, 04:17 AM
Originally posted by jwatte:
It depends on the hardware.

And, for most hardware, I wouldn't view it as a "DMA channel" as much as "cache lines" or something similar in the hardware. I e, it's possible that data stride and aliasing matters as much as how scattered (or not) it is.

Also, the driver is likely to be able to combine interleaved arrays into a single memory fetch. If you have a struct that contains your vertex, and your vertex array is an array of this struct, then all the Pointer() calls collapse down to a single "stream" anyway.Thanks.

It seems I need to go and make some experiments again. Back in GF2 times, I found no diff in performance when putting a 32 bytes vertex format (pos/nrm/tex) interleaved or not. I then went for non-interleaved (structure of arrays, basically each mesh component has it's own VBO) as it is more flexible.

Would love to do this in DX with the same ease ;)

jwatte
06-05-2004, 11:11 AM
IDirect3DDevice9::SetStreamSource() not good enough for you? ;-)

Christian Schüler
06-06-2004, 11:26 PM
Of course it's doable, but you must check the caps to see how many streams the driver supports and that all your data must be in vertex buffers.

Then again, a GL equivalent of SetStreamSourceFreq() would be nice.

jwatte
06-07-2004, 08:09 AM
Then again, a GL equivalent of SetStreamSourceFreq() would be nice.Or, even better, the rumored DX 9.1 update which allows you to specify repeat across the vertex buffer (i e, only specify the shared vertices once). Yum!

Korval
06-07-2004, 10:42 AM
Or, even better, the rumored DX 9.1 update which allows you to specify repeat across the vertex buffer (i e, only specify the shared vertices once). Yum!I'm not entirely sure what this means. Could you clarify it?

Christian Schüler
06-08-2004, 12:41 AM
It sounds like place vertices for multiple instances once into a buffer and set repeat to the size of the buffer, while the transform data comes from another steam with reduced frequency. It would make a lot of sense.

Obli
06-08-2004, 02:57 AM
Originally posted by Christian Schler:
It sounds like place vertices for multiple instances once into a buffer and set repeat to the size of the buffer, while the transform data comes from another steam with reduced frequency. It would make a lot of sense.I could see some cases in which it would help but I'm not sure I've understood this correctly.
Does this means that say, the geometry array can be 400 elements wide while texcord[0] array can be 100 elements wide and repeated 4 times (something like a wrap-around)?

Adruab
06-08-2004, 08:34 AM
Yeah that's the general idea. DX is targetting it more at instancing. I.E. putting the geometry for a highly replicated object in one buffer and then putting the transform/other per obj data in another stream and have it draw the per obj data only once per every full cycle through the geometry data. Effective this removes the overhead for all the state changes required for rendering individual objects. There are other applications as you mention, though they aren't as widely applicable (not sure why you'd want to replicate 100 tx coords :p ).

I think what the guy mentioning repeat is talking about is just that, being able to tell the driver that you're duplicating the information in a table. Most of the time when building an indexed mesh you have to duplicate position information if the corresponding vertex on a different poly has a different texture coordinate/normal etc (moving from 3DS to interleaved arrays, also talked about in problems with texture seaming).

It could lead to a reduction in memory if you only had to specify duplication indices for those vertices after the table of unique ones. Still, I don't think you'd gain much if any performance due to restrictions on how to organize vertex data (not nice for direct streaming...), unless we had much bigger vertex caches (reuseable stream output...) and we could specify the calculations on the different streams as seperable (utilize already transformed positions with seperately calculated normals and such). Ooooh, if that was possible, I'd buy into this any day :) .