I’m having troubles with moving my code from ATI Radeon 9000 card to my new GeForceFX 5600. I was unable to determine the size of vertex cache, my simple test shows that there’s no post-TNL caching at all, but that’s an absurd, isn’t it? As a result, vertex processing speed is amazingly slow, I can push only 11 mtri/sec against 30 mtri on my old Radeon. Does anyone have similar difficulties?
How exactly do you feed the vertices to the gfx card?
I´m assuming you use indexed vertices with glDrawRangeElements maybe even with VBO. In that case everything should work fine.
Also you have to be sure, that you are transform-limited, else your test won´t make any sense.
Hardware TNL is not gone from any GeForce products. It’s ATI that did that with the Radeon 7000 and 9100 IGP…
Maybe you’re not vertex transfer bound, but bound somewhere else? Maybe you’re off the fast path for some reason? For example, Index buffers should be in system memory for GeForce cards.
Try running VTune (if you’re on Intel CPU, else AMD’s profiler) and see where the hold-up is. Perhaps you’re spending time copying or converting data somewhere in the driver?
size of your vertex format, strange combinations of vertex formats, using non-standard types (shorts, bytes?)
alignment issues
is the vbo static or dynamic ? In the second case, are you mapping the buffer ? then be sure you don’t read from the mapped memory, only write to it, and only sequentially, without “holes”.
size of the vertex buffer, number of vertices/indices rendered per call ? Maybe it’s too high, or too low…?
DIP is Direct3D language specific, not everybody’s familiar with it on this forums
Y.
[This message has been edited by Ysaneya (edited 01-15-2004).]
I was unable to determine the size of vertex cache
I’m concerned about this. How is it that you are attempting to determine the size of the post-T&L cache? Perhaps you are using VBO’s in a fashion that worked fast on ATi cards, but does something odd on nVidia cards.
Also, you are using the most current FX drivers, right?
Lastly, I’m not certain, but I’m not sure that nVidia’s VBO implementation is as fast as using VAR yet. I seem to recall reading that somewhere on this forum, but I’m not entirely positive. If it turns out that VBO’s currently aren’t as fast as VAR (which is as fast as the hardware can go), then you can ignore it and wait until nVidia irons out their VBO implementation.