TnL in OpenGL vs D3D

I have read a bunch of video card reviews that talked about how TnL GPUs are not important yet because most games do not use them. This got me thinking about how it is enabled. Please correct me if I am wrong here.

OpenGL TnL. This is all done in the video card drivers. Code written last year will be TnL accelerated when run on a system with a TnL card. There is nothing special needing to be done to get the benifits.

Direct3D TnL. DX7 came with a TnL interface. If this interface is not requested, you don’t get TnL acceleration. Since this interface was not around before Dx7, everything written before that ( <= Dx6 ) is never going to request it.

If this is true, it is possible to write a D3D program now that doesn’t use TnL accel. but OpenGL has to use it (if it is avaiable). This leads to another question, why would you ever NOT want to use TnL acceleration?

Insights?

Yes, it’s possible to write an application in D3D without T&L support, and sadly enough most of the games today doesn’t use it. An OpenGL application will automatically use T if supported by hardware and driver, but L is optional. Most games uses lightmapping instead.

There are certain situations where T&L is degrading performance, eg. when you have high AGP traffic or the card is highly stressed while the CPU is not (a lot of textures and/or very high res). A GF/GF2 performs slightly better in 1600x1200x32 with T&L off in most current games.

Just in case i’ve missed something… Hardware transformations mechanism seems to operate only on the GeForce family when using the VertexArrayRange extension… Could you confirm this point??? When benchmarking with simple use of vertexArrays (basic mechanism) on Tnt2 Boards and GeForce DDR it’s about the same RasterTiming approx 1% difference (with a simple test object, i am not testing fillrate only t&l! )… Of course, perhaps i’ve done some mistakes but it really seems that if you don’t use NV proprietary extensions then Transformations are done in software! Agree?

Hmm … I can’t confirm nor deny that, since I don’t own a GF
I haven’t heard about that before … seams strange to me …

I don’t think this is the case because I wrote a program that shows about 2000 triangles at once. On a P3-450 (Win2K) with a GeForce SDR is ran smooth. Then I tried the exact same executable on a P3-667 (Win2K) with a Matrox G400 (low end version) and it was jerky. This is what got me thinking about the TnL implementation in the first place.

I think the NV extensions just put the GPU under your control so you don’t have to rely solely on the drivers.

I agree with you Sylvain. From my experience the GPU is not used unless you use the vertex array extension.

From my own implementations, I saw little or no performance difference between normal vertex arrays, and the vertex array range extension of gf/gf2. The GL_NV_FENCE extension did little as well. I also used the wglNvAllocMemory to store the arrays in video memory (not agp memory!). The scene consisted of about 8,000+ textured & lit triangles drawn in tri-strips. Appx. 203 FPS at 1024x768x32 w/ the vertex array range, and 200 without. Not much diff. if you ask me, and hardly worth the time spent getting it to work. =)

[This message has been edited by fenris (edited 09-10-2000).]

Under OpenGL you always get hardware T&L on GeForce if you’re letting OpenGL do the work.

It’s entirely possible to get the same frame rate with TNT & GeForce – particularly if you’re not doing anything except T&L in your app. What else does the CPU have to do? Also, if the application is fill-limited, then T&L performance is irrelevant, so be careful of that.

The vertex array range extension allows the GPU to pull the data directly, so it puts the smallest load possible on the CPU (which must still copy the array indices).

If VAR does not make things faster, then the T&L engine or fill is the bottleneck. When data transfer between the CPU and GPU is the bottleneck, then VAR can easily give 2X+ improvement.

Hope this helps …
Cass

[This message has been edited by cass (edited 09-10-2000).]

Ferris, an 8000 poly app isnt really pushing the T&L, more the fill rate. Try the same with more than a 100,000 polys, I would expect to seee a difference then.

I am really not sure that it is really the case ‘CASS’ it should have been but Nvidia seems to have disabled the GPU when using standart GL implementation… Don’t ask me why!! Anyway, try a 100000 polys test just like ‘ADRIAN’ thought and you’ll see…
Anyway, i’ll make more tests concerning this issue and i’ll post them here… thank you all for the replies & infos!!

Sylvain,

I work for NVIDIA. We don’t require using NV_vertex_array_range or other vertex array extensions to get hardware T&L.

Thanks -
Cass

Hi there !

I agree with Cass (although I do not work at nVidia : I am just a registered developer !).

You do not need to use the vertex_array_range extension to enable T&L on the GeForce series…

On the other hand, it is advised (by nVidia !) to use special types of variables if you want to have full T&L : use floats for vertices and ubytes for colors. This does not mean that you won’t get HW T&L when using doubles or whatever ; it just means that using floats and ubyte really shows the power of HW T&L. And believe me : I switched some of my apps from using doubles to using floats and the gain was impressive !

Another thing I found when using GeForce (that was with 3.xx Detonator series (what they call Detonator 3 now is 6.xx series !)) : when using display lists, it was a lot better using glNewList(GL_COMPILE) and then using glCallList than using glNewList(GL_COMPILE_AND_EXECUTE). I haven’t checked this strange behaviour on the last drivers…

Have fun !

Regards.

Eric

Thanks for clearing this up Cass.

Looks like your famous
http://www.nvnews.net/articles/cass_everitt.shtml