Only 4 muti-textures pixel on FX 5200?

Hi,

I have GeForce FX 5200 and it says that it has 16 textures per pixel on the box. But, why OpenGL thinks it has only 4 textures?

I mean following code returns 4!

int nMaxTexUnit;
glGetIntegerv(GL_MAX_TEXTURE_UNITS,&nMaxTexUnit);

you have to use NV_fragment_program or ARB_fragment_programm to get access for the texture units beyond texture unit 4. Nvidia decided to expose only 4 texture units for the legacy multitextureing pipeline.

does the gf fx have more than 4 texture units??

Jan

You can access up to 16 separate textures in a fragment program on all GeForce FX GPU’s. You can query this amount by checking GL_MAX_TEXTURE_IMAGE_UNITS_ARB.

But to do so, these chips have to have 16 texure units, don’t they? the gf 4 has four, so the have four times as many? wow…

How do you bind textures to them if they are not accessible with ordinary multitexture commands? And how do you assign texture coordinates?

Jan

It’s all in the ARB_fragment_program spec.

You will note that there are only 8 channels of texture coordinates into the vertex pipe, and also between vertex and fragment. However, the fragment pipe allows you to use anything for texture coordinates, including dependent values that you calculate on your own.

and well does this now mean that the fx has 16 texture units (although you cannot use them without fragment programs), 16 things of what the gf 4 only had four? four times as many?

Then I think I have to buy one…

Jan

just confirming jwatte:

glGetIntegerv(GL_MAX_TEXTURE_IMAGE_UNITS_ARB,&n1);
returns 16, but

glGetIntegerv(GL_MAX_TEXTURE_COORDS_ARB,&n2);
returns 8.

Originally posted by JanHH:
But to do so, these chips have to have 16 texure units, don’t they? the gf 4 has four, so the have four times as many? wow…
These chips don’t have as many physical samplers per pipeline, they instead go into “loopback” mode.

I’m not sure about the pipe organisation of NV34 atm, but Geforce 4 Ti is straight 4x2. Ie if you use more than two bilinear filtered textures the fragments will recirculate on chip.

[This message has been edited by zeckensack (edited 12-26-2003).]

NV30 and NV35 are said to be 4x2
NV31 and NV34 are 4x1

I like to think of graphics cards as special-purpose CPUs. Texture fetch is “just” a memory access, with some magic filtering built-in. This is not unlike other domain-specific signal processors found in embedded systems.

Thus, it’s not that there’s physical “texture samplers” as much as there’s instructions to fetch texture values, some number of execution units, and some number of texture memory access pipes. Depending on access locality, sequencing, and phase of the moon, you can make 5 fetches run really slowly, or you can make 8 fetches run reasonably fast.

Originally posted by Ostsol:
Geforce 4 Ti is straight 4x2. Ie if you use more than two bilinear filtered textures the fragments will recirculate on chip

I am trying to gain rending speed in my volume rendering application by drawing n slices at once on a single rectangular polygon(if the card has n texture units), insteady of drawing one slice on one polygon.

Assuming that I have GF4 Ti, am I then not going to have close to four fold speed up in this way? Maybe just then two times faster speed up?

Originally posted by Jay2:
[b] I am trying to gain rending speed in my volume rendering application by drawing n slices at once on a single rectangular polygon(if the card has n texture units), insteady of drawing one slice on one polygon.

Assuming that I have GF4 Ti, am I then not going to have close to four fold speed up in this way? Maybe just then two times faster speed up?[/b]
Consider the alternatives. If the chip only allowed for dual texturing, you’d have to multipass, which involves blending and resending geometry. Blending is very expensive due to bandwidth requirements. On-chip loopback doesn’t consume any external bandwidth, just fillrate (which you’d need for the multipass solution as well).

A hypothetical NV25 with four “real” texture units per pipe would be faster (clock for clock), no doubt. But there is no such chip - well, there’s Parhelia (4x4) …

Loopback designs are less complex than “real” silicon and it’s likely easier to get better power characteristics and higher clock speeds out of them which would somewhat offset the loss in texturing performance per clock.

You also get higher average unit utilization, so less transistors sit idle. With a x1 design, all units will be busy if you use textures at all. With a x2 design, you only reach peak utilization with 2, 4 (, 6, 8 etc) texture layers.

This makes loopback an attractive design choice. “Best bang for the buck”, so to speak.

Here’s a post from Nvidia that explains the whole tex unit, texcoord, and tex image unit difference on GF-FX class GPUs:
http://www.cgshaders.org/forums/viewtopic.php?t=1041&highlight=

Here’s the original post on Opengl.org
http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/009119.html

[This message has been edited by dorbie (edited 12-28-2003).]

Here’s a review by Toms hardware, it lists all the chip numbers (I always get confused by all the NV#s) and it also lists the pipes. The numbers seem to be a bit different from what was mentioned above.
http://www6.tomshardware.com/graphic/20031229/index.html