1D texturing v.s. 2D texturing

I was wondering whichever is more efficient, 1D or 2D. Well, I assume that 1D is quicker, but how much? Is it quick enough to justify a state change per model? I’m not hoping for definite answers here, just some discussion before I fire up a testbed for no reason.

I don’t think there’s any other difference in performance between the two on any hardware.

Sending one texture coordinate is less data than sending two texture coordinates. Of course, you can send a single coordinate to a 2D texture if you wanted.

Also, un-aligned vertex formats (sizes like 28, 30, 36 etc) are pretty poor for bus transactions, which like multiples of powers of two. Thus, shaving two bytes off a 32-byte vert to make it 30 probably isn’t a win.

Also, un-aligned vertex formats (sizes like 28, 30, 36 etc) are pretty poor for bus transactions, which like multiples of powers of two. Thus, shaving two bytes off a 32-byte vert to make it 30 probably isn’t a win.

Didn’t we already go over why this is not true? I seem to remember making a very convincing (and true) argument that the alignment ultimately doesn’t matter. Not only that, the final post in that thread shows that you agree with me, jwatte.

Here: http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/007250.html.

That link doesn’t work, unfortunately. Probably because of the trailing period.

I do recall just recently attending an ATI evangelism session where they recommended aligning/sizing vertices, so that you only transferred one chunk per vert, and this resonated with the original belief deep inside me.

Did I change my mind? Perhaps. Read on:

If the card has a “cache” only for the vertex currently being used, and transfers data in aligned 32-byte chunks, then it makes sense to think that 30 bytes of vertex size will perform slower than properly aligned 32-byte vertex size, because 14 vertex accesses out of 16 will fetch two chunks instead of one.

If the difference in size is much bigger (size is 20 in the example you quoted), or if the card fetch logic works differently than I postulate above, then throughput will vary in some other way. I guess I’d like to re-state my previous opinion with a big “it probably varies by card, and you should profile to make sure”.

(Is it just me or is opengl.org very slow today?)

Originally posted by jwatte:
(Is it just me or is opengl.org very slow today?)

It’s just you afaics.

jwatte,

I’d expect that data transfer, not # of cache lines accessed, is going to be the bottleneck.

Compare to non-interleaved vertices, where you might access, say, 5 cache lines per vertex. Yes, interleaved is faster, but it’s not that much faster…

  • Matt

Hmm.

I suppose some synthetic degenerate benchmark could be constructed and run on a variety of cards to find out one way or the other. I lean towards padding my 30 byte vertices out to 32, although keeping my 20 byte vertices at 20 bytes.

The question then becomes where to draw the line. Or whether to actually care – with proper AGP memory management, vertex transfer is seldom your limitation anyway.

It would be worth it to have such a benchmark. Have you tried it out?

Really, someone should make a super-duper benchmarker that uploads info onto a website.

V-man

You mean like this one ?

How do you change the data alignement?

PS: yes that is a nice one. I didn’t know it had accumulated that much data. There is a surprising amount of people running on microsoft GDI. or is the software switching to software mode on purpose here?

V-man

No special trick, there’s no option to run in software, you can download the sources to check. I have even more results sleeping on my hard drive, but i do not trust the graphs 100%. Some of the results are strange, i have to have a look into it. I mean, sometimes i see vertex arrays going at 50 MTris/sec, which card is able to do that ?!

Y.

Well, thanks for the emptying answers and discussion. I’ll just go with 2D sleep well on it.

The TNT2 can do something like 19 MTri per second!

((If you compile into a display list, and then don’t change the modelview matrix after compiling.))