PDA

View Full Version : 1D texturing v.s. 2D texturing



NordFenris
10-04-2002, 12:08 AM
I was wondering whichever is more efficient, 1D or 2D. Well, I assume that 1D is quicker, but how much? Is it quick enough to justify a state change per model? I'm not hoping for definite answers here, just some discussion before I fire up a testbed for no reason. http://www.opengl.org/discussion_boards/ubb/smile.gif

Humus
10-04-2002, 12:19 AM
I don't think there's any other difference in performance between the two on any hardware.

jwatte
10-04-2002, 05:06 PM
Sending one texture coordinate is less data than sending two texture coordinates. Of course, you can send a single coordinate to a 2D texture if you wanted.

Also, un-aligned vertex formats (sizes like 28, 30, 36 etc) are pretty poor for bus transactions, which like multiples of powers of two. Thus, shaving two bytes off a 32-byte vert to make it 30 probably isn't a win.

Korval
10-04-2002, 05:19 PM
Also, un-aligned vertex formats (sizes like 28, 30, 36 etc) are pretty poor for bus transactions, which like multiples of powers of two. Thus, shaving two bytes off a 32-byte vert to make it 30 probably isn't a win.

Didn't we already go over why this is not true? I seem to remember making a very convincing (and true) argument that the alignment ultimately doesn't matter. Not only that, the final post in that thread shows that you agree with me, jwatte.

Here: http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/007250.html.

jwatte
10-05-2002, 03:14 PM
That link doesn't work, unfortunately. Probably because of the trailing period.

I do recall just recently attending an ATI evangelism session where they recommended aligning/sizing vertices, so that you only transferred one chunk per vert, and this resonated with the original belief deep inside me.

Did I change my mind? Perhaps. Read on:

If the card has a "cache" only for the vertex currently being used, and transfers data in aligned 32-byte chunks, then it makes sense to think that 30 bytes of vertex size will perform slower than properly aligned 32-byte vertex size, because 14 vertex accesses out of 16 will fetch two chunks instead of one.

If the difference in size is much bigger (size is 20 in the example you quoted), or if the card fetch logic works differently than I postulate above, then throughput will vary in some other way. I guess I'd like to re-state my previous opinion with a big "it probably varies by card, and you should profile to make sure".

(Is it just me or is opengl.org very slow today?)

Humus
10-05-2002, 04:20 PM
Originally posted by jwatte:
(Is it just me or is opengl.org very slow today?)

It's just you afaics.

mcraighead
10-05-2002, 06:16 PM
jwatte,

I'd expect that data transfer, not # of cache lines accessed, is going to be the bottleneck.

Compare to non-interleaved vertices, where you might access, say, 5 cache lines per vertex. Yes, interleaved is faster, but it's not _that_ much faster...

- Matt

jwatte
10-06-2002, 08:50 AM
Hmm.

I suppose some synthetic degenerate benchmark could be constructed and run on a variety of cards to find out one way or the other. I lean towards padding my 30 byte vertices out to 32, although keeping my 20 byte vertices at 20 bytes.

The question then becomes where to draw the line. Or whether to actually care -- with proper AGP memory management, vertex transfer is seldom your limitation anyway.

V-man
10-06-2002, 12:56 PM
It would be worth it to have such a benchmark. Have you tried it out?

Really, someone should make a super-duper benchmarker that uploads info onto a website.

V-man

harsman
10-07-2002, 01:22 AM
You mean like this one (http://www.fl-tw.com/opengl/GeomBench/) ?

V-man
10-07-2002, 10:26 AM
How do you change the data alignement?

PS: yes that is a nice one. I didn't know it had accumulated that much data. There is a surprising amount of people running on microsoft GDI. or is the software switching to software mode on purpose here?

V-man

Ysaneya
10-07-2002, 11:36 PM
No special trick, there's no option to run in software, you can download the sources to check. I have even more results sleeping on my hard drive, but i do not trust the graphs 100%. Some of the results are strange, i have to have a look into it. I mean, sometimes i see vertex arrays going at 50 MTris/sec, which card is able to do that ?!

Y.

NordFenris
10-10-2002, 10:05 AM
Well, thanks for the emptying answers and discussion. http://www.opengl.org/discussion_boards/ubb/smile.gif I'll just go with 2D sleep well on it. http://www.opengl.org/discussion_boards/ubb/smile.gif

jwatte
10-10-2002, 01:14 PM
The TNT2 can do something like 19 MTri per second!

((If you compile into a display list, and then don't change the modelview matrix after compiling.))