[OT] big structs performnce issues

I know this is more c related but this kind of optimization should only be an issue in a very high performance application(i.e. a graphics app),so if anyone knows,you know.Anyway say I have a strucure which is going to be accessed a lot so it makes sense to keep it as small as possible so that it will be more cache-friendly etc.

typedef struct fatstruct{
int this[3];
int that[3];
.
.
.
} fatstruct_t;

To this struct I need to add 2 integers, flags and level which both need not be more than 8 bits big.The performance of reading/writing level isn’t so important but flags is very important.What is the fastest to do:
a)put them both in one int(short might be slower due to alignment/integral promotion issues) named say levelnflags.Then I can have the level in the first byte which can be accessed like this:

fat.levelnflags&0xff

for reading.Writing might be a little more complicated but as I said performance of level isn’t so important.Flags can be accessed like usual:

fat.levelnflags

b)keep them in two seperate shorts, level and flags.This would be much cleaner to do and potentially faster.Memory alignment wouldn’t be much of an issue if I put flags on a 4 byte boundary but would accessing flags for reading or writing be slower due to integral promotion or whatever?
c)maybe bit fields could help?I’ve never actually used them so far.
d)any of the above
e)none of the above
So which one would be faster?The reason I’m making such a fuss about this is that this struct is the triangle struct in a ROAM renderer.Due to heavy bookkeeping involved with ROAM surfaces some parts of it tend to become IO bound rather than CPU bound.For example culling a tri involves reading and updating this flags byte for each culled tri.For 10000 tris per frame and 60 fps this may well take more than the actual computations needed.

[This message has been edited by zen (edited 06-28-2002).]

Mmm, I don’t think shaving the last bit off your data structures is going to help much in the long run.
Are you implementing your own stack with this ROAM method, or are you using recursive function calls?
If you’re using the recursive method, then the size of your structs is the least of your concerns…
Abandon ROAM - it’s pretty redundant these days.

Why would it help to implement my own stack?I can turn recursive to iterative this way but won’t the hardware stack be faster?
Anyway I’ve done a couple of benchmarks and it looks like this:
reading a short is faster than reading an int,almost twice as fast IIRC which is to be expected since only half is read.
Yet writing to an int is faster than writing to short.In my case though packing label and flags in one int mans some extra work when updating flags which makes it a lot slower.These benchmarks were loops of 10000000 reads and writes which isn’t very realistic but I think I can safely choose method b.
Regarding ROAM,it might be redundant for your average game needs with a 1kx1k or 2kx2k height map but try getting to 4kx4k or 8kx8k and I don’t think brute force or static lod is going to do.ROAM on the other hand,beeing frame coherent is pretty good for this kind of thing.Especially with some improvements to output big tri chunks(say 128 tris) instead of single tris which makes it much more HW friendly.Last time I checked the author of ROAM was working on 40kx40k height maps.I doubt though that these have much use in games.Unless you want 1mm resolution or something.