On VBO mapped data alignments

I posted this already without success so I’ll ask it again adding other information.

I need to update some pretty often.
Depending on the render path I could use standard VA or VBOs.
I’ll probably need to make it as fast as possible.
So, I really want to align to 128bit bounduaries to do SIMD acceleration. I found a nice way to use all the 4 components of the SWAR register so I could simply use the VP to shuffle in the GPU.
Now, for VA, I’ll just make sure I’ll allocate them aligned. This won’t take so much.
As for mapped VBOs, do I really need to check for their alignment? It would be lovely if drivers would be so nice to return a 128bit aligned address.
Obviously, this could be a recipe for mess but it would make things a lot easier and I think GPUs have stricter alignment problems than CPUs (I remember something from NV_var) so it could work anyway.

It would be lovely if drivers would be so nice to return a 128bit aligned address.
The VBO spec very clearly says nothing about the alignment of the pointer returned in VBO mapping. So you may not assume anything about this pointer. Which means, if you want it aligned to 128-bit, you will have to allocate 128-bit more of the buffer, and compute a pointer inside that buffer that is aligned to 128-bits. Then, you have to set up your gl*Pointers with new offsets for this alignment.

This is a more wordy way of saying what the other post said: you should try not to care.

Besides the fact it’s not the same thing, I am interested at the actual driver implementation. In theory antialiasing on win32 should not be there at all but we all know how drivers manage to make it possible anyway.
What I meant to say in both posts is that I knew since the beginning that the spec say that, I am asking if there’s a typical, “extended”, non-documented feature which could help this.
My apologies if I didn’t make this pretty explicit.

What I meant to say in both posts is that I knew since the beginning that the spec say that, I am asking if there’s a typical, “extended”, non-documented feature which could help this.
If there is, it would be one that can change with even so much as a CPU or motherboard change. Some new technology comes out that replaces AGP (like PCIe), and the rules change. Even a driver update can do it (if the implementers decide to change how they handle mapped VBO’s). Suddenly, existing code that was working just fine is now broken.

It would be exceeding bad to try to rely on any kind of current driver behavior in this regard. It’s only 16-byte alignment; it just means that your VBOs may waste 16 bytes of room. I would not be too concerned about the wasted memory if I were you.

In all reality, chances are that any VBO memory is more than 16 byte aligned, since it is likely to be DMA’ed. However, I wouldn’t want to stake the continued operation of my code on that. Especially considering that some implementations may give you system memory instead of driver memory when you map a buffer.

Originally posted by Korval:
If there is, it would be one that can change with even so much as a CPU or motherboard change. Some new technology comes out that replaces AGP (like PCIe), and the rules change. Even a driver update can do it (if the implementers decide to change how they handle mapped VBO’s). Suddenly, existing code that was working just fine is now broken.
Which is exactly how FSAA is enabled on all windows boxes… and no one seems to care since vendors made an easy way thuru it.

Originally posted by Korval:
It’s only 16-byte alignment; it just means that your VBOs may waste 16 bytes of room. I would not be too concerned about the wasted memory if I were you.
In fact, I wasn’t concerned about this. The non-accelerated (in the sense of non-VBO) vertex path is already doing this.
I mentioned SIMD for a good reason.
If the memory is not aligned, then I have to pad it. The needed machinery is somewhat ankward. I could work around it with some tricks, but this is a new functionality being added and I don’t want to spend too much time on the prototype.
Consider the memory is mapped, it’s 128bit aligned. I am lucky. Then I map it again and it needs to be padded by 4 bytes. The next time I map the buffer, it needs 12 padding bytes… Realizing this and workaround it correctly requires some shuffling at least.
For large data arrays this problably won’t be a problem but again, I just need to make it work right now.

Originally posted by Korval:
In all reality, chances are that any VBO memory is more than 16 byte aligned, since it is likely to be DMA’ed.
This is exactly what I need to know and what I suppose. Since I have access to a very limited range of renderers, I can’t speculate too much on this.
Could someone try this alignment thing on a broad range of video cards?

Originally posted by Korval:
Especially considering that some implementations may give you system memory instead of driver memory when you map a buffer.
I’m not aware of the problems involved by this. I always written and read the pointer returned by the map request normally and I never experienced any problem.