Memory alignment

Hi there,

i know this is unrelated to opengl but your a clever bunch and you’ll be able to point me in the right direction.

i’m looking for info on memory alignment. specfically the hows/whys/whens etc.

i’ve been looking on google.com but theres not much there so can anybody help?

Cheers

Allan

All ints and floats should be aligned on 4 bytes, all doubles on 8 bytes, that’s pretty much it, when ever you fail to do this most CPU’s are getting slower, on Intel CPU’s it’s pretty expensive to handle unaligned data.

If you want more fancy stuff you should also try to put related data together in tight groups of 16 or 32 bytes, aligned at 16 or 32 bytes, this improves the cache management a bit.

If you go to intels website you can find a pdf file about optimization on Intel cpu’s it’s a lot of assembler stuff but they also have lot’s of info about algnment there.

Mikael

Good question, now that we have someone that obviously knows what he is talking about … lemme ask a question …

Suppose I have a vertex structure defined in the following code and that I manage groups of vertices in a simple array of such structs …

struct Vertex
{
double x;
double y;
double z;
};

The sizeof() operator on that struct will return 24 (sizeof(Vertex) = 3sizeof(double) = 38 = 24). Would it be beneficial to pad it to 32 bytes such as in the following code? I could care less about the additional memory requirements, I have 512Mb of RAM.

struct Vertex
{
double x;
double y;
double z;
double padding;
};

Also, do you know if the Visual C++ optimizer does this for me already?

[This message has been edited by Iceman (edited 12-14-2001).]

This depends on the access pattern, I assume you’re worried about arrays of this type. It’s not going to help to create the pad, and use extra memory.

If you had small groups of floats you access fairly randomly then it pays to align. If you have an array, and you’re just reading through the array sequentially then it wouldn’t pay.

If this data was used for indexing, and the triangle vertices were jumping around it, then it could pay to align. If it was indexed and most primitives got reasonably sequential access it wouldn’t pay to align, and this is normally the case with indexed primitives, at least the first hit on vertices is sequential.

[This message has been edited by dorbie (edited 12-14-2001).]

[This message has been edited by dorbie (edited 12-14-2001).]

Also, to follow up, don’t use doubles. As nice as they are, most implementations internally convert them to single-precision floats when doing computations anyway. Not to mention, they’re pretty slow as far as floating-point math is concerned.

If you use an extension like NV_VAR or ATI_VAO, where vertex data is stored in AGP or video memory and is directly accessed by the hardware, then the question of padding becomes pretty hardware-specific. I would imagine, however, that since it is uncached random accessing, that padding wouldn’t be necessary or benifitial.

Most of the time I use floats for that very reason. In my current project, that is impossible since the numbers are so large (earth reference coordinates …).

Assuming that I am using floats, the question becomes similar …

struct Vertex
{
float x;
float y;
float z;
float padding; // ???
};

In another project, I use compiled vertex array such as the code snip below:

namespace
{
// The format of the functions we are going to use.
typedef void (APIENTRY *PFNGLLOCKARRAYSEXTPROC) (int first, int count);
typedef void (APIENTRY *PFNGLUNLOCKARRAYSEXTPROC) (void);

// Our two function pointers.
PFNGLLOCKARRAYSEXTPROC glLockArraysEXT = NULL;
PFNGLUNLOCKARRAYSEXTPROC glUnlockArraysEXT = NULL;

// Define a vertex structure
struct Vertex3
{
float x, y, z;
};

// Define a 2D texture coordinate structure
struct Vertex2
{
float u, v;
};

// Define the vertex array, it holds the vertices of each cube corner.
Vertex3 Vertices = { coords here };

// Define the texture coordinates, positioned exactly as the vertices above.
Vertex2 TextureCoords = { coords here };

}; // namespace

// END: Internal processing utilities
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/// Post-graphics initialization hook.

void MultiTexturedObj::Init()
{
// Init our lock function pointers.

glLockArraysEXT = (PFNGLLOCKARRAYSEXTPROC) wglGetProcAddress(“glLockArraysEXT”);
glUnlockArraysEXT = (PFNGLUNLOCKARRAYSEXTPROC) wglGetProcAddress(“glUnlockArraysEXT”);

if(glLockArraysEXT == NULL | | glUnlockArraysEXT == NULL)
{
MgWarning(
“Failed retrieving vertex array lock/unlock function addresses”,
MG_TRACEPOINT
);
exit(1);
}

stateManager().enableClient(GL_VERTEX_ARRAY);
stateManager().enableClient(GL_TEXTURE_COORD_ARRAY);
// …
glVertexPointer(3, GL_FLOAT, 0, Vertices);
glTexCoordPointer(2, GL_FLOAT, 0, TextureCoords);
glLockArraysEXT(0, 24);
}

// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/// Display hook.

void MultiTexturedObj::Render(MgGlStateManager& sMgr)
{
// Enable Face culling.
sMgr.enable(GL_CULL_FACE);
// Cull the front faces.
glCullFace(GL_FRONT);
// First pass.
glDrawArrays(GL_QUADS,0,24);
//Cull back polygons.
glCullFace(GL_BACK);
// 2nd pass
glDrawArrays(GL_QUADS,0,24);
}

I assume that since my code is working fine that I do not have an alignment problem. However, I seek to understand alignment also … hence the posts…

I thank you all for the explainations

[This message has been edited by Iceman (edited 12-14-2001).]