Example:

GLSL:

Code :
layout (std140) uniform clientProjections {        
    vec3 clientEye;
    layout(row_major) mat4 clientModel;
    layout(row_major) mat4 clientMVP;
    layout(row_major) mat4 clientShadowMVP;
};

CLIENT:

Code :
glBufferSubData(GL_UNIFORM_BUFFER, uboElsOffset[2], sizeof(GLfloat) * 16, &clientMVP->m[0][0]);
YALOG.forceExit();

Fglrx silently ignores this modifier.


Workaround:

Code :
#elif __AMD__
layout (std140) uniform clientProjections {        
    vec3 clientEye;
    mat4 clientModel;
    mat4 clientMVP;
    mat4 clientShadowMVP;
};
#endif

Code :
if(YALOG.isAMD()) {
   glBufferSubData(GL_UNIFORM_BUFFER, uboElsOffset[2], sizeof(GLfloat) * 16, &(clientMVP.createTranspose())->m[0][0]);
}

I'm currently investigating the poor benchmark results for my 7980 test system. The code above crates a noticeable CPU overhead.