PDA

View Full Version : AMD Bugs in EXT_shader_image_load_store



oscarbg
10-28-2010, 09:55 AM
Hi I have patched excellent icare3d demo
http://blog.icare3d.org/2010/07/opengl-40-abuffer-v20-linked-lists-of.html
for AMD compatibilty the demo plus source code is here:
http://dl.dropbox.com/u/1416327/ABufferLinkedListAMD.rar
The problem in short is that altough I have patched to use only EXT_shader_image_load_store as even in tex mode used NV_shader_buffer_load and store

extensions the demo works ok still in NV hardware in AMD hardware still doesn't work..
The shaders compile fine and after linking it seems every GL function isn't working..

Also altough it work still I see two fixes I have done that can be bugs in AMD as Nvidia doesn't require:
1. layout specifier is the first thing to specify in image declarations for images
don't work on AMD (nvidia works):
coherent uniform layout(size1x32) uimage2D abufferPageIdxImg
this works:
layout(size1x32) coherent uniform uimage2D abufferPageIdxImg;

2.shader compiler says #version is the first thing to be used even before macros:
is this really needed per spec i.e. shouldn't macro preprocessing be done earlier so macro usage before #version will work correctly as Nvidia drivers do?


Please AMD boys monitoring this forums download and try to fix errors in AMD implementation..
Summarized changes are:

*Remove "inline" keywords in shaders as this even seems to don't work on NV drivers..
*Change every code casting a float to int from (int)(a) or (int) a -> int(a) that's the correct way..
*(AMD Bug?)layout specifier for images has to be used first so instead of

coherent uniform layout(size1x32) uimage2D abufferPageIdxImg
use
layout(size1x32) coherent uniform uimage2D abufferPageIdxImg;

*set to use all tex mode demo at start
i.e.:
int pABufferUseTextures=1;
int pSharedPoolUseTextures=1;
even
by default mode for tex had
int pABufferUseTextures=1;
int pSharedPoolUseTextures=0;

I have defined int pSharedPageUseTextures=1; for the remaining thing (curSharedPage) using nv buffer_load/store..
and change acordingly to use tex stores

same code changes:
setShadersGlobalMacro("CURSHAREDPAGE_USE_TEXTURES",pSharedPageUseTextures);

if(pSharedPageUseTextures)
{
if(!curSharedPageTexID)
glGenTextures(1, &curSharedPageTexID);
glActiveTexture(GL_TEXTURE5);
glBindTexture(GL_TEXTURE_BUFFER, curSharedPageTexID);
//Associate BO storage with the texture
glTexBuffer(GL_TEXTURE_BUFFER, GL_R32F, curSharedPageBuffID);
glBindImageTextureEXT(5, curSharedPageTexID, 0, false, 0, GL_READ_WRITE, GL_R32UI);
checkGLError ("curSharedPageTexID");
}
if(pSharedPageUseTextures)
{
glProgramUniform1iEXT(prog, glGetUniformLocation(prog, "curSharedPageImg"), 5);
}
else
{
glProgramUniformui64NV(prog, glGetUniformLocation(prog, "d_curSharedPage"), curSharedPageAddress);
}

Also removed use of shader buffer load and store functions even for buffer objects associated to texture buffers :
if(havebufferext)
{
glMakeBufferResidentNV(GL_TEXTURE_BUFFER, GL_READ_WRITE);
glGetBufferParameterui64vNV(GL_TEXTURE_BUFFER, GL_BUFFER_GPU_ADDRESS_NV, &sharedPageListAddress);
}


finally seems ok to use glut normal context creation in AMD so:
int glutf=0;
if(glutf)
{
//Init OpenGL 4.0 context
glutInitContextVersion (4, 0);
glutInitContextProfile(GLUT_CORE_PROFILE );
glutInitContextProfile( GLUT_COMPATIBILITY_PROFILE); //needed for glutBitmapCharacter
glutInitContextFlags (GLUT_FORWARD_COMPATIBLE ); //Can be uses for compatibility with openGL 2.x
}

finally another AMD bug can be shader compiler saying #version is the first thing to be used even before macros:
is this really needed per spec i.e. shouldn't macro preprocessing be done earlier so macro usage before #version usege work correctly as Nvidia drivers do?

see code for putting macros right after #version..

Forgot to add in the code for correct compatibility memorybarrierext i.e. change shader_buffer_load_store
glMemoryBarrierEXT(GL_SHADER_GLOBAL_ACCESS_BARRIER _BIT_NV);
by ALL_BARRIER_BITS_EXT
in image_load_store..
probably can tuned better using TEXTURE_UPDATE_BARRIER_BIT_EXT?
also
SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 0x00000010
and others are:
VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT 0x00000001
ELEMENT_ARRAY_BARRIER_BIT_EXT 0x00000002
UNIFORM_BARRIER_BIT_EXT 0x00000004
TEXTURE_FETCH_BARRIER_BIT_EXT 0x00000008

SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT 0x00000020
COMMAND_BARRIER_BIT_EXT 0x00000040
PIXEL_BUFFER_BARRIER_BIT_EXT 0x00000080
TEXTURE_UPDATE_BARRIER_BIT_EXT 0x00000100
BUFFER_UPDATE_BARRIER_BIT_EXT 0x00000200
FRAMEBUFFER_BARRIER_BIT_EXT 0x00000400
TRANSFORM_FEEDBACK_BARRIER_BIT_EXT 0x00000800
ATOMIC_COUNTER_BARRIER_BIT_EXT 0x00001000
ALL_BARRIER_BITS_EXT 0xFFFFFFFF
also noting that probably texture barrier extension gets deprecated using instead of texturebarrier a
glMemoryBarrierEXT(TEXTURE_UPDATE_BARRIER_BIT_EXT)

Cyril
10-28-2010, 10:03 AM
Great job ! Thanks a lot !
That's sad it does not work yet on AMD, I would be interested to compare performances (especially for atomic ops) !

oscarbg
10-28-2010, 02:44 PM
Glad you liked but all merit is for you I have only patched thinks that AMD didn't seems to eat well.. :-)

frank li
10-29-2010, 01:08 AM
I'd like to answer some questions.

1. layout specifier is the first thing to specify in image declarations for images
don't work on AMD (nvidia works):
coherent uniform layout(size1x32) uimage2D abufferPageIdxImg
this works:
layout(size1x32) coherent uniform uimage2D abufferPageIdxImg;

According to ext_shader_image_load_store, In image variable declarations, the "coherent", "volatile", "restrict", and "const" qualifiers can be positioned anywhere in the declaration, either before or after the data type of the variable being qualified.

So it's legal to put coherent everywhere before the symbol.

But for layout, it's referred in GLSLlangSpec, the usage for it is described as
layout-qualifier interface-qualifier ; or
layout-qualifier interface-qualifier declaration ;

So it's illegal to put layout before "uniform" here.



2.shader compiler says #version is the first thing to be used even before macros:
is this really needed per spec i.e. shouldn't macro preprocessing be done earlier so macro usage before #version will work correctly as Nvidia drivers do?

Yes, the AMD driver followed it strictly.



Please AMD boys monitoring this forums download and try to fix errors in AMD implementation..
Summarized changes are:

*Remove "inline" keywords in shaders as this even seems to don't work on NV drivers..

We will remove the "inline" from reserved keyword soon for your convenience.



*Change every code casting a float to int from (int)(a) or (int) a -> int(a) that's the correct way..

In GLSL, the type cast doesn't include casting a float to int. It's actually a construtor to do it. The following are accepted constructors which is indicated in Chapter 5.4.1 in GLSLlangSpec4.10.6.
int(uint) // converts an unsigned integer to a signed integer
int(bool) // converts a Boolean value to an int
int(float) // converts a float value to an int
int(double) // converts a double value to a signed integer
uint(int) // converts a signed integer value to an unsigned integer
uint(bool) // converts a Boolean value to an unsigned integer
uint(float) // converts a float value to an unsigned integer
uint(double) // converts a double value to an unsigned integer
bool(int) // converts a signed integer value to a Boolean
bool(uint) // converts an unsigned integer value to a Boolean value
bool(float) // converts a float value to a Boolean
bool(double) // converts a double value to a Boolean
float(int) // converts a signed integer value to a float
float(uint) // converts an unsigned integer value to a float value
float(bool) // converts a Boolean value to a float
float(double)// converts a double value to a float
double(int) // converts a signed integer value to a double
double(uint) // converts an unsigned integer value to a double
double(bool) // converts a Boolean value to a double
double(float)// converts a float value to a double



*(AMD Bug?)layout specifier for images has to be used first so instead of
coherent uniform layout(size1x32) uimage2D abufferPageIdxImg
use
layout(size1x32) coherent uniform uimage2D abufferPageIdxImg;
It has been answered ;)

aqnuep
10-29-2010, 02:40 AM
1. layout specifier is the first thing to specify in image declarations for images
don't work on AMD (nvidia works):
coherent uniform layout(size1x32) uimage2D abufferPageIdxImg
this works:
layout(size1x32) coherent uniform uimage2D abufferPageIdxImg;

Actually if you check the specification of EXT_shader_image_load_store, you can see there that this is not valid:
coherent uniform layout(size1x32) uimage2D abufferPageIdxImg;

It is written there that the layout is first and the access hint can be either before or after the type name:
layout(size1x32) coherent uniform image2D abufferPageIdxImg;
layout(size1x32) uniform image2D coherent abufferPageIdxImg;

oscarbg
10-29-2010, 10:32 AM
Hi frank li,
thanks for your detailed clarifications so now I think i have converted the code to be glsl complaint.. but as I pointed even with all this fixes code fails to run AMD drivers and as you seem from AMD would be good to know you can reproduce the program don't work

oscarbg
10-29-2010, 10:33 AM
Hi aqnuep,
thanks for your response I'm lazy enough for not reading the specs well enough..

frank li
10-31-2010, 07:13 PM
Yes, I'm from AMD. I will try to run your program.

frank li
10-31-2010, 08:13 PM
Hi oscarbg,
There is a problem in your shaders so that the driver can't link the program.

You use "#include filename" in your shader, but "#include" is not even the reserved keyword in the current GLSL spec. So we can't handle the "include", can't find some variable and function definition and can't link the program.

Chris Lux
11-01-2010, 01:08 AM
You use "#include filename" in your shader, but "#include" is not even the reserved keyword in the current GLSL spec. So we can't handle the "include", can't find some variable and function definition and can't link the program.
OT: So then someone finally should implement the ARB_shading_language_include (http://www.opengl.org/registry/specs/ARB/shading_language_include.txt) extension! Come on! It is out there since march.

I am also looking at you Nvidia!

Cyril
11-02-2010, 08:47 AM
Hi frank li. Sorry the #include are preprocessed by my small shader manager and so the shaders need to be compiled inside the application.

frank li
11-02-2010, 07:17 PM
Hi Cyril,
Do you mean you will preprocess the "#include" in your shader manager and there will be no "#include" when the shaders are compiled?

Cyril
11-03-2010, 12:21 AM
Yes that's it