PDA

View Full Version : AMD Releases OpenGL 4.0 Drivers



Graham Sellers
03-25-2010, 10:57 AM
AMD has released a preview (beta) of our OpenGL 4.0 drivers for ATI Radeon HD 5000 series. It also includes support for OpenGL 3.3 on ATI Radeon HD and ATI FirePro Graphics Adapters. More information (including links to drivers) here: http://links.amd.com/OpenGL

Graham

Alfonse Reinheart
03-25-2010, 11:13 AM
Well... that was unexpected.

Rob Barris
03-25-2010, 11:38 AM
Well... that was unexpected.

OTOH your style is somewhat predictable !

Alfonse Reinheart
03-25-2010, 11:42 AM
For previous versions of OpenGL, it took them a good 3 months or so to have drivers out. So it's far from unreasonable to say that it is unexpected for them to have beta 3.3 and 4.0 drivers out in less than a month.

Groovounet
03-25-2010, 01:04 PM
I agree with "unexpected" but with a smile it would haven't kill anyone.

We all noticed the increase of commitment by AMD during the past 1 or 2 years. They had work to catch up and this is a proof they fill up most of the gap.

A personnal satisfaction point to me: the release of OpenGL 4.0 drivers before nVidia. Just that nVidia always has to say "We are first and so awesome". It might be true and I think it is but being humble is a nice attitude too. I would find a lot of person calling that "education". Look what it does: making people angry all the time like Alfonce. :p

Nice work! :D

Nice one, I like this playing someone else game and beat him.

Alfonse Reinheart
03-25-2010, 02:00 PM
A personnal satisfaction point to me: the release of OpenGL 4.0 drivers before nVidia.

To be fair though, it helps when you have 4.0 capable hardware actually out ;)

Anyway, I wasn't intending to be negative with regard to the release. Think of it as "cautiously optimistic."

I do my main development on NVIDIA cards because of stable OpenGL support, but the current state of NVIDIA cards has had me looking towards ATI's offerings. I would probably already have had an HD 4xxx or 5xxx card if I could count more on ATI's OpenGL implementation.

barthold
03-25-2010, 08:55 PM
Regardless of your preference for OpenGL vendor, this is good news for OpenGL. For the viability of the only cross-platform 3D graphics API, both across OSes as well as across a wide range of devices (phones through workstations), we need implementations rapidly following spec releases.

Plus, competition is good for all of you :-)

Congratulations to AMD with their OpenGL 4 driver release.

Barthold
(with my ARB hat on)

Heiko
03-25-2010, 11:29 PM
Very nice job AMD!

Groovounet
03-26-2010, 02:49 AM
Plus, competition is good for all of you :-)


I follow that! Beware because if you continue that way the Khronos Group is going to kill the competition ... with Direct3D!

Ok ok, we are not there yet but I like to think that with all the good work you have been doing it might happen.
Well, "I like" expect that I hope that Direct3D will remain to be a good competitor.

ZbuffeR
03-26-2010, 01:53 PM
Anyone with ideas/details about GL_AMD_conservative_depth ?

Ilian Dinev
03-26-2010, 03:14 PM
I bet it's the feature, where you hint/specify the maximum deviation of the gl_FragDepth (when you're specifying/overwriting it in the shader).
Useful to not-trash the whole Hi-Z culling compression.

CatAtWork
03-26-2010, 03:40 PM
So do the 5700 and lower 5000 series support double precision natively? I thought they did not.

Edit: A 5770 supports gpu_shader5 with these drivers, but is this emulated?

Groovounet
03-26-2010, 04:52 PM
Yes, this is emulation using 2 floats values: super slow!

In this drivers it's not supposed to work yet...
Following OpenGL convention the drivers should not return GL_ARB_gpu_shader_fp64 or OpenGL 4.0 version.

Well... it's preview drivers...

Groovounet
03-26-2010, 06:26 PM
First test and problem:
I create an OpenGL 3.3 context.

I use:
glGetIntegerv(GL_MAJOR_VERSION, &MajorVersion);
glGetIntegerv(GL_MINOR_VERSION, &MinorVersion);

It returns 4.0 :p

(Same with 3.0 / 3.2)

Alfonse Reinheart
03-26-2010, 08:58 PM
This is 100% valid behavior. Read the WGL_ARB_context_create specification again. The current version (after ARB_context_create_profile was added) says that you can get the version you asked for or any higher version, so long as no functionality from the version you asked for was removed since between the two. Nothing was removed between 3.3 and 4.0, so this is perfectly legitimate.

Groovounet
03-27-2010, 05:47 AM
Ah ok so at least when I request OpenGL 3.0 or 3.1 I should not get 4.0.

Thanks for the clarifications!

Alfonse Reinheart
03-27-2010, 01:42 PM
Ah ok so at least when I request OpenGL 3.0 or 3.1 I should not get 4.0.

No, you can still get that. What you will get is the 4.0 compatibility profile. And since the extension disallows you from asking for 3.0/3.1 + core, this remains perfectly legal behavior.

The idea is that you shouldn't care as long as you're getting at least what you ask for.

Groovounet
03-28-2010, 05:52 AM
Some more problems I encounter:
- Conditional rendering with GL_ANY_SAMPLES_PASSED => hard reset.
- When I was playing around with the (amazing and please extend its features to uniform and varyings) GL_ARB_explicit_attrib_location extension, the OpenGL drivers just stop working. No other software run. The first frame was ok but generate an error and then all the frames generate errors everywhere. I tried to reinstall the drivers. It didn't work. I try to removed my nVidia card. it didn't work. I end up to install Catalyst 10.3 which brings me back to normal and working OpenGL 3.2 drivers.

Alfonse Reinheart
03-28-2010, 12:37 PM
amazing and please extend its features to uniform and varyings

I can understand uniforms, but what good would varyings be? You don't have to query them, so there's no need to assign arbitrary numbers to them.


I try to removed my nVidia card.

You meant ATi, right?

Groovounet
03-28-2010, 06:01 PM
- Separate shader program and Transform feedback

- No I meant nVidia as I have both. In case of a conflict.

Alfonse Reinheart
03-28-2010, 06:10 PM
- Separate shader program and Transform feedback

Separate shader program doesn't exist yet except as a bad extension. Simply adding this in wouldn't be enough to make that extension better, as it would still have to deal with type conflicts.

As for transform feedback, you have a point.

Groovounet
03-28-2010, 06:19 PM
(As far as my tests went it already work on nVidia for varyings :p (Is this nVidia flexibilities where is always work when it should not good or not... is an other question!))

I think separate shader program with explicit varying locations would be a huge step forward for this extension. Does it solve everything? Maybe not.

Alfonse Reinheart
03-28-2010, 10:56 PM
I think separate shader program with explicit varying locations would be a huge step forward for this extension.

I think you misunderstand.

I want separation of shaders; I think it is the current low-hanging fruit in terms of OpenGL's deficiencies. Exactly how this gets implemented is up for debate.

If we have to assign numbers to inputs and outputs to allow them to combine together, so be it. But that should be defined in the proper extension, namely the shader separation extension. And it should only be done if it is necessary. If implementations can make shader separation work based on name rather than arbitrary numbers, I would rather have that.

Without the separation of shaders, having to define input/output locations is only useful in the context of transform feedback. Asking for applying indices to inputs and outputs so that you can have shader separation is putting the cart before the horse; we want shader separation, and if implementing it requires numbering inputs and outputs, then so be it.

Groovounet
03-29-2010, 02:43 AM
Yes, the cart and the horse come together.

Alfonse Reinheart
03-29-2010, 02:50 AM
But if all you need is a horse, why ask for a cart?

Groovounet
03-29-2010, 04:27 AM
What I need is a cart.

Alfonse Reinheart
03-29-2010, 10:47 AM
What I need is a cart.

Again, I ask the question, why? If separate shader doesn't need this, why do you? Besides transform feedback, of course.

Black Knight
03-30-2010, 01:25 PM
When I create a core profile with with catalyst 10.3 glVertexAttribPointer method fails with GL_INVALID_OPERATION.
If i change the context creationg attribs to this :
int attribs[] =
{
WGL_CONTEXT_MAJOR_VERSION_ARB, major,
WGL_CONTEXT_MINOR_VERSION_ARB, minor,
WGL_CONTEXT_FLAGS_ARB, 0,
WGL_CONTEXT_PROFILE_MASK_ARB, WGL_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB,
0
};

Everything works I guess its a driver bug in 10.3 I haven't tried it with the OGL 4.0 preview drivers yet.
The code is at http://glbase.codeplex.com if anyone wants to give it a try.

Alfonse Reinheart
03-30-2010, 01:36 PM
When I create a core profile with with catalyst 10.3 glVertexAttribPointer method fails with GL_INVALID_OPERATION.

Did you create a VAO? 3.1 core and above requires having a bound VAO before gl*Pointer will work.

Dan Bartlett
03-30-2010, 02:01 PM
With an NVidia OpenGL 3.3 core context, you don't get a GL_INVALID_OPERATION when calling glVertexAttribPointer with no VAO bound.
Not sure whether it's just a mistake in the appendices that the default VAO is marked as removed, or whether you really need to create your own "default" VAO (one you bind at start + forget about).

There's several other reasons that glVertexAttribPointer can cause GL_INVALID_OPERATION (from spec):
size is BGRA and type is not UNSIGNED_BYTE, INT_2_10_10_10_REV or UNSIGNED_INT_2_10_10_10_REV; type is INT_2_10_10_10_REV or UNSIGNED_INT_2_10_10_10_REV, and size is neither 4 or BGRA; for VertexAttribPointer only, size is BGRA and normalized is FALSE; any of the *Pointer commands specifying the location and organization of vertex array data are called while zero is bound to the ARRAY_BUFFER buffer object binding point (see section 2.9.6), and the pointer argument is not NULL.

Alfonse Reinheart
03-30-2010, 02:12 PM
With an NVidia OpenGL 3.3 core context, you don't get a GL_INVALID_OPERATION when calling glVertexAttribPointer with no VAO bound.

That's off-spec behavior, and should probably be fixed in NVIDIA's drivers. The core specification removes the default VAO.

Groovounet
03-30-2010, 02:32 PM
I think nVidia love off-spec behavior and it won't be fixed.

Black Knight
03-30-2010, 02:45 PM
I was drawing with glDrawArrays and I am not using a VAO should I create a VAO and use that to fix this? I'm going to give it a try.

Groovounet
03-30-2010, 02:51 PM
Yes, VAOs are required...

Black Knight
03-30-2010, 04:38 PM
Just adding
unsigned int vao = 0;
glGenVertexArrays(1,&vao);
glBindVertexArray(vao);

Fixed the problem thanks.

oscarbg
04-06-2010, 10:23 AM
Any hope of having "precise" qualifier outside of GPU_EXT_shader5 extension? it's not a gpu feature only a compiler fature..
i.e. for not only fermi and cypress gpus I want in gt200 for example..
it's not good since double precision emulation on d3d10 gpus using
float-float approaches gets optimized by Nvidia compiler!
Example code optimized:

vec2 dblsgl_add (vec2 x, vec2 y)
{
precise vec2 z;
float t1, t2, e;

t1 = x.y + y.y;
e = t1 - x.y;
t2 = ((y.y - e) + (x.y - (t1 - e))) + x.x + y.x;
z.y = e = t1 + t2;
z.x = t2 - (e - t1);
return z;
}

vec2 dblsgl_mul (vec2 x, vec2 y)
{
precise vec2 z;
float up, vp, u1, u2, v1, v2, mh, ml;

up = x.y * 4097.0;
u1 = (x.y - up) + up;
u2 = x.y - u1;
vp = y.y * 4097.0;
v1 = (y.y - vp) + vp;
v2 = y.y - v1;
//mh = __fmul_rn(x.y,y.y);
mh = x.y*y.y;
ml = (((u1 * v1 - mh) + u1 * v2) + u2 * v1) + u2 * v2;
//ml = (fmul_rn(x.y,y.x) + __fmul_rn(x.x,y.y)) + ml;

ml = (x.y*y.x + x.x*y.y) + ml;

mh=mh;
z.y = up = mh + ml;
z.x = (mh - up) + ml;
return z;
}

Groovounet
04-06-2010, 10:42 AM
I don't see why precise could not be supported by OpenGL 3 hardware and it's actually one of the features of GLSL 4.0 that could be bring to GLSL 3.4.

However, "precise" as nothing to do with double float. double float are part of GL_ARB_shader_fp64 and should be supported by Radeon HD 48** / 47** and GeForce GTX 2**.

oscarbg
04-06-2010, 07:40 PM
Sorry I mean float-float approaches of using 2 floats for having near double precision..
search google for it..

oscarbg
04-06-2010, 07:48 PM
see the code: is a float-float implementation for
a mandelbrot
and the program:
http://dl.dropbox.com/u/1416327/mandeldouble.rar

oscarbg
04-06-2010, 07:54 PM
Sorry for posting so many..
above executable contains:
*uses gl_arb_gpu_shader5 in a float-float implementation with precise keyword for fixing agressive Nvidia compiler
*uses arg_gpu_shader_FP64 with doubles.. and fallbacks to doublepAMD on catalyst no ogl 4.0 drivers..
*normal mandelbrot implementation

on AMD 5850 with 1920x1080 res ati gl 4.0 drivers
I obtain:
*13fps using float-float approach..
*50fps using doubles with ati gl 4.0 drivers
*130fps using single precision
Note pre GL 4.0 drivers using doublepAMD attain 36fps on double precision now gl 4.0 drivers either doublepAMD or double attain 50fps..
You can deduce Gflop/s seeing glsl code.. it's very high..

Groovounet
04-07-2010, 03:27 AM
So what's your point?

And please, delete all your duplicate post, this is so ridiculous.

elFarto
04-12-2010, 01:39 AM
Anyone with ideas/details about GL_AMD_conservative_depth ?I bet it's the feature, where you hint/specify the maximum deviation of the gl_FragDepth (when you're specifying/overwriting it in the shader).
Useful to not-trash the whole Hi-Z culling compression.
The spec (http://www.opengl.org/registry/specs/AMD/conservative_depth.txt) is now up.

Any chance of getting the spec for the GL_AMD_name_gen_delete extension?

Regards
elFarto

Alfonse Reinheart
04-12-2010, 10:25 AM
Interesting extension. It allows you to be able to take advantage of early depth testing as long as you follow certain rules in a shader.

oscarbg
04-14-2010, 02:38 PM
I have some requests/questions:
*spec doucmentation of ext_shader_atomic_counters?
*AMD is going to ship updated 4.0 drivers shipping with these Nvidia Fermi shipping now multivendor extensions?
EXT_shader_image_load_store
EXT_vertex_attrib_64bit
some timeline?..

Chris Lux
04-17-2010, 06:11 AM
EXT_direct_state_access?

Groovounet
04-17-2010, 12:40 PM
Nothing on that's yet... I believe that AMD is not really into DSA... :p

DSA DSA DSA! :p

soconne
04-20-2010, 08:18 PM
I agree, this is a MUCH needed feature!

Chris Lux
05-11-2010, 02:34 AM
Hi,
we just got our HD5870 to play with, but it really is depressing to get the ATI drivers to play nice. Some things i ran into:



glGetProgramiv(_gl_program_obj, GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH, &act_uniform_max_len);


throws invalid enum error.

Sampler objects: No gl errors when tying to bind them to a unit, but when trying to access a 3d or 1d texture using a sampler object i get black in return, accessing the same texture bypassing the sampler object using texelFetch works fine.

Then when trying to clear the depth or depth_stencil attachment of an FBO using the following, nothing happens:



glBindFramebuffer(GL_DRAW_FRAMEBUFFER, buffer_id());
// glClearBufferfv(GL_DEPTH, 0, &in_clear_depth);
glClearBufferfi(GL_DEPTH_STENCIL, 0, in_clear_depth, in_clear_stencil);


(note even if only a depth attachment is bound to the FBO the last one should work just fine).

The following does the job, but i just dont want to use it:



glBindFramebuffer(GL_DRAW_FRAMEBUFFER, buffer_id());

glClearDepth(in_clear_depth);
glClearStencil(in_clear_stencil);
glClear(GL_DEPTH_BUFFER_BIT | GL_STENCIL_BUFFER_BIT);


NOW the biggest annoyance: Loops in shaders, why in the hell does the follwing not work:

It is a gutted basic volume ray caster.


vec4 dst = vec4(0.0, 0.0, 0.0, 0.0);
vec4 src;

while (inside_volume) {
src = sampling_pos.rgb;
src.a = 0.001;

sampling_pos += ray_increment;

float omda_sa = (1.0 - dst.a) * src.a;
dst.rgb += omda_sa*src.rgb;
dst.a += omda_sa;
}
#endif
//vec4 volume_col = texture(volume_texture, ray_entry_position);
//vec4 color_map = texture(color_map_texture, volume_col.r);

out_color = dst;


I tried a for loop, a fixed loop count etc. The loop is just not evaluated.

Frustrating!

Heiko
05-11-2010, 04:50 AM
So far I haven't seen any problems with loops on ATI hardware recently in glsl shaders (using both HD4870 and 5770). I'm using shader version 150 (if it makes any difference).

I assume you did not post the loop code you are actually trying to use because there is no way the value of `inside_volume' will change in your loop (so either the loop is never entered, or the shader can never break out from the loop).

Chris Lux
05-11-2010, 05:40 AM
Hi,
no that is just a gutted shader. I am on the GL4.0 beta driver and was using "#version 330 core".

I tried:



for (int lc = 0; lc < 1000; ++lc) {
....
}
out_color = dst;




int lc = 0;
while (true) {
++lc;
if (lc > 1000) break;
...
}
out_color = dst;




bool inside_volume = true;
while (inside_volume) {
...
if (dst.a > 0.9) inside_volume = false;
}
out_color = dst;

at no time the loop was executed.

what did work was:



for (int lc = 0; lc < 1000; ++lc) {
...
}
if (lc_c < 512) {
out_color = vec4(0.0, 0.0, 1.0, 1.0);
}
else {
out_color = vec4(1.0, 0.0, 0.0, 1.0);
}


result = red;

what did not:


vec4 dst = vec4(0.0);
for (int lc = 0; lc < 1000; ++lc) {
...
}
if (lc < 512) {
out_color = vec4(0.0, 0.0, 1.0, 1.0);
}
else {
out_color = vec4(dst, 1.0);
}


result = blue, i.e. not working.

-chris

Heiko
05-11-2010, 07:43 AM
Just did a quick test on my HD4870, the following both worked:



vec4 col = vec4(0.0);
int lc = 0;
while (true)
{
++lc;
if (lc > 1000)
break;

col += vec4(0.001);
}

frag_colour = col; // output is white as expected





vec4 col = vec4(0.0);
for (int i = 0; i < 1000; ++i)
{
col += vec4(0.001);
}

frag_colour = col; // output white as expected


In this case the GLSL compiler did not do any loop unrolling, it was significantly slower than just `frag_colour = vec4(1.0);'. So the loop seems to work properly. I'm not using the OpenGL preview driver though. However I did use that one a while ago and it worked fine with a much smaller loop (for loop, loop count 8, but much more complex). That worked fine for me (HD4870, HD5770, Linux and Windows).

Sure nothing else in the shader is incorrect? Or there is some other function in the shader that causes it not to compile properly? Any info in the shader info log?

If you don't use anything OpenGL 3.3 / OpenGL 4.0 specific you might try AMD Gpu Shader Analyzer to see if it compiles properly in there (but that currently only supports Catalyst 10.3 I think, which supports OpenGL 3.2).

Alfonse Reinheart
05-11-2010, 08:23 AM
result = blue, i.e. not working.

What is "lc_c" and where does it get computed?

Chris Lux
05-11-2010, 09:02 AM
What is "lc_c" and where does it get computed?
lc_c is a typo here in the forum and it got computed the time i pressed the wrong buttons. ;)

tomorrow morning i have time again with the card and will look into this one more time and post a complete shader.

Chris Lux
05-12-2010, 12:57 AM
So,
back with the HD5780 on the GL4 beta drivers.

Here the reduced to minimum shader code:


// vertex shader ////////////////////////////////////////////////
#version 330 core

out vec3 ray_entry_position;

layout(std140, column_major) uniform;

uniform transform_matrices
{
mat4 mv_matrix;
mat4 mv_matrix_inverse;
mat4 mv_matrix_inverse_transpose;

mat4 p_matrix;
mat4 p_matrix_inverse;

mat4 mvp_matrix;
mat4 mvp_matrix_inverse;
} current_transform;

layout(location = 0) in vec3 in_position;

void main()
{
ray_entry_position = in_position;

gl_Position = current_transform.mvp_matrix * vec4(in_position, 1.0);
}

// fragment shader //////////////////////////////////////////////

#version 330 core

in vec3 ray_entry_position;

uniform sampler3D volume_texture;
uniform sampler1D color_map_texture;

uniform vec3 camera_location;
uniform float sampling_distance;
uniform vec3 max_bounds;

layout(location = 0) out vec4 out_color;

vec3 debug_col;

bool
inside_volume_bounds(const in vec3 sampling_position)
{
return ( all(greaterThanEqual(sampling_position, vec3(0.0)))
&amp;&amp; all(lessThanEqual(sampling_position, max_bounds)));
}

void main()
{
vec3 ray_increment = normalize(ray_entry_position - camera_location) * sampling_distance;
vec3 sampling_pos = ray_entry_position + ray_increment; // test, increment just to be sure we are in the volume

vec3 obj_to_tex = vec3(1.0) / max_bounds;

vec4 dst = vec4(0.0, 0.0, 0.0, 0.0);
vec4 src;

bool inside_volume = inside_volume_bounds(sampling_pos);

//unsigned int loop_c = 0u;
while (inside_volume) {
//loop_c += 1u;
// get sample
src = texture(volume_texture, sampling_pos * obj_to_tex);

src = texture(color_map_texture, src.r);

// increment ray
sampling_pos += ray_increment;
inside_volume = inside_volume_bounds(sampling_pos) &amp;&amp; (dst.a < 0.99);
// compositing
float omda_sa = (1.0 - dst.a) * src.a;
dst.rgb += omda_sa*src.rgb;
dst.a += omda_sa;
}

//vec4 volume_col = texture(volume_texture, ray_entry_position * 0.5);
//vec4 color_map = texture(color_map_texture, volume_col.r);

//out_color = vec4(volume_col.rgb, 1.0);
//out_color = vec4(color_map.rgb, 1.0);
//out_color = vec4(ray_entry_position, 1.0);
out_color = vec4(sampling_pos, 1.0);
//out_color = dst;
}


First here the expected result when returning dst from the shader on a Nvidia board using their GL3.3/4 beta driver:
http://img245.imageshack.us/img245/7366/expectedresultnvidia.th.png (http://img245.imageshack.us/i/expectedresultnvidia.png/)

Now the first problem on ATi when using sampler objects:

Here the result when returning:
out_color = vec4(color_map.rgb, 1.0);
http://img38.imageshack.us/img38/8236/atiusingsamplerobject.th.png (http://img38.imageshack.us/i/atiusingsamplerobject.png/)

Now the result when using _no_ sampler objects and setting the same state as the texture state:
http://img704.imageshack.us/img704/3835/atinotusingsamplerobjec.th.png (http://img704.imageshack.us/i/atinotusingsamplerobjec.png/)

So the sampler objects seem broken, i get 0 when trying to look into either texture (volume_texture, color_map_texture) when a sampler object it bound to their unit.

So and now, here the ray entry position returned:
out_color = vec4(ray_entry_position, 1.0);
http://img40.imageshack.us/img40/1241/atirayentrypos.th.png (http://img40.imageshack.us/i/atirayentrypos.png/)

and here the sample exit position:
out_color = vec4(sampling_pos, 1.0);
http://img15.imageshack.us/img15/4192/atisampleposatexit.th.png (http://img15.imageshack.us/i/atisampleposatexit.png/)

It is clear that the loop is not run.

So i tried the following loops:



bool inside_volume = true;
int loop_c = 0;
while (inside_volume) {
loop_c += 1;

src = texture(volume_texture, sampling_pos * obj_to_tex);
src = texture(color_map_texture, src.r);

sampling_pos += ray_increment;
inside_volume = (loop_c < 1000);
// compositing
float omda_sa = (1.0 - dst.a) * src.a;
dst.rgb += omda_sa*src.rgb;
dst.a += omda_sa;
}



for (int lc = 0; lc < 1000; ++lc) {
src = texture(volume_texture, sampling_pos * obj_to_tex);
src = texture(color_map_texture, src.r);

sampling_pos += ray_increment;

// compositing
float omda_sa = (1.0 - dst.a) * src.a;
dst.rgb += omda_sa*src.rgb;
dst.a += omda_sa;
}


No luck.

I will also send this to AMD to investigate.

-chris

Heiko
05-12-2010, 02:33 AM
I was just looking at your functions and I don't see anything too funny in your main code, except I would probably do something like this:



// vec4 src; // not necessary to be defined outside loop

bool inside_volume = inside_volume_bounds(sampling_pos);

//unsigned int loop_c = 0u;
while (inside_volume) {
//loop_c += 1u;
// get sample and store it in a separate variable,
// I called the variable `sample_val' instead of just
// `sample' because I think I recall using the variable
// name sample once and that caused the shader not to run
// apparently `sample' is a reserved keyword in
// ATI's glsl (it is not mentioned in GLSL spec)
// or perhaps I recall incorrect.. but just to be sure
float sample_val = texture(volume_texture, sampling_pos * obj_to_tex).r;

// define src overhere, perhaps the driver does not like
// writing to src and using it at the same time for the
// second lookup (it should work though...)
vec4 src = texture(color_map_texture, sample_val);

// increment ray
sampling_pos += ray_increment;
inside_volume = inside_volume_bounds(sampling_pos) &amp;&amp; (dst.a < 0.99);
// compositing
float omda_sa = (1.0 - dst.a) * src.a;

//dst.rgb += omda_sa*src.rgb;
//dst.a += omda_sa;
// might compile into fewer instructions on ATI hardware
// (not sure though)
dst += omda_sa * vec4(src.rgb, 1.0);
}


With respect to your while loop you've tried, that still does not contain a breaking point. So that would never compile I guess (or it compiles but does not run properly).

edit: further, you might try using texelFetch instead of texture. However, texelFetch expects an ivec* as parameter and integer texture coordinates. Could be that the problems are with the texture lookup. I don't use much texture lookups for 3D textures, so maybe there is a problem with that which I am not aware of.

elFarto
05-19-2010, 05:54 AM
Any chance of getting the spec for the GL_AMD_name_gen_delete extension?
The spec (http://www.opengl.org/registry/specs/AMD/name_gen_delete.txt) for GL_AMD_name_gen_delete is now up.

Regards
elFarto

Jan
05-19-2010, 08:02 AM
Awesome! AMD seems to take OpenGL really seriously lately! Nice to see that they actually try to clean up the API and add much needed basic functionality, instead of cluttering it even more with half-baked vendor-specific features.

The name_gen_delete and debug_blablub extension should have been top-priority extensions for the ARB in GL 3.0. Nice to see AMD fix what the ARB failed to add.

Jan.

Groovounet
05-20-2010, 07:16 AM
A little thing in the GLSL log messages: control and evaluation shader are refered as hull and domain shader.

Heiko
05-27-2010, 01:00 AM
The new Catalyst 10.5 now has OpenGL 3.3 and OpenGL in the mainline driver (although I believe it isn't mentioned in the driver changelog).

OpenGL version number goes from 4.0.9826 to 4.0.9836 (with respect to the OpenGL 4.0 preview driver).

wien
05-30-2010, 07:03 AM
Anyone had any luck with sampler objects on Catalyst 10.5? I can get them to somewhat work. Setting filters, anisotropy and wrap modes works just fine, but the textures come out too light. It's almost as if the sRGB->linear transformation is applied twice or something. Maybe I'm missing something...

Without sampler objects (correct):
http://www.wien-systems.no/files/without-sampler-objects.png

With sampler objects:
http://www.wien-systems.no/files/with-sampler-objects.png

Chris Lux
11-22-2010, 08:44 AM
Hi,
we just got our HD5870 to play with, but it really is depressing to get the ATI drivers to play nice. Some things i ran into:

Then when trying to clear the depth or depth_stencil attachment of an FBO using the following, nothing happens:



glBindFramebuffer(GL_DRAW_FRAMEBUFFER, buffer_id());
// glClearBufferfv(GL_DEPTH, 0, &amp;in_clear_depth);
glClearBufferfi(GL_DEPTH_STENCIL, 0, in_clear_depth, in_clear_stencil);


(note even if only a depth attachment is bound to the FBO the last one should work just fine).

The following does the job, but i just dont want to use it:



glBindFramebuffer(GL_DRAW_FRAMEBUFFER, buffer_id());

glClearDepth(in_clear_depth);
glClearStencil(in_clear_stencil);
glClear(GL_DEPTH_BUFFER_BIT | GL_STENCIL_BUFFER_BIT);


Frustrating!
i have to quote myself here. i am back at our ATI machine using catalyst 10.11 drivers. the problem described above is still there. how in the hell can something like this be still there? glClearBufferfi is a fundamental OpenGL call. the most frustrating thing is, if i use another fbo directly before the clear operation of the fbo in question it works. i also set the stencil and depth mask before calling the glClearBufferfi function. it makes no sense that glClear works when glClearBufferfi does not under exact the same conditions!

ATI is _unusable_ when it comes to OpenGL development. Another problem i have to deal with here is some uniform where i get the following warning. When i set this uniform to 0 it works, when i set it to 1 glUniform1i crashes... yes crashes, no error no warning, a crash!

I am extremely frustrated! On some occasions going to ATI forces me to write OpenGL code more in line with the spec. BUT on a lot more occasions ATI just does not work...

Regards
-chris

Pierre Boudier
11-22-2010, 10:17 AM
hi chris,

I just noticed your report on ClearBuffer; I looked at the implementation, and it seems that DEPTH_STENCIL is indeed not handled; you should have got an invalid enum error though. As well, if you used GL_DEPTH, it seems that it should have worked too.
we will fix it asap.

if you find other issues, do not hesitate to contact us directly, since we might not catch your post here.

regards,

Pierre B.

Alfonse Reinheart
11-22-2010, 11:22 AM
glClearBufferfi is a fundamental OpenGL call. the most frustrating thing is, if i use another fbo directly before the clear operation of the fbo in question it works. i also set the stencil and depth mask before calling the glClearBufferfi function. it makes no sense that glClear works when glClearBufferfi does not under exact the same conditions!

Sure it does. glClearBufferfi is not a "fundamental OpenGL call." It is a relatively new entrypoint, added in the ARB_framebuffer_object/OpenGL 3.0 specifications. How many live OpenGL applications do you suppose actually use glClearBufferfi, especially when the existing glClear calls work just fine for 90+% of cases? And if no applications use an entrypoint, what's the chance of an IHV detecting a driver bug with it?

What we have here is the reason why conformance tests were invented. To force makers of an implementation to implement the entire specification, and to detect exactly when and where an implementation is deficient.

So long as OpenGL lacks a rigorous conformance test, these problems will persist.

Chris Lux
11-23-2010, 12:35 AM
Sure it does. glClearBufferfi is not a "fundamental OpenGL call." It is a relatively new entrypoint, added in the ARB_framebuffer_object/OpenGL 3.0 specifications. How many live OpenGL applications do you suppose actually use glClearBufferfi, especially when the existing glClear calls work just fine for 90+% of cases? And if no applications use an entrypoint, what's the chance of an IHV detecting a driver bug with it?
yes, glClear works just fine, but the more DSA approach of the glClearBufferX functions is very appealing. ARB_framebuffer_object/OpenGL 3.0 are out for over two years, it is a shame that no one at ATi detected it until now. it also shows how valued by developers ATi is when it comes to OpenGL development.


What we have here is the reason why conformance tests were invented. To force makers of an implementation to implement the entire specification, and to detect exactly when and where an implementation is deficient.

So long as OpenGL lacks a rigorous conformance test, these problems will persist. if khronos did not listen until now i think they will never listen to implement a new conformance test suite. sadly.

Alfonse Reinheart
11-23-2010, 12:45 AM
it also shows how valued by developers ATi is when it comes to OpenGL development.

More likely it shows how often this function is used in actual shipping code. That being, not particularly often.


if khronos did not listen until now i think they will never listen to implement a new conformance test suite.

They say they're working on it. They know the need for it; it's simply a matter of resources. Khronos, and the ARB before it, is a volunteer organization. That doesn't lend them to being able to employ a couple of skilled OpenGL developers for a year to build a proper conformance suite for GL.