PDA

View Full Version : NVIDIA releases OpenGL 4.0 drivers



barthold
04-12-2010, 09:21 PM
NVIDIA is proud to announce the immediate availability of OpenGL 4 drivers for Linux as well as OpenGL 4 WHQL-certified drivers for Windows. Additionally, support for eight new extensions is provided:

* ARB_texture_compression_bptc – provides new texture compression formats for both fixed-point and high dynamic range floating-point texels.
* EXT_shader_image_load_store - allows GLSL- and assembly-based shaders to load from, store to, and perform atomic read-modify-write operations to texture images.
* EXT_vertex_attrib_64bit - provides OpenGL shading language support for vertex shader inputs with 64-bit floating-point components and OpenGL API support for specifying the value of those inputs.
* NV_vertex_attrib_integer_64bit - provides support for specifying vertex attributes with 64-bit integer components, analogous to the 64-bit floating point support added in EXT_vertex_attrib_64bit.
* NV_gpu_program5 - provides assembly programmability support for new hardware features provided by NVIDIA’s OpenGL 4.0-capable hardware in vertex, fragment, and geometry programs.
* NV_tesssellation_program5 - provides assembly programmability support for tessellation control and evaluation programs.
* NV_gpu_shader5 - provides a superset of the features provided in ARB_gpu_shader5 and GLSL 4.00. This includes support for a full set of 8-, 16-, 32-, and 64-bit scalar and vector integer data types, and more. Additionally, it allows patches (as used in tessellation) to be passed on to the geometry shader, used as input to transform feedback, and rasterized as a set of control points.
* NV_shader_buffer_store – extends the bindless graphics capabilities of the NV_shader_buffer_load extension. This extension provides the ability to store to buffer object memory, and to perform atomic read-modify-write operations, using either GLSL- or assembly-based shaders.

The drivers and extension documentation can be downloaded from http://developer.nvidia.com/object/opengl_driver.html

Happy coding!

Barthold
(with my NVIDIA hat on)

Heiko
04-13-2010, 12:01 AM
Nice, AMD and nVidia going strong with OpenGL driver support! Now if only Intel could catch up and we could get some OpenGL 3.x / OpenGL 4.x for Mac OSX...

But, things are improving for OpenGL, keep up the good work!
(and I hope we'll see atomic read/modify/write back in OpenGL 4.1... I have hope :)).

elFarto
04-13-2010, 01:21 AM
NV_shader_buffer_store and EXT_shader_image_load_store look very nice.

EXT_shader_image_load_store even fulfils one of the proposals on the wiki, "write to specific samples within a shader".

Regards
elFarto

Groovounet
04-13-2010, 02:19 AM
*** Great work! ***

I'm surprised to see this fine granularity of int support but we don't have half support. Are half supposed to be directly supported in the hardware? Or maybe if was just on old chips that didn't have a full support of single-float?

We might have some clues here on wasn't coming for OpenGL 3.4 and 4.1!

barthold
04-13-2010, 10:43 AM
With these drivers we also fixed the issues reported with our earlier OpenGL 3.3 drivers. If you were one of the bug reporters, I'd like to know if your issue is indeed now fixed.

Thanks,
Barthold
(with my NVIDIA hat on)

Prune
04-13-2010, 07:15 PM
ARB_texture_compression_bptc
On the site it's listed as only supported on OpenGL 4.0 hardware. Is there any technical reason for not making this available on 3.3 hardware? As an owner of a GTX285, I'm immensely disappointed.

pbrown
04-13-2010, 10:29 PM
ARB_texture_compression_bptc
On the site it's listed as only supported on OpenGL 4.0 hardware. Is there any technical reason for not making this available on 3.3 hardware?

Yes, these formats require decompression hardware that will generally only be found on OpenGL 4.0-capable hardware.

skynet
04-13-2010, 11:55 PM
If you were one of the bug reporters, I'd like to know if your issue is indeed now fixed.

I confirm that with 197.44, you can retrieve the uniform offset of all uniforms in a shared-layout uniform block, even if the uniforms itself are not referenced by the shader.

Groovounet
04-14-2010, 02:46 AM
With these drivers we also fixed the issues reported with our earlier OpenGL 3.3 drivers. If you were one of the bug reporters, I'd like to know if your issue is indeed now fixed.

Thanks,
Barthold
(with my NVIDIA hat on)

Sampler object fixed too!

barthold
04-14-2010, 10:55 AM
Thanks guys for confirming the bug fixes.

Barthold
(with my NVIDIA hat on)

AlexN
04-14-2010, 11:08 AM
I didn't report this one, but it looks like 197.44 fixes a GLSL bug where breaking out of a for loop would increment the loop counter an additional time.

oscarbg
04-14-2010, 02:19 PM
197.44 don't expose GL_ARB_gpu_shader_fp64 on a gtx 275..
Is Nvidia going in future to expose double precision extension GL_ARB_gpu_shader_fp64 on gtx 280 cards supporting doubles on CUDA?
Also forcing it I get:
0(5) : warning C7547: extension GL_ARB_gpu_shader_fp64 not supported in profile
gp4fp

CrazyButcher
04-15-2010, 02:18 AM
sweet, any ETA for Cg support on NV_gpu_program5 and related?

Prune
04-20-2010, 07:14 PM
Is EXT_direct_state_access orthogonal w.r.t. ARB_texture_multisample? I don't see a glTextureImage2DMultisampleEXT in glew...

Alfonse Reinheart
04-20-2010, 07:54 PM
You can't expect the extension to be updated for every extension that comes out. Even though it has been updated to some since its initial release, but even that caused versioning problems with extension loaders.

In short, no. DSA does not have that function.

Chris Lux
04-30-2010, 05:30 AM
Hi,
while reworking some of my texture binding functionality i noticed a bug in the latest OpenGL 3.3 and 4.0 beta drivers (197.44) with binding sampler objects.

When running on a last generation card (FX5800 in my case) and a OpenGL 3.3 context the following way to bind a sampler object fails with GL_INVALID_VALUE:

glBindSampler(0, _sampler_id);

This is the exact way to do it according to the spec. Doing it like with texture objects works fine:

glBindSampler(GL_TEXTURE0, _sampler_id);

However when running the same code on a GTX 480 with the same driver also on a 3.3 context works as expected and the second way correctly throws the invalid value error.

Regards
-chris

Alfonse Reinheart
04-30-2010, 09:10 AM
Yeah, there's a thread (http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=274755#Post2747 55) about this. NVIDIA (and the ARB) is aware of the problem and will have a fix in their next driver revision.

Chris Lux
05-02-2010, 01:56 AM
ok, i did not see that thread. i think the info that it works both ways in nvidia drivers depending on what hardware you run it is new ;).

pbrown
05-05-2010, 06:00 PM
ok, i did not see that thread. i think the info that it works both ways in nvidia drivers depending on what hardware you run it is new ;).

There shouldn't be any difference in BindSampler behavior on NVIDIA GPUs between hardware versions, GeForce/Quadro, or OpenGL context versions. The first 3.3 driver did erroneously accept TEXTURE0 as described in the thread linked by Alphonse.

My guess is that your FX5800 was actually running with the older OpenGL 3.3-only driver (197.15?). Maybe something like the following happened?

* plug in GeForce GTX 480
* install 197.44
* everything works as expected
* replace with Quadro FX 5800
* after reboot, Windows plug-n-play ends up using the 197.15 driver, which accepts only TEXTURE0

I have seen something like this happen to me in the past, though I'm not sure how such a situation arises.

Chris Lux
05-06-2010, 06:12 AM
There shouldn't be any difference in BindSampler behavior on NVIDIA GPUs between hardware versions, GeForce/Quadro, or OpenGL context versions. The first 3.3 driver did erroneously accept TEXTURE0 as described in the thread linked by Alphonse.

My guess is that your FX5800 was actually running with the older OpenGL 3.3-only driver (197.15?). Maybe something like the following happened?

* plug in GeForce GTX 480
* install 197.44
* everything works as expected
* replace with Quadro FX 5800
* after reboot, Windows plug-n-play ends up using the 197.15 driver, which accepts only TEXTURE0

I have seen something like this happen to me in the past, though I'm not sure how such a situation arises.
you are right, the machine i put the FX5800 in was actually running 197.15. it is not my main development machine so i did not catch that. thanks!

Chris Lux
05-06-2010, 11:19 PM
Are there any plans to support the GL_ARB_shading_language_include extension in the near future?

Prune
05-07-2010, 07:10 PM
Seems to me it's pretty trivial to implement such functionality in the text reading portion of your code. Why do you need an OpenGL extension for it?

Rob Barris
05-07-2010, 08:14 PM
Seems to me it's pretty trivial to implement such functionality in the text reading portion of your code. Why do you need an OpenGL extension for it?

The ease of implementation is low if your source doesn't nest includes, and particularly if your source doesn't bracket #includes with #ifs based on expressions. Bring those idioms into the mix and the problem grows in difficulty...

GLSL already has a preprocessor, the include feature just makes it more complete.

Chris Lux
05-09-2010, 03:20 AM
I have a custom preprocessor based on boost::wave, but this way it is really complicated if glsl internal macros are used. I am asking this because i want to get rid of the custom preprocessor.

What i think is still missing is a way to get custom macros to the shader preprocessor (i.e. MY_CUSTOM_NUM_LIGHTS). I want to get away from fiddling with the shader source based on certain assumptions.

elFarto
05-09-2010, 12:23 PM
What i think is still missing is a way to get custom macros to the shader preprocessor (i.e. MY_CUSTOM_NUM_LIGHTS). I want to get away from fiddling with the shader source based on certain assumptions.
Can't you just declare:

#include "internal.h"

at the top of your shaders, then use glNamedStringARB to pass in your parameters? (i.e. generate an include file at runtime)

Regards
elFarto

Chris Lux
05-24-2010, 07:43 AM
http://blogs.nvidia.com/ntersect/2010/05/introducing-the-new-256-driver-release.html



OpenGL 4.0 – While we currently support OpenGL 4.0 in developer drivers, Release 256, brings full OpenGL 4.0 support to our unified consumer drivers. GeForce GTX 400 series customers can immediately take advantage of the tessellation support in OpenGL 4.0 by downloading Unigine’s latest release of their Heaven benchmark, version 2.1, which adds support for OpenGL 4.0 tessellation and 3D Vision technology. GeForce GTX 400 series GPUs are tessellation monsters, feed them highly tessellated objects and they’ll chew them up at an incredible speed.


very nice drivers.

but, as it always is after new releases, the requests: please support the GL_ARB_shading_language_include extension in the near future.

Chris Lux
06-02-2010, 12:08 PM
hi,
i found that the following calls crash on the latest OpenGL 4.0 drivers (257.15) using windows 7.



int num_comp_routines = 0;
scoped_array<int> comp_routines;
glGetActiveSubroutineUniformiv(_gl_program_obj, GL_FRAGMENT_SHADER, i, GL_NUM_COMPATIBLE_SUBROUTINES, &amp;num_comp_routines);

if (0 < num_comp_routines) {
comp_routines.reset(new int[num_comp_routines]);
glGetActiveSubroutineUniformiv(_gl_program_obj, GL_FRAGMENT_SHADER, i, GL_COMPATIBLE_SUBROUTINES, comp_routines.get());
}



both calls to glGetActiveSubroutineUniformiv crash with an access violation in nvoglv64.dll, i tried a very large fixed number for the number of compatible routines to get around the first crash but the second also crashed...

-chris

Chris Lux
06-03-2010, 12:24 AM
ok,
after some trying to work around this issue i think subroutines are just broken in current nvidia drivers.

shader snippet:


subroutine vec4 generate_output_color(in vec2 cp_tc, in vec2 pp_tc, in vec4 cp_col, in vec4 pp_col, in float b);

subroutine uniform generate_output_color output_generator;

subroutine (generate_output_color)
vec4 output_blended_coordinate(in vec2 cp_tc, in vec2 pp_tc, in vec4 cp_col, in vec4 pp_col, in float b)
{
return (mix(vec4(cp_tc, 0.0, 1.0), vec4(pp_tc, 0.0, 1.0), b));
}

subroutine (generate_output_color)
vec4 output_blended_color(in vec2 cp_tc, in vec2 pp_tc, in vec4 cp_col, in vec4 pp_col, in float b)
{
return (mix(cp_col, pp_col, b));
}
...
return (output_generator(a, b, c, d, e));


as in my last post said, i am unable to use the reflection api to retrieve all the information i need.

so i tried the direct way:


unsigned rl = glGetSubroutineIndex(_gl_program_obj, GL_FRAGMENT_SHADER, "output_blended_color");
rl = glGetSubroutineIndex(_gl_program_obj, GL_FRAGMENT_SHADER, "output_blended_coordinate");


in every case glGetSubroutineIndex returns complete garbage. and when trying to force an index (0 or 1) on the subroutine uniform (for which i easily get the location using glGetSubroutineUniformLocation). i get an invalid value gl error.

if someone got subroutines to work, please let me know if o did something wrong.

regards
-chris

Aleksandar
06-03-2010, 07:27 AM
Subroutines work just fine with 257.15 drivers on WinXP x32!

Indices returned by glGetSubroutineIndex are not 0 and 1. glGetSubroutineUniformLocation returns 0 if there is one subroutine uniform, and glGetSubroutineIndex-s return 2 and 1 respectively (reverse order compared to the order it is defined in the shader). Don't ask me why. I also have to figure it out. Your problem makes me curious to see what values are returned.

I'll install Win7 x64 on the same machine and tell you if there is any problem with x64 implementation of the drivers.

Aleksandar
06-04-2010, 11:30 AM
Chris, I think I have discovered what is your problem!

Few hours ago I have installed Win7 x64, and surprisingly shader subroutines ... work perfectly. :)

I have tried to gain symptoms like yours by changing the shader code making intentional errors. After a while it happened. Then I checked your cod again. In the code fragment you have posted there is no definition of subroutine uniform variable. This kind of error should be reported by the GLSL compiler. Probably you have neglected error messages posted by the compiler.

If shader is not compiled correctly than it cannot be linked also. In that case location retrieved by functions like glGetSubroutineIndex are undetermined.

The only thing that is not like spec says is indexing in NV drivers. Instead of 0 based it is 1 based. Further more, it seems that some kind of stack is used for storing functions' names, because the order of indices is inverted.

Chris Lux
06-04-2010, 11:55 AM
hi,
thanks for the help. i am in contact with nvidia about this. the shader compiles and links without any errors or warnings.

I discovered that the functions crash when the program is not bound to the current state. according to the spec this should not be necessary for the reflection API. But I am still not able to use the GetActiveSubroutineName and -Index functions. I am currently not at my workstation. I will post a complete simple shader that shows these errors everytime

-chris

Chris Lux
06-05-2010, 12:55 AM
I have tried to gain symptoms like yours by changing the shader code making intentional errors. After a while it happened. Then I checked your cod again. In the code fragment you have posted there is no definition of subroutine uniform variable. This kind of error should be reported by the GLSL compiler. Probably you have neglected error messages posted by the compiler.
there is a subroutine uniform variable declared (line 3 in my above post).


The only thing that is not like spec says is indexing in NV drivers. Instead of 0 based it is 1 based. Further more, it seems that some kind of stack is used for storing functions' names, because the order of indices is inverted.
you can find my test code in this posting. i am able to retrieve all subroutine uniform names and the count of subroutines in the shader stages. but when trying to retrieve the function names and indices the x64 driver does not work correct currently (error for indices confirmed by nvidia).

in [1] you can find my code to retrieve the active subroutines in the fragment shader. glGetActiveSubroutineName does not return something useful and throws an invalid value error. as you said the indices currently are 1-based instead of 0-based. so i tried passing i+1 to glGetActiveSubroutineName but the problem persists on my end.

in [2] i attempt to retrieve the compatible subroutines for a specific subroutine uniform. the indices returned from glGetActiveSubroutineUniformiv with GL_COMPATIBLE_SUBROUTINES are 1 and 2 in my test case. But again when passing these indices to glGetActiveSubroutineName i get the same begavior as before.

[1] retrieve active subroutines


int act_routines = 0;
int act_routine_max_len = 0;
char* temp_name = 0;
glapi.glGetProgramStageiv(_gl_program_obj, GL_FRAGMENT_SHADER,
GL_ACTIVE_SUBROUTINES, &amp;act_routines);
glapi.glGetProgramStageiv(_gl_program_obj, GL_FRAGMENT_SHADER,
GL_ACTIVE_SUBROUTINE_MAX_LENGTH, &amp;act_routine_max_len);
if (act_routine_max_len > 0) {
temp_name = new char[act_routine_max_len + 1]; // reserve for null termination
}
for (int i = 0; i < act_routines; ++i) {
std::string actual_routine_name;
unsigned actual_routine_index = 0;

int ret_size = 0;

glapi.glGetActiveSubroutineName(_gl_program_obj, GL_FRAGMENT_SHADER,
i, act_routine_max_len, &amp;ret_size, temp_name);
gl_assert(glapi, program::retrieve_uniform_information() after retrieving subroutine info);

actual_routine_index =
glapi.glGetSubroutineIndex(_gl_program_obj, GL_FRAGMENT_SHADER,
temp_name);
gl_assert(glapi, program::retrieve_uniform_information() after retrieving subroutine info);
}
delete [] temp_name;


[2] retrieve compatible subroutines


i is the index of the current subroutine uniform
// compatible routines
glapi.glGetActiveSubroutineUniformiv(_gl_program_o bj, GL_FRAGMENT_SHADER,
i, GL_NUM_COMPATIBLE_SUBROUTINES, &amp;num_comp_routines);

if (0 < num_comp_routines) {
comp_routines.reset(new int[num_comp_routines]);
glapi.glGetActiveSubroutineUniformiv(_gl_program_o bj, GL_FRAGMENT_SHADER,
i, GL_COMPATIBLE_SUBROUTINES, comp_routines.get());
}

for (int r = 0; r < num_comp_routines; ++r) {
// here comp_routines contains 1 and 2
glapi.glGetActiveSubroutineName(_gl_program_obj, GL_FRAGMENT_SHADER,
comp_routines[r], max_act_routine_len, 0, temp_name.get());
}


vertex shader


#version 400 core

out vec3 normal;
out vec2 texture_coord;
out vec3 view_dir;

uniform mat4 projection_matrix;
uniform mat4 model_view_matrix;
uniform mat4 model_view_matrix_inverse_transpose;

layout(location = 0) in vec3 in_position;
layout(location = 1) in vec3 in_normal;
layout(location = 2) in vec2 in_texture_coord;

void main()
{
normal = normalize(model_view_matrix_inverse_transpose * vec4(in_normal, 0.0)).xyz;
view_dir = -normalize(model_view_matrix * vec4(in_position, 1.0)).xyz;
texture_coord = in_texture_coord;

gl_Position = projection_matrix * model_view_matrix * vec4(in_position, 1.0);
}


fragment shader


#version 400 core

in vec3 normal;
in vec2 texture_coord;
in vec3 view_dir;

uniform vec3 light_ambient;
uniform vec3 light_diffuse;
uniform vec3 light_specular;
uniform vec3 light_position;

uniform vec3 material_ambient;
uniform vec3 material_diffuse;
uniform vec3 material_specular;
uniform float material_shininess;
uniform float material_opacity;

layout(location = 0) out vec4 out_color;

subroutine vec3 color_me(in vec3 col);
subroutine uniform color_me color_sample;

subroutine (color_me)
vec3 phong_light(in vec3 col)
{
vec4 res;
vec3 n = normalize(normal);
vec3 l = normalize(light_position); // assume parallel light!
vec3 v = normalize(view_dir);
vec3 h = normalize(l + v);

return ( light_ambient * material_ambient
+ light_diffuse * col * max(0.0, dot(n, l))
+ light_specular * material_specular * pow(max(0.0, dot(n, h)), material_shininess));
}

subroutine (color_me)
vec3 const_color(in vec3 col)
{
return (col);
}

void main()
{
vec4 res;

res.rgb = color_sample(material_diffuse);
res.a = material_opacity;

out_color = res;
}

Aleksandar
06-05-2010, 12:35 PM
I'm sorry, I have overlooked the declaration.
Yes, it looks fine.
You should change this line:

for (int i = 0; i < act_routines; ++i) {
with

for (int i = 1; i <= act_routines; ++i) {
because currently on NV the range is [1..GL_ACTIVE_SUBROUTINE].

I'll try to recompile my code (and your too) with VS2008/2010 x64 to see if it works. I did try it on Win7 x64, but the application was compiled as u 32bit.

Chris Lux
06-05-2010, 01:22 PM
that is the point ;), the x64 part of the ICD has some bugs.

as I wrote, I tried i+1 as index with the same behavior...


-chris

Cyril
06-25-2010, 06:10 AM
Hi, I get an "undefined variable" when using a memoryBarrier() (from EXT_shader_image_load_store) in a fragment program with NVIDIA 257.29/258.49 drivers, is it a known unimplemented feature in R256 ?
Other features (image load/store, atomics) work correctly.

Thanks

Ian Ameline
07-15-2010, 12:10 PM
I notice that on an 8600M with forceware 257.21 on Windows 7 64 bit, that EXT_shader_image_load_store is not a supported extension. Is this correct?

We could sure use it.

Cyril
07-15-2010, 02:27 PM
Yes, EXT_shader_image_load_store is Fermi only, you need a GTX4xx to use it (or an ATI HD 5xxx, but not sure if it is supported in current drivers).