PDA

View Full Version : NVIDIA releases OpenGL 4.4 beta drivers



Khronos_webmaster
07-22-2013, 07:27 AM
To coincide with the release ofOpenGL 4.4, NVIDIA is pleased to announce our OpenGL 4.4 beta driversare available for immediate download for Windows and Linux fromhttps://developer.nvidia.com/opengl-driver.
These drivers provide full OpenGL4.4 and GLSL 4.40 functionality and implement all the ARB extensionsreleased today. They also include a new extension calledGL_NV_blend_equation_advanced, which adds many new advanced blendingequations.
For OpenGL 3 capable hardware, thesenew extensions are provided:


GL_ARB_enhanced_layouts (OpenGL4.4)
GL_ARB_multi_bind (OpenGL 4.4)
GL_ARB_texture_mirror_clamp_to_edge (OpenGL 4.4)
GL_ARB_texture_stencil8 (OpenGL 4.4)
GL_ARB_vertex_type_10f_11f_11f_rev (OpenGL 4.4)

For OpenGL 4 capable hardware, thesenew extensions are provided:


GL_ARB_buffer_storage (OpenGL4.4)
GL_ARB_clear_texture (OpenGL 4.4)
GL_ARB_compute_variable_group_size
GL_ARB_indirect_parameters
GL_ARB_query_buffer_object (OpenGL 4.4)
GL_ARB_shader_draw_parameters
GL_ARB_shader_group_vote
GL_ARB_sparse_texture
GL_NV_blend_equation_advanced

For GeForce 6xx capable hardware,these new extensions are provided:



GL_ARB_bindless_texture
GL_ARB_seamless_cubemap_per_texture

For downloads and more informationplease visit our OpenGL 4.4 pagehttps://developer.nvidia.com/opengl-driver.

nigels
07-22-2013, 09:37 AM
GLEW 1.10.0 is now available, including GL 4.4 support.
http://glew.sourceforge.net/

- Nigel

Alfonse Reinheart
07-22-2013, 10:11 AM
Is there a reason why ARB_buffer_storage isn't available on 3.x-class NVIDIA hardware? Or ARB_clear_texture for that matter? Is this just a beta thing, or is that how it's going to be when these release?

malexander
07-22-2013, 10:52 AM
Is there a reason why ARB_buffer_storage isn't available on 3.x-class NVIDIA hardware?

Possibly because of the coherent mapped mode? The immutable part seems straightforward enough to be implemented on even GL1.5 hardware, but persistent/coherent mapping likely requires hardware support. Too bad the features in ARB_buffer_storage isn't split into two extensions.

thokra
07-22-2013, 11:49 AM
ARB_clear_texture

That really should be trivial to implement - I assume, at least ...

Toni
07-23-2013, 02:55 AM
Is there any reason why GTX 660 is not supported? I bought one thinking it will work for several ogl 4.x generations... :(

Nowhere-01
07-23-2013, 03:24 AM
Is there any reason why GTX 660 is not supported? I bought one thinking it will work for several ogl 4.x generations... :(

it's most likely, an editorial mistake. even very low-end 6xx's are supported.

RenderVirs
07-31-2013, 12:42 PM
Has anyone seen any issues with the beta driver and tessellation shaders?

When compiling separable shaders the OpenGL 4.4 the beta driver is giving the following error (both Windows and Linux)

Error C3008: unknown layout specifier 'patch'

Here are the structs it is complaining about:

Controller:
patch out patchData
{
int i;
vec2 v;
}patchData_out;

Evalulation:
patch in patchData
{
int i;
vec2 v;
}patchData_in;

The shaders compile and run with OpenGL 4.3 drivers.

Piers Daniell
08-01-2013, 11:37 AM
Thanks for reporting this issue. I have reproduced the problem and identified the bug. We'll fix it asap and include the fix in a new beta driver. Unfortunately there is no work around for the OpenGL 4.4 beta driver, other than to remove the usage of "patch" until the bug is fixed.

oscarbg
08-04-2013, 01:31 PM
Thanks for reporting this issue. I have reproduced the problem and identified the bug. We'll fix it asap and include the fix in a new beta driver. Unfortunately there is no work around for the OpenGL 4.4 beta driver, other than to remove the usage of "patch" until the bug is fixed.
This bug is still present on OGL 4.4 beta drivers:
http://www.opengl.org/discussion_boards/showthread.php/181214-NV-bug-(tested-on-314-14)-using-storage-buffer-objects?p=1253473#post1253473
Also on a Fermi GPU I see

NV_bindless_multi_draw_indirect (http://www.opengl.org/registry/specs/NV/bindless_multi_draw_indirect.txt)

and that's a bug as bindless ext is Kerpler only right?

Piers Daniell
08-05-2013, 09:20 AM
This bug is still present on OGL 4.4 beta drivers:
http://www.opengl.org/discussion_boards/showthread.php/181214-NV-bug-(tested-on-314-14)-using-storage-buffer-objects?p=1253473#post1253473
Also on a Fermi GPU I see

NV_bindless_multi_draw_indirect (http://www.opengl.org/registry/specs/NV/bindless_multi_draw_indirect.txt)

and that's a bug as bindless ext is Kerpler only right?

Sorry, I was unaware of the compute compiler bug until now. I was able to reproduce it and will investigate the issue.

With regards to NV_bindless_multi_draw_indirect, that should work fine on Fermi. It's only "bindless texture" that requires Kepler.

Aleksandar
08-08-2013, 07:14 AM
326.29 does not allow swap interval control from the code. While WGL_EXT_swap_control extension is properly listed, (PFNWGLSWAPINTERVALEXTPROC)wglGetProcAddress("wglSwapIntervalEXT") returns NULL. Vsync can be turned on/off from the control panel, so only problem is in returning function address.

Chris Lux
08-10-2013, 10:59 AM
Hi,
I ran into a quite serious bug with the OpenGL 4.4 beta driver which I was able to reproduce with _all_ 326.x drivers available from the NV website.

I have a fragment shader which is doing some quite complex things (ray casting a height field). I have these things wrapped into a nice function so the shader does look something like this (broken down to illustrate issue):



#version 420 core

#extension GL_ARB_shading_language_include : require


#include </scm/data/horizon/ray_cast.glslh>
#include </scm/data/horizon/ray_cast/uniforms.glslh>


// input/output definitions ///////////////////////////////////////////////////////////////////////
in per_vertex {
[...]
} v_in;


// attribute layout definitions ///////////////////////////////////////////////////////////////////
layout(location = 0, index = 0) out vec4 out_color;
//layout(depth_any) out float gl_FragDepth;


// implementation /////////////////////////////////////////////////////////////////////////////////
void main()
{
out_color.rgba = vec4(1.0, 0.0, 0.0, 1.0);

ray_cast_something(ray_cast_restult);

//if (ray_cast_restult._t > -1.0) {
// [...] output something from the result
// }


out_color.rgb = vec4(0.0, 0.0, 1.0, 1.0);
}



The problem I have is that where the ray does hit a surface I get red color and everywhere else blue. However, with this shader code I should get blue everywhere.

The problem seems to be that the shader is terminating the ray casting function but is not returning control to the main function somehow. I know that the function is working because I write per-pixel feedback into texture images (image load store) and I can see the intersection results are fine. I tried to break down the ray casting function to get it to return some debug values. However everything I tried failed to get control back in the main function to write even just a plain color. To rule out blending or depth testing these options were disabled.

With driver versions up the the 326+ line everything worked as expected and understandable.

Piers Daniell
08-12-2013, 10:52 AM
326.29 does not allow swap interval control from the code. While WGL_EXT_swap_control extension is properly listed, (PFNWGLSWAPINTERVALEXTPROC)wglGetProcAddress("wglSwapIntervalEXT") returns NULL. Vsync can be turned on/off from the control panel, so only problem is in returning function address.
I was not able to reproduce this. wglGetProcAddress("wglSwapIntervalEXT") seems to return a non-NULL pointer for me. What GPU are you using?

Piers Daniell
08-13-2013, 09:34 AM
We have published an updated version of the Windows OpenGL 4.4 beta drivers to version 326.58. You can download it from the usual place:
https://developer.nvidia.com/opengl-driver

This update fixes the following bugs reported in this forum thread:
Comment #8: Error C3008: unknown layout specifier 'patch'
Comment #10: Access to matrix member of an SSBO

Also fixed:
Uses of samples in structs and the "uniform" keyword
Fix some bugs with ARB_sparse_texture on XP

Chris Lux
08-14-2013, 04:32 AM
We have published an updated version of the Windows OpenGL 4.4 beta drivers to version 326.58. You can download it from the usual place:
https://developer.nvidia.com/opengl-driver

This update fixes the following bugs reported in this forum thread:
Uses of samples in structs and the "uniform" keyword

What does this refer to?

I did a lot of changes in my project to remove samplers from structs (strictly used as uniform) after testing the OpenGL 4.4 beta driver. After reading the spec it seemed as that this was 'more' clarified in the current GLSL spec as when i implemented it first and samplers were somewhat allowed in uniform structs. Is this allowed again? Is this behavior conformant to the spec? It seems a bit of a grey area...

Piers Daniell
08-14-2013, 09:32 AM
What does this refer to?

I did a lot of changes in my project to remove samplers from structs (strictly used as uniform) after testing the OpenGL 4.4 beta driver. After reading the spec it seemed as that this was 'more' clarified in the current GLSL spec as when i implemented it first and samplers were somewhat allowed in uniform structs. Is this allowed again? Is this behavior conformant to the spec? It seems a bit of a grey area...
The OpenGL 4.4 beta driver had a regression where we became too restrictive with samplers in structs. Basically the spec, in the absence of bindless texture, only allows samplers to be declared as part of a uniform-qualified variable. However, there is no restriction that the sampler can't be in a struct which is declared as a uniform-qualified variable. The bug we fixed is that we forbid this by mistake, where in previous drivers we allowed it.

Here are some examples:
uniform sampler2D tex; // legal

struct S {
uniform sampler2D tex; // illegal to put "uniform" in struct
};

struct S {
sampler2D tex; // legal
};
S foo; // illegal - "samplers" must be uniform
uniform S bar; // legal - the sampler in S is now part of a uniform-qualified variable (the bug we fixed between 326.29 and 326.58 is to allow this)

uniform Buffer {
sampler2D tex; // illegal - without bindless texture this is not allowed
};

Hope that helps.

Chris Lux
08-14-2013, 11:57 AM
The OpenGL 4.4 beta driver had a regression where we became too restrictive with samplers in structs. Basically the spec, in the absence of bindless texture, only allows samplers to be declared as part of a uniform-qualified variable. However, there is no restriction that the sampler can't be in a struct which is declared as a uniform-qualified variable. The bug we fixed is that we forbid this by mistake, where in previous drivers we allowed it.

Here are some examples:
uniform sampler2D tex; // legal

struct S {
uniform sampler2D tex; // illegal to put "uniform" in struct
};

struct S {
sampler2D tex; // legal
};
S foo; // illegal - "samplers" must be uniform
uniform S bar; // legal - the sampler in S is now part of a uniform-qualified variable (the bug we fixed between 326.29 and 326.58 is to allow this)

uniform Buffer {
sampler2D tex; // illegal - without bindless texture this is not allowed
};

Hope that helps.
Thanks for the clarification.

However, does the spec really allow this. I read the part about opaque and sampler types and it seems to be quite picky about these types, where and how they can be declared. I mean it is really useful, but is it legal according to the spec?

-chris

Piers Daniell
08-14-2013, 02:46 PM
Yes, the GLSL spec allows this. I confirmed with the spec editor. This sentence on page 27 of the GLSL 440 spec:
They [opaque values] can only be declared as function parameters or uniform-qualified variables.

doesn't just mean basic-type variables, it means any variable, including structs.

oscarbg
08-14-2013, 04:56 PM
Yes, the GLSL spec allows this. I confirmed with the spec editor. This sentence on page 27 of the GLSL 440 spec:
They [opaque values] can only be declared as function parameters or uniform-qualified variables.

doesn't just mean basic-type variables, it means any variable, including structs.
Thanks for this fast bug fixing!
Now some questions hope you can answer some of them:
I found references about NVX_shader_thread_group and NVX_shader_thread_shuffle and on twitter Pat Brown said that he thought were already implemented on driver.
Are specs coming soon jointly for also new WGL_NV_delay_before_swap extension?

Release notes of new 325 drivers mention TXAA support on OpenGL.. I obtained TXAA SDK (v2.1) contacting Lottes.. question is if TXAA on OpenGL is already present for Linux 325 driver or Windows driver only?.. if Linux support is present I would like to test but my TXAA SDK libs aren't compiled for Linux..
Also we get now GL_NVX_nvenc_interop extension reported but current NVENC 2.0 SDK June 2013 release on web doesn't mention support yet for NVENC encoding from OGL buffers/texes (only CUDA pointers and DX support presently).. any clue?

Latest Intel drivers seems to ship with new exts:
GL_INTEL_fragment_shader_span_sharing
Lottes commented "GL_INTEL_fragment_shader_span_sharing might be sharing across the threads in a 2x2 pixel quad?"
Eric Penner's “Shader Amortization using Pixel Quad Message Passing" (Gpu Pro 2) talks about some usage cases
In case Fermi/Kepler HW supports that is NV interested on implementing/exposing this functionality..
GL_INTEL_compute_shader_lane_shift
That also seems like new warp SHUFL intrustion in PTX parlance in Kepler might NV also expose that?

thanks..

GClements
08-14-2013, 09:00 PM
However, does the spec really allow this. I read the part about opaque and sampler types and it seems to be quite picky about these types, where and how they can be declared.
Are you confusing "uniform block" with "struct"?

Opaque types cannot be used in uniform blocks, but I see no such prohibition on structs.

Chris Lux
08-15-2013, 02:45 AM
Yes, the GLSL spec allows this. I confirmed with the spec editor. This sentence on page 27 of the GLSL 440 spec:
They [opaque values] can only be declared as function parameters or uniform-qualified variables.

doesn't just mean basic-type variables, it means any variable, including structs.
very cool! thanks again for the clarification.


Are you confusing "uniform block" with "struct"?

Opaque types cannot be used in uniform blocks, but I see no such prohibition on structs.
nope, i meant structs that are used as uniforms. i use(d) them to pack samplers together and to instantiate multiple such structs with samplers inside.

i just installed the 326.58 and still get the error:
error C7554: OpenGL requires sampler variables to be explicitly declared as uniform

the case where i get the error is like this:


struct A {
sampler2D a_sampler;
};

struct B {
A a;
sampler2D b_sampler;
float something;
};

uniform B b;

it now complains for the a_sampler declaration.

Piers Daniell
08-19-2013, 03:20 PM
We have published an updated version of the Windows OpenGL 4.4 beta drivers to version 326.77. Linux updates will follow shortly. You can download them from the usual place:
https://developer.nvidia.com/opengl-driver

This update fixes the following bugs reported in this forum thread:
Comment #13: main() terminated prematurely
Comment #22: More problems with samplers in structs

Also fixed:
Unable to allocate a DEPTH_COMPONENT16 sparse texture
Rendering corruption with sparse depth textures
Fix system instability when using multiple sparse textures

Piers Daniell
08-20-2013, 09:28 AM
Thanks for this fast bug fixing!
Now some questions hope you can answer some of them:
I found references about NVX_shader_thread_group and NVX_shader_thread_shuffle and on twitter Pat Brown said that he thought were already implemented on driver.
Are specs coming soon jointly for also new WGL_NV_delay_before_swap extension?

Release notes of new 325 drivers mention TXAA support on OpenGL.. I obtained TXAA SDK (v2.1) contacting Lottes.. question is if TXAA on OpenGL is already present for Linux 325 driver or Windows driver only?.. if Linux support is present I would like to test but my TXAA SDK libs aren't compiled for Linux..
Also we get now GL_NVX_nvenc_interop extension reported but current NVENC 2.0 SDK June 2013 release on web doesn't mention support yet for NVENC encoding from OGL buffers/texes (only CUDA pointers and DX support presently).. any clue?

Latest Intel drivers seems to ship with new exts:
GL_INTEL_fragment_shader_span_sharing
Lottes commented "GL_INTEL_fragment_shader_span_sharing might be sharing across the threads in a 2x2 pixel quad?"
Eric Penner's “Shader Amortization using Pixel Quad Message Passing" (Gpu Pro 2) talks about some usage cases
In case Fermi/Kepler HW supports that is NV interested on implementing/exposing this functionality..
GL_INTEL_compute_shader_lane_shift
That also seems like new warp SHUFL intrustion in PTX parlance in Kepler might NV also expose that?

thanks..


Sorry for the delay in replying. The WGL_NV_delay_before_swap extension spec can be found here:
http://www.opengl.org/registry/specs/NV/wgl_delay_before_swap.txt

The NVX_shader_thread_group and NVX_shader_thread_shuffle extensions aren't quite ready, but we hope to release specs for these soon.

I don't know the status of any of the other specs you mention. I'll need to do a little research.

Piers Daniell
08-21-2013, 09:56 PM
I have posted a new Windows OpenGL 4.4 beta driver 326.84 to the usual place:
https://developer.nvidia.com/opengl-driver

This fixes an issue with CUDA/OpenCL and an issue with using lots of sparse textures.

Dark Photon
08-22-2013, 05:54 AM
BTW, what happened to the 325.05.04 Linux beta posted here. Was that revoked?

(I have it downloaded, but I see 325.05.03 is the latest listed there now.)

Piers Daniell
08-22-2013, 08:48 AM
I've fixed the webpage back to pointing to 325.05.04. Sorry about that. I hope to get a new revision posted next week.

Piers Daniell
08-29-2013, 04:13 PM
The OpenGL 4.4 beta drivers for both Windows and Linux have been updated. The new Windows version is 326.98 and the new Linux version is 325.05.13. The major fix in these new drivers is a regression in the functionality of atomics. New drivers can be found in the usual place:
https://developer.nvidia.com/opengl-driver

There is a known issue with layered rendering to sparse textures, notably cube maps, that we're currently investigating. A fix for this should be available next week.

gregory38
09-02-2013, 01:11 PM
Trying ARB_bindless texture on linux I got the error below on some programs (only a different define but no change on layout or binding). Linux driver 32.05.13.



glProgramUniformHandleui64vARB => GL_INVALID_OPERATION. Element is invalid

More verbose error message will be welcome :) Specs only deals with invalid layout (or binding).

Here the full shader. This one is fine. If I replace the initial "#define PS_ATST 1" by "#define PS_ATST 6". I got the above error!


#version 330 core
#extension GL_ARB_shading_language_420pack: require
#extension GL_ARB_separate_shader_objects: require
#extension GL_ARB_shader_image_load_store: require
#extension GL_ARB_bindless_texture: require
#define ENABLE_BINDLESS_TEX
#define FRAGMENT_SHADER 1
#define ps_main main
#define PS_FST 0
#define PS_WMS 0
#define PS_WMT 0
#define PS_FMT 0
#define PS_AEM 0
#define PS_TFX 0
#define PS_TCC 1
#define PS_ATST 1
#define PS_FOG 1
#define PS_CLR1 0
#define PS_FBA 0
#define PS_AOUT 0
#define PS_LTF 0
#define PS_COLCLIP 0
#define PS_DATE 0
#define PS_SPRITEHACK 0
#define PS_TCOFFSETHACK 0
#define PS_POINT_SAMPLER 0
#define PS_IIP 1
//#version 420 // Keep it for text editor detection

// note lerp => mix

#define FMT_32 0
#define FMT_24 1
#define FMT_16 2
#define FMT_PAL 4 /* flag bit */

// Not sure we have same issue on opengl. Doesn't work anyway on ATI card
// And I say this as an ATI user.
#define ATI_SUCKS 0

#ifndef VS_BPPZ
#define VS_BPPZ 0
#define VS_TME 1
#define VS_FST 1
#define VS_LOGZ 0
#endif

#ifndef PS_FST
#define PS_FST 0
#define PS_WMS 0
#define PS_WMT 0
#define PS_FMT FMT_32
#define PS_AEM 0
#define PS_TFX 0
#define PS_TCC 1
#define PS_ATST 1
#define PS_FOG 0
#define PS_CLR1 0
#define PS_FBA 0
#define PS_AOUT 0
#define PS_LTF 1
#define PS_COLCLIP 0
#define PS_DATE 0
#define PS_SPRITEHACK 0
#define PS_POINT_SAMPLER 0
#define PS_TCOFFSETHACK 0
#define PS_IIP 1
#endif

struct vertex
{
vec4 t;
vec4 c;
vec4 fc;
};

#ifdef FRAGMENT_SHADER

#if !GL_ES && __VERSION__ > 140

in SHADER
{
vec4 t;
vec4 c;
flat vec4 fc;
} PSin;

#define PSin_t (PSin.t)
#define PSin_c (PSin.c)
#define PSin_fc (PSin.fc)

#else

#ifdef DISABLE_SSO
in vec4 SHADERt;
in vec4 SHADERc;
flat in vec4 SHADERfc;
#else
layout(location = 0) in vec4 SHADERt;
layout(location = 1) in vec4 SHADERc;
flat layout(location = 2) in vec4 SHADERfc;
#endif
#define PSin_t SHADERt
#define PSin_c SHADERc
#define PSin_fc SHADERfc

#endif

// Same buffer but 2 colors for dual source blending
#if GL_ES
layout(location = 0) out vec4 SV_Target0;
#else
layout(location = 0, index = 0) out vec4 SV_Target0;
layout(location = 0, index = 1) out vec4 SV_Target1;
#endif

#ifdef ENABLE_BINDLESS_TEX
layout(bindless_sampler, binding = 0) uniform sampler2D TextureSampler;
layout(bindless_sampler, binding = 1) uniform sampler2D PaletteSampler;
#else
#ifdef DISABLE_GL42
uniform sampler2D TextureSampler;
uniform sampler2D PaletteSampler;
#else
layout(binding = 0) uniform sampler2D TextureSampler;
layout(binding = 1) uniform sampler2D PaletteSampler;
#endif
#endif

#ifndef DISABLE_GL42_image
#if PS_DATE > 0
// FIXME how to declare memory access
layout(r32i, binding = 2) coherent uniform iimage2D img_prim_min;
#endif
#else
// use basic stencil
#endif

#ifndef DISABLE_GL42_image
#if PS_DATE > 0
// origin_upper_left
layout(pixel_center_integer) in vec4 gl_FragCoord;
//in int gl_PrimitiveID;
#endif
#endif

#ifdef DISABLE_GL42
layout(std140) uniform cb21
#else
layout(std140, binding = 21) uniform cb21
#endif
{
vec3 FogColor;
float AREF;
vec4 HalfTexel;
vec4 WH;
vec4 MinMax;
vec2 MinF;
vec2 TA;
uvec4 MskFix;
vec4 TC_OffsetHack;
};

vec4 sample_c(vec2 uv)
{
// FIXME: check the issue on openGL
if (ATI_SUCKS == 1 && PS_POINT_SAMPLER == 1)
{
// Weird issue with ATI cards (happens on at least HD 4xxx and 5xxx),
// it looks like they add 127/128 of a texel to sampling coordinates
// occasionally causing point sampling to erroneously round up.
// I'm manually adjusting coordinates to the centre of texels here,
// though the centre is just paranoia, the top left corner works fine.
uv = (trunc(uv * WH.zw) + vec2(0.5, 0.5)) / WH.zw;
}

return texture(TextureSampler, uv);
}

vec4 sample_p(float u)
{
//FIXME do we need a 1D sampler. Big impact on opengl to find 1 dim
// So for the moment cheat with 0.0f dunno if it work
return texture(PaletteSampler, vec2(u, 0.0f));
}

#if 0
#else
vec4 wrapuv(vec4 uv)
{
vec4 uv_out = uv;

if(PS_WMS == PS_WMT)
{
if(PS_WMS == 2)
{
uv_out = clamp(uv, MinMax.xyxy, MinMax.zwzw);
}
else if(PS_WMS == 3)
{
uv_out = vec4((ivec4(uv * WH.xyxy) & ivec4(MskFix.xyxy)) | ivec4(MskFix.zwzw)) / WH.xyxy;
}
}
else
{
if(PS_WMS == 2)
{
uv_out.xz = clamp(uv.xz, MinMax.xx, MinMax.zz);
}
else if(PS_WMS == 3)
{
uv_out.xz = vec2((ivec2(uv.xz * WH.xx) & ivec2(MskFix.xx)) | ivec2(MskFix.zz)) / WH.xx;
}
if(PS_WMT == 2)
{
uv_out.yw = clamp(uv.yw, MinMax.yy, MinMax.ww);
}
else if(PS_WMT == 3)
{
uv_out.yw = vec2((ivec2(uv.yw * WH.yy) & ivec2(MskFix.yy)) | ivec2(MskFix.ww)) / WH.yy;
}
}

return uv_out;
}
#endif

#if 0
#else
vec2 clampuv(vec2 uv)
{
vec2 uv_out = uv;

if(PS_WMS == 2 && PS_WMT == 2)
{
uv_out = clamp(uv, MinF, MinMax.zw);
}
else if(PS_WMS == 2)
{
uv_out.x = clamp(uv.x, MinF.x, MinMax.z);
}
else if(PS_WMT == 2)
{
uv_out.y = clamp(uv.y, MinF.y, MinMax.w);
}

return uv_out;
}
#endif

mat4 sample_4c(vec4 uv)
{
mat4 c;

c[0] = sample_c(uv.xy);
c[1] = sample_c(uv.zy);
c[2] = sample_c(uv.xw);
c[3] = sample_c(uv.zw);

return c;
}

vec4 sample_4a(vec4 uv)
{
vec4 c;

// Dx used the alpha channel.
// Opengl is only 8 bits on red channel.
c.x = sample_c(uv.xy).r;
c.y = sample_c(uv.zy).r;
c.z = sample_c(uv.xw).r;
c.w = sample_c(uv.zw).r;

return c * 255.0/256.0 + 0.5/256.0;
}

mat4 sample_4p(vec4 u)
{
mat4 c;

c[0] = sample_p(u.x);
c[1] = sample_p(u.y);
c[2] = sample_p(u.z);
c[3] = sample_p(u.w);

return c;
}

vec4 sample_color(vec2 st, float q)
{
if(PS_FST == 0) st /= q;

if(PS_TCOFFSETHACK == 1) st += TC_OffsetHack.xy;

vec4 t;
mat4 c;
vec2 dd;

if (PS_LTF == 0 && PS_FMT <= FMT_16 && PS_WMS < 3 && PS_WMT < 3)
{
c[0] = sample_c(clampuv(st));
}
else
{
vec4 uv;

if(PS_LTF != 0)
{
uv = st.xyxy + HalfTexel;
dd = fract(uv.xy * WH.zw);
}
else
{
uv = st.xyxy;
}

uv = wrapuv(uv);

if((PS_FMT & FMT_PAL) != 0)
{
c = sample_4p(sample_4a(uv));
}
else
{
c = sample_4c(uv);
}
}

// PERF: see the impact of the exansion before/after the interpolation
for (int i = 0; i < 4; i++)
{
if((PS_FMT & ~FMT_PAL) == FMT_24)
{
// FIXME GLSL any only support bvec so try to mix it with notEqual
bvec3 rgb_check = notEqual( c[i].rgb, vec3(0.0f, 0.0f, 0.0f) );
c[i].a = ( (PS_AEM == 0) || any(rgb_check) ) ? TA.x : 0.0f;
}
else if((PS_FMT & ~FMT_PAL) == FMT_16)
{
// FIXME GLSL any only support bvec so try to mix it with notEqual
bvec3 rgb_check = notEqual( c[i].rgb, vec3(0.0f, 0.0f, 0.0f) );
c[i].a = c[i].a >= 0.5 ? TA.y : ( (PS_AEM == 0) || any(rgb_check) ) ? TA.x : 0.0f;
}
}

if(PS_LTF != 0)
{
t = mix(mix(c[0], c[1], dd.x), mix(c[2], c[3], dd.x), dd.y);
}
else
{
t = c[0];
}

return t;
}

#ifdef SUBROUTINE_GL40
#else
vec4 tfx(vec4 t, vec4 c)
{
vec4 c_out = c;
if(PS_TFX == 0)
{
if(PS_TCC != 0)
{
c_out = c * t * 255.0f / 128.0f;
}
else
{
c_out.rgb = c.rgb * t.rgb * 255.0f / 128.0f;
}
}
else if(PS_TFX == 1)
{
if(PS_TCC != 0)
{
c_out = t;
}
else
{
c_out.rgb = t.rgb;
}
}
else if(PS_TFX == 2)
{
c_out.rgb = c.rgb * t.rgb * 255.0f / 128.0f + c.a;

if(PS_TCC != 0)
{
c_out.a += t.a;
}
}
else if(PS_TFX == 3)
{
c_out.rgb = c.rgb * t.rgb * 255.0f / 128.0f + c.a;

if(PS_TCC != 0)
{
c_out.a = t.a;
}
}

return c_out;
}
#endif


#if 0
void datst()
{
#if PS_DATE > 0
float alpha = sample_rt(PSin_tp.xy).a;
float alpha0x80 = 128.0 / 255;

if (PS_DATE == 1 && alpha >= alpha0x80)
discard;
else if (PS_DATE == 2 && alpha < alpha0x80)
discard;
#endif
}
#endif

#ifdef SUBROUTINE_GL40
#else
void atst(vec4 c)
{
float a = trunc(c.a * 255.0 + 0.01);

if(PS_ATST == 0) // never
{
discard;
}
else if(PS_ATST == 1) // always
{
// nothing to do
}
else if(PS_ATST == 2 ) // l
{
if (PS_SPRITEHACK == 0)
if ((AREF - a - 0.5f) < 0.0f)
discard;
}
else if(PS_ATST == 3 ) // le
{
if ((AREF - a + 0.5f) < 0.0f)
discard;
}
else if(PS_ATST == 4) // e
{
if ((0.5f - abs(a - AREF)) < 0.0f)
discard;
}
else if(PS_ATST == 5) // ge
{
if ((a-AREF + 0.5f) < 0.0f)
discard;
}
else if(PS_ATST == 6) // g
{
if ((a-AREF - 0.5f) < 0.0f)
discard;
}
else if(PS_ATST == 7) // ne
{
if ((abs(a - AREF) - 0.5f) < 0.0f)
discard;
}
}
#endif

// Note layout stuff might require gl4.3
#ifdef SUBROUTINE_GL40
#else
void colclip(inout vec4 c)
{
if (PS_COLCLIP == 2)
{
c.rgb = 256.0f/255.0f - c.rgb;
}
if (PS_COLCLIP > 0)
{
// FIXME !!!!
//c.rgb *= c.rgb < 128./255;
bvec3 factor = bvec3(128.0f/255.0f, 128.0f/255.0f, 128.0f/255.0f);
c.rgb *= vec3(factor);
}
}
#endif

void fog(vec4 c, float f)
{
if(PS_FOG != 0)
{
c.rgb = mix(FogColor, c.rgb, f);
}
}

vec4 ps_color()
{
vec4 t = sample_color(PSin_t.xy, PSin_t.w);

vec4 zero = vec4(0.0f, 0.0f, 0.0f, 0.0f);
vec4 one = vec4(1.0f, 1.0f, 1.0f, 1.0f);
#if PS_IIP == 1
vec4 c = clamp(tfx(t, PSin_c), zero, one);
#else
vec4 c = clamp(tfx(t, PSin_fc), zero, one);
#endif

atst(c);

fog(c, PSin_t.z);

colclip(c);

if(PS_CLR1 != 0) // needed for Cd * (As/Ad/F + 1) blending modes
{
c.rgb = vec3(1.0f, 1.0f, 1.0f);
}

return c;
}

#if GL_ES
void ps_main()
{
vec4 c = ps_color();
c.a *= 2.0;
SV_Target0 = c;
}
#endif

#if !GL_ES
void ps_main()
{
#if PS_DATE == 3 && !defined(DISABLE_GL42_image)
int stencil_ceil = imageLoad(img_prim_min, ivec2(gl_FragCoord.xy));
// Note gl_PrimitiveID == stencil_ceil will be the primitive that will update
// the bad alpha value so we must keep it.

if (gl_PrimitiveID > stencil_ceil) {
discard;
}
#endif

vec4 c = ps_color();

float alpha = c.a * 2.0;

if(PS_AOUT != 0) // 16 bit output
{
float a = 128.0f / 255.0; // alpha output will be 0x80

c.a = (PS_FBA != 0) ? a : step(0.5, c.a) * a;
}
else if(PS_FBA != 0)
{
if(c.a < 0.5) c.a += 0.5;
}

// Get first primitive that will write a failling alpha value
#if PS_DATE == 1 && !defined(DISABLE_GL42_image)
// DATM == 0
// Pixel with alpha equal to 1 will failed
if (c.a > 127.5f / 255.0f) {
imageAtomicMin(img_prim_min, ivec2(gl_FragCoord.xy), gl_PrimitiveID);
}
//memoryBarrier();
#elif PS_DATE == 2 && !defined(DISABLE_GL42_image)
// DATM == 1
// Pixel with alpha equal to 0 will failed
if (c.a < 127.5f / 255.0f) {
imageAtomicMin(img_prim_min, ivec2(gl_FragCoord.xy), gl_PrimitiveID);
}
#endif


#if (PS_DATE == 2 || PS_DATE == 1) && !defined(DISABLE_GL42_image)
// Don't write anything on the framebuffer
// Note: you can't use discard because it will also drop
// image operation
#else
SV_Target0 = c;
SV_Target1 = vec4(alpha, alpha, alpha, alpha);
#endif

}
#endif // !GL_ES

#endif

Nowhere-01
09-02-2013, 10:54 PM
...

you know, you probably should've posted only the actual define combination, that causes problems and cut out all irrelevant code. i highly doubt
anyone will read through this garbage(i hope you understand, why is it garbage and there's some unavoidable reason for it to be like that, although i don't think it is possibile to justify such thing).

gregory38
09-05-2013, 02:37 PM
I will post a shorter shader when I got some free times

Basically the only difference is an alpha test
Good shader


void atst(vec4 c)
{ }


Bad shader. AREF is an uniform.


void atst(vec4 c)
float a = trunc(c.a * 255.0 + 0.01);
if ((a-AREF - 0.5f) < 0.0f)
discard;


In both case texture sampler are defined like that


layout(bindless_sampler, binding = 0) uniform sampler2D TextureSampler;
layout(bindless_sampler, binding = 1) uniform sampler2D PaletteSampler;

Piers Daniell
09-09-2013, 10:06 AM
@gregory38, I wasn't able to reproduce the problem you reported. I was able to compile and link your fragment shader fine and glProgramUniformHandleui64vARB appears to work correctly, at least on "TextureSampler". I couldn't try it with "PaletteSampler" because it appears that uniform is never referenced with PS_ATST set to either 1 or 6.

Is there any chance you're not using the correct location value for "TextureSampler"? For me, if I compile with PS_ATST with 6 then "TextureSampler" gets a location of "1" (and not 0). However, if you try to
call glProgramUniformHandleui64vARB with a location of "0" then you'll get "Element is invalid".

Basically, if you change your program in any way, including something simple like changing the PS_ATST value, you need to query all the locations again, because they may have changed.

gregory38
09-10-2013, 12:34 AM
Is there any chance you're not using the correct location value for "TextureSampler"? For me, if I compile with PS_ATST with 6 then "TextureSampler" gets a location of "1" (and not 0). However, if you try to
call glProgramUniformHandleui64vARB with a location of "0" then you'll get "Element is invalid".

Ok that my issue, I used the same glcall which expect a location of 0. I was wrongly expecting location to follow the layout binding semantic. I guess "binding" is linked to image unit not "bindless uniform". Is there any way to specify a default location with layout, or is it a limitation of the extension? Might worth to add an error/warning that said "bindless sampler and binding layout property are incompatible".

gregory38
09-13-2013, 12:11 AM
Is there any way to specify a default location with layout, or is it a limitation of the extension? I guess I can use location

Piers Daniell
09-16-2013, 11:07 AM
The OpenGL 4.4 beta drivers for both Windows and Linux have been updated. The new Windows version is 327.24 and the new Linux version is 325.05.14. The major fix in these new drivers is to a problem with rendering to layered sparse textures and an issue with 3D sparse textures. New drivers can be found in the usual place:
https://developer.nvidia.com/opengl-driver

oscarbg
10-11-2013, 05:53 PM
Hi can you provide some info since last update on what's new on new drivers like 327.44 and 331.40 what's has most bug fixes i.e. recommended for development?
also I see EGL is supported now on Linux but still no full OpenGL only GL ES but seems a tegra Linux driver 334 already supports full OGL via EGL so I assume coming soon to Linux world.. question is if EGL will be coming to Windows also so we can use EGL APIs for Linux and Windows as already both Intel and AMD GPU drivers have EGL Windows support (but limited to GL ES)..