PDA

View Full Version : GLSL crash.....Limits reached????



nashouston
10-01-2010, 02:24 AM
Hi everybody!!
I use glsl to evaluate nurbs surface..... but I don't know why
something don't work...For instance there is a function which do a binary search on a knot list to find a knot span...


int findSpan(int degree, float current_u, samplerBuffer KNOT_TYPE){

if(current_u < texelFetch(KNOT_TYPE, 0).x || current_u > texelFetch(KNOT_TYPE, textureSize(KNOT_TYPE)-1).x)
discard;

int n = textureSize(KNOT_TYPE) - degree - 2;

if( current_u == (texelFetch(KNOT_TYPE, n+1)).x)
return n;

int min=degree; int max=n+1;
int middle=(min+max)/2;

while (current_u < (texelFetch(KNOT_TYPE, middle)).x || current_u >= (texelFetch(KNOT_TYPE, middle+1)).x ){
if(current_u < (texelFetch(KNOT_TYPE, middle)).x)
max=middle;

if(current_u >= (texelFetch(KNOT_TYPE, middle)).x)
min=middle;

middle=(min+max)/2;
}
return middle;
}
The problem is the while loop which not terminated....this function works fine on CPU and always terminate whenever the entry value is comprised in the entire knot domain(in my code the entry value is always legal)...Maybe glsl don't support which statement well....I don't find any limitation in the specification...so need your help....
Thanks

kRogue
10-01-2010, 06:49 AM
my biggest suspicions:



if( current_u == (texelFetch(KNOT_TYPE, n+1)).x)


Secondly, what hardware/OS, etc?

Thirdly, how do you know it is the while loop that is not terminated?

Also are you 100% sure the contents of your samplerBuffer are set correctly?

One suggestion: change from using floats, to ints for current_u and samplerBuffer to isamplerBuffer (and change your C/C++ code too). That way round off issues are no longer an issue.

nashouston
10-01-2010, 08:39 AM
Thanks you KRogue for your quick reply....

Secondly, what hardware/OS, etc?
windows vista and nvidia 9700mgt



Thirdly, how do you know it is the while loop that is not terminated?

When I limited the number of iterations (For instance MAX_ITERATION = 4) the system don't crash....



Also are you 100% sure the contents of your samplerBuffer are set correctly?

yes absolutely...


One suggestion: change from using floats, to ints for current_u and samplerBuffer to isamplerBuffer (and change your C/C++ code too). That way round off issues are no longer an issue.


I can't use int knot values for nurbs evaluation..that is not precise at all...

Have anyone have another suggestion ??

kRogue
10-01-2010, 11:54 AM
I can't use int knot values for nurbs evaluation..that is not precise at all...


err... the idea is that an int I represents say I/2^16 (if you are using 32-bit ints). That will give really good precision, but still if you see it NOT crash when you limit the number of iterations to 4 or so it does make me suspicious of the driver and not the code. The other suspicion is how did you put in the max iteration in the code... it might be 4 worked because the GLSL compiler did some unrolling magic.

What is the format of your texture buffer object? Is it R32F? The using of "==" anywhere makes me beyond nervous in floating point arithmetic.. and the floating point arithmetic on the GPU may or may not be the same as on CPU.

Just to fill in the blanks, what exactly are you doing in that fragment shader blurb you posted? I see that you are hunting for a value's upper bound in the texture buffer object and returning where it is, but to what purpose? Can you post the rest of the shader?

I do not see anything wrong with the shader code.. but that shader looks to me quite untypical for fragment shaders, as it has, on the surface, a while loops whose number of iterations is not easily bounded (it should be log_2(textureSize(KNOT_TYPE)) hmmm.. just an idea an idea, put the max iterations back, but set to say 24... that is a ridiculous size.. also maybe do, though maybe I might be being silly:



while (current_u < (texelFetch(KNOT_TYPE, middle)).x ||
current_u >= (texelFetch(KNOT_TYPE, middle+1)).x )

to


while (min<max &amp;&amp;
(current_u < (texelFetch(KNOT_TYPE, middle)).x
|| current_u >= (texelFetch(KNOT_TYPE, middle+1)).x) ) )

nashouston
10-01-2010, 01:03 PM
what is the format of your texture buffer object? Is it R32F?

yes...


but to what purpose?

For nurbs raytracing on gpu (first find a span where lies the knot : paramU and paramV in my code, second compute basis function, third refine the parameter value using newton iteration, evaluate surface point and derivates)

this is all fragment shader :


#version 330
precision highp float;

layout(location=0, index=0) out vec4 finalColor;

//Initialisation
smooth in float paramU;
smooth in float paramV;
smooth in vec3 rayDir;
smooth in vec3 lightDir;

uniform mat2x4 LIGHT;
uniform vec4 CAMERA;
uniform samplerBuffer SURFACE_CTRL;
uniform samplerBuffer SURFACE_KNOTU;
uniform samplerBuffer SURFACE_KNOTV;
//uniform samplerBuffer TRIMMING_CTRL;
//uniform samplerBuffer TRIMMING_STRUCTURE;
uniform isamplerBuffer SURFACE_AUX;
//uniform isamplerBuffer TRIMMING_DEG;
//uniform samplerBuffer TRIMMING_AUX;
uniform sampler2D DIFFUSE_COLOR;

layout(std140)uniform matiere{
float ksh;
float kd;
float ks;
float kr;
};

layout(std140)uniform MatrixMVP{
mat4 MV;
mat4 MP;
};

const int DEG_MAX = 4;
const int DEG_CURVE_MAX = 3;

vec3 phongShade(vec3 N_DIR, vec3 L_DIR, vec3 V_DIR, vec3 Color, float Ka, float Kd, float Ks, vec3 lightCol, float shininess){

vec3 final_color = Color*Ka*lightCol; //ambient
//vec3 final_color = vec3(0.1,0.1,0.1);

float lambertTerm = dot(N_DIR,L_DIR); //lambert

if(lambertTerm > 0.0) {
final_color += Color*Kd*lambertTerm*lightCol; //diffuse

vec3 R = reflect(L_DIR,N_DIR);
float specular = pow(max(dot(R,V_DIR), 0.0), shininess);
final_color += Color*Ks*lightCol*(specular); //specular
}

return final_color;
}

/*int findSpan(int degree, float current_u, samplerBuffer KNOT_TYPE){

if(current_u < texelFetch(KNOT_TYPE, 0).x || current_u > texelFetch(KNOT_TYPE, textureSize(KNOT_TYPE)-1).x)
discard;

int n = textureSize(KNOT_TYPE) - degree - 2;

if( current_u == (texelFetch(KNOT_TYPE, n+1)).x)
return n;

for(int compt=degree; compt < textureSize(KNOT_TYPE); compt++){

if(texelFetch(KNOT_TYPE, compt).x > current_u)
return compt-1;

if(texelFetch(KNOT_TYPE, compt).x == current_u)
return compt;
}
return n;
}*/

int findSpan(int degree, float current_u, samplerBuffer KNOT_TYPE){

if(current_u < texelFetch(KNOT_TYPE, 0).x || current_u > texelFetch(KNOT_TYPE, textureSize(KNOT_TYPE)-1).x)
discard;

int n = textureSize(KNOT_TYPE) - degree - 2;

if( current_u == (texelFetch(KNOT_TYPE, n+1)).x)
return n;

int min=degree; int max=n+1;
int middle=(min+max)/2;
int CONTROL = 0;

while (current_u < (texelFetch(KNOT_TYPE, middle)).x || current_u >= (texelFetch(KNOT_TYPE, middle+1)).x ){
//if(CONTROL > 4)
//break;
if(current_u < (texelFetch(KNOT_TYPE, middle)).x)
max=middle;

if(current_u >= (texelFetch(KNOT_TYPE, middle)).x)
min=middle;

middle=(min+max)/2;
CONTROL++;
}
return middle;
}

float[2*DEG_MAX] basisDerFunc(int degree, int index, float uv, samplerBuffer KNOT_TYPE){

int n = 1;
int taille = degree+1;
float left[DEG_MAX];
float right[DEG_MAX];
float ndu[DEG_MAX*DEG_MAX];
float ar[2*DEG_MAX];
float res[2*DEG_MAX];

ndu[0] = 1.0;

for(int j=1; j<=degree; j++){
left[j]= uv - (texelFetch(KNOT_TYPE, (index+1-j))).x;
right[j]= (texelFetch(KNOT_TYPE, (index+j))).x - uv;
float saved=0.0;
float temp=0.0;

for(int r=0; r<j; r++){
ndu[(j*taille) + r] = right[r+1]+left[j-r];
temp = ndu[(r*taille)+(j-1)]/ndu[(j*taille)+r];
ndu[(r*taille)+j] = saved + (right[r+1]*temp);
saved = left[j-r]*temp;
}
ndu[(j*taille)+j]=saved;
}

for(int j=0; j<=degree; j++)
res[j] = ndu[(j*taille)+degree];

for(int r=0; r<=degree; r++){

int s1=0; int s2=1;
ar[0]=1.0;

for(int k=1; k<=n; k++){

float d = 0.0;
int rk = r-k; int pk = degree-k; int j1; int j2;

if(r>=k){
ar[s2*taille] = ar[s1*taille]/ndu[((pk+1)*taille)+rk];
d = ar[(s2*taille)] * ndu[(rk*taille)+pk];
}

if(rk>= -1) j1=1;
else j1= -rk;

if(r-1<=pk) j2=k-1;
else j2=degree-r;

for(int j=j1; j<=j2; j++){
ar[(s2*taille)+j] = (ar[(s1*taille)+j] - ar[(s1*taille)+j-1]) / ndu[((pk+1)*taille)+rk+j];
d += ar[(s2*taille)+j] * ndu[((rk+j)*taille)+pk];
}

if(r <= pk){
ar[(s2*taille)+k] = -ar[(s1*taille)+k-1] / ndu[((pk+1)*taille)+r];
d += ar[(s2*taille)+k] * ndu[(r*taille)+pk];
}
res[(k*taille)+r] = d ;
int j=s1; s1=s2; s2=j;
}
}
int r=degree;

for(int k=1; k<=n; k++){
for(int j=0; j<=degree; j++)
res[(k*taille)+j] *= r ;

r *= (degree-k);
}
return res;
}

vec3[4] evalDerSurface(int indu, int indv, int degu, int degv, int size_p_v, float Nu[2*DEG_MAX], float Nv[2*DEG_MAX]){

int taille1 = degu+1;
int taille2 = degv+1;
const int d = 1;
const int size_d = d+1;
vec3 SKL[size_d*size_d] = vec3[size_d*size_d](vec3(0.0), vec3(0.0), vec3(0.0), vec3(0.0));
{
vec4 Aders[size_d*size_d] = vec4[size_d*size_d](vec4(0.0), vec4(0.0), vec4(0.0), vec4(0.0));
float Bin[size_d*size_d] = float[size_d*size_d](0.0, 0.0, 0.0, 1.0);

{//Compute Aders and Wders
vec4 temp[DEG_MAX+1] = vec4[DEG_MAX+1](vec4(0.0), vec4(0.0), vec4(0.0), vec4(0.0), vec4(0.0));

int du = min(d,degu);
int dv = min(d,degv);

for(int k=0; k<=du; k++){
for(int s=0; s<=degv; s++)
for(int r=0; r<=degu; r++){
vec4 CUR_CTRL = MP * MV *(texelFetch(SURFACE_CTRL, ((indu-degu+r)*(size_p_v)+indv-degv+s)));
temp[s] = temp[s] + Nu[(k*taille1)+r] * CUR_CTRL;
}

int dd = min(d-k,dv);
for(int l=0; l<=dd; l++){
for(int s=0; s<=degv; s++)
Aders[k*size_d+l] = Aders[k*size_d+l] + Nv[(l*taille2)+s] * temp[s];
}
}
}

//Compute nurbs derivates
for(int k=0; k<=d; k++)
for(int l=0; l<=d-k; l++){
vec3 v = Aders[k*size_d+l].xyz;
for(int j=1; j<=l; j++)
v = v - Bin[l*size_d+j] * Aders[j].w * SKL[k*size_d+l-j];
for(int i=1; i<=k; i++){
v = v - Bin[k*size_d+i] * Aders[i*size_d].w * SKL[(k-i)*size_d+l];
vec3 v2 = vec3(0.0);
for(int j=1; j<=l; j++)
v2 = v2 + Bin[l*size_d+j] * Aders[i*size_d+j].w * SKL[(k-i)*size_d+l-j];
v = v -Bin[k*size_d+i] * v2;
}
SKL[k*size_d+l] = v / Aders[0].w;
}
}
return SKL;
//return glm::cross(SKL[1][0], SKL[0][1]);
}

/*int binarySearch(int begin, int end, float uu){

if(begin == end)
return begin;

int min=begin; int max=end;
int middle=(min+max)/2;

while (uu < texelFetch(TRIMMING_STRUCTURE, middle).x || uu >= texelFetch(TRIMMING_STRUCTURE, middle+1).x){

if(uu < texelFetch(TRIMMING_STRUCTURE, middle).x)
max=middle;
else
min=middle;

middle=(min+max)/2;
}
return middle;
}

float deCasteljau(int index, int cur_index, float t){

int DEG_CUR = texelFetch(TRIMMING_DEG,index).x;
float Q[DEG_CURVE_MAX+1];
for(int ii=cur_index; ii<=(cur_index+DEG_CUR); ii++)
Q[ii] = texelFetch(TRIMMING_CTRL,ii).y;
for(int k=1; k<=DEG_CUR; k++)
for(int i=0; i<=DEG_CUR-k; i++)
Q[i] = (1.0-t)*Q[i] + t*Q[i+1];
return Q[0];
}

float deCasteljauU(int index, int cur_index, float t){

int DEG_CUR = texelFetch(TRIMMING_DEG,index).x;
float Q[DEG_CURVE_MAX+1];
for(int ii=cur_index; ii<=(cur_index+DEG_CUR); ii++)
Q[ii] = texelFetch(TRIMMING_CTRL,ii).x;
for(int k=1; k<=DEG_CUR; k++)
for(int i=0; i<=DEG_CUR-k; i++)
Q[i] = (1.0-t)*Q[i] + t*Q[i+1];
return Q[0];
}

bool optimizedBisection(int index, float Ta, float Tb, float Ua, float Ub, float Va, float Vb, float Up, float Vp){

if(Up <= min(Ua, Ub))
return true;

if(Up >= max(Ua, Ub))
return false;

//Loacalise current ctrl points
int current_index = 0;
for(int ii=0; ii<index; ii++)
current_index += (texelFetch(TRIMMING_DEG, ii).x + 1);

float low;
float high;

if(Va-Vp < 0.0){
low = Ta; high = Tb;
}
else {
low = Tb; high = Ta;
}

float mid = low +(high-low)/2.0;
float f = 0.0;

while((mid != low &amp;&amp; mid != high) &amp;&amp; abs(mid) > 0){

float U_1 = deCasteljauU(index,current_index,low);
float U_2 = deCasteljauU(index,current_index,high);

if(Up <= min(U_1, U_2))
return true;

if(Up >= max(U_1, U_2))
return false;

f = deCasteljau(index,current_index,mid);
float fp = f - Vp;

if(fp < 0.0)
low = mid;
else if(fp > 0.0)
high = mid;
else
return true;

mid = low +(high-low)/2.0;
}
return false;
}

bool isTrimmed(float UU, float VV){
int sizeTotal = textureSize(TRIMMING_STRUCTURE);

int TOTAL_INTERSECTION = 0;

if((textureSize(SURFACE_AUX) - 3) <= 0)
return false;

//Trimming code
int NUM_V = texelFetch(SURFACE_AUX, 3).x;

//Test if VV is in the trimming region
if( VV <= texelFetch(TRIMMING_STRUCTURE, 0).x || VV >= texelFetch(TRIMMING_STRUCTURE, (NUM_V-1)).x)
return false;

int CUR_V_OFFSET = binarySearch(0, NUM_V-1, VV);

//Find index_begin and index_end for VV U-list and index_begin_next and index_end_next for next(VV) U-list
int CUR_U_BEGIN = texelFetch(SURFACE_AUX, 4+CUR_V_OFFSET).x;
int CUR_U_BEGIN_NEXT = texelFetch(SURFACE_AUX, 5+CUR_V_OFFSET).x;
int CUR_U_END = texelFetch(SURFACE_AUX, 5+CUR_V_OFFSET).x - 1;
int CUR_U_END_NEXT = ( CUR_U_BEGIN_NEXT == texelFetch(SURFACE_AUX,textureSize(SURFACE_AUX)-1).x ? sizeTotal-1 : (texelFetch(SURFACE_AUX, 6+CUR_V_OFFSET).x) - 1);

int CUR_U_OFFSET = binarySearch(CUR_U_BEGIN, CUR_U_END, UU);

int COUNT_LINK = 0;
int COUNT_LINK_NEXT = 0;
int CUR_U_OFFSET_NEXT;

for(int i=CUR_U_BEGIN; i<CUR_U_OFFSET; i++)
COUNT_LINK += int(texelFetch(TRIMMING_AUX, 3*(i-NUM_V)).x);

for(int ii=CUR_U_BEGIN_NEXT; ii<=CUR_U_END_NEXT; ii++){
COUNT_LINK_NEXT += int(texelFetch(TRIMMING_AUX, 3*(ii-NUM_V)).x);
if(COUNT_LINK_NEXT == COUNT_LINK){
CUR_U_OFFSET_NEXT = ii+1;
break;
}
}
int INDEX = int(texelFetch(TRIMMING_AUX, 3*(CUR_U_OFFSET_NEXT-NUM_V)+1).x);
float T_1 = texelFetch(TRIMMING_AUX, 3*(CUR_U_OFFSET-NUM_V)+2).x;
//float T_2 = texelFetch(TRIMMING_AUX, 3*(CUR_U_OFFSET_NEXT-NUM_V)+2).x;
float U_1 = texelFetch(TRIMMING_STRUCTURE, CUR_U_OFFSET).x;
float V_1 = texelFetch(TRIMMING_STRUCTURE, CUR_V_OFFSET).x;
float V_2 = texelFetch(TRIMMING_STRUCTURE, CUR_V_OFFSET+1).x;

int LOOP_COUNT = int(texelFetch(TRIMMING_AUX, 3*(CUR_U_OFFSET-NUM_V)).x);

for(int jj = 0; jj<LOOP_COUNT; jj++){
float U_2 = texelFetch(TRIMMING_STRUCTURE, CUR_U_OFFSET_NEXT+jj).x;
float T_2 = texelFetch(TRIMMING_AUX, 3*(CUR_U_OFFSET_NEXT-NUM_V+jj)+2).x;
if(optimizedBisection(INDEX, T_1, T_2, U_1, U_2, V_1, V_2, UU, VV)){
TOTAL_INTERSECTION += (LOOP_COUNT-jj);
}
}

for(int k=CUR_U_OFFSET+1; k<=CUR_U_END; k++)
TOTAL_INTERSECTION += int(texelFetch(TRIMMING_AUX, 3*(k-NUM_V)).x);

if(TOTAL_INTERSECTION % 2 == 0)
return false;
else
return true;
}*/

void main(void){

//Intersection
vec3 NORMAL_1 = ((abs(rayDir.x) > abs(rayDir.y) &amp;&amp; abs(rayDir.x) > abs(rayDir.z)) ? vec3(rayDir.y, -rayDir.x, 0.0) : vec3(0.0, rayDir.z, -rayDir.y));
vec3 NORMAL_2 = cross(NORMAL_1, rayDir.xyz);

float OFFSET_1 = -dot(NORMAL_1, CAMERA.xyz);
float OFFSET_2 = -dot(NORMAL_2, CAMERA.xyz);

//Auxiliary data
int DEG_U = texelFetch(SURFACE_AUX, 1).x;
int DEG_V = texelFetch(SURFACE_AUX, 2).x;
int SIZE_P_V = textureSize(SURFACE_KNOTV) - DEG_V - 1;

//Initialize parameters u v
vec2 CURRENT_UV = vec2(paramU, paramV);

vec3 EVAL[4];

//Newton Iteration (Max Iteration = 4)
/*for(int i=0; i<4; i++){

int INDEX_U = findSpan(DEG_U, CURRENT_UV[0], SURFACE_KNOTU);
int INDEX_V = findSpan(DEG_V, CURRENT_UV[1], SURFACE_KNOTV);
float BASIS_U[2*DEG_MAX] = basisDerFunc(DEG_U, INDEX_U, CURRENT_UV[0], SURFACE_KNOTU);
float BASIS_V[2*DEG_MAX] = basisDerFunc(DEG_V, INDEX_V, CURRENT_UV[1], SURFACE_KNOTV);
EVAL = evalDerSurface(INDEX_U, INDEX_V, DEG_U, DEG_V, SIZE_P_V, BASIS_U, BASIS_V);
mat2 JACOBIAN = mat2( dot(NORMAL_1, EVAL[2]), dot(NORMAL_2, EVAL[2]), dot(NORMAL_1, EVAL[1]), dot(NORMAL_2, EVAL[1]) );
vec2 NEW_FUNC = vec2( dot(NORMAL_1, EVAL[0])+OFFSET_1, dot(NORMAL_1, EVAL[2])+OFFSET_2 );

CURRENT_UV = CURRENT_UV - (inverse(JACOBIAN) * NEW_FUNC);
}*/

//if(isTrimmed(CURRENT_UV.x, CURRENT_UV.y))
//discard;
int INDEX_U = findSpan(DEG_U, CURRENT_UV[0], SURFACE_KNOTU);
int INDEX_V = findSpan(DEG_V, CURRENT_UV[1], SURFACE_KNOTV);
float BASIS_U[2*DEG_MAX] = basisDerFunc(DEG_U, INDEX_U, CURRENT_UV[0], SURFACE_KNOTU);
float BASIS_V[2*DEG_MAX] = basisDerFunc(DEG_V, INDEX_V, CURRENT_UV[1], SURFACE_KNOTV);
EVAL = evalDerSurface(INDEX_U, INDEX_V, DEG_U, DEG_V, SIZE_P_V, BASIS_U, BASIS_V);

vec4 EXACT_NORMAL = (transpose(inverse(MV))) * vec4(cross(EVAL[2],EVAL[1]),1.0);

vec2 TEXTURE_COORD = vec2(CURRENT_UV.x / texelFetch(SURFACE_KNOTU, textureSize(SURFACE_KNOTU)-1).x, CURRENT_UV.y / texelFetch(SURFACE_KNOTV, textureSize(SURFACE_KNOTV)-1).x);

vec4 DIFFUSE_COLOR = texture(DIFFUSE_COLOR, TEXTURE_COORD);

//finalColor = vec4(INDEX_U-5,0.0,0.0,0.0);
finalColor = vec4(phongShade(EXACT_NORMAL.xyz, lightDir, -rayDir, DIFFUSE_COLOR.xyz, 0.5, kd, ks, LIGHT[1].xyz, ksh) ,1.0);

gl_FragDepth = EVAL[0].z;
}


I have also a trimming code but it is in comment for the moment..

Thanks

nashouston
10-01-2010, 01:23 PM
For your information I used the paper available here
Ray Casting of nurbs surfaces (http://www.uni-weimar.de/cms/fileadmin/medien/vr/documents/publications/Ray_Casting_of_Trimmed_NURBS_Surfaces_on_the_GPU.p df)
But I use another trimming approach....

nashouston
10-02-2010, 01:43 PM
up......

kRogue
10-04-2010, 12:49 AM
Try this:

change:


while (current_u < (texelFetch(KNOT_TYPE, middle)).x || current_u >= (texelFetch(KNOT_TYPE, middle+1)).x ){
//if(CONTROL > 4)
//break;
if(current_u < (texelFetch(KNOT_TYPE, middle)).x)
max=middle;

if(current_u >= (texelFetch(KNOT_TYPE, middle)).x)
min=middle;

middle=(min+max)/2;
CONTROL++;
}
return middle;


to



for(int CONTROL=0; CONTROL<MAX_CONTROL; ++CONTROL)
{
if(current_u>=(texelFetch(KNOT_TYPE, middle)).x &amp;&amp; current_u < (texelFetch(KNOT_TYPE, middle+1)).x )
break;

if(min==max)
break;

if(current_u < (texelFetch(KNOT_TYPE, middle)).x)
max=middle;

if(current_u >= (texelFetch(KNOT_TYPE, middle)).x)
min=middle;

middle=(min+max)/2;
}


and define MAX_CONTROL up until you see the thing freezing on you... I _suspect_ the GLSL compiler will unrool the loop (somehow)... and since the number of iterations is no more than log_2(size of array), having MAX_CONTROL at around 24 or so is already quite excessive.

nashouston
10-08-2010, 02:13 AM
Thanks...I will try it....
But do you know how the while loop dont work ????