PDA

View Full Version : Strange segfault error when calling glDrawArrays



jonathanbyrn
01-05-2017, 07:34 AM
Hi,
I am debugging a piece of code and have hit a brick wall. The code takes a voxel mesh, renders it to the screen and outputs it as an obj file. It is reading the file successfully, outputting it as an obj successfully, creating the GLMesh successfully but then throwing a segfault after it calls glBindVertexArray(voxelMesh.VAO) and glDrawArrays(voxelMesh.mode, 0, num_voxel_vertices);

Glxgears works, I have made sure the mesh is being created correctly and that the correct values are being passed. The code also draws a set of axes using glBindVertexArray and glDrawArrays with no problem just before it draws the main mesh.

The brick wall is that the identical code works fine on another machine but crashes on mine which would indicate it is a problem with the libraries. How do I go about debugging a problem in the libraries????

I am running on Ubuntu 16.04 and have attached my glxinfo and apitrace of the error. Any advice on how to tackle this problem would be greatly appreciated.

https://dl.dropboxusercontent.com/u/3440275/glxinfo.txt
https://dl.dropboxusercontent.com/u/3440275/program.trace

Silence
01-06-2017, 04:54 AM
Some code maybe ?

Segmentation fault generally means that you are using a badly/not set pointer. Did you ensured glBindVertexArray is useable ?

jonathanbyrn
01-06-2017, 05:09 AM
Some code maybe ?

Segmentation fault generally means that you are using a badly/not set pointer. Did you ensured glBindVertexArray is useable ?

Thanks for getting back to me. Apologies, here is the offending code:


/* Draw the volume contents. */
glUniform1i(locs.useSingleColour, !g_usingColour);
glUniform1i(locs.useAmbientOcclusion, 1);
float singleColour[4] = {0.1f, 1.0f, 0.2f, 1.0f};
glUniform4fv(locs.singleColour, 1, singleColour);
glBindVertexArray(voxelMesh.VAO);
printf("voxsize:%d\n", getGLMeshSize(voxelMesh));
if (g_drawVoxelsWireframe) {
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
}

printf("drawvox num voxel %d\n", num_voxel_vertices);
printf("drawvox mode %d\n", voxelMesh.mode);
printf("drawvox VAO %u\n", voxelMesh.VAO);
printf("vox size%u\n", voxelMesh.mesh.positions.elementSize);
printf("vox length %u\n", voxelMesh.mesh.positions.length);


glDrawArrays(voxelMesh.mode, 0, num_voxel_vertices);

printf("drawvox finished\n");


Everything in voxelmesh is being populated (VAO, vertices, etc). I initially thought it was going outside the memory bounds but even if I set num_voxel_vertices to 1 it throws an error. So it could be a problem with the VAO or something else.

john_connor
01-06-2017, 05:30 AM
usually you get "segmentation fault" errors if you havent initialized the OpenGL functions after you created the GL context. sometimes you get that error if your vertexarray object isnt build correctly, e.g. you havent set the glVertexAttribPointer(...) correctly or if the buffer object that MUST be bound at the time you call glVertexAttribPointer(...) doesnt exist / hasnt been bound before

make sure you have something likethis:


GLuint myvertexarray = 0;
GLuint myvertexbuffer = 0;

glGenVertexArrays(1, &myvertexarray);
glGenBuffers(1, &myvertexbuffer);

glBindVertexArray(myvertexarray);
glBindBuffer(GL_ARRAY_BUFFER, myvertexbuffer);
glVertexAttribPointer(...);


the point is:
-- there has to be a buffer bound at GL_ARRAY_BUFFER
-- call glVertexAttribPointer(...) AFTER you have bound the buffer
-- make sure you put some data into the buffer at some time BEFORE calling glDrawArrays(...)
-- usually you want to shade your models yourself, so make sure you call glUseProgram(...) + glBindVertexArray(...) before calling glDrawArrays(...)

jonathanbyrn
01-06-2017, 06:46 AM
usually you get "segmentation fault" errors if you havent initialized the OpenGL functions after you created the GL context. sometimes you get that error if your vertexarray object isnt build correctly, e.g. you havent set the glVertexAttribPointer(...) correctly or if the buffer object that MUST be bound at the time you call glVertexAttribPointer(...) doesnt exist / hasnt been bound before

make sure you have something likethis:


GLuint myvertexarray = 0;
GLuint myvertexbuffer = 0;

glGenVertexArrays(1, &myvertexarray);
glGenBuffers(1, &myvertexbuffer);

glBindVertexArray(myvertexarray);
glBindBuffer(GL_ARRAY_BUFFER, myvertexbuffer);
glVertexAttribPointer(...);


the point is:
-- there has to be a buffer bound at GL_ARRAY_BUFFER
-- call glVertexAttribPointer(...) AFTER you have bound the buffer
-- make sure you put some data into the buffer at some time BEFORE calling glDrawArrays(...)
-- usually you want to shade your models yourself, so make sure you call glUseProgram(...) + glBindVertexArray(...) before calling glDrawArrays(...)

Hi, yes the vertex array was being generated in a constructor function:



glGenVertexArrays(1, &glmesh.VAO);
glBindVertexArray(glmesh.VAO);


I just want to clarify something. This is not a problem with the code. The exact same code works on someone else`s machine. I have also tested it by creating a virtual machine, compiling it and running it on that. It ran perfectly!!!

The problem is either the GLEW library or the GLFW library, or some other thing that is being called but I have no idea how to debug it, any recommendations are appreciated!

Silence
01-08-2017, 02:47 AM
I just want to clarify something. This is not a problem with the code. The exact same code works on someone else`s machine. I have also tested it by creating a virtual machine, compiling it and running it on that. It ran perfectly!!!

The problem is either the GLEW library or the GLFW library, or some other thing that is being called but I have no idea how to debug it, any recommendations are appreciated!

This does not mean that the code is good :) This can even mean that there's a bug in the code. This can also mean that the machine on which it is running does not support some functionalities/extensions (which, from your glext, is not the case), or that your program is not linked with the correct GL library (ie, an old one). Or it might be due to the fact that you use them in a different context than the one they were set.

Let us know how you create your context, how you initialize glew and so on.

Try also to print the value of the function pointers. Maybe one of them is NULL. If so, try to get its address by yourself (ie by using glXGetProcAddress under Unix).

jonathanbyrn
01-09-2017, 04:48 AM
This does not mean that the code is good :) This can even mean that there's a bug in the code. This can also mean that the machine on which it is running does not support some functionalities/extensions (which, from your glext, is not the case), or that your program is not linked with the correct GL library (ie, an old one). Or it might be due to the fact that you use them in a different context than the one they were set.

Let us know how you create your context, how you initialize glew and so on.

Try also to print the value of the function pointers. Maybe one of them is NULL. If so, try to get its address by yourself (ie by using glXGetProcAddress under Unix).

Thank you very much for the suggestions, at least it gave me something to google! Apologies for not sending on all the code, I am still familiarising myself with the code base and the GL part is only a small part of the whole program. Here is what I have found so far:




bool initLibraries(const char *title, const int w, const int h)
{
if (!glfwInit()) {
fprintf(stderr, "Error: GLFW could not be initialised. Exiting.\n");
return false;
}

g_window = glfwCreateWindow(w, h, title, NULL, NULL);
if (!g_window) {
printf("Error: A window could not be opened. Exiting.\n");
glfwTerminate();
return false;
}
g_windowWidth = w;
g_windowHeight = h;

glfwMakeContextCurrent(g_window);
glfwSetCursorEnterCallback(g_window, cursorEnterCallback);

GLenum result = glewInit();
if (result != GLEW_OK) {
printf("Error: GLEW could not be initialised.\n");
printf("%s. Exiting.\n", glewGetErrorString(result));
glfwTerminate();
return false;
}

return true;
}

/*
* Initialise renderer
*/

if (!initLibraries("VOLA Renderer", (dualView) ? 1600 : 800, 800)) {
svo_del(g_svo);
return EXIT_FAILURE;
}

const char *fshader = "shaders/fragmentShader.glsl";
const char *vshader = "shaders/vshader.glsl";

char *shaderMessage = NULL;
GLuint fshdr = loadShader(GL_FRAGMENT_SHADER, fshader, &shaderMessage);
if (shaderMessage) {
fprintf(stderr, "%s\n", shaderMessage);
shaderMessage = NULL;
}
GLuint vshdr = loadShader(GL_VERTEX_SHADER, vshader, &shaderMessage);
if (shaderMessage) {
fprintf(stderr, "%s\n", shaderMessage);
shaderMessage = NULL;
}

/* Link the shaders. */
GLuint shader = glCreateProgram();
glAttachShader(shader, fshdr);
glAttachShader(shader, vshdr);
glLinkProgram(shader);
GLint stat;
glGetProgramiv(shader, GL_LINK_STATUS, &stat);
if (!stat) {
GLint len;
glGetProgramiv(shader, GL_INFO_LOG_LENGTH, &len);
GLchar err[len];
glGetProgramInfoLog(shader, len, &len, err);
printf("%s\n", err);
glfwTerminate();
return EXIT_FAILURE;
}
glUseProgram(shader);

struct ShaderLocations locs = {
.colours = glGetAttribLocation(shader, "colour"),
.normals = glGetAttribLocation(shader, "normal"),
.occlusions = glGetAttribLocation(shader, "occlusion"),
.positions = glGetAttribLocation(shader, "position"),

.useLighting = glGetUniformLocation(shader, "useLighting"),
.useAmbientOcclusion = glGetUniformLocation(shader, "useAmbientOcclusion"),
.useSingleColour = glGetUniformLocation(shader, "useSingleColour"),
.passThrough = glGetUniformLocation(shader, "passThrough"),
.singleColour = glGetUniformLocation(shader, "singleColour"),

.cameraInverseT = glGetUniformLocation(shader, "cameraInverseT"),
.cameraInverseRx = glGetUniformLocation(shader, "cameraInverseRx"),
.cameraInverseRy = glGetUniformLocation(shader, "cameraInverseRy"),
.cameraInverseRz = glGetUniformLocation(shader, "cameraInverseRz"),
.perspectiveProjection = glGetUniformLocation(shader, "perspectiveProjection"),
.modelMatrix = glGetUniformLocation(shader, "modelMatrix"),

.lightPosition = glGetUniformLocation(shader, "lightPosition"),
.lightIntensity = glGetUniformLocation(shader, "lightIntensity")
};

g_camera = cameraNew(0, 0, 0, 45.0f, -135.0f, 0, 25, 50, true, 1, 0);

/* Load the perspective projection matrix into the shaders. */
float perspectiveMat[16];
pers(perspectiveMat, 1.0f, 1000.0f, -1.0f, 1.0f, 1.0f, -1.0f);
glUniformMatrix4fv(locs.perspectiveProjection, 1, GL_TRUE, perspectiveMat);

/* ************************************************** ************************ */

/* Light position should be a function of the scene size and oblique
* to the voxel faces for good results */
const float lightX = g_svo->len / 3.1f;
const float lightY = 2000;
const float lightZ = g_svo->len / 2.1f;
const float lightIntensity = 2.5f;

glUniform4f(locs.lightPosition, lightX, lightY, lightZ, 1.0f);
glUniform1f(locs.lightIntensity, lightIntensity);

/* ************************************************** ************************ */

/* Main loop ************************************************** ****************/

glEnable(GL_SCISSOR_TEST);
glEnable(GL_DEPTH_TEST);
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
glPointSize(2.0f);

/* Used to track the time elapsed between frames. */
float prevt, currt;

/* Used to track cursor position. */
double cx, cy, cxprev, cyprev;
glfwGetCursorPos(g_window, &cxprev, &cyprev);

glUniform1i(locs.passThrough, 0);


And here is some of the code for drawing the mesh. It initially is drawing the axes, which works fine so it is capable of drawing a mesh, just not the main mesh!




/* Draw the X, Y and Z axes. */
if (g_drawAxes) {
glUniform1i(locs.useLighting, 0);
glUniform1i(locs.useAmbientOcclusion, 0);
glUniform1i(locs.useSingleColour, 0);
glBindVertexArray(axesMesh.VAO);
printf("axesmesh VAO %u\n", axesMesh.VAO);
glDrawArrays(axesMesh.mode, 0, 6);
}
glUniform1i(locs.useLighting, g_lightingEnabled);
if (g_useLevelOfDetail && g_showLevelOfDetail) {
float LODColour[] = {0.0f, 1.0f, 0.0f, 1.0f};
glUniform1i(locs.useSingleColour, 1);
glUniform4fv(locs.singleColour, 1, LODColour);
glBindVertexArray(LODGLMeshes[g_LOD].VAO);
glDrawArrays(GL_TRIANGLES, 0, LODGLMeshes[g_LOD].mesh.positions.length / 3);
} else {
/* Draw the volume contents. */
glUniform1i(locs.useSingleColour, !g_usingColour);
glUniform1i(locs.useAmbientOcclusion, 1);
float singleColour[4] = {0.1f, 1.0f, 0.2f, 1.0f};
glUniform4fv(locs.singleColour, 1, singleColour);
glBindVertexArray(voxelMesh.VAO);
printf("voxsize:%d\n", getGLMeshSize(voxelMesh));
if (g_drawVoxelsWireframe) {
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
}

printf("drawvox num voxel %d\n", num_voxel_vertices);
printf("drawvox mode %d\n", voxelMesh.mode);
printf("drawvox VAO %u\n", voxelMesh.VAO);
printf("vox size%u\n", voxelMesh.mesh.positions.elementSize);
printf("vox length %u\n", voxelMesh.mesh.positions.length);

glDrawArrays(voxelMesh.mode, 0, num_voxel_vertices);

printf("drawvox finished\n");



Regarding the libraries, I checked them using ldd and GLEW and GLFW are pointing to the libraries I installed. I compared them with the libraries linked on the virtual machine and spotted some differences:

linux-vdso.so.1 => (0x00007ffe537f6000)
libvola.so => /home/jonathan/data/Jonathan/programs/MvOLA/VOLA/build/libvola.so (0x00007f5f4a326000)
libGL.so.1 => /usr/lib/nvidia-367/libGL.so.1 (0x00007f5f4a065000)
libGLEW.so.1.13 => /usr/lib/x86_64-linux-gnu/libGLEW.so.1.13 (0x00007f5f49de1000)
libglfw.so.3 => /usr/lib/x86_64-linux-gnu/libglfw.so.3 (0x00007f5f49bcb000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5f498c2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5f494f8000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5f48f60000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5f48d5b000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5f48638000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f5f482fe000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5f480e1000)
libXrandr.so.2 => /usr/lib/x86_64-linux-gnu/libXrandr.so.2 (0x00007f5f47ed5000)
libXinerama.so.1 => /usr/lib/x86_64-linux-gnu/libXinerama.so.1 (0x00007f5f47cd2000)
libXi.so.6 => /usr/lib/x86_64-linux-gnu/libXi.so.6 (0x00007f5f47ac2000)
libXxf86vm.so.1 => /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1 (0x00007f5f478bb000)
libXcursor.so.1 => /usr/lib/x86_64-linux-gnu/libXcursor.so.1 (0x00007f5f476b1000)
/lib64/ld-linux-x86-64.so.2 (0x000055e2dc0ce000)
libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f5f4749e000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f5f4727c000)
libXrender.so.1 => /usr/lib/x86_64-linux-gnu/libXrender.so.1 (0x00007f5f47071000)
libXfixes.so.3 => /usr/lib/x86_64-linux-gnu/libXfixes.so.3 (0x00007f5f46e6b000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f5f46c67000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f5f46a60000)

These libraries were not linked on the virtual machine:

libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5f49176000)
libGLX.so.0 => /usr/lib/nvidia-367/libGLX.so.0 (0x00007f5f48b2a000)
libGLdispatch.so.0 => /usr/lib/nvidia-367/libGLdispatch.so.0 (0x00007f5f48841000)

And these libraries were linked on the virtual machine but not on my main machine:

libexpat
libXdamage
libdrm

finally I tried calling glXGetProcAddress on the glDrawArrays function but it said it is not in the scope. Is it in the GLEW and GLFW libraries or do I need to import another library.

Thanks again for all the help!

Silence
01-09-2017, 05:05 AM
So it can render with VAO/VBO and glDrawArrays for your axis, but not for your mesh. So your function addresses are certainly good.

How is your voxelMesh created ? Is it the same class (and therefore the same code) than for your axis ? Are the VAO/VBO created only after the GL context had been created ?

jonathanbyrn
01-09-2017, 06:11 AM
So it can render with VAO/VBO and glDrawArrays for your axis, but not for your mesh. So your function addresses are certainly good.

How is your voxelMesh created ? Is it the same class (and therefore the same code) than for your axis ? Are the VAO/VBO created only after the GL context had been created ?

Good question, They both end up calling:


struct GLMesh GLMesh(struct Mesh mesh, GLint colourLocation,
GLint normalLocation, GLint occlusionLocation, GLint positionLocation,
GLenum mode)
{
struct GLMesh glmesh;
glmesh.mode = mode;
glmesh.mesh = mesh;

glGenVertexArrays(1, &glmesh.VAO);
glBindVertexArray(glmesh.VAO);

if (mesh.colours.array) {
glGenBuffers(1, &glmesh.colourVBO);
glBindBuffer(GL_ARRAY_BUFFER, glmesh.colourVBO);
glBufferData(GL_ARRAY_BUFFER, mesh.colours.elementSize * mesh.colours.length, mesh.colours.array, GL_STATIC_DRAW);
glVertexAttribPointer(colourLocation, 4, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(colourLocation);
}

if (mesh.normals.array) {
glGenBuffers(1, &glmesh.normalVBO);
glBindBuffer(GL_ARRAY_BUFFER, glmesh.normalVBO);
glBufferData(GL_ARRAY_BUFFER, mesh.normals.elementSize * mesh.normals.length, mesh.normals.array, GL_STATIC_DRAW);
glVertexAttribIPointer(normalLocation, 1, GL_UNSIGNED_BYTE, 0, 0);
glEnableVertexAttribArray(normalLocation);
}

if (mesh.occlusions.array) {
glGenBuffers(1, &glmesh.occlusionVBO);
glBindBuffer(GL_ARRAY_BUFFER, glmesh.occlusionVBO);
glBufferData(GL_ARRAY_BUFFER, mesh.occlusions.elementSize * mesh.occlusions.length, mesh.occlusions.array, GL_STATIC_DRAW);
glVertexAttribIPointer(occlusionLocation, 1, GL_UNSIGNED_BYTE, 0, 0);
glEnableVertexAttribArray(occlusionLocation);
}

glGenBuffers(1, &glmesh.positionVBO);
glBindBuffer(GL_ARRAY_BUFFER, glmesh.positionVBO);
glBufferData(GL_ARRAY_BUFFER, mesh.positions.elementSize * mesh.positions.length, mesh.positions.array, GL_STATIC_DRAW);
glVertexAttribPointer(positionLocation, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(positionLocation);

glBindVertexArray(0);

return glmesh;
}


Which assigns the VAO and binds it.

Both glMeshes are called after the initLibraries call and after the shaders, camera and lights have been added but before the main loop.

Dark Photon
01-09-2017, 04:23 PM
I am debugging a piece of code and have hit a brick wall. The code ... [throws] a segfault after it calls glBindVertexArray(voxelMesh.VAO) and glDrawArrays(voxelMesh.mode, 0, num_voxel_vertices); ...

I am running on Ubuntu 16.04...

As a first-step, I'd recommend running valgrind on your application to make sure that the memory handling on your side is clean.

It's easy. Just build with debug info (-g), and then instead of running "myapp" run "valgrind myapp". Let your application run. After it completes (whether it crashes or not), look through the output and find things like "invalid writes", "invalid reads", and such along with the full stacktrace for each. If it finds any, fix those up. See the docs (http://valgrind.org/docs/manual/mc-manual.html) for details.

That'll likely point you to the offending stack trace and provide details on the error. If not, run your app in gdb and print the stack trace ("where" command) -- that'll at least give you the stack trace.

If it's not in your app but inside a GL call, I'd suspect you are inadvertently asking OpenGL to read past the end of an array (e.g. a vertex attribute or index list array) in a draw call. You can ensure that it is that specific GL call that is causing the crash if you surround it with glFinish() calls. My suspicion is it's the glDrawArrays that's causing the crash (as the draw calls are where OpenGL does most of the work for queuing batches).

jonathanbyrn
01-10-2017, 02:23 AM
As a first-step, I'd recommend running valgrind on your application to make sure that the memory handling on your side is clean.

It's easy. Just build with debug info (-g), and then instead of running "myapp" run "valgrind myapp". Let your application run. After it completes (whether it crashes or not), look through the output and find things like "invalid writes", "invalid reads", and such along with the full stacktrace for each. If it finds any, fix those up. See the docs (http://valgrind.org/docs/manual/mc-manual.html) for details.

That'll likely point you to the offending stack trace and provide details on the error. If not, run your app in gdb and print the stack trace ("where" command) -- that'll at least give you the stack trace.

If it's not in your app but inside a GL call, I'd suspect you are inadvertently asking OpenGL to read past the end of an array (e.g. a vertex attribute or index list array) in a draw call. You can ensure that it is that specific GL call that is causing the crash if you surround it with glFinish() calls. My suspicion is it's the glDrawArrays that's causing the crash (as the draw calls are where OpenGL does most of the work for queuing batches).

This is great, now I have a lot more information about what is going on. I ran it with --read-var-info to get more information and it is an invalid read:



--7945-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0xf2
==7945== Invalid read of size 4
==7945== at 0x4053F4A: ??? (in /tmp/.glbzQOXg (deleted))
==7945== by 0xA3DF6E3: ??? (in /usr/lib/nvidia-367/libnvidia-glcore.so.367.57)
==7945== by 0xA3E48F7: ??? (in /usr/lib/nvidia-367/libnvidia-glcore.so.367.57)
==7945== by 0x9FCAA27: ??? (in /usr/lib/nvidia-367/libnvidia-glcore.so.367.57)
==7945== by 0x40647E: main (vola.c:887)
==7945== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==7945==
==7945==
==7945== Process terminating with default action of signal 11 (SIGSEGV)
==7945== Access not within mapped region at address 0x0
==7945== at 0x4053F4A: ??? (in /tmp/.glbzQOXg (deleted))
==7945== by 0xA3DF6E3: ??? (in /usr/lib/nvidia-367/libnvidia-glcore.so.367.57)
==7945== by 0xA3E48F7: ??? (in /usr/lib/nvidia-367/libnvidia-glcore.so.367.57)
==7945== by 0x9FCAA27: ??? (in /usr/lib/nvidia-367/libnvidia-glcore.so.367.57)
==7945== by 0x40647E: main (vola.c:887)
==7945== If you believe this happened as a result of a stack
==7945== overflow in your program's main thread (unlikely but
==7945== possible), you can try to increase the size of the
==7945== main thread stack using the --main-stacksize= flag.
==7945== The main thread stack size used in this run was 8388608.
==7945==
==7945== HEAP SUMMARY:
==7945== in use at exit: 5,658,658 bytes in 8,085 blocks
==7945== total heap usage: 12,496 allocs, 4,411 frees, 184,617,376 bytes allocated
==7945==
==7945== LEAK SUMMARY:
==7945== definitely lost: 560 bytes in 9 blocks
==7945== indirectly lost: 345,408 bytes in 9 blocks
==7945== possibly lost: 915,586 bytes in 3,610 blocks
==7945== still reachable: 4,397,104 bytes in 4,457 blocks
==7945== of which reachable via heuristic:
==7945== newarray : 16 bytes in 1 blocks
==7945== multipleinheritance: 104 bytes in 1 blocks


So it looks like a problem with a GL call in my nvidia library. The virtualmachine was probably getting away with it as it was not calling the graphics card directly. one thing you mentioned was that it is probably going off the end of the array. I checked this by calling glDrawArrays(voxelMesh.mode, 0, num_voxel_vertices); with num_voxel_vertices set to 1. This also caused it to crash suggesting there was nothing in the array, even though it has been generated and bound to a valid vertex array. If I call glDrawArrays(voxelMesh.mode, 0, 0); it works fine.

Dark Photon
01-10-2017, 05:31 AM
==7945== Invalid read of size 4
...
==7945== by 0xA3DF6E3: ??? (in /usr/lib/nvidia-367/libnvidia-glcore.so.367.57)
...
==7945== by 0x40647E: main (vola.c:887)
==7945== Address 0x0 is not stack'd, malloc'd or (recently) free'd

So it looks like a problem with a GL call in my nvidia library. ...

glDrawArrays(voxelMesh.mode, 0, num_voxel_vertices); with num_voxel_vertices set to 1. ... caused it to crash suggesting there was nothing in the array, even though it has been generated and bound to a valid vertex array. If I call glDrawArrays(voxelMesh.mode, 0, 0); it works fine.

Good info. Here are a few thoughts.

Based on lots of experience with the NVidia GL driver on Linux, I can tell you this is very unlikely to be an error in the NVidia driver (assuming it's installed properly) and thus is very likely to be a bug in the app code driving it. You can trigger crashes like this in a draw call (e.g. glDrawArrays) by providing a NULL pointer (as you're doing) to glVertexAttribPointer but yet not having an array buffer bound. I'm not saying that's it -- I see the code you posted -- but I suspect something like this is happening.

Is it possible your VAO state is being "messed up" post-creation and pre-use (or pre-reuse)? That's one possibility.

You can see if this is the case by querying the VAO state right before you draw with it. That is, right after glBindVertexArray() and right before glDrawArrays(), ask GL for the current VAO state. Then print that out and make sure it still looks right (see code below).

I'd pay particular attention to the values you see set for ENABLED, ARRAY_BUFFER_BINDING, and ARRAY_POINTER. What you "don't" what to see is a case where you have an enabled vertex attribute with ARRAY_POINTER set to 0 and ARRAY_BUFFER_BINDING also set to 0. That'll trigger a crash something like what you're seeing, as you're telling GL to go romping through CPU memory starting at address 0.

Here's some code to query the current vertex attribute state. You'll have to add the printfs (or whatever you want) to print the values in text form to the console or system log:


struct VertexAttrState
{
GLint enabled ;
GLint normalized;
GLint isInteger ;
GLint isLong ;
GLint type ;
GLint numComp ;
GLint stride ;
GLint divisor ;
GLvoid *ptr ;
GLint buffer ;
GLuint64 addr ;
GLuint64 len ;
GLint binding ;
GLint relOffset ;
};

GLboolean bindlessArrays = glIsEnabled( GL_VERTEX_ATTRIB_ARRAY_UNIFIED_NV );
VertexAttrState genericAttr[16];

for ( int i = 0; i < 16; i++ )
{
VertexAttrState &a = genericAttr[ i ];

glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_ENABLED , &a.enabled );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_SIZE , &a.numComp );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_STRIDE , &a.stride );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_TYPE , &a.type );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_NORMALIZED , &a.normalized );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_INTEGER , &a.isInteger );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_LONG , &a.isLong );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_DIVISOR , &a.divisor );
glGetVertexAttribPointerv ( i, GL_VERTEX_ATTRIB_ARRAY_POINTER , &a.ptr );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING, &a.buffer );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_BINDING , &a.binding );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_RELATIVE_OFFSET , &a.relOffset );

if ( glGetIntegerui64i_vNV )
{
glGetIntegerui64i_vNV ( GL_VERTEX_ATTRIB_ARRAY_ADDRESS_NV, i , &a.addr );
glGetIntegerui64i_vNV ( GL_VERTEX_ATTRIB_ARRAY_LENGTH_NV , i , &a.len );
}
}


Another possibility that just occurred to me is you may not be determining the vertex attribute indices properly (i.e. arguments to GLMesh() function). That is, you may be setting up "some" vertex attributes properly, but just not the ones the shader is expecting you to.

Related: you can force OpenGL to assign a specific vertex attribute index to a specific vertex attribute in the shader in a few different ways. One is to set up this association pre-shader-link in your C++ code with glBindAttribLocation() (https://www.khronos.org/opengl/wiki/Vertex_Shader#Inputs). Another is to explicitly specify the index in a layout declaration (https://www.khronos.org/opengl/wiki/Vertex_Shader#Inputs) in the shader.

[[EDIT:]] Another possibility that occurred to me is your app could be mixing generic vertex attributes and the old fixed-function vertex attributes. If you're going to do that, you need to be careful. Search your app for occurrances of: glEnableClientState, glDisableClientState, glVertexPointer, glNormalPointer, glColorPointer, glTexcoordPointer, just to name a few. If you don't find them, you likely don't have this problem.

jonathanbyrn
01-10-2017, 09:14 AM
Good info. Here are a few thoughts.

Based on lots of experience with the NVidia GL driver on Linux, I can tell you this is very unlikely to be an error in the NVidia driver (assuming it's installed properly) and thus is very likely to be a bug in the app code driving it. You can trigger crashes like this in a draw call (e.g. glDrawArrays) by providing a NULL pointer (as you're doing) to glVertexAttribPointer but yet not having an array buffer bound. I'm not saying that's it -- I see the code you posted -- but I suspect something like this is happening.

Is it possible your VAO state is being "messed up" post-creation and pre-use (or pre-reuse)? That's one possibility.

You can see if this is the case by querying the VAO state right before you draw with it. That is, right after glBindVertexArray() and right before glDrawArrays(), ask GL for the current VAO state. Then print that out and make sure it still looks right (see code below).

I'd pay particular attention to the values you see set for ENABLED, ARRAY_BUFFER_BINDING, and ARRAY_POINTER. What you "don't" what to see is a case where you have an enabled vertex attribute with ARRAY_POINTER set to 0 and ARRAY_BUFFER_BINDING also set to 0. That'll trigger a crash something like what you're seeing, as you're telling GL to go romping through CPU memory starting at address 0.

Here's some code to query the current vertex attribute state. You'll have to add the printfs (or whatever you want) to print the values in text form to the console or system log:


struct VertexAttrState
{
GLint enabled ;
GLint normalized;
GLint isInteger ;
GLint isLong ;
GLint type ;
GLint numComp ;
GLint stride ;
GLint divisor ;
GLvoid *ptr ;
GLint buffer ;
GLuint64 addr ;
GLuint64 len ;
GLint binding ;
GLint relOffset ;
};

GLboolean bindlessArrays = glIsEnabled( GL_VERTEX_ATTRIB_ARRAY_UNIFIED_NV );
VertexAttrState genericAttr[16];

for ( int i = 0; i < 16; i++ )
{
VertexAttrState &a = genericAttr[ i ];

glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_ENABLED , &a.enabled );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_SIZE , &a.numComp );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_STRIDE , &a.stride );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_TYPE , &a.type );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_NORMALIZED , &a.normalized );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_INTEGER , &a.isInteger );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_LONG , &a.isLong );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_DIVISOR , &a.divisor );
glGetVertexAttribPointerv ( i, GL_VERTEX_ATTRIB_ARRAY_POINTER , &a.ptr );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING, &a.buffer );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_BINDING , &a.binding );
glGetVertexAttribiv ( i, GL_VERTEX_ATTRIB_RELATIVE_OFFSET , &a.relOffset );

if ( glGetIntegerui64i_vNV )
{
glGetIntegerui64i_vNV ( GL_VERTEX_ATTRIB_ARRAY_ADDRESS_NV, i , &a.addr );
glGetIntegerui64i_vNV ( GL_VERTEX_ATTRIB_ARRAY_LENGTH_NV , i , &a.len );
}
}


Another possibility that just occurred to me is you may not be determining the vertex attribute indices properly (i.e. arguments to GLMesh() function). That is, you may be setting up "some" vertex attributes properly, but just not the ones the shader is expecting you to.

Related: you can force OpenGL to assign a specific vertex attribute index to a specific vertex attribute in the shader in a few different ways. One is to set up this association pre-shader-link in your C++ code with glBindAttribLocation() (https://www.khronos.org/opengl/wiki/Vertex_Shader#Inputs). Another is to explicitly specify the index in a layout declaration (https://www.khronos.org/opengl/wiki/Vertex_Shader#Inputs) in the shader.

[[EDIT:]] Another possibility that occurred to me is your app could be mixing generic vertex attributes and the old fixed-function vertex attributes. If you're going to do that, you need to be careful. Search your app for occurrances of: glEnableClientState, glDisableClientState, glVertexPointer, glNormalPointer, glColorPointer, glTexcoordPointer, just to name a few. If you don't find them, you likely don't have this problem.


I tried printing the values but I am not sure if I did it correctly, Apologies for my lack of knowledge but I have only been coding in C for a couple of daysand it is 15 years since I studied it in college, Here are the results, the addresses are empty on all of them which doesn't look good:



voxsize:511644
VAO 0 enabled:1
VAO 0 numComp:3
VAO 0 stride:0
VAO 0 type:5126
VAO 0 normalized:0
VAO 0 isInteger:0
VAO 0 isLong:0
VAO 0 divisor:0
VAO 0 ptr:0
VAO 0 buffer:6
VAO 0 binding:0
VAO 0 relOffset:0
VAO 0 addr:0
VAO 0 len:0
VAO 1 enabled:1
VAO 1 numComp:1
VAO 1 stride:0
VAO 1 type:5121
VAO 1 normalized:0
VAO 1 isInteger:1
VAO 1 isLong:0
VAO 1 divisor:0
VAO 1 ptr:0
VAO 1 buffer:4
VAO 1 binding:1
VAO 1 relOffset:0
VAO 1 addr:0
VAO 1 len:0
VAO 2 enabled:1
VAO 2 numComp:4
VAO 2 stride:0
VAO 2 type:5126
VAO 2 normalized:0
VAO 2 isInteger:0
VAO 2 isLong:0
VAO 2 divisor:0
VAO 2 ptr:0
VAO 2 buffer:3
VAO 2 binding:2
VAO 2 relOffset:0
VAO 2 addr:0
VAO 2 len:0
VAO 3 enabled:1
VAO 3 numComp:1
VAO 3 stride:0
VAO 3 type:5121
VAO 3 normalized:0
VAO 3 isInteger:1
VAO 3 isLong:0
VAO 3 divisor:0
VAO 3 ptr:0
VAO 3 buffer:5
VAO 3 binding:3
VAO 3 relOffset:0
VAO 3 addr:0
VAO 3 len:0
VAO 4 enabled:0
VAO 4 numComp:4
VAO 4 stride:0
VAO 4 type:5126
VAO 4 normalized:0
VAO 4 isInteger:0
VAO 4 isLong:0
VAO 4 divisor:0
VAO 4 ptr:0
VAO 4 buffer:0
VAO 4 binding:4
VAO 4 relOffset:0
VAO 4 addr:0
VAO 4 len:0
VAO 5 enabled:0
VAO 5 numComp:4
VAO 5 stride:0
VAO 5 type:5126
VAO 5 normalized:0
VAO 5 isInteger:0
VAO 5 isLong:0
VAO 5 divisor:0
VAO 5 ptr:0
VAO 5 buffer:0
VAO 5 binding:5
VAO 5 relOffset:0
VAO 5 addr:0
VAO 5 len:0
VAO 6 enabled:0
VAO 6 numComp:4
VAO 6 stride:0
VAO 6 type:5126
VAO 6 normalized:0
VAO 6 isInteger:0
VAO 6 isLong:0
VAO 6 divisor:0
VAO 6 ptr:0
VAO 6 buffer:0
VAO 6 binding:6
VAO 6 relOffset:0
VAO 6 addr:0
VAO 6 len:0
VAO 7 enabled:0
VAO 7 numComp:4
VAO 7 stride:0
VAO 7 type:5126
VAO 7 normalized:0
VAO 7 isInteger:0
VAO 7 isLong:0
VAO 7 divisor:0
VAO 7 ptr:0
VAO 7 buffer:0
VAO 7 binding:7
VAO 7 relOffset:0
VAO 7 addr:0
VAO 7 len:0
VAO 8 enabled:0
VAO 8 numComp:4
VAO 8 stride:0
VAO 8 type:5126
VAO 8 normalized:0
VAO 8 isInteger:0
VAO 8 isLong:0
VAO 8 divisor:0
VAO 8 ptr:0
VAO 8 buffer:0
VAO 8 binding:8
VAO 8 relOffset:0
VAO 8 addr:0
VAO 8 len:0
VAO 9 enabled:0
VAO 9 numComp:4
VAO 9 stride:0
VAO 9 type:5126
VAO 9 normalized:0
VAO 9 isInteger:0
VAO 9 isLong:0
VAO 9 divisor:0
VAO 9 ptr:0
VAO 9 buffer:0
VAO 9 binding:9
VAO 9 relOffset:0
VAO 9 addr:0
VAO 9 len:0
VAO 10 enabled:0
VAO 10 numComp:4
VAO 10 stride:0
VAO 10 type:5126
VAO 10 normalized:0
VAO 10 isInteger:0
VAO 10 isLong:0
VAO 10 divisor:0
VAO 10 ptr:0
VAO 10 buffer:0
VAO 10 binding:10
VAO 10 relOffset:0
VAO 10 addr:0
VAO 10 len:0
VAO 11 enabled:0
VAO 11 numComp:4
VAO 11 stride:0
VAO 11 type:5126
VAO 11 normalized:0
VAO 11 isInteger:0
VAO 11 isLong:0
VAO 11 divisor:0
VAO 11 ptr:0
VAO 11 buffer:0
VAO 11 binding:11
VAO 11 relOffset:0
VAO 11 addr:0
VAO 11 len:0
VAO 12 enabled:0
VAO 12 numComp:4
VAO 12 stride:0
VAO 12 type:5126
VAO 12 normalized:0
VAO 12 isInteger:0
VAO 12 isLong:0
VAO 12 divisor:0
VAO 12 ptr:0
VAO 12 buffer:0
VAO 12 binding:12
VAO 12 relOffset:0
VAO 12 addr:0
VAO 12 len:0
VAO 13 enabled:0
VAO 13 numComp:4
VAO 13 stride:0
VAO 13 type:5126
VAO 13 normalized:0
VAO 13 isInteger:0
VAO 13 isLong:0
VAO 13 divisor:0
VAO 13 ptr:0
VAO 13 buffer:0
VAO 13 binding:13
VAO 13 relOffset:0
VAO 13 addr:0
VAO 13 len:0
VAO 14 enabled:0
VAO 14 numComp:4
VAO 14 stride:0
VAO 14 type:5126
VAO 14 normalized:0
VAO 14 isInteger:0
VAO 14 isLong:0
VAO 14 divisor:0
VAO 14 ptr:0
VAO 14 buffer:0
VAO 14 binding:14
VAO 14 relOffset:0
VAO 14 addr:0
VAO 14 len:0
VAO 15 enabled:0
VAO 15 numComp:4
VAO 15 stride:0
VAO 15 type:5126
VAO 15 normalized:0
VAO 15 isInteger:0
VAO 15 isLong:0
VAO 15 divisor:0
VAO 15 ptr:0
VAO 15 buffer:0
VAO 15 binding:15
VAO 15 relOffset:0
VAO 15 addr:0
VAO 15 len:0


I will look into which vertex attributes are being used now.

GClements
01-10-2017, 11:01 AM
I tried printing the values but I am not sure if I did it correctly, Apologies for my lack of knowledge but I have only been coding in C for a couple of daysand it is 15 years since I studied it in college, Here are the results, the addresses are empty on all of them which doesn't look good:

Only attributes 0-3 are enabled. All of those have a non-zero buffer, so the problem isn't caused by reading from invalid client memory.

Dark Photon
01-10-2017, 05:03 PM
Here are the results, the addresses are empty on all of them which doesn't look good

No, for this dump it looks like you're OK, as GClements said. All the enabled vertex attributes either have a non-zero buffer binding.

Is this a dump right before a glDrawArrays call that crashes? Or could this be a dump for an earlier call to glDrawArrays that worked?

jonathanbyrn
01-11-2017, 02:19 AM
No, for this dump it looks like you're OK, as GClements said. All the enabled vertex attributes either have a non-zero buffer binding.

Is this a dump right before a glDrawArrays call that crashes? Or could this be a dump for an earlier call to glDrawArrays that worked?

Nope, that is right before the glDrawArrays call that crashes the code. I checked how the vertices are being stored and it is standard gluints.

Dark Photon
01-11-2017, 05:24 AM
Nope, that is right before the glDrawArrays call that crashes the code. I checked how the vertices are being stored and it is standard gluints.

Ok. If you don't figure it out, I'd suggest you reproduce your problem in a short stand-alone test program that folks can inspect and run. Here's a shell to paste your code into:


//------------------------------------------------------------------------------
// Stand-alone GLUT Test Program Shell
//------------------------------------------------------------------------------

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <GL/glew.h>
#include <GL/glut.h>

//-----------------------------------------------------------------------

void checkGLError( const char hdr[] )
{
int err = glGetError();
if( err )
{
fprintf(stderr, "ERROR %s: %s\n", hdr, gluErrorString(err));
exit(1);
}
}

//-----------------------------------------------------------------------

void init()
{
//------------------------------------------------------------
// Put your GL resource creation here
// (buffer objects, textures, etc.)
//------------------------------------------------------------
}

//-----------------------------------------------------------------------

void reshape( int width, int height )
{
glViewport(0, 0, width, height);
}

//-----------------------------------------------------------------------

void display()
{
static float angle = 0.0;

// Clear screen
int err=0;
glClearColor( 0.1f, 0.1f, 0.43f, 1.0f );
glClear( GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT );

// Load up PROJECTION and MODELVIEW
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(-2,2,-2,2,-2,2);

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();

//------------------------------------------------------------
// Put your GL draw calls here.
//------------------------------------------------------------

// Swap
glutSwapBuffers();

// Cause display() to be called again.a
glutPostRedisplay();
checkGLError( "End of display()" );
}

//-----------------------------------------------------------------------

void keyboard( unsigned char key, int x, int y )
{
switch (key)
{
case 27: // ESC quits
exit(0);
break;
}
}

int main( int argc, char** argv )
{
// Init GLUT
glutInit( &argc, argv );
glutInitDisplayMode( GLUT_RGBA | GLUT_DEPTH | GLUT_DOUBLE );
glutCreateWindow( argv[0] );

glutKeyboardFunc( keyboard );
glutDisplayFunc( display );
glutReshapeFunc( reshape );

glutReshapeWindow( 400,400 );

// Init GLEW
GLenum err = glewInit();
if ( err != GLEW_OK )
{
// Problem: glewInit failed, something is seriously wrong.
fprintf( stderr, "Error: %s\n", glewGetErrorString(err) );
exit(1);
}

printf( "GL_RENDERER = %s\n", glGetString( GL_RENDERER) );

init();

glutMainLoop();
return 0;
}

jonathanbyrn
01-11-2017, 06:13 AM
Ok. If you don't figure it out, I'd suggest you reproduce your problem in a short stand-alone test program that folks can inspect and run. Here's a shell to paste your code into:


//------------------------------------------------------------------------------
// Stand-alone GLUT Test Program Shell
//------------------------------------------------------------------------------

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <GL/glew.h>
#include <GL/glut.h>

//-----------------------------------------------------------------------

void checkGLError( const char hdr[] )
{
int err = glGetError();
if( err )
{
fprintf(stderr, "ERROR %s: %s\n", hdr, gluErrorString(err));
exit(1);
}
}

//-----------------------------------------------------------------------

void init()
{
//------------------------------------------------------------
// Put your GL resource creation here
// (buffer objects, textures, etc.)
//------------------------------------------------------------
}

//-----------------------------------------------------------------------

void reshape( int width, int height )
{
glViewport(0, 0, width, height);
}

//-----------------------------------------------------------------------

void display()
{
static float angle = 0.0;

// Clear screen
int err=0;
glClearColor( 0.1f, 0.1f, 0.43f, 1.0f );
glClear( GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT );

// Load up PROJECTION and MODELVIEW
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(-2,2,-2,2,-2,2);

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();

//------------------------------------------------------------
// Put your GL draw calls here.
//------------------------------------------------------------

// Swap
glutSwapBuffers();

// Cause display() to be called again.a
glutPostRedisplay();
checkGLError( "End of display()" );
}

//-----------------------------------------------------------------------

void keyboard( unsigned char key, int x, int y )
{
switch (key)
{
case 27: // ESC quits
exit(0);
break;
}
}

int main( int argc, char** argv )
{
// Init GLUT
glutInit( &argc, argv );
glutInitDisplayMode( GLUT_RGBA | GLUT_DEPTH | GLUT_DOUBLE );
glutCreateWindow( argv[0] );

glutKeyboardFunc( keyboard );
glutDisplayFunc( display );
glutReshapeFunc( reshape );

glutReshapeWindow( 400,400 );

// Init GLEW
GLenum err = glewInit();
if ( err != GLEW_OK )
{
// Problem: glewInit failed, something is seriously wrong.
fprintf( stderr, "Error: %s\n", glewGetErrorString(err) );
exit(1);
}

printf( "GL_RENDERER = %s\n", glGetString( GL_RENDERER) );

init();

glutMainLoop();
return 0;
}


I have started working back through the tutorials to get a better understanding of opengl. I will try and port the existing codebase into program without giving too much away and still being able to reproduce the error. Thanks for all the advice so far!

jonathanbyrn
01-12-2017, 08:37 AM
Hi,
I tried to create a trimmed down program to recreate the bug but it has already started ballooning. The original is 30,000 LOC and I think at best I can trim it down to 2/3000 and a couple of library imports. Is that okay or a waste of everyone's time?

Dark Photon
01-12-2017, 04:25 PM
Hi,
I tried to create a trimmed down program to recreate the bug but it has already started ballooning. The original is 30,000 LOC and I think at best I can trim it down to 2/3000 and a couple of library imports. Is that okay or a waste of everyone's time?

I think it'll be helpful to you. Either by adding more and more code into a contained test program and noticing when your problem develops. Or by whittling back your application code until the problem goes away.

As for the utility of getting feedback from others, the main thing is it be easily compilable, runnable, and siftable by others. The shorter you can make it, the more likely folks will try it and help you find your problem.

I suspect as you pair back your application code and learn more, you'll probably get the relevant code down to a pretty small size and possibly figure it out yourself.

jonathanbyrn
01-17-2017, 04:55 AM
Hi,
so it took much longer than expected but after much wailing and gnashing of teeth, I managed to create a trimmed version of the code. I learned a lot about the code base and also about how it is using opengl but I still couldn't find the bug. I also learned a lot about how to create a stubbed version of c code, a new life skill! Initially I just output the vertices and drew them in a demo program but that worked and told me nothing about the bug. Instead I started with the code base and then skeletonised it down to a single instance where I could demo the bug. I left in the camera controls so that it is possible to look around the scene:
https://dl.dropboxusercontent.com/u/3440275/trimmed.zip
once the code is built you can see it working by running:
./build/viewer
if you press s, the camera moves backwards and you can see the axes.
if you run:
./build/viewer -b test.bin
it will open a mesh which crashes on my machine but runs on every other machine I have tested it.

Again thanks for all your help so far, it has been a good learning experience. I don't expect you to debug the code, just letting me know if you can recreate the bug would be helpful.

Dark Photon
01-17-2017, 04:33 PM
I managed to create a trimmed version of the code. ...:
https://dl.dropboxusercontent.com/u/3440275/trimmed.zip

Great job! I had no problem fetching, building, and running it here.


once the code is built you can see it working by running:
./build/viewer
if you press s, the camera moves backwards and you can see the axes.

Works fine here.


if you run:
./build/viewer -b test.bin
it will open a mesh which crashes on my machine but runs on every other machine I have tested it.
...
I don't expect you to debug the code, just letting me know if you can recreate the bug would be helpful.

Good news. It crashes on startup here as well. So it's not just something odd about your machine's setup.

Details:
* GPU: NVidia GeForce GTX 760
* GPU Driver: NVidia 370.28

To provide you more info, I went back, compiled the app with "-g" (adds debugging info), and ran it under valgrind (http://valgrind.org/)'s memcheck tool. Here's what I see:



==6514== Memcheck, a memory error detector
==6514== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==6514== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==6514== Command: build/viewer -b test.bin
==6514==
==6514== Conditional jump or move depends on uninitialised value(s)
==6514== at 0x4E43356: svo_set(svo*, unsigned long, unsigned long, unsigned long) (svo.c:682)
==6514== by 0x4E4287B: svo_aset(svo_acc*, unsigned long, unsigned long, unsigned long) (svo.c:423)
==6514== by 0x4E39E54: binvox(svo_acc*, unsigned char*, unsigned long, float*) (files.c:201)
==6514== by 0x4E39811: loadBinvox(svo*, char const*, float*) (files.c:59)
==6514== by 0x402CB8: main (viewer.c:263)
==6514==
==6514== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==6514== at 0x5792060: __sendmsg_nocancel (in /lib64/libpthread-2.18.so)
==6514== by 0x97B65E6: ??? (in /usr/lib64/libGLX_nvidia.so.370.28)
==6514== by 0x97B2448: ??? (in /usr/lib64/libGLX_nvidia.so.370.28)
==6514== by 0x974AB0D: ??? (in /usr/lib64/libGLX_nvidia.so.370.28)
==6514== by 0xAC50165: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xAC48903: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xAC4A562: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xAC4AD37: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0x979F8D9: ??? (in /usr/lib64/libGLX_nvidia.so.370.28)
==6514== by 0xAC46EC7: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xAC47B08: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0x974ADB5: ??? (in /usr/lib64/libGLX_nvidia.so.370.28)
==6514== Address 0x7feffdfdc is on thread 1's stack
==6514==
Renderer
Volume dimensions: 64 * 64 * 64
Volume size = 10.03 KiB
Extracting mesh... ==6514== Thread 3:
==6514== Conditional jump or move depends on uninitialised value(s)
==6514== at 0x4E42326: svo_aget_colour(svo_acc*, unsigned long, unsigned long, unsigned long, Colour*) (svo.c:284)
==6514== by 0x4E424E6: FEMAGetSurroundingVoxels(svo_acc*, unsigned long, unsigned long, unsigned long, Voxel*) (svo.c:322)
==6514== by 0x4E3CF11: extractNoColour(svo_acc*, Region*) (render.c:687)
==6514== by 0x4E3E3D7: getMesh(void*) (render.c:1063)
==6514== by 0x578B0DA: start_thread (in /lib64/libpthread-2.18.so)
==6514== by 0x62A9E3C: clone (in /lib64/libc-2.18.so)
==6514==
==6514== Thread 2:
==6514== Conditional jump or move depends on uninitialised value(s)
==6514== at 0x4E42334: svo_aget_colour(svo_acc*, unsigned long, unsigned long, unsigned long, Colour*) (svo.c:284)
==6514== by 0x4E424E6: FEMAGetSurroundingVoxels(svo_acc*, unsigned long, unsigned long, unsigned long, Voxel*) (svo.c:322)
==6514== by 0x4E3CF11: extractNoColour(svo_acc*, Region*) (render.c:687)
==6514== by 0x4E3E3D7: getMesh(void*) (render.c:1063)
==6514== by 0x578B0DA: start_thread (in /lib64/libpthread-2.18.so)
==6514== by 0x62A9E3C: clone (in /lib64/libc-2.18.so)
==6514==
==6514== Conditional jump or move depends on uninitialised value(s)
==6514== at 0x4E42342: svo_aget_colour(svo_acc*, unsigned long, unsigned long, unsigned long, Colour*) (svo.c:284)
==6514== by 0x4E424E6: FEMAGetSurroundingVoxels(svo_acc*, unsigned long, unsigned long, unsigned long, Voxel*) (svo.c:322)
==6514== by 0x4E3CF11: extractNoColour(svo_acc*, Region*) (render.c:687)
==6514== by 0x4E3E3D7: getMesh(void*) (render.c:1063)
==6514== by 0x578B0DA: start_thread (in /lib64/libpthread-2.18.so)
==6514== by 0x62A9E3C: clone (in /lib64/libc-2.18.so)
==6514==
==6514== Thread 1:
==6514== Invalid read of size 4
==6514== at 0x405FF4A: ??? (in /var/tmp/.glqOjG41 (deleted))
==6514== by 0xAE01BB3: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xAE06DC7: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xA9E81C7: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0x403CCF: main (viewer.c:501)
==6514== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==6514==
==6514==
==6514== Process terminating with default action of signal 11 (SIGSEGV)
==6514== Access not within mapped region at address 0x0
==6514== at 0x405FF4A: ??? (in /var/tmp/.glqOjG41 (deleted))
==6514== by 0xAE01BB3: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xAE06DC7: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0xA9E81C7: ??? (in /usr/lib64/libnvidia-glcore.so.370.28)
==6514== by 0x403CCF: main (viewer.c:501)
==6514== If you believe this happened as a result of a stack
==6514== overflow in your program's main thread (unlikely but
==6514== possible), you can try to increase the size of the
==6514== main thread stack using the --main-stacksize= flag.
==6514== The main thread stack size used in this run was 8388608.
finished in 6.98 seconds
Voxel mesh size = 0.49 MiB
Controls:
Move camera: w, a, s, d
Rotate camera: h, j, k, l
Toggle grid: o
Toggle axes: x
Toggle wireframe: v
Toggle lighting: n
Toggle level of detail: t
Change level of detail: 0-9
Toggle flashlight: g
Exit: q
==6514==
==6514== HEAP SUMMARY:
==6514== in use at exit: 6,067,577 bytes in 8,628 blocks
==6514== total heap usage: 16,868 allocs, 8,240 frees, 226,016,506 bytes allocated
==6514==
==6514== LEAK SUMMARY:
==6514== definitely lost: 488 bytes in 8 blocks
==6514== indirectly lost: 201,120 bytes in 8 blocks
==6514== possibly lost: 1,331,997 bytes in 3,266 blocks
==6514== still reachable: 4,533,972 bytes in 5,346 blocks
==6514== suppressed: 0 bytes in 0 blocks
==6514== Rerun with --leak-check=full to see details of leaked memory
==6514==
==6514== For counts of detected and suppressed errors, rerun with: -v
==6514== Use --track-origins=yes to see where uninitialised values come from
==6514== ERROR SUMMARY: 1981 errors from 6 contexts (suppressed: 2 from 2)


As you can see, there are some uninitialized memory reads in your code, and then a bad read of address 0x0 (a NULL pointer) down in the NVidia driver inside the glDrawArrays() for your mesh ("Invalid read of size 4"), just as you said was happening. My suspicion is your app is providing this NULL pointer to GL to read, but I'll look into this as time permits.

Dark Photon
01-17-2017, 08:21 PM
Good news. It crashes on startup here as well. ... I'll look into this as time permits.

Ok, it looks like you need to do some length checking on your mesh vertex arrays to make sure they provide sufficient bytes to satisfy all vertices in the draw call.

Here's a hack (not a fix) to your program that will reveal one problem with it (the one that was causing the crash in the driver on the voxel mesh glDrawArrays draw call) and get it to at least come up:

In GLMesh(), make this change:



< if (mesh.colours.array) {
---
> if (mesh.colours.array && mesh.colours.length) {


You'll notice that the colors array has a non-null pointer but a 0 length. Oops. With the way your code is written, that ends up allocating a VBO containing 0 bytes, and the subsequent draw call instructs OpenGL (the GPU+driver) to go romping off the end of that 0-byte VBO to fetch colors for each vertex in that draw call.

jonathanbyrn
01-18-2017, 02:17 AM
Ok, it looks like you need to do some length checking on your mesh vertex arrays to make sure they provide sufficient bytes to satisfy all vertices in the draw call.

Here's a hack (not a fix) to your program that will reveal one problem with it (the one that was causing the crash in the driver on the voxel mesh glDrawArrays draw call) and get it to at least come up:

In GLMesh(), make this change:



< if (mesh.colours.array) {
---
> if (mesh.colours.array && mesh.colours.length) {


You'll notice that the colors array has a non-null pointer but a 0 length. Oops. With the way your code is written, that ends up allocating a VBO containing 0 bytes, and the subsequent draw call instructs OpenGL (the GPU+driver) to go romping off the end of that 0-byte VBO to fetch colors for each vertex in that draw call.

You hero! I was sure it had something to do with the vertices, I was looking at the wrong array. Well now I have enough info to work out what is going wrong. This has been one of the most helpful forums I have been on. When I started on this forum I had no tools to debug an openGL that was happening at runtime, now I have a couple of useful tools and approaches under my belt.
Thanks again for going above and beyond the call of duty!

Dark Photon
01-18-2017, 05:23 AM
Sure thing! Good luck with your project.