OpenGLES 2.0 Invalid Accesses / Complete Desktop Crash

I don’t even know what happened. I’ve created a veritable abomination. This devil code is completely beyond me. I’ve been digging through GDB and Valgrind for hours, but nothing is helping me figure out what I did wrong.

Anyways, on to the actual problem. I’ve only included the code that is run during initialization (which I’m pretty sure is the culprit) but I’ll post a copy of my whole codebase if it isn’t. What first caught my attention was that valgrind was reporting thousands of invalid write operations from OpenGL.

This is the first occurrence.


==22432== Invalid write of size 4
==22432==    at 0xA1FD38C: brw_emit_pipe_control (brw_pipe_control.c:197)
==22432==    by 0xA1FC060: brw_emit_select_pipeline (brw_misc_state.c:461)
==22432==    by 0xA1FC25E: brw_upload_invariant_state (brw_misc_state.c:545)
==22432==    by 0xA2034BB: brw_upload_initial_gpu_state (brw_state_upload.c:63)
==22432==    by 0xA2034BB: brw_init_state (brw_state_upload.c:186)
==22432==    by 0xA1F2923: brwCreateContext (brw_context.c:1017)
==22432==    by 0xA1B570B: driCreateContextAttribs (dri_util.c:479)
==22432==    by 0x7F1350B: ??? (in /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0.0.0)
==22432==    by 0x7F0BC2A: eglCreateContext (in /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0.0.0)
==22432==    by 0x113C50: glesSetupOpenGLES() (window.cpp:72)
==22432==    by 0x112EA0: main (main.cpp:69)
==22432==  Address 0x7fe384c99004 is not stack'd, malloc'd or (recently) free'd


==22390== Invalid write of size 4
==22390==    at 0xA448010: GEN75_3DSTATE_SBE_pack (gen75_pack.h:4652)
==22390==    by 0xA448010: blorp_emit_sf_config (blorp_genX_exec.h:665)
==22390==    by 0xA448010: blorp_emit_pipeline (blorp_genX_exec.h:1200)
==22390==    by 0xA448B7C: blorp_exec (blorp_genX_exec.h:1617)
==22390==    by 0xA448B7C: gen75_blorp_exec (genX_blorp_exec.c:284)
==22390==    by 0xA424FBC: blorp_fast_clear (blorp_clear.c:337)
==22390==    by 0xA1EA1F9: do_single_blorp_clear (brw_blorp.c:1267)
==22390==    by 0xA1EC1CF: brw_blorp_clear_color (brw_blorp.c:1333)
==22390==    by 0xA1EEEFB: brw_clear (brw_clear.c:297)
==22390==    by 0x113D97: glesSetupOpenGLES() (window.cpp:82)
==22390==    by 0x112EA0: main (main.cpp:69)
==22390==  Address 0x7f364a822300 is not stack'd, malloc'd or (recently) free'd

But wait, there’s more. If I run it without valgrind, the window pops up and then my entire desktop crashes on that screen. HDMI cord isn’t even outputting a signal anymore. It comes back if I logout or change the display settings, but it’s fairly catastrophic. But there are NO error codes or segfaults in the actual program. My best guess is that I somehow configured OpenGL to buffer overflow and corrupt active memory. But I can’t for the life of me figure out why. Oh, I should also mention that it only crashes the desktop sometimes, regardless of code changes. Please help me!

glesCheckOpenGL() is just a macro to throw an error if glGetError returns
glesAssert() is just checking the condition

GLES Initialization Routine


static EGLDisplay eglDisplay;
static EGLSurface eglSurface;
static EGLContext eglContext;

static EGLNativeWindowType eglNativeWindow;
static EGLNativeDisplayType eglNativeDisplay;

void glesSetupOpenGLES() {
    EGLBoolean result;
    EGLConfig config;
    EGLint majorVersion;
    EGLint minorVersion;
    EGLint contextAttribs[] = {EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE,
                               EGL_NONE};

    result = eglBindAPI(EGL_OPENGL_ES_API);
    glesAssert(result != EGL_FALSE, "Cannot Bind API");

    glesCreateWindow(eglNativeWindow, eglNativeDisplay);

    eglDisplay = eglGetDisplay(eglNativeDisplay);
    if (eglDisplay == EGL_NO_DISPLAY) throw gles_error("Cannot Get Display");
    glesCheckOpenGL("");

    // Initialize EGL
    result = eglInitialize(eglDisplay, &majorVersion, &minorVersion);
    if (EGL_FALSE == result) throw gles_error("Cannot Initialize EGL");
    glesCheckOpenGL("");

    {
        EGLint numConfigs = 0;
        EGLint attribList[] = {
            EGL_RED_SIZE, 6,
            EGL_GREEN_SIZE, 6,
            EGL_BLUE_SIZE, 6,
            EGL_ALPHA_SIZE, 8,
            EGL_DEPTH_SIZE, 16, // You need this line for depth buffering to work
            EGL_SURFACE_TYPE, EGL_WINDOW_BIT,
            EGL_NONE};

        // Get configs
        result = eglGetConfigs(eglDisplay, NULL, 0, &numConfigs);
        glesAssert(result != EGL_FALSE, "Cannot Get Configs");
        // Choose config
        result =
            eglChooseConfig(eglDisplay, attribList, &config, 1, &numConfigs);
        if (EGL_FALSE == result) throw gles_error("Cannot Choose Config");
        if (numConfigs < 1) throw gles_error("No Valid Configurations");
    }
    glesCheckOpenGL("");

    // Create a surface
    eglSurface =
        eglCreateWindowSurface(eglDisplay, config, eglNativeWindow, NULL);
    if (eglSurface == EGL_NO_SURFACE)
        throw gles_error("Cannot Create Window Surface");
    glesCheckOpenGL("");

    // Create a GL context
    eglContext =
        eglCreateContext(eglDisplay, config, EGL_NO_CONTEXT, contextAttribs);
    if (eglContext == EGL_NO_CONTEXT) throw gles_error("Cannot Create Context");
    glesCheckOpenGL("");

    // Make the context current
    result = eglMakeCurrent(eglDisplay, eglSurface, eglSurface, eglContext);
    if (EGL_FALSE == result) throw gles_error("Cannot Assign Content");
    glesCheckOpenGL("");

    glClearColor(0.0f, 0.0f, 0.0f, 1.0f);
    glClear(GL_COLOR_BUFFER_BIT);
    glViewport(0, 0, 240, 320);
    glesCheckOpenGL("");
}

The window routine is separate because I have another file which creates a window on a Raspberry PI without X11. The X11 version is really just so I can test and iterate on my desktop.

X11 Window Routine



// X11 related local variables
static Display *x_display = NULL;
static Atom s_wmDeleteMessage;

void glesCreateWindow(EGLNativeWindowType &eglNativeWindow, EGLNativeDisplayType &eglNativeDisplay)
{
    Window root;
    XSetWindowAttributes swa;
    XSetWindowAttributes xattr;
    Atom wm_state;
    XEvent xev;
    EGLConfig ecfg;
    EGLint num_config;
    Window win;

    /*
     * X11 native display initialization
     */

    x_display = XOpenDisplay(NULL);
    if (x_display == NULL)
        throw gles_error("Cannot Open X Display");

    root = DefaultRootWindow(x_display);

    swa.event_mask = ExposureMask | PointerMotionMask | KeyPressMask;
    win = XCreateWindow(
        x_display, root,
        0, 0, 240, 320, 0,
        CopyFromParent, InputOutput,
        CopyFromParent, CWEventMask,
        &swa);
    s_wmDeleteMessage = XInternAtom(x_display, "WM_DELETE_WINDOW", 0);
    XSetWMProtocols(x_display, win, &s_wmDeleteMessage, 1);

    xattr.override_redirect = 0;
    XChangeWindowAttributes(x_display, win, CWOverrideRedirect, &xattr);

    XWMHints hints;
    hints.input = 1;
    hints.flags = InputHint;
    XSetWMHints(x_display, win, &hints);

    XSizeHints shints;
    shints.min_width = 240;
    shints.min_height = 320;
    shints.min_aspect.x = 3;
    shints.min_aspect.y = 4;
    shints.max_aspect.x = 3;
    shints.max_aspect.y = 4;
    shints.flags = PMinSize | PAspect;
    XSetWMNormalHints(x_display, win, &shints);

    // make the window visible on the screen
    XMapWindow(x_display, win);
    XStoreName(x_display, win, "PrintrbotUI");

    // get identifiers for the provided atom name strings
    wm_state = XInternAtom(x_display, "_NET_WM_STATE", 0);

    memset(&xev, 0, sizeof(xev));
    xev.type = ClientMessage;
    xev.xclient.window = win;
    xev.xclient.message_type = wm_state;
    xev.xclient.format = 32;
    xev.xclient.data.l[0] = 1;
    xev.xclient.data.l[1] = 0;
    XSendEvent(
        x_display,
        DefaultRootWindow(x_display),
        0,
        SubstructureNotifyMask,
        &xev);

    eglNativeWindow = (EGLNativeWindowType)win;
    eglNativeDisplay = (EGLNativeDisplayType)x_display;
}

[QUOTE=legoabram;1291831]…I’ve been digging through GDB and Valgrind for hours, but nothing is helping me figure out what I did wrong.
… I’ve only included the code that is run during initialization (which I’m pretty sure is the culprit) but I’ll post a copy of my whole codebase if it isn’t.
What first caught my attention was that valgrind was reporting thousands of invalid write operations from OpenGL.[/QUOTE]

I love valgrind. It’s an awesome tool! I really miss it on Windows.

However, it doesn’t always know what it needs to be able to sanity check some memory accesses and code deep in graphics drivers. In fact, IIRC it is delivered with some default suppressions for graphics drivers to avoid these littering your log. I recall having to suppress valgrind logging suspected errors deep inside NVidia usermode GL driver code where it had things like self-modifying code. Not sure if that’s still a problem.

So just keep in mind when you see a valgrind log deep inside a graphics call that, while this “might” be due to you passing in bad memory or giving invalid commands to the graphics driver (so you should check your code at and just before that callstack carefully), it might instead just be something the driver is doing right that valgrind is misinterpreting.

BTW, what exactly are you trying to do? Render an EGL+GLES app on top of Mesa drivers on an Intel GPU with DRI?

But wait, there’s more. If I run it without valgrind, the window pops up and then my entire desktop crashes on that screen.

That is a bug in your graphics drivers or your desktop window manager. No matter what you do from a user-mode graphics app, you shouldn’t be able to crash more than just your app.

I can’t for the life of me figure out why.

[ol]
[li]Disable most of your code. [/li][li]Start with something that’s solid and works 100% of the time. [/li][li]Gradually add in functionality until it breaks. [/li][li]Dive deep once you know what’s triggering this driver bug. [/li][/ol]
Once you get to that point, post a short standalone test program for others to test on their graphics drivers and/or provide feedback on. I can test it on NVidia’s Linux drivers using their EGL+GLES support. You or I could also test it on Mesa’s EGL+GLES support but w/o the Intel DRI back-end.

That would make sense, and the mesa driver doesn’t seem to have a strong reputation with valgrind. I’m working on a suppression file that should help isolate my program.

The target platform is a Raspberry PI without a window manager. I’ve hacked an LCD onto the DPI ports. Ultimately, getting the program to run on Mesa is just for convenience sake; Faster compiling, faster debug, faster iteration, etc. I have to use specifically OpenGLES 2 because that is what’s available on the RPi without a window manager. All the weirdness I’ve been encountering just has me concerned that I’m doing something egregiously wrong that will bite me in the butt later.

Specifically, it’s just the graphics context that gets buggered. Every program still runs perfectly fine. It’s just that screen turns off. My apologies on the word choice.

[QUOTE=Dark Photon;1291836]
Once you get to that point, post a short standalone test program for others to test on their graphics drivers and/or provide feedback on. I can test it on NVidia’s Linux drivers using their EGL+GLES support. You or I could also test it on Mesa’s EGL+GLES support but w/o the Intel DRI back-end.[/QUOTE]
I’ve isolated the context generation code into the following code block. It should make a window, load OpenGLES 2.0, clear to black, and close after 1 second. It compiles using

g++ whatever_you_feel_like.cpp -lX11 -lGLESv2 -lEGL

And yes, I know, this is not very clean code. I’ve been hacking at it for hours to see what changes. When I know that it works perfectly, I’ll tidy it up.



#include <EGL/egl.h>
#include <EGL/eglext.h>
#include <GLES2/gl2.h>
#include <string.h>
#include <X11/Xlib.h>
#include <X11/Xatom.h>
#include <X11/Xutil.h>
#include <sys/time.h>

static EGLDisplay eglDisplay;
static EGLSurface eglSurface;
static EGLContext eglContext;

static EGLNativeWindowType eglNativeWindow;
static EGLNativeDisplayType eglNativeDisplay;

// X11 related local variables
static Display *x_display = NULL;
static Atom s_wmDeleteMessage;

void glesCreateWindow(EGLNativeWindowType &eglNativeWindow, EGLNativeDisplayType &eglNativeDisplay)
{
    Window root;
    XSetWindowAttributes swa;
    XSetWindowAttributes xattr;
    Atom wm_state;
    XEvent xev;
    EGLConfig ecfg;
    EGLint num_config;
    Window win;

    /*
     * X11 native display initialization
     */

    x_display = XOpenDisplay(NULL);
    if (x_display == NULL)
        throw "Cannot Open X Display";

    root = DefaultRootWindow(x_display);

    swa.event_mask = ExposureMask | PointerMotionMask | KeyPressMask;
    win = XCreateWindow(
        x_display, root,
        0, 0, 240, 320, 0,
        CopyFromParent, InputOutput,
        CopyFromParent, CWEventMask,
        &swa);
    s_wmDeleteMessage = XInternAtom(x_display, "WM_DELETE_WINDOW", 0);
    XSetWMProtocols(x_display, win, &s_wmDeleteMessage, 1);

    xattr.override_redirect = 0;
    XChangeWindowAttributes(x_display, win, CWOverrideRedirect, &xattr);

    XWMHints hints;
    hints.input = 1;
    hints.flags = InputHint;
    XSetWMHints(x_display, win, &hints);

    XSizeHints shints;
    shints.min_width = 240;
    shints.min_height = 320;
    shints.min_aspect.x = 3;
    shints.min_aspect.y = 4;
    shints.max_aspect.x = 3;
    shints.max_aspect.y = 4;
    shints.flags = PMinSize | PAspect;
    XSetWMNormalHints(x_display, win, &shints);

    // make the window visible on the screen
    XMapWindow(x_display, win);
    XStoreName(x_display, win, "PrintrbotUI");

    // get identifiers for the provided atom name strings
    wm_state = XInternAtom(x_display, "_NET_WM_STATE", 0);

    memset(&xev, 0, sizeof(xev));
    xev.type = ClientMessage;
    xev.xclient.window = win;
    xev.xclient.message_type = wm_state;
    xev.xclient.format = 32;
    xev.xclient.data.l[0] = 1;
    xev.xclient.data.l[1] = 0;
    XSendEvent(
        x_display,
        DefaultRootWindow(x_display),
        0,
        SubstructureNotifyMask,
        &xev);

    eglNativeWindow = (EGLNativeWindowType)win;
    eglNativeDisplay = (EGLNativeDisplayType)x_display;
}


void glesSetupOpenGLES() {
    EGLBoolean result;
    EGLConfig config;
    EGLint majorVersion;
    EGLint minorVersion;
    EGLint contextAttribs[] = {EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE,
                               EGL_NONE};

    result = eglBindAPI(EGL_OPENGL_ES_API);
    if (result == EGL_FALSE) throw "Cannot Bind API";

    glesCreateWindow(eglNativeWindow, eglNativeDisplay);

    eglDisplay = eglGetDisplay(eglNativeDisplay);
    if (eglDisplay == EGL_NO_DISPLAY) throw "Cannot Get Display";

    // Initialize EGL
    result = eglInitialize(eglDisplay, &majorVersion, &minorVersion);
    if (EGL_FALSE == result) throw "Cannot Initialize EGL";

    {
        EGLint numConfigs = 0;
        EGLint attribList[] = {
            EGL_RED_SIZE,
            6,
            EGL_GREEN_SIZE,
            6,
            EGL_BLUE_SIZE,
            6,
            EGL_ALPHA_SIZE,
            8,
            EGL_DEPTH_SIZE,
            16, // You need this line for depth buffering to work
            EGL_SURFACE_TYPE,
            EGL_WINDOW_BIT,
            EGL_NONE};

        // Get configs
        result = eglGetConfigs(eglDisplay, NULL, 0, &numConfigs);
        if (EGL_FALSE == result) throw "Cannot Choose Config";
        // Choose config
        result =
            eglChooseConfig(eglDisplay, attribList, &config, 1, &numConfigs);
        if (EGL_FALSE == result) throw "Cannot Choose Config";
        if (numConfigs < 1) throw "No Valid Configurations";
    }

    // Create a surface
    eglSurface =
        eglCreateWindowSurface(eglDisplay, config, eglNativeWindow, NULL);
    if (eglSurface == EGL_NO_SURFACE)
        throw "Cannot Create Window Surface";

    // Create a GL context
    eglContext =
        eglCreateContext(eglDisplay, config, EGL_NO_CONTEXT, contextAttribs);
    if (eglContext == EGL_NO_CONTEXT) throw "Cannot Create Context";

    // Make the context current
    result = eglMakeCurrent(eglDisplay, eglSurface, eglSurface, eglContext);
    if (EGL_FALSE == result) throw "Cannot Assign Content";

    glClearColor(0.0f, 0.0f, 0.0f, 1.0f);
    glClear(GL_COLOR_BUFFER_BIT);
    glViewport(0, 0, 240, 320);
}

#include <unistd.h>

int main(int argc, const char **argv) {
    glesSetupOpenGLES();
    sleep(1);
    return 0;
}

Thanks for your input!

I tried your program here using the EGL+GLES support in NVidia’s Linux graphics drivers, and it works fine without any modifications. Here’s the valgrind output for it:


> valgrind ./tst_gles
==4291== Memcheck, a memory error detector
==4291== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4291== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4291== Command: ./tst_gles
==4291== 
==4291== Conditional jump or move depends on uninitialised value(s)
==4291==    at 0xAE2EE62: ??? (in /usr/lib64/libnvidia-eglcore.so.390.67)
==4291==    by 0xAE2D180: ??? (in /usr/lib64/libnvidia-eglcore.so.390.67)
==4291==    by 0x71830C7: ??? (in /usr/lib64/libEGL_nvidia.so.390.67)
==4291==    by 0x7184A43: ??? (in /usr/lib64/libEGL_nvidia.so.390.67)
==4291==    by 0x40142B: glesSetupOpenGLES() (tst_gles.cpp:151)
==4291==    by 0x401504: main (tst_gles.cpp:166)
==4291== 
==4291== 
==4291== HEAP SUMMARY:
==4291==     in use at exit: 83,031 bytes in 60 blocks
==4291==   total heap usage: 2,431 allocs, 2,371 frees, 6,058,170 bytes allocated
==4291== 
==4291== LEAK SUMMARY:
==4291==    definitely lost: 24 bytes in 1 blocks
==4291==    indirectly lost: 272 bytes in 4 blocks
==4291==      possibly lost: 0 bytes in 0 blocks
==4291==    still reachable: 82,735 bytes in 55 blocks
==4291==         suppressed: 0 bytes in 0 blocks
==4291== Rerun with --leak-check=full to see details of leaked memory
==4291== 
==4291== For counts of detected and suppressed errors, rerun with: -v
==4291== Use --track-origins=yes to see where uninitialised values come from
==4291== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Now the black contents you rendered into your EGL window’s back buffer never actually get displayed on-screen, but this is only because you’re missing this statement at the end of glesSetupOpenGLES():


 eglSwapBuffers( eglDisplay, eglSurface );

With this addition, I see a black window.