Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 10 of 10

Thread: glDrawElements() blocking in CPU - takes 5ms to return

  1. #1
    Junior Member Regular Contributor
    Join Date
    Jul 2005
    Location
    Pennsylvannia
    Posts
    103

    glDrawElements() blocking in CPU - takes 5ms to return

    I am having a strange issue where calling glDrawElements appears to be blocking; stalling the CPU for a moment while it waits for the function to return. I placed a system timer immediately before and after my call to glDrawElements and the function consistently takes betweeen 5 and 7 milliseconds to return.
    First, this is very slow. I'm only rendering about 4000 vertices. Second, I thought the whole point of using VBOs was to eliminate immediate mode, so that glDrawElements would just issue commands to the GPU and the CPU can continue working while the GPU works at its own pace. This does not appear to be happening at all.

    As I mentioned, my data is entirely stored in VBOs which are initialized and filled once. I am using very simple shaders. The vertex shader is handed a 20 element array of mat4's as uniforms before each render, and the pre-filled VBO's are merely activated, and vertex attribute pointers set.
    The only thing I actually time is the call to glDrawElements. Which as I mentioned, takes about 5 to 7 ms.
    Here are the general specs for my setup. Pretty modern. Should be fine.

    System info:
    Laptop: Dell XPS L511Z
    Processor:
    Intel® Core™
    i5-2410M CPU @
    2.30 Ghz 2.30Ghz
    Installed Memory (RAM) 6.00 GB.
    System type: 64 bit operating system.
    Operating system: Windows 7.
    Graphics card info:
    Card type: Nvidia GeForce GT 525M
    Driver Version: 285.77
    DirectX support: 11
    CUDA Cores: 96
    Graphics clock: 600 Mhz
    Processor clock:: 1200 Mhz
    Memory clock: 900 Mhz (1800 Mhz data rate)
    Memoryinterface: 128 bit
    Total available graphics memory: 3797 MB
    Dedicated video memory: 1024 MB DDR3
    System video memory: 0 MB
    Shared System Memory: 2773 MB
    Video BIOS version: 70.08.53.00.07
    IRQ: 16
    Bus: PCI Express x 16 Gen 2
    My application info:
    Win32 project written in C++ using Visual Studio 2010.
    Striving for proficiency...

  2. #2
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    411
    Doesn't glDrawElements pass indices? Still seems very slow. You should be using an element buffer I think. I am not sure, but graphics hardware seems to be only able to manage one task at a time. So there could be interaction with other applications running on your computer.

    EDITED: You might also be sure that your indices are aligned. If you just allocated them with new (assuming C++) then you should be fine. Anyway, passing unaligned memory can be incredibly slow. Just a hunch.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  3. #3
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,183
    glDrawElements doesn't just draw. Most modern GPUs will operate in a "lazy mode" so any state changes, shader changes, texture changes, etc are cached locally by the driver as they happen, then evaluated/validated/etc when a draw call occurs. What you're getting in your timing is the result of all of this as well as the cost of the draw call itself - you can confirm this by issuing a second draw call immediately after with the exact same params and time that too - you're most likely going to find that it returns almost immediately.

    So based on that, the excessive time is going to be on account of something that happened before the glDrawElements call (but which the driver just stored up at the time it happened, and is only doing for real when the draw call is made), and the most likely looking suspect is that 20-element array of mat4s. Some info on how you're sending that to the driver will help in diagnosing further.

    It's also possible that the timing functions you're using are not accurate (e.g. you might be using something like GetTickCount which has very poor resolution) in which case wrong times are to be expected. That's what you should double-check first.

  4. #4
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    411
    ^Yeah that makes sense. I was just reading an article on Wikipedia yesterday (comparing D3D and OpenGL) that claimed D3D's weakness was an inability to buffer user mode calls (before switching to kernel mode) presumably because the "IHV" layer Microsoft engineered does not allow for it. But the article says this only plagues D3D9 and was corrected for 10 I think. Still ~5ms is a long time. A 60fps frame is like 15.

    EDITED: 20-element array sounds like 5 matrices which seems like nothing. I've always assumed uploading all of the program registers at once would not be a big deal (unless your hardware supports a whole lot more than are normally required)
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  5. #5
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,183
    D3D9 and below actually does buffer user-mode calls - that's 100% a myth. The cost was in validation.

    I'm reading the 20-element array as being 20 matrices, which equates to 80 vec4s. Worst case is 80 glUniform4f calls here, but even 20 glUniformMatrix4fv calls can be quite heavy (especially if each one is also accompanied by a run-time glGetUniformLocation). Of course it can also be done with a buffer object (which - if careless - may involve a CPU/GPU sync but the time for that wouldn't be expected to be measured with glDrawElements) or even a single glUniformMatrix4fv call, so we need to know how the OP is setting these.

  6. #6
    Junior Member Regular Contributor Kopelrativ's Avatar
    Join Date
    Apr 2011
    Posts
    214
    As a complement to measuring CPU time, use a query to measure the time as reported by the GPU. But be careful about asking for the result, as it may stall the pipeline. One way is to do something as follows. That is, you get the result at the next iteration.

    Code :
    GLuint result = 0;
    if (!firsttime) glGetQueryObjectuiv(fQuery, GL_QUERY_RESULT, &result);
    glBeginQuery(GL_TIME_ELAPSED, fQuery);
    glDraw...
    glEndQuery(GL_TIME_ELAPSED);
    You will have to do a glGenQueries(1, &fQuery); somewhere also.

  7. #7
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    411
    ^mhagain, Direct3D9 lets you upload as many consecutive registers (of a class) as you need to. Which seems reasonable, as I would imagine the best approach would be to stream them all up in a block if possible. But I can also imagine the driver building a buffer and tagging each register in the upload for random access update. I recently programmed a lot with OpenGL ES (WebGL) and I don't remember there being an API for updating a block of registers (which would probably be very helpful for Javascript; as would bringing back display lists I think) but I did not really look. I guess OpenGL cannot do that then? Either way I don't think it would matter much beyond the unnecessary (presumably user mode) function calls.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  8. #8
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,183
    Quote Originally Posted by michagl View Post
    ^mhagain, Direct3D9 lets you upload as many consecutive registers (of a class) as you need to. Which seems reasonable, as I would imagine the best approach would be to stream them all up in a block if possible. But I can also imagine the driver building a buffer and tagging each register in the upload for random access update. I recently programmed a lot with OpenGL ES (WebGL) and I don't remember there being an API for updating a block of registers (which would probably be very helpful for Javascript; as would bringing back display lists I think) but I did not really look. I guess OpenGL cannot do that then? Either way I don't think it would matter much beyond the unnecessary (presumably user mode) function calls.
    If they're declared in the GLSL as:
    Code :
    uniform mat4 matrixArray[20];
    They can be loaded in the C(++) code as:
    Code :
    matrixType matrixArray[20];
     
    // fill in data here
     
    glUniformMatrix4fv (uniformLocation, 20, GL_FALSE, matrixArray);

    That should be the fastest way of loading them when using traditional uniforms.

  9. #9
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    411
    That's good to know. The WebGL API for glUniformMatrix4fv does not take a count argument. But It takes a typed array which I assumed had to be 4x4 (16) since that is the name of the API procedure.

    https://www.khronos.org/registry/webgl/specs/1.0/
    void uniformMatrix4fv(WebGLUniformLocation location, GLboolean transpose, Float32Array value);

    For the record. I am not sure the spec (above) even explains it, but it sounds like (from a little searching about) you can pass a multiple of 16 sized array. But I am not 100% positive that you can select out the individual matrices for use in your script with the Float32Array spec. That may be a fundamental limitation of Javascript.

    Sounds like I have some homework to do.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

  10. #10
    Member Regular Contributor
    Join Date
    Jan 2005
    Location
    USA
    Posts
    411
    Quote Originally Posted by michagl View Post
    Doesn't glDrawElements pass indices? ... You might also be sure that your indices are aligned. If you just allocated them with new (assuming C++) then you should be fine. Anyway, passing unaligned memory can be incredibly slow. Just a hunch.
    Forgive me. I'd forgotten how glBindBuffer interacts with glDrawElements. Look into it if you've not heard of it. Otherwise disregard these comments. I cannot edit the original post for correctness at this point.
    God have mercy on the soul that wanted hard decimal points and pure ctor conversion in GLSL.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •