Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 3 of 3

Thread: PBO: mapped memory and OpenMP

  1. #1
    Junior Member Newbie
    Join Date
    May 2012
    Posts
    11

    PBO: mapped memory and OpenMP

    Hi All,

    i experienced an interesting issue when working with mapped buffers and multithreaded writes to these buffers, at least with the latest ATI drivers on windows ( others need to be verified). We have a multithreaded data producer ( unpacker ) that writes to mapped memory. when requesting read AND write memory, the call is 30x faster.


    Code :
    // alloc buffer
    glBindBuffer( GL_PIXEL_UNPACK_BUFFER, buffer_id );
    glBufferData(GL_PIXEL_UNPACK_BUFFER, size, NULL, GL_STREAM_DRAW);

    This scope takes 20ms to complete
    Code :
    {
    void* b = glMapBufferRange( GL_PIXEL_UNPACK_BUFFER, 0, size, GL_MAP_WRITE_BIT|GL_MAP_READ_BIT );
    copyTo( b, getRGBASize() ); // <- ca. 8 threads writing ca. 64MB data
    glUnmapBuffer( GL_PIXEL_UNPACK_BUFFER );
    }

    This scope takes 600ms to complete
    Code :
    {
    void* b = glMapBufferRange( GL_PIXEL_UNPACK_BUFFER, 0, size, GL_MAP_WRITE_BIT );
    copyTo( b, getRGBASize() ); // <- 8 threads writing ca. 64MB data
    glUnmapBuffer( GL_PIXEL_UNPACK_BUFFER );
    }

    Is it possible that we get video memory in the latter case and the threaded write trashes the memory access?

  2. #2
    Junior Member Newbie
    Join Date
    May 2012
    Posts
    11
    Reply to myself, go deeper into detail:

    Indeed, the problem is 'non-cacheable memory' and i found a 2008 forum entry from yooyo that mapped buffers should be accessed sequentially only. So i serialized the last producer step and the copy performance was there again:

    Code :
    #pragma omp parallel
    {
         char some_thread_private_memory_on_stack[];
         dataproducer_into_local_memory(some_thread_private_memory_on_stack);
     
         #pragma omp critical
         {
               memcpy( mapped_buffer_ptr, some_thread_private_memory_on_stack, sizeof() );
         }
    }

    If anyone can throw a light on 'non-cacheable memory', (what is it? where is it?) that would be great!

  3. #3
    Member Regular Contributor
    Join Date
    Aug 2008
    Posts
    457
    This article by Fabian Giesen about write-combined memory explains the issue & some guidelines on getting the best performance. Write-combined & non-cacheable memory are explained in section 3.3.3 of this article (What Every Programmer Should Know About Memory).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •