PDA

View Full Version : PBO: mapped memory and OpenMP



henniman2
05-29-2014, 03:58 AM
Hi All,

i experienced an interesting issue when working with mapped buffers and multithreaded writes to these buffers, at least with the latest ATI drivers on windows ( others need to be verified). We have a multithreaded data producer ( unpacker ) that writes to mapped memory. when requesting read AND write memory, the call is 30x faster.




// alloc buffer
glBindBuffer( GL_PIXEL_UNPACK_BUFFER, buffer_id );
glBufferData(GL_PIXEL_UNPACK_BUFFER, size, NULL, GL_STREAM_DRAW);


This scope takes 20ms to complete

{
void* b = glMapBufferRange( GL_PIXEL_UNPACK_BUFFER, 0, size, GL_MAP_WRITE_BIT|GL_MAP_READ_BIT );
copyTo( b, getRGBASize() ); // <- ca. 8 threads writing ca. 64MB data
glUnmapBuffer( GL_PIXEL_UNPACK_BUFFER );
}

This scope takes 600ms to complete

{
void* b = glMapBufferRange( GL_PIXEL_UNPACK_BUFFER, 0, size, GL_MAP_WRITE_BIT );
copyTo( b, getRGBASize() ); // <- 8 threads writing ca. 64MB data
glUnmapBuffer( GL_PIXEL_UNPACK_BUFFER );
}

Is it possible that we get video memory in the latter case and the threaded write trashes the memory access?

henniman2
06-04-2014, 10:44 AM
Reply to myself, go deeper into detail:

Indeed, the problem is 'non-cacheable memory' and i found a 2008 forum entry from yooyo that mapped buffers should be accessed sequentially only. So i serialized the last producer step and the copy performance was there again:



#pragma omp parallel
{
char some_thread_private_memory_on_stack[];
dataproducer_into_local_memory(some_thread_private _memory_on_stack);

#pragma omp critical
{
memcpy( mapped_buffer_ptr, some_thread_private_memory_on_stack, sizeof() );
}
}

If anyone can throw a light on 'non-cacheable memory', (what is it? where is it?) that would be great!

Dan Bartlett
06-04-2014, 11:39 AM
This article by Fabian Giesen (http://fgiesen.wordpress.com/2013/01/29/write-combining-is-not-your-friend/) about write-combined memory explains the issue & some guidelines on getting the best performance. Write-combined & non-cacheable memory are explained in section 3.3.3 of this article (http://www.akkadia.org/drepper/cpumemory.pdf) (What Every Programmer Should Know About Memory).