Texture loading and PBO

I try to make a texture loader for my game engine so I am looking for the fastest method to upload textures. But I have trouble understanding the PBOs …

PBO locks :
Has say the PBO’s documentation, the PBO memory is not cacheable, so when we map again a PBO we don’t necessarily retrieve our data. But I the documentation we can read this :

Note that if GPU is still working with the buffer object, glMapBufferARB() will not return until GPU finishes its job with the corresponding buffer object. To avoid this stall(wait), call glBufferDataARB() with NULL pointer right before glMapBufferARB(). Then, OpenGL will discard the old buffer, and allocate new memory space for the buffer object.

If the PBO memory is not catcheable, what’s the interest of the opengl’s locking mecanism in the PBO mapping function ? And if calling glBufferData give us a new buffer, what the interest of creating multiple PBOs to stream a texture ?

PBO allocating time :
Since reading (and decoding) a texture from disk is a slow opération it seems a good idea to use the PBOs like this :

  1. Read and decode the first texture in PBO1 memory (CPU)
  2. Load texture1 from PBO1 (GPU) <- while -> reading and decoding texture2 in PBO2 (CPU)
  3. Load texture2 from PBO2 (GPU) <- while -> reading and decoding texture3 in PBO3 (CPU)

My question is how creating the PBOs (the texture not necessarily have the same size) ?
-> It is a problem creating a PBO for each texture ? How can I release the PBO without locking the CPU ?
-> It is better to reuse the PBO if possible (PBO pool) ? How avoiding CPU lock ? with glBufferData ?
-> It is better to use one big PBO, enough for the bigest texture and call glBufferData(**,NULL) each time ?

Thanks !

Check out this paper by NVidia on Optimizing Texture Transfers

Thanks !

This paper propose using two PBO alternately. May is the fastest method.

But doesn’t explain why… Is someone can explain me PBO’s internal ?