Iím doing morph animations on GPU using CUDA. Each frame, I update the vertex buffer before rendering:

Code :
cudaGraphicsResourceSetMapFlags(cudaResource, cudaGraphicsMapFlagsWriteDiscard);
cudaError err = cudaGraphicsMapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);
//Update Vertex Buffer 
err = cudaGraphicsUnmapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);

After, I render using glDrawRangeElements.
Using Nsight I see that glDrawRangeElements call stalls until GPU begins to actually draw the same mesh.

Click image for larger version. 

Name:	nsight.jpg 
Views:	190 
Size:	94.4 KB 
ID:	1284

The lag is independent of the computation Iím doing. As long as the resource is Map / Unmapped the lag is present.
I added cudaStreamSynchronize and cudaDeviceSynchronize to ensure GPU is done and I also double and triple buffered my Vertex Buffer but it didnít change anything.
I get the lag only when I use the Map the resource using CUDA, otherwise it all runs well.

Iím on windows 7 with NVIDIA GTX 480.
Iíve tried updating the drivers, CUDA versions(5.5 and 6.0) and the GPU (GTX 680) but to no avail.

Any ideas or pointers would be greatly appreciated.