Synchronization of compute shader and CPU commends

I’m implementing a particle system, where the particles are initialized during each frame with CPU commands:


        protected void initParticles()
	{
		int dataSize = (4 + 3 + 1 + 1) * Float.SIZE / 8;
		int buffSize = particleCount * dataSize;
		int newParticles = 10;
		
		for (int i = 0; i < particleCount; i++)
		{
			if (newParticles > 0 && dataBuff.get(i * 9 + 7) == 0)
			{
				dataBuff.put(i * 9 + 7, 1f);    //update lifespan
				
				newParticles--;
			}
		}
		
		dataBuff.rewind();
		
		gl.glBindBuffer(GL4.GL_ARRAY_BUFFER, vboBuff.get(0));
		gl.glBufferSubData(GL4.GL_ARRAY_BUFFER, 0, buffSize, dataBuff);
	}

Setup buffers at the start:


        dataBuff = GLBuffers.newDirectFloatBuffer(buffSize);
		
		for (int i = 0; i < particleCount; i++)
		{
			dataBuff.put(new float[] {((float) Math.random() - 0.5f) * 20f, 
					10, //((float) Math.random() * 25) + 25f
					((float) Math.random() - 0.5f) * 20f, 
					0.02f});                        //position and size
			dataBuff.put(new float[] {0, -16, 0});  //velocity
			dataBuff.put(0f);                       //lifespan
			dataBuff.put(0f);                       //age
		}
        
        gl.glBufferData(...);

And then the compute shader is supposed to receive the updated data. This is the rendering code:


                initParticles();
		
		gl.glUseProgram(computeProgram);
		gl.glBindBufferBase(GL4.GL_SHADER_STORAGE_BUFFER, 0, vboBuff.get(0));
		gl.glDispatchCompute(1, 1, 1);
		
		gl.glUseProgram(viewProgram);
		
		setUniforms();
		
		gl.glActiveTexture(GL4.GL_TEXTURE0);
		texture.bind(gl);
		
		gl.glBindVertexArray(vaoBuff.get(0));
		gl.glDrawArraysInstanced(GL4.GL_TRIANGLE_STRIP, 0, 4, particleCount);

In my example, the compute shader is bound to a SSBO, which is also used as a VBO for rendering program to read from it. The ‘initParticles()’ will update that buffer in each frame. There should be 10 new particles generated in each frame, yet the result comes out to be that the command initParticles() doesn’t wait the compute shader to finish first, it just keep executing and as a result all particles are generated. (The shader code should be correct after testing.) Any ideas about that?

There are a lot of things that are unclear in this code and your description of what you want to have happen.

You say that “there should be 10 new particles generated in each frame”. What do you mean by “new particles”? Do you want to overwrite the old particles (as your ‘initParticles’ function seems to be doing)? Or do you want to add particles to the existing set of particles?

Also, if you’re constantly updating the particle data from the CPU… what exactly is the compute shader for? It seems like your program is trying to reset the particle data after every compute shader operation, rather than providing some initial data on the first frame and letting the compute shader do per-frame computations on it. Because that’s what your CPU operation does; it overwrites the entire buffer every frame, rather than just updating part of it. Did you mean for it to do that?

Equally importantly, it is not clear what the relationship is between the compute shader operation and the rendering operation. Sure, the compute shader uses the SSBO, so it probably reads from it. But does it write to it as well, or does it write to something else? Does the rendering operation read from that same SSBO? Is that how the two operations communicate? Does it read from the buffer as an SSBO or as a VBO? Or, put another way, is the buffer you bound to the SSBO range the same buffer used by the VAO?


I’m now going to make some guesses as to what you are trying to do, and then respond to your code as though that’s what it actually did. I assume that your intent is to use your initParticles function to add new particles to the existing buffer, when new particles need to be added to the system. Your compute shader will operate on the buffer and update all particles, new and old. It will both read from and write to this buffer as an SSBO. Your rendering operation will then read from the is buffer as a VBO, through that VAO you bind.

Given this intent, there is one major problem here. It’s the fact that your compute shader is working with this buffer as an shader storage buffer. SSBO operations use an incoherent memory model. This means that, when your compute shader operation writes to the SSBO, that write will not become visible to any subsequent operations (reads or writes) until you issue an appropriate glMemoryBarrier call. Specifically, the GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT barrier.

Furthermore, when you start writing to that buffer in initParticles, any writes from the prior compute shader operation may not have completed and become visible. Thus they could happen after your next initParticles function executes. You’ll need to use the GL_BUFFER_UPDATE_BARRIER_BIT barrier to get that to work.

Sorry for not being clear.

[QUOTE=Alfonse Reinheart;1265593]
I’m now going to make some guesses as to what you are trying to do, and then respond to your code as though that’s what it actually did. I assume that your intent is to use your initParticles function to add new particles to the existing buffer, when new particles need to be added to the system. Your compute shader will operate on the buffer and update all particles, new and old. It will both read from and write to this buffer as an SSBO. Your rendering operation will then read from the is buffer as a VBO, through that VAO you bind.[/QUOTE]

Yes. That’s what I’m doing.

[QUOTE=Alfonse Reinheart;1265593]
Furthermore, when you start writing to that buffer in initParticles, any writes from the prior compute shader operation may not have completed and become visible. Thus they could happen after your next initParticles function executes. You’ll need to use the GL_BUFFER_UPDATE_BARRIER_BIT barrier to get that to work.[/QUOTE]

What you said make sense to me. I’m thinking for the same reason. I tried using gl.glMemoryBarrier(GL4.GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT) after (even before) calling glDispatchCompute, yet didn’t work.

[QUOTE=Alfonse Reinheart;1265593]
Also, if you’re constantly updating the particle data from the CPU… what exactly is the compute shader for? It seems like your program is trying to reset the particle data after every compute shader operation, rather than providing some initial data on the first frame and letting the compute shader do per-frame computations on it. Because that’s what your CPU operation does; it overwrites the entire buffer every frame, rather than just updating part of it. Did you mean for it to do that?[/QUOTE]

Yes. I update the entire buffer with CPU to init new particles. That’s because a certain number of particles will be born (and dead) in each frame. And this is a serial process which might not be effeciently handled by GPU parallel computing. The compute shader is used for dealling with the physical bahavior of particles (like changing the positions, velocities and etc.). I also have the same worrying for the cost of updating the entire buffer, but I haven’t come up with a better choice currently.