How to handle the birth and death of particle without reading from the buffer

I’ve implemented a simple particle system where each particle has a position, velocity, age and lifespan stored in different SSBOs.

Most of the data including all the position, velocity and age is updated through a compute shader. And emiting new particles is handled by CPU since it’s a serial process. But in order to track whether each particle is alive or dead, I have to read all the lifespan data back to CPU from the SSBO (using glGetBufferSubData()), which do stalls the GPU I’m afraid.

So I was wondering is there any way to prevent reading the data back from GPU. I found a demo particle system made by AMD using directX, which uses a dead list and an alive list in the compute shader. But I have no idea how to achieve this with GLSL. Any ideas?

Demo code:


    ...
    
    // The dead list, so any particles that are retired this frame can be added to this list
    AppendStructuredBuffer<uint>			g_DeadListToAddTo		: register( u2 );
    
    // The alive list which gets built using this shader
    RWStructuredBuffer<float2>				g_IndexBuffer			: register( u3 );
    
    ...
    
    // Dead particles are added to the dead list for recycling
    if ( pb.m_Age <= 0.0f || killParticle )
    {
    	pb.m_Age = -1;
    	g_DeadListToAddTo.Append( id.x );
    }
    
    ...

When a particle “dies”, I assume that you want it to be removed from the array of existing particles. And when a new particle is added, you want it to be inserted into the array of existing particles.

So long as the order of the particles in the array is not relevant, I would suggest the following.

Your compute shader should have two working buffers, A and B (note: these may in fact be two regions of the same buffer object). On one frame, it reads the particles in A, does whatever processing is needed, and writes them to B. But it only writes a particle to B if the particle from A is still alive.

The way you pull that off is with atomic counters. The counter starts at 0, the beginning of B. When you want to write a particle, you perform an atomic increment, using the old value as the index into B for where that particle’s data gets written.

Once you’ve done your computations to generate B, you issue your memory barriers and use B to feed the rendering operation. You can even have your atomic counter be the count used to feed your indirect rendering.

On the next frame, you simply do the reverse: read from B and write to A, then render from A. A different memory barrier will be needed for B, since you’re reading B via SSBO rather than VBO. You could even have buffers C, D, etc if you think it’s necessary. Oh, and don’t forget to clear the atomic counter (probably using a different piece of memory).

Note that you should try to put some distance between the compute operation and the subsequent render operation. They can’t be concurrent, and the memory barrier will ensure that they are not. But if your render was immediately after the compute, then the render will stall the GPU until the compute operation is done.

Adding particles would be done in a separate pass, to avoid synchronization issues. If you have particles to add that frame, fill up a buffer object (call it X) with the new particles. Then, after having run the first compute operation, run a second that reads from the buffer you provided, copying them into the particle array. It could even be the same compute shader, one that simply reads from a different buffer. One filled in by the CPU with any new particles. You shouldn’t need barriers between these two compute operations, since they’re more-or-less the same. They should use the same atomic counter as well.

And emiting new particles is handled by CPU since it’s a serial process.

What exactly is “serial” about that process? I can understand if it’s based on data that would be inconvenient to put on the GPU. But that aside, I don’t see why a compute shader can’t be written that adds particles on its own, if the creation of new particles is governed purely by an algorithm.

Detailed answer. Thank you very much. Using the atomic counter is really a good choice.

“serial” means that when you generate a certain number of new particles (say 30 for example), you would need to do it one by one rather than doing it at the same time, since you have to count during each generation. From this perspective, the principle behind using the atomic counter is the same (at least it should be the same) and the advantage of this in compared to using CPU calculation is that reading from the buffer can be avoid.

Actually I read it from some materials download from the web which tells me that the task like emiting new particles should be handled by CPU due to the reason I mentioned above. Now I can totally replace it with compute shader with the atomic counter. It seems that those materials are out of date.

One more thing, is there any dynamic data structure in glsl that is similiar to the “AppendStructuredBuffer” or “RWStructuredBuffer” in DirectX? Using a dead list and an alive list should be able to further optimize the performance.

One more thing, is there any dynamic data structure in glsl that is similiar to the “AppendStructuredBuffer” or “RWStructuredBuffer” in DirectX?

Looking at the documentation, D3D is working at a higher level than OpenGL here. Those are both just SSBOs with an internal atomic counter (optional in the case of RWStructureBuffer).

Using a dead list and an alive list should be able to further optimize the performance.

It’s unclear to me exactly why that would be the case. What performance issue would you avoid by keeping a list of dead particles?

Currently hard to say since I haven’t tried it.
What I’m thinking is that in the normal case I have to go through every particle to see whether it’s alive or dead when emit new particles. But with a dead list, I only need to read a certain number of elements as the index into the particles that should be initialized.

[QUOTE=A_Shuang;1265648]Currently hard to say since I haven’t tried it.
What I’m thinking is that in the normal case I have to go through every particle to see whether it’s alive or dead when emit new particles. But with a dead list, I only need to read a certain number of elements as the index into the particles that should be initialized.[/QUOTE]

But you have to go through every particle in order to compute where each particle is going. That’s where the determination is made as to whether it’s alive or dead. And if it’s dead, you simply don’t copy it into the list of living particles.

You know exactly how many particles were alive last frame; that’s the value from the last frame’s atomic counter.

[QUOTE=Alfonse Reinheart;1265649]
But you have to go through every particle in order to compute where each particle is going. That’s where the determination is made as to whether it’s alive or dead. And if it’s dead, you simply don’t copy it into the list of living particles.

You know exactly how many particles were alive last frame; that’s the value from the last frame’s atomic counter.[/QUOTE]

I read the relevant content from here: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=0CDYQFjAE&url=http%3A%2F%2Ftwvideo01.ubm-us.net%2Fo1%2Fvault%2FGDC2014%2FPresentations%2FGareth_Thomas_Compute-based_GPU_Particle.pdf&ei=wi4qVbaPJJHlsASiuIAg&usg=AFQjCNGWzTA32dg1cHPi4yd8iFewjwZGfA&sig2=TTBrsF0JfuwGJ3wm4iiAig&bvm=bv.90491159,d.cWc
In this demo, two compute shaders are used to emit new particles, and simulate physical behaviors, respectively. By using a dead list and an alive list, the number of threads in the emit compute shader is minimized.