When a particle “dies”, I assume that you want it to be removed from the array of existing particles. And when a new particle is added, you want it to be inserted into the array of existing particles.
So long as the order of the particles in the array is not relevant, I would suggest the following.
Your compute shader should have two working buffers, A and B (note: these may in fact be two regions of the same buffer object). On one frame, it reads the particles in A, does whatever processing is needed, and writes them to B. But it only writes a particle to B if the particle from A is still alive.
The way you pull that off is with atomic counters. The counter starts at 0, the beginning of B. When you want to write a particle, you perform an atomic increment, using the old value as the index into B for where that particle’s data gets written.
Once you’ve done your computations to generate B, you issue your memory barriers and use B to feed the rendering operation. You can even have your atomic counter be the count used to feed your indirect rendering.
On the next frame, you simply do the reverse: read from B and write to A, then render from A. A different memory barrier will be needed for B, since you’re reading B via SSBO rather than VBO. You could even have buffers C, D, etc if you think it’s necessary. Oh, and don’t forget to clear the atomic counter (probably using a different piece of memory).
Note that you should try to put some distance between the compute operation and the subsequent render operation. They can’t be concurrent, and the memory barrier will ensure that they are not. But if your render was immediately after the compute, then the render will stall the GPU until the compute operation is done.
Adding particles would be done in a separate pass, to avoid synchronization issues. If you have particles to add that frame, fill up a buffer object (call it X) with the new particles. Then, after having run the first compute operation, run a second that reads from the buffer you provided, copying them into the particle array. It could even be the same compute shader, one that simply reads from a different buffer. One filled in by the CPU with any new particles. You shouldn’t need barriers between these two compute operations, since they’re more-or-less the same. They should use the same atomic counter as well.
And emiting new particles is handled by CPU since it’s a serial process.
What exactly is “serial” about that process? I can understand if it’s based on data that would be inconvenient to put on the GPU. But that aside, I don’t see why a compute shader can’t be written that adds particles on its own, if the creation of new particles is governed purely by an algorithm.