PDA

View Full Version : Slow particle system



shultays
01-02-2018, 12:00 PM
Hello everyone, this will be a vague question and I would rather not ask such a question but I am kinda desperate.

For my game, I created a particle system. On my PC it is quite fast, I can easily increase the number of particles to huge amounts and the game is still fast. However in some PCs, people complain that games slowdown even with trivial amount of particles compared to what I do on my PC.

I am inexperienced in OpenGL and have no idea what might be the problem. I wish I had another PC to debug things but I don't. If you can find something weird with my system, please tell.

It is basically a GL_DYNAMIC_DRAW buffer object that is filled as new particles are added and they all rendered in a single call. When a buffer is full, it fetches another one. Particles also have a life time, so once a the system decides that the last particle added died, it marks that buffer as free and reuse that buffer some other time. Buffers are shared between everything so I don't think creating new buffers are a issue.

My buffers have 256 quads each and buffer size depends on vertex size (for example smoke particle has 6 floats and 3 vec2). Is this too big? Are there a performance hit for some hardware using such big vertices?

Here is how I create my buffers


glGenBuffers(1, &bufferData.quadBuffer);
glBindBuffer(GL_ARRAY_BUFFER, bufferData.quadBuffer);
glBufferData(GL_ARRAY_BUFFER, particleTemplate->attributeSize * 4 * MAX_QUAD, NULL, GL_DYNAMIC_DRAW);

But as I said, these buffers are reused and new ones are only created as number of partices on screen increases.


Adding a particle is simply filling attributes of 4 vertices and passing it to GPU like that


glBindBuffer(GL_ARRAY_BUFFER, bufferData.quadBuffer);
glBufferSubData(GL_ARRAY_BUFFER, vertexSize * bufferData.count, vertexSize * particleNumToAdd, buff);

And finally rendering is filling some uniforms and then rendering each buffer



particleTemplate->shader->begin();
particleTemplate->shader->setUniform(particleTemplate->uCurrentTime, timer.getTime() - time);

cRenderableWithShader::render(isIdentity, mat, crop);

glActiveTexture(GL_TEXTURE0);

for (int i = textures.size() - 1; i >= 0; i--)
{
glActiveTexture(GL_TEXTURE0 + i);
textures[i]->bindTexture();
particleTemplate->shader->setTexture(i, i);
}

particleTemplate->shader->setViewMatrix(game->getViewMatrix(alignment));

glEnableVertexAttribArray(0);
for (auto& bufferData : quadBuffers)
{
glBindBuffer(GL_ARRAY_BUFFER, bufferData.quadBuffer);
for (auto& attribute : particleTemplate->attributes)
{
particleTemplate->shader->bindAttribute(attribute.index, particleTemplate->attributeSize, attribute.begin);
}
glDrawArrays(GL_QUADS, 0, bufferData.count);
}
glDisableVertexAttribArray(0);
glDisable(GL_TEXTURE_2D);

This part probably has some problems. I don't need to enable disable textures like that probably but would that cause performance problems?


My shaders
https://github.com/shultays/bloodworks/blob/master/game/resources/particles/rocketSmoke/shader.vs
https://github.com/shultays/bloodworks/blob/master/game/resources/particles/rocketSmoke/shader.ps

My shaders are probably horrible, would it make game crawl in an old hardware?

And here are my full particle code, it is a bit messy though
https://github.com/shultays/bloodworks/blob/96214ba99e38b1eb90ec8ae8e198af73a675c168/game/source/cParticle.h
https://github.com/shultays/bloodworks/blob/7c33dddcbe826620d0957c7451e20af2ad55a9e4/game/source/cParticle.cpp

cParticle::addParticleInternal creates a new particle and cParticle::render renders them

Can you spot anything obvious that can cause problems? The reason it works on my PC can not be because my PC is much faster, for example I created a test scene to add lots of particles and compare it to a regular game.

https://www.youtube.com/watch?v=VsReiuj05_Q

In the first part of the video, how rocket smoke particle works in actual game, most you can get with rockets are like up to 100-150 buffers top and that makes people's PC crawl. In next scene I created a demo that has 3500 seperate buffers being rendered and FPS is playable. Would it make such a difference in old and new PCs?

I don't think it is cpu bound either, other than particles there are not much calculations for rockets

Silence
01-03-2018, 03:59 AM
Most probably you make too many draw calls on these unknown hardwares.

You can consider reducing the number of draw calls (ie keep a single VAO/VBO, use base vertex drawing, or instancing).

shultays
01-03-2018, 05:09 AM
Hmm, but if my analyze was correct that was 100-150 extra draw calls when particles are on screen. I am pretty sure I do more for non-particle draws

What is a good amount of draw calls if I want to target older hardware too?

Dark Photon
01-03-2018, 05:59 AM
I would try to correlate the performance (frame time consumption, in milliseconds) you're seeing with something about your particle system. For instance, the number of fragments touched, the number of vertices transformed, the number of buffers updated, the number of batches drawn, etc.

When testing, disable everything else so you can focus on particle system performance. Also (again, for testing only), at the beginning of each frame and after submitting the draws for all particles, do a glFinish() so you can get consistent frame timings (in ms) for all the time it takes to submit and execute your draw work.

When I hear particle system perf problem, the two most likely candidates that come to my mind (particularly on old hardware) are: 1) fill (fragments blended) 2) inefficient buffer updates (which can cause implicit synchronization). For #1, if you dynamically size your frustum to fit the window/viewport, try resizing your window and see what that does to performance.

shultays
01-03-2018, 01:25 PM
Finally I found a friend to to debug things, I timed things with him. Adding new particles is more problematic than render. but render gets slow as well

I should probably delay appending vertices to buffer until it render stuff. Currently each a new particles calls glBufferData once to update the buffer. batching vertices together and doing it on a single call should greatly improve the performance.

I am not sure how to improve rendering, but maybe fixing particle adding code will fix things

OceanJeff40
01-07-2018, 12:06 PM
Your particles look so small, have you considered adding them as just points, and using Point Sprites?

I just did some videos and am continuing to do some videos on Particle Systems (one of my favorite things to play with!) that use one point per particle, in the vertex shader the gl_PointSize is used to increase the size (it yields a square size only....), and then gl_PointCoord in the fragment shader to map across the expanded point (square).

Here's a link to my video channel:

https://www.youtube.com/channel/UCzx8alrxVELz5h1dfCdkdfg?view_as=subscriber

Here's one on instancing:

https://www.youtube.com/watch?v=BxkPiID_M9g

And here's one on using gl_PointCoord, there was an interesting fix to enable it, so I documented it in video:

https://www.youtube.com/watch?v=X1sCoPxJJW8

Also, if you want me to research something, just ask and I will give it a try, I'm focusing on 2D top down / 2D Platformer combo right now, to "prove myself", and give myself a project to continually focus on, and then I will add a 3D release under my belt as my side project shortly thereafter (this year sometime).

Hope this helps,

Jeff