Hello everyone, this will be a vague question and I would rather not ask such a question but I am kinda desperate.

For my game, I created a particle system. On my PC it is quite fast, I can easily increase the number of particles to huge amounts and the game is still fast. However in some PCs, people complain that games slowdown even with trivial amount of particles compared to what I do on my PC.

I am inexperienced in OpenGL and have no idea what might be the problem. I wish I had another PC to debug things but I don't. If you can find something weird with my system, please tell.

It is basically a GL_DYNAMIC_DRAW buffer object that is filled as new particles are added and they all rendered in a single call. When a buffer is full, it fetches another one. Particles also have a life time, so once a the system decides that the last particle added died, it marks that buffer as free and reuse that buffer some other time. Buffers are shared between everything so I don't think creating new buffers are a issue.

My buffers have 256 quads each and buffer size depends on vertex size (for example smoke particle has 6 floats and 3 vec2). Is this too big? Are there a performance hit for some hardware using such big vertices?

Here is how I create my buffers

Code :
glGenBuffers(1, &bufferData.quadBuffer);
glBindBuffer(GL_ARRAY_BUFFER, bufferData.quadBuffer);
glBufferData(GL_ARRAY_BUFFER, particleTemplate->attributeSize * 4 * MAX_QUAD, NULL, GL_DYNAMIC_DRAW);

But as I said, these buffers are reused and new ones are only created as number of partices on screen increases.


Adding a particle is simply filling attributes of 4 vertices and passing it to GPU like that

Code :
glBindBuffer(GL_ARRAY_BUFFER, bufferData.quadBuffer);
glBufferSubData(GL_ARRAY_BUFFER, vertexSize * bufferData.count, vertexSize * particleNumToAdd, buff);

And finally rendering is filling some uniforms and then rendering each buffer

Code :
particleTemplate->shader->begin();
particleTemplate->shader->setUniform(particleTemplate->uCurrentTime, timer.getTime() - time);
 
cRenderableWithShader::render(isIdentity, mat, crop);
 
glActiveTexture(GL_TEXTURE0);
 
for (int i = textures.size() - 1; i >= 0; i--)
{
	glActiveTexture(GL_TEXTURE0 + i);
	textures[i]->bindTexture();
	particleTemplate->shader->setTexture(i, i);
}
 
particleTemplate->shader->setViewMatrix(game->getViewMatrix(alignment));
 
glEnableVertexAttribArray(0);
for (auto& bufferData : quadBuffers)
{
	glBindBuffer(GL_ARRAY_BUFFER, bufferData.quadBuffer);
	for (auto& attribute : particleTemplate->attributes)
	{
		particleTemplate->shader->bindAttribute(attribute.index, particleTemplate->attributeSize, attribute.begin);
	}
	glDrawArrays(GL_QUADS, 0, bufferData.count);
}
glDisableVertexAttribArray(0);
glDisable(GL_TEXTURE_2D);

This part probably has some problems. I don't need to enable disable textures like that probably but would that cause performance problems?


My shaders
https://github.com/shultays/bloodwor...moke/shader.vs
https://github.com/shultays/bloodwor...moke/shader.ps

My shaders are probably horrible, would it make game crawl in an old hardware?

And here are my full particle code, it is a bit messy though
https://github.com/shultays/bloodwor...ce/cParticle.h
https://github.com/shultays/bloodwor.../cParticle.cpp

cParticle::addParticleInternal creates a new particle and cParticle::render renders them

Can you spot anything obvious that can cause problems? The reason it works on my PC can not be because my PC is much faster, for example I created a test scene to add lots of particles and compare it to a regular game.

https://www.youtube.com/watch?v=VsReiuj05_Q

In the first part of the video, how rocket smoke particle works in actual game, most you can get with rockets are like up to 100-150 buffers top and that makes people's PC crawl. In next scene I created a demo that has 3500 seperate buffers being rendered and FPS is playable. Would it make such a difference in old and new PCs?

I don't think it is cpu bound either, other than particles there are not much calculations for rockets