It should work faster ...

Hello.

I come up with some problems while writing my particle engine. When I wrote everything (it looks great, believe me ) I wanted to test it. I set up an explosion with 1000 particles. Ok. It looks nice but fps falls from 100 to 20. I don’t think it is that I’ve made an unefficient algorithm because program slows down only at the begginig of the explosion - I mean when all particles are close to each other and they are blended together. When they are split on the whole screen everything work just fine. Do you have any ideas of solving this problem? If I was not clear in what I’ve said just let me know. I can post the demo and so you can see how does it look like.

Thanks in advance!

Sounds like a fillrate problem maybe related to memory read/write stalls and/or failing early z-rejects…

Originally posted by HS:
Sounds like a fillrate problem maybe related to memory read/write stalls and/or failing early z-rejects…

Well, I think it’s something about blending particles. When the computer has to draw 1000 small particles which are in the same place it has to make blending calculation for all the particles, am I right? But after few calculations it appears that pixels in the current place have already their maximum value so the computer doesn’t have to blend the particles more. But I’m not sure about that really…

Thanks!

Orzech

If you blend 1000 particles at (almost) the same place there maybe hardware limitations because the texture units get stalled by pending read/write operations to that part of the framebuffer (which would be a extreme case of bandwidth- rather then a fillrate limitation).

Since I am not a hardware engineer that maybe completly non-sense but its the only logical (sort of) explanation that comes to mind.

What 3d card are we talking about btw?

You are wrong with your guess about blending being the problem. Every particle is rendered, every particle has a translucent alpha (so that it is blended with the background), that’s it. it does not at all matter where the particles are (only exception is when particles are so close to the viewer that they nearly fill the whole screen, wich is a fillrate problem then). I just implemented volumetric clouds, and 5 clouds with 5000 particles each still make about 30 fps (with also the ground being drawn), where before there were about 60 (without any clouds). So there REALLY something else has to be wrong. I guess rahter the depth-sorting of the particles (do you even have that?) could be a problem. Do you have alpha test enabled? Or your particles are very large with only a small texture on them (fillrate problem)?

Jan

well the theory about the texture unit getting stalled because of always writing to the same part of the framebuffer might be true, but certainly not so dramatic that the framerate gets dropped to 10 fps due to only 1000 particles… but… what hardware are you using?

To examine the blending theory, I changed my cloud engine to drawing every particle at the same point… and the performance does not suffer at all, rather even gets a little faster (but only about 1 fps), strangely. So there are 5 clouds with 5 particles each wich are all at the same point (for each cloud), and performance does NOT drop… so for sure not the blending is the problem. Rather cache?

Jan

my particles have depth sorting tap myself on the shoulder and still 30 fps…

Hello!
Thanks for your help guys. Sorry for a delay - I didn’t have much time lately. I’ll try to precise my problem now.
Hardware : GeForce 2 MX 32 (if it’s important), Athlon 1200
Well I don’t have alpha testing (what’s that for really?). Depth-sorting also isn’t included in my program (but from what i know, it wouldn’t make the program run faster). And finally, my textures are rather small - just 32x32.
Like I said the problem appear only when particles are close to each other. I have 90 fps just when they are split on the whole screen.

I’m waiting for your advices

Great thanks for your help!!!

Orzech

depth-sorting obviously makes things slower rather than faster .

Alpha test means that fragments with an alpha value greater or smaller than a certain value are discareded from further processing, making things faster (and allows us to use z-buffering together with alpha blendinng). But this is rather useless with particle systems.

Are the particles closer to the viewer (which means, larger) when they are close togehter? That would be a fillrate problem then.

But maybe it is in fact a bandwith problem and that, togehter with your rather slow graphics board, causes the performance loss.

Jan

Originally posted by JanHH:
[b]
But maybe it is in fact a bandwith problem and that, togehter with your rather slow graphics board, causes the performance loss.

Jan[/b]

Hey, my graphics card works fine!
Still I don’t know what cause the performance “jumps”. At first I thought that it’s just my algorithm too slow but it isn’t much of computing though. Hmmm…

Thanks

Orzech

I guess it would help to have a look at the code.

Originally posted by JanHH:
I guess it would help to have a look at the code.

OK. I’ll post a drawing part (i think there’s a problem) but not right now. Although, I’ll try to do it today.

Thanks

Orzech

So here is my code. If you find anything “suspicious” let me know.
To be honest, now I’m sure that it is a problem with my OpenGL
implementation. I don’t know precisly how OGL use a framebuffer
but it seems that writing repetadly colors in the same position causes
lost of performance. For example, if IsRotating flag is set on true
program runs faster because particles are rotating and the don’t accumulate.
I’m not sure if there is any simple way to make things better.

Here are 3 basic functions which draws and update particles,
move them and create display lists for them.

<CODE>

void PARTICLE::SolveSimple(float dt)
{
// Solving Euler equation

V   = V + a * dt;
Pos = Pos + V * dt;

}

void PARTICLE::Live(float dt)
{
// Drawing

glPushMatrix();

glTranslatef(Pos.x, Pos.y, Pos.z);

GLfloat Kolor[] = {R, G, B, Energy};

glColor4fv(Kolor);

if (IsRotating)
{
  glRotatef(CurRotX, 1, 0, 0);
  glRotatef(CurRotY, 0, 1, 0);
  glRotatef(CurRotZ, 0, 0, 1);

  CurRotX += RotX;
  CurRotY += RotY;
  CurRotZ += RotZ;
}

glCallList(ListID);

glPopMatrix();

// Updating current particle

Energy -= DyingSpeed * dt;

SolveSimple(dt);

}

GLuint CreateParticle(TEXTURE tekstura)
{
// Creates a display list for a specified texture to use in particles

GLuint lista;

lista = glGenLists(1);

glLoadIdentity();

glBindTexture(GL_TEXTURE_2D, tekstura);
glBlendFunc(GL_SRC_ALPHA, GL_ONE);

glNewList(lista, GL_COMPILE);			

glDisable(GL_DEPTH_TEST);	

glBegin(GL_TRIANGLE_STRIP);
	glTexCoord2f(0, 1); glVertex2f(-1, 1);
	glTexCoord2f(0, 0); glVertex2f(-1, -1);
	glTexCoord2f(1, 1); glVertex2f(1, 1);		
	glTexCoord2f(1, 0); glVertex2f(1, -1);
glEnd();

glEnable(GL_DEPTH_TEST);

glEndList();	

return lista;

}

</CODE>

Thanks

See you,

Orzech

what i found “suspicious” is that you disable and enable depth test within the display ist… so if you draw thousands of particles you call theses functions thousands of times, without this making any sense. I would rather call it once and then draw all particles, or rather do not disable depth testing at all, but set depth buffer to read only… else, your particles will mess up the other tings in your scene (if there are any). glDephtMask(GL_TRUE) (or false, one of them is right ). Also i guess that your particle system will look better if you render the particles in sorted back-to-front oder.

But I really guess that the performance loss comes from what you said, so i think the only solution to this is to get a better graphics card. Also, there is an extension to render sprites more efficiently, NV_point_sprite, which is supported from gf4 ti 4200 on.

Originally posted by JanHH:
But I really guess that the performance loss comes from what you said, so i think the only solution to this is to get a better graphics card. Also, there is an extension to render sprites more efficiently, NV_point_sprite, which is supported from gf4 ti 4200 on.

Unfortunately, I’ll stay with my old gf2 .

Thanks for your help.

Orzech