Hi all,
I am trying to do a cloth simulation and I have two versions
-
using render to vertex buffer that uses fragment shader for verlet integration and
-
using transform feedback that uses vertex shader for verlet integration.
I tried to compare the performances of the two and in my tests using a 2D mesh grid ranging from 64x64 to 2048x2048, for small mesh sizes TF is around 1.25-1.5x faster however for larger meshes, R2VB is 1.5-2x faster than TF. Here are my stats on my NVIDIA Quadro FX 5800. All times are msecs per frame calc. using timer query as detailed below.
+------------+-------------+---------------+
| Grid size | R2VB | TF |
+------------+-------------+---------------+
| 64 x 64 | 0.370-0.376 | 0.088-0.090 |
+------------+-------------+---------------+
| 128 x 128 | 0.403-0.431 | 0.238-0.240 |
+------------+-------------+---------------+
| 256 x 256 | 0.713-0.758 | 0.804-0.806 |
+------------+-------------+---------------+
| 512 x 512 | 2.100-2.308 | 3.090-3.096 |
+------------+-------------+---------------+
|1024 x 1024 | 7.670-9.250 | 12.205-12.209 |
+------------+-------------+---------------+
|2048 x 2048 |31.800-32.39 | 48.240-48.560 |
+------------+-------------+---------------+
Is this an expected output or am i doing something wrong in timing calc.
This is how i calc. my times
- For R2VB:
glBeginQuery(GL_TIME_ELAPSED,t_query);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fboID[writeID]);
//bind verlet integration fragment shader
//draw full screen quad
glFlush();
//read back the results into the VBO
glBindFramebuffer(GL_READ_FRAMEBUFFER, fboID[readID]);
glReadBuffer(GL_COLOR_ATTACHMENT0);
glBindBuffer(GL_PIXEL_PACK_BUFFER,vboID);
glReadPixels(0, 0, texture_size_x, texture_size_y, GL_RGBA, GL_FLOAT, 0);
glFlush();
glFinish();
glEndQuery(GL_TIME_ELAPSED);
//get the elapsed time
glGetQueryObjectui64v(t_query, GL_QUERY_RESULT, &elapsed_time);
- For TF:
glBeginQuery(GL_TIME_ELAPSED,t_query);
glBeginTransformFeedback(GL_POINTS);
glDrawArrays(GL_POINTS, 0, total_points);
glEndTransformFeedback();
glFlush();
glEndQuery(GL_TIME_ELAPSED);
//get the elapsed time
glGetQueryObjectui64v(t_query, GL_QUERY_RESULT, &elapsed_time);
I was expecting TF to be a better option since it did not involve any readback but the performance results are the opposite. Any ideas?