Shader's difficulty and time of glBegin/glEnd

Hi.

I have a question about between difficulty of shader and execution time of glBegin/glEnd.

I wrote a long GLSL shader program and a short GLSL shader program, and measured the time of rendering.
As a result, the time of glBegin/glEnd in the case of long shader is longer than the case of short shader.
I thought that glBegin/glEnd only issue the order of rendering to GPU,
so I don’t think it is right that the time of glBegin/glEnd increase.

(This penchant is shown when I used GL_VERTEX_ARRAY .
I use GeForce7800GTX on VineLinux3.2, and GeForce6800GT on VineLinux3.2/WindowXP.)

Is this a spec of OpenGL?
or do I something wrong ?

Can you describe how you are measuring the time difference? (where you are placing your counter calls etc and what timer you are using?)

What leads you to believe that it is a glBegin/End that is using CPU time?

How much geometry are you rendering?

What happens if you render a LOT more geometry (if you are testing with a trivial case)

What happens if you use VBO’s with the standard glDraw calls? (glBegin/End is not recommended for massive geometry throughput)

Finally, can you describe why you are measuring this?

Thanks reply.

I measure the execution times by:

 
	glFinish();
#ifdef _WIN32
	QueryPerformanceCounter(&liBeforeDraw);
#endif
#ifdef _LINUX
	gettimeofday(&tBeforeDraw, &tz);
#endif

#if USE_VERTEX_ARRAY == 0
	{
		glBegin(GL_QUADS);
		glVertex3f(-1.0, -1.0, -0.5f);		glVertex3f( 1.0, -1.0, -0.5f);
		glVertex3f( 1.0,  1.0, -0.5f);		glVertex3f(-1.0,  1.0, -0.5f);
		glEnd();
	}
#endif
#if USE_VERTEX_ARRAY == 1
	{
		float vp[12];
		GLubyte indices[] = { 0 , 1 , 2, 3 };
		vp[0]	= -1.0f;		vp[1]	= -1.0f;		vp[2]	= -0.5f;
		vp[3]	=  1.0;			vp[4]	= -1.0f;		vp[5]	= -0.5f;
		vp[6]	=  1.0;			vp[7]	=  1.0;			vp[8]	= -0.5f;
		vp[9]	= -1.0f;		vp[10]	=  1.0;			vp[11]	= -0.5f;
		glEnableClientState(GL_VERTEX_ARRAY);
		glVertexPointer(3,GL_FLOAT, 0, vp);
		glDrawElements(GL_QUADS, 4, GL_UNSIGNED_BYTE, indices);
	}
#endif

#ifdef _WIN32
	QueryPerformanceCounter(&liAfterDraw);
	dTmpSec		= (double)(liAfterDraw.QuadPart - liBeforeDraw.QuadPart)/(double)liFreq.QuadPart;
#endif
#ifdef _LINUX
	gettimeofday(&tAfterDraw, &tz);
	dTmpSec		= (tAfterDraw.tv_sec + (double)tAfterDraw.tv_usec*1.0e-6) - (tBeforeDraw.tv_sec + (double)tBeforeDraw.tv_usec*1.0e-6);
#endif
 

I believe that glBegin/glEnd ( or glEnableClientState~glDrawElements ) do not wait for the end of GPU’s rendering, but glBegin/glEnd ( or glEnableClientState~glDrawElements ) need some seconds.

On the other hand, because I use FBO etc, this problem is not easy.

What does seem sure is that the function call of glBegin/glEnd requires some seconds ( often need 2~3sec, but over 10sec is required when shader’s difficulty and rendering size are too large ) when I use long shader program ( it requires about 10sec for rendering ) although these functions only do issuing some command to GPU.

The reasons why I want to measure these times is that I want to do some CPU jobs while GPU executing shader programs.
Now the function call of glBegin/glEnd require 2~3sec, CPU can do various jobs when these seconds.

This is just a guess, you will need an answer from an Nvidia guy to know for sure…

Perhaps since you do a glFinish(); just before the glBegin(), the OpenGL pipe is empty so it starts execuiting it straight away and this is using some CPU? (Wild guess)

Just out of curiousity, how may uniforms are there in this “big” shader? What happens if you write a equally big shader with only a few uniforms?

Anyway, 2-3 seconds seems a very long time for that code to execute… (If Nvidia does not reply, you may have to directly email them)

Hmmm, thats looks like a problem I run into some time ago.
I also tried to measure time for OpenGL calls. The problem I found was the time measurement (only tested on Windows).
If I used only a simple geometry (a handfull triangles or quads) between glBegin and glEnd, I couldn’t get a valid measure (why not, I don’t know exactly…).
Try to make a bigger workload beetween glBegin and glEnd (I used up to 1 million triangles per second, but be warned you may only measure cpu-transfer time with that heavy amount of vertices, I didn’t test it with static VBOs).
I am not sure, but about >1000 triangles you will get more valid times.

Last week I tested my Linux system and I found no GL_QUADS optimisation in the driver (its still hardware accelerated but there seems to be no optimisation, because the fps was a lot lower than with triangles in all cases, tests with glperf show me exact the same “problem”. OK its easy to use a wrapper from quads to triangles, but its not nice to have a bigger performance drop when using quads.

I’m using Athlon XP 2600+, 1GB RAM and Radeon 9800Pro 256MB

Ups, sorry I meant 1 million triangles per frame (not per second) :rolleyes:

Are you sure your shader is being executed in hardware (the long one)? Check info log. Only a software emulation can explain such delays.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.