Speed of glCopyTexImage2D... Brace yourself :)

Well, as prompted by my other thread about FBO and AA, I’ve started to more seriously investigate the speed of glCopyTexImage2D. I’ve modified some code I found on the net, and put some timing in.

I’m hoping to verify that my results are right, and to gather some stats from other peoples machines.

My machine is:
P4 2.4B (533mhz FSB)
1gb ram
Asus Geforce 6800GT 128mb (so slower ram then a full GT)

Software:
Windows XP SP2
Nvidia 78.01 drivers
VSYNC, AA and AF are all OFF

And if I run the program below in 1280x1024 (maximized, not fullscreen, havent tried that) I get the following:

CopyTime: 0.001271 = 2815.384809MB/s, Frame: 0.003100
CopyTime: 0.001290 = 2773.318703MB/s, Frame: 0.003103
CopyTime: 0.001266 = 2827.193310MB/s, Frame: 0.003038
CopyTime: 0.001271 = 2814.765973MB/s, Frame: 0.003051
CopyTime: 0.001299 = 2754.233572MB/s, Frame: 0.003636
CopyTime: 0.001271 = 2814.765973MB/s, Frame: 0.003053
CopyTime: 0.001280 = 2794.498191MB/s, Frame: 0.003076
CopyTime: 0.001288 = 2778.734027MB/s, Frame: 0.003117
CopyTime: 0.001287 = 2780.544089MB/s, Frame: 0.003077
CopyTime: 0.001267 = 2823.453731MB/s, Frame: 0.003038
CopyTime: 0.001288 = 2778.734027MB/s, Frame: 0.003076
CopyTime: 0.001281 = 2792.670176MB/s, Frame: 0.003089
CopyTime: 0.002794 = 1280.334444MB/s, Frame: 0.004620
CopyTime: 0.001282 = 2791.452911MB/s, Frame: 0.003012
CopyTime: 0.001266 = 2826.569272MB/s, Frame: 0.003053
CopyTime: 0.001276 = 2804.288509MB/s, Frame: 0.003073
CopyTime: 0.001284 = 2787.200450MB/s, Frame: 0.003108
CopyTime: 0.001286 = 2781.147716MB/s, Frame: 0.003095
CopyTime: 0.001282 = 2791.452911MB/s, Frame: 0.003247
CopyTime: 0.001262 = 2834.075168MB/s, Frame: 0.002991
CopyTime: 0.001279 = 2798.161666MB/s, Frame: 0.003062
CopyTime: 0.001280 = 2795.718368MB/s, Frame: 0.003100
CopyTime: 0.001281 = 2792.061284MB/s, Frame: 0.003066
CopyTime: 0.001284 = 2786.593941MB/s, Frame: 0.003089

Heres the source code:

// console.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"



#include <GL/glut.h>
#include <GL/glext.h>
#include <stdio.h>
#include <assert.h>
#include "TimeCounter.h"



int imageWinWidth = 256;
int imageWinHeight = 256;

void reshape(int w, int h)
{
  glClearColor (0.0, 0.0, 0.0, 0.0);
  glViewport(0, 0, (GLsizei) w, (GLsizei) h);
  glMatrixMode(GL_PROJECTION);
  glLoadIdentity();

  glFrustum(0.0, 1.0,  1.0, 0.0,   1.0,   100.0);
  gluLookAt(0.0,0.0,0.0,  0.0, 0.0,  -1.0,   0.0, 1.0, 0.0);

  glMatrixMode(GL_MODELVIEW);

  glLoadIdentity();
  glutPostRedisplay();

imageWinWidth = w;
imageWinHeight = h;
}


void myIdle(void)
{
  glutPostRedisplay();
}

void keyboard (unsigned char key, int x, int y)
{
   switch (key) {
      case 27:
         exit(0);
         break;
      default:
         break;
   }
}

void MouseFunc( int button, int state, int x, int y)
{
  switch(button) {
    case GLUT_LEFT_BUTTON :
      break;
    case GLUT_RIGHT_BUTTON :
      break;
  }
}

unsigned int texture(0);

void render_redirect(void)
{
TimeCounter wholeframe;

  // draw a scene.  the results are being 
  // written into the associated texture,'tex'
  glClearColor(0.0, 0.0, 1.0, 1.0);
  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
  glColor4f( 1.0, 1.0, 1.0, 1.0);
  glLineWidth(5.0);
  glBegin(GL_LINES);
    glColor4f( 1.0, 1.0, 1.0, 1.0);
    glVertex3f( 0.0, 0.0, -1.0);
    glVertex3f( 1.0, 1.0, -1.0);
  glEnd();

	if(!texture)
	{
		glGenTextures(1, &texture);
		glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture);
		glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
		glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
	}
	glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture);
	glEnable(GL_TEXTURE_RECTANGLE_ARB);
glFinish();
TimeCounter copytime;
	glCopyTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB, 0, 0, imageWinWidth, imageWinHeight, 0);
	

	float fS(imageWinWidth), fT(imageWinHeight);
	float fW(1), fH(1);
	glBegin(GL_QUADS);
		glTexCoord2f(0,0);
		glVertex3f(0,0,-1);
		glTexCoord2f(fS,0);
		glVertex3f(fW,0,-1);
		glTexCoord2f(fS,fT);
		glVertex3f(fW,fH,-1);
		glTexCoord2f(0,fT);
		glVertex3f(0,fH,-1);
	glEnd();
glFinish();
copytime.Tick();

	glDisable(GL_TEXTURE_RECTANGLE_ARB);

wholeframe.Tick();
printf("CopyTime: %f = %fMB/s, Frame: %f
", copytime.GetTimeStep(), float(imageWinWidth*imageWinHeight*3)/1048576.0f*1.0f/copytime.GetTimeStep(), wholeframe.GetTimeStep());

glutSwapBuffers();
}

int _tmain(int argc, _TCHAR* argv[])
{

   glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGBA | GLUT_DEPTH);
   glutInitWindowSize(imageWinWidth, imageWinHeight);
   glutCreateWindow("copytest");


   glutDisplayFunc(render_redirect);
   glutIdleFunc(myIdle);
   glutReshapeFunc(reshape);
   glutKeyboardFunc(keyboard);
   glutMouseFunc(MouseFunc);
   glutMainLoop();
   return 0;
}

And heres TimeCounter’s source

class TimeCounter
{
public:
	TimeCounter();

	void Reset();
	void Tick();

	float GetTotalTime();
	float GetTimeStep();

protected:

// time tracking variables
	float m_Frequency, m_LastStep;
	__int64 m_Start, m_StartFrame, m_Last;
};





#include "stdafx.h"
#include "windows.h"
#include "TimeCounter.h"

#pragma warning(disable : 4244) // disable __int64 conversion warning



TimeCounter::TimeCounter() :	m_Frequency(0), m_LastStep(0), m_Start(0),
								m_StartFrame(0), m_Last(0)
{
	// get the timer frequency
	__int64 frequency;

	if(!QueryPerformanceFrequency( (LARGE_INTEGER*) &frequency))
		return;

	// assign it to the float version, frequency will fit in a float easily!
	m_Frequency = frequency;

	// reset ourself
	Reset();
}

void TimeCounter::Reset()
{
	// get the "last" time
//	QueryPerformanceCounter( (LARGE_INTEGER*) &m_Last);

	// we got the frequency, get a start time
	QueryPerformanceCounter((LARGE_INTEGER*) &m_Start);
	m_LastStep = m_Last = m_StartFrame = m_Start;
}

void TimeCounter::Tick()
{
	// get the time
	QueryPerformanceCounter( (LARGE_INTEGER*) &m_Last);

	// Store the difference between the 2 values
	m_LastStep = (float)(m_Last-m_StartFrame)/m_Frequency;

	// save the current time as the start of the next time frame.
	m_StartFrame = m_Last;
}


float TimeCounter::GetTotalTime()
{
	// return the time so the app start is 0...
	return (float) ((m_Last-m_Start)/m_Frequency);
}

float TimeCounter::GetTimeStep()
{
	return m_LastStep;
}

Others, post your results here, with hardware & software configuration.

Heres the source and binary zipped up

And heres glut!

Note: My results above might be wrong, so feel free to correct me :slight_smile:

Before I’ll give your code a try I can second your results. For me glCopyTexImage2D has been blazingly fast (unnoticable really) on the last few NVidia generation cards. Don’t know about ATIs.

And for comparison, some results from friends machines:

Athlon 3200+ @ 2.5ghz (~4100+)
Geforce 6800GT 256mb at 450core/1.12ghz mem:

 CopyTime: 0.000060 = 3136.283545MB/s, Frame: 0.001035 
 CopyTime: 0.000063 = 2982.954177MB/s, Frame: 0.001048 
 CopyTime: 0.000077 = 2440.598975MB/s, Frame: 0.001059 
 CopyTime: 0.000062 = 3036.944442MB/s, Frame: 0.000896 
 CopyTime: 0.000062 = 3036.944442MB/s, Frame: 0.001030 
 CopyTime: 0.000059 = 3165.871203MB/s, Frame: 0.000973 
 CopyTime: 0.000060 = 3121.696236MB/s, Frame: 0.001033 
 CopyTime: 0.000060 = 3151.008016MB/s, Frame: 0.001037 
 CopyTime: 0.000059 = 3180.875274MB/s, Frame: 0.001038 
 CopyTime: 0.000057 = 3273.974106MB/s, Frame: 0.001032 
 CopyTime: 0.000061 = 3050.748502MB/s, Frame: 0.001048 
 CopyTime: 0.000060 = 3136.283545MB/s, Frame: 0.001037 
 CopyTime: 0.000059 = 3180.875274MB/s, Frame: 0.001034 
 CopyTime: 0.000062 = 3023.264387MB/s, Frame: 0.001036 
 CopyTime: 0.000061 = 3092.924767MB/s, Frame: 0.001028 
 CopyTime: 0.000059 = 3165.871203MB/s, Frame: 0.001034 
 CopyTime: 0.000059 = 3196.022240MB/s, Frame: 0.001043 
 CopyTime: 0.000058 = 3258.081089MB/s, Frame: 0.001033 
 CopyTime: 0.000059 = 3165.871203MB/s, Frame: 0.001037 
 CopyTime: 0.000060 = 3151.008016MB/s, Frame: 0.001035 
 CopyTime: 0.000064 = 2930.850039MB/s, Frame: 0.001036

P4 2.4 @ 2.7
Radeon 9700 non-pro

CopyTime: 0.002092 = 1710.361292MB/s, Frame: 0.006626  
CopyTime: 0.002120 = 1687.598595MB/s, Frame: 0.006781  
CopyTime: 0.002077 = 1723.017092MB/s, Frame: 0.006611  
CopyTime: 0.002099 = 1704.670067MB/s, Frame: 0.006598  
CopyTime: 0.002082 = 1718.623887MB/s, Frame: 0.006698  
CopyTime: 0.002051 = 1744.372805MB/s, Frame: 0.006606  
CopyTime: 0.002056 = 1740.579625MB/s, Frame: 0.006615  
CopyTime: 0.002068 = 1730.233033MB/s, Frame: 0.006682  
CopyTime: 0.002071 = 1727.432620MB/s, Frame: 0.006641  
CopyTime: 0.002064 = 1733.511940MB/s, Frame: 0.006643  
CopyTime: 0.002089 = 1713.106585MB/s, Frame: 0.006668  
CopyTime: 0.002065 = 1732.808230MB/s, Frame: 0.006597  
CopyTime: 0.002036 = 1757.056583MB/s, Frame: 0.006621  
CopyTime: 0.002064 = 1733.746442MB/s, Frame: 0.006620  
CopyTime: 0.002040 = 1753.928482MB/s, Frame: 0.006628  
CopyTime: 0.002076 = 1723.712677MB/s, Frame: 0.006672  
CopyTime: 0.002071 = 1727.199629MB/s, Frame: 0.006589  
CopyTime: 0.002032 = 1761.164177MB/s, Frame: 0.006618  
CopyTime: 0.002068 = 1729.999481MB/s, Frame: 0.006564  
CopyTime: 0.002110 = 1695.866761MB/s, Frame: 0.006653  
CopyTime: 0.002081 = 1719.085315MB/s, Frame: 0.006639  
CopyTime: 0.002091 = 1710.818293MB/s, Frame: 0.006679  
CopyTime: 0.002069 = 1729.065124MB/s, Frame: 0.006591  
CopyTime: 0.002087 = 1714.712303MB/s, Frame: 0.006627

Edit: oops someone had vsync on :slight_smile:

P4 3.0 GHz GeForceFX5900:

CopyTime: 0.000169 = 1106.895655MB/s, Frame: 0.000277
CopyTime: 0.000179 = 1047.929828MB/s, Frame: 0.000275
CopyTime: 0.000161 = 1165.319527MB/s, Frame: 0.000283
CopyTime: 0.000165 = 1134.351351MB/s, Frame: 0.000242
CopyTime: 0.000161 = 1165.223207MB/s, Frame: 0.000266
CopyTime: 0.000154 = 1214.847128MB/s, Frame: 0.000239
CopyTime: 0.000137 = 1367.099214MB/s, Frame: 0.000204
CopyTime: 0.000138 = 1355.711660MB/s, Frame: 0.000210
CopyTime: 0.000152 = 1236.981060MB/s, Frame: 0.000224
CopyTime: 0.000137 = 1365.152802MB/s, Frame: 0.000207

Dual Opteron 2.2 GHz GeForce6800:

CopyTime: 0.000088 = 2142.836372MB/s, Frame: 0.000153
CopyTime: 0.000087 = 2152.653784MB/s, Frame: 0.000342
CopyTime: 0.000093 = 2011.445562MB/s, Frame: 0.000168
CopyTime: 0.000087 = 2145.721117MB/s, Frame: 0.000149
CopyTime: 0.000086 = 2170.251028MB/s, Frame: 0.000179
CopyTime: 0.000090 = 2079.170257MB/s, Frame: 0.000231
CopyTime: 0.000086 = 2188.496808MB/s, Frame: 0.000171
CopyTime: 0.000088 = 2125.341136MB/s, Frame: 0.000353
CopyTime: 0.000082 = 2273.051161MB/s, Frame: 0.000183
CopyTime: 0.000103 = 1823.220921MB/s, Frame: 0.000202

Dual Opteron 2.2 GHz GeForce6800:
(no AA, no filtering, low quality)

CopyTime: 0.000059 = 3162.483138MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3167.462546MB/s, Frame: 0.000113
CopyTime: 0.000061 = 3079.573005MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3181.175663MB/s, Frame: 0.000113
CopyTime: 0.000059 = 3180.736289MB/s, Frame: 0.000113
CopyTime: 0.000059 = 3187.096253MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3179.223361MB/s, Frame: 0.000113
CopyTime: 0.000061 = 3089.283524MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3180.272507MB/s, Frame: 0.000113
CopyTime: 0.000059 = 3182.886018MB/s, Frame: 0.000113

Thanks def!

So it looks like I’m not crazy after all :slight_smile: These cards are really fast doing copys back! Like I said in the other thread, I assume its mainly because this can be a video mem->video mem copy, the driver isnt forced to copy back to system ram!

Interesting to see that older cards arent all that much slower, around about 1/2 the speed.

Keep 'em coming :smiley: Would be good to see how mid range cards are doing too, say 6600’s or 5700’s!

Interesting to see that older cards arent all that much slower

Oh no? :slight_smile:

ATI 9250 (128-bit):
RGB texture:
256x256 roughly 220MB/s
512x512 roughly 370MB/s

RGBA texture:
256x256 roughly 290MB/s
512x512 roughly 465MB/s

To reduce transaction overhead and more measure the actual copy speed, I looped CopyTexImage2D 40 times, and for 256x256 I got fluctuations 544-609MB/s. Copying to an RGBA texture it climbed to 745-815MB/s.

Coming to think of it, 815MB/s perhaps isn’t that shabby for such an old card.

this is copytexsubimage you’re talking about I assume?
also, how do these figures compare to fbo?

Pentium M 1.7GHz FX5200 :rolleyes:
~360 MB/s

Changing the glCopyTexImage2D call to

 glCopyTexSubImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, 0, 0, 0, 0, imageWinWidth, imageWinHeight); 

and adding

 glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB, imageWinWidth, imageWinHeight, 0, GL_RGB, GL_UNSIGNED_BYTE, 0); 

when initializing the texture gives the same results!
Using the sub-versions of glTexImage2D and glCopyTexImage2D used to be faster, but now the results (at least in this test) are equal!

kon

A paper I just finished might be of interest here:

fbo.pdf

Graphics cards have finally made the jump to ludicrous speed.

Here’s my X800 Pro (256MB) on an Athlon XP 2500+ with 1GB RAM.

CopyTime: 0.000135 = 1383.844700MB/s, Frame: 0.000282
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000280
CopyTime: 0.000135 = 1389.574958MB/s, Frame: 0.000281
CopyTime: 0.000136 = 1375.337527MB/s, Frame: 0.000284
CopyTime: 0.000136 = 1378.161508MB/s, Frame: 0.000284
CopyTime: 0.000136 = 1380.997257MB/s, Frame: 0.000279
CopyTime: 0.000135 = 1392.457920MB/s, Frame: 0.000281
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000282
CopyTime: 0.000135 = 1386.703909MB/s, Frame: 0.000282
CopyTime: 0.000155 = 1209.305698MB/s, Frame: 0.000305
CopyTime: 0.000136 = 1378.161508MB/s, Frame: 0.000283
CopyTime: 0.000139 = 1350.431936MB/s, Frame: 0.000286
CopyTime: 0.000143 = 1310.868550MB/s, Frame: 0.000290
CopyTime: 0.000135 = 1386.703909MB/s, Frame: 0.000282
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000273
CopyTime: 0.000136 = 1378.161508MB/s, Frame: 0.000282
CopyTime: 0.000140 = 1336.981436MB/s, Frame: 0.000287
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000281
CopyTime: 0.000136 = 1380.997257MB/s, Frame: 0.000282
CopyTime: 0.000135 = 1389.574958MB/s, Frame: 0.000244
CopyTime: 0.000136 = 1380.997257MB/s, Frame: 0.000283
CopyTime: 0.000139 = 1350.431936MB/s, Frame: 0.000300
CopyTime: 0.000135 = 1392.457920MB/s, Frame: 0.000265
CopyTime: 0.000135 = 1392.457920MB/s, Frame: 0.000281

Using the sub-versions of glTexImage2D and glCopyTexImage2D used to be faster, but now the results (at least in this test) are equal!
yeah ive noticed for a while (at least with nvidia) theres no difference between the two, though i suppose sub image is always a better bet

GALAXY 6600GT /AMD Barthon 2700+/RAM 1Gb

CopyTime: 0.001512 = 2362.034325MB/s, Frame: 0.013285
CopyTime: 0.001397 = 2556.193702MB/s, Frame: 0.009863
CopyTime: 0.001414 = 2525.883038MB/s, Frame: 0.011541
CopyTime: 0.001415 = 2523.389496MB/s, Frame: 0.011696
CopyTime: 0.001410 = 2531.887569MB/s, Frame: 0.011707
CopyTime: 0.001409 = 2533.393113MB/s, Frame: 0.011680
CopyTime: 0.001402 = 2547.024436MB/s, Frame: 0.011689
CopyTime: 0.001425 = 2505.089716MB/s, Frame: 0.011722
CopyTime: 0.001399 = 2553.129812MB/s, Frame: 0.011714
CopyTime: 0.001412 = 2528.381514MB/s, Frame: 0.011668
CopyTime: 0.001460 = 2445.182275MB/s, Frame: 0.011854
CopyTime: 0.001416 = 2521.398245MB/s, Frame: 0.011337
CopyTime: 0.001416 = 2521.895815MB/s, Frame: 0.011660
CopyTime: 0.001443 = 2475.013145MB/s, Frame: 0.011851
CopyTime: 0.001409 = 2534.397734MB/s, Frame: 0.011336
CopyTime: 0.001427 = 2502.637243MB/s, Frame: 0.011712
CopyTime: 0.001407 = 2537.416804MB/s, Frame: 0.011522
CopyTime: 0.001399 = 2551.600725MB/s, Frame: 0.011663
CopyTime: 0.001409 = 2533.895429MB/s, Frame: 0.011733
CopyTime: 0.001499 = 2381.398862MB/s, Frame: 0.012312
CopyTime: 0.001398 = 2554.150363MB/s, Frame: 0.010848
CopyTime: 0.001407 = 2537.920507MB/s, Frame: 0.011721
CopyTime: 0.001409 = 2534.900449MB/s, Frame: 0.011677
CopyTime: 0.001444 = 2472.619180MB/s, Frame: 0.011814
CopyTime: 0.001411 = 2530.383814MB/s, Frame: 0.011411
CopyTime: 0.001410 = 2531.887569MB/s, Frame: 0.011666
CopyTime: 0.001405 = 2540.442865MB/s, Frame: 0.011572
CopyTime: 0.001407 = 2538.424620MB/s, Frame: 0.011703
CopyTime: 0.001454 = 2456.461328MB/s, Frame: 0.011815
CopyTime: 0.001402 = 2547.531961MB/s, Frame: 0.011351
CopyTime: 0.001408 = 2536.409579MB/s, Frame: 0.011719
CopyTime: 0.001397 = 2556.193702MB/s, Frame: 0.011693
CopyTime: 0.001541 = 2316.651787MB/s, Frame: 0.013247
CopyTime: 0.001884 = 1894.880356MB/s, Frame: 0.010349
CopyTime: 0.001415 = 2523.389496MB/s, Frame: 0.011204
CopyTime: 0.001421 = 2513.464776MB/s, Frame: 0.011783
CopyTime: 0.001407 = 2536.913091MB/s, Frame: 0.011590
CopyTime: 0.001498 = 2384.064227MB/s, Frame: 0.013821
CopyTime: 0.001412 = 2528.381514MB/s, Frame: 0.009370
CopyTime: 0.001412 = 2527.881382MB/s, Frame: 0.011731
CopyTime: 0.001447 = 2467.844719MB/s, Frame: 0.011755

Geforce 6600 128 AGP 4x P4 2,23

CopyTime: 0.000222 = 845.295557MB/s, Frame: 0.011567
CopyTime: 0.000225 = 831.678691MB/s, Frame: 0.011567
CopyTime: 0.000217 = 862.679574MB/s, Frame: 0.011534
CopyTime: 0.000230 = 815.509926MB/s, Frame: 0.011600
CopyTime: 0.000221 = 847.430162MB/s, Frame: 0.011541
CopyTime: 0.000210 = 894.886274MB/s, Frame: 0.011554
CopyTime: 0.000234 = 802.828588MB/s, Frame: 0.011548
CopyTime: 0.000232 = 808.632171MB/s, Frame: 0.011536
CopyTime: 0.000221 = 848.501512MB/s, Frame: 0.011529
CopyTime: 0.000213 = 878.487806MB/s, Frame: 0.011556
CopyTime: 0.000230 = 815.509926MB/s, Frame: 0.011597
CopyTime: 0.000227 = 824.526651MB/s, Frame: 0.011541
CopyTime: 0.000205 = 914.393320MB/s, Frame: 0.011520
CopyTime: 0.000228 = 822.505734MB/s, Frame: 0.011632
CopyTime: 0.000276 = 679.316507MB/s, Frame: 0.011621

Geforce fx 5600 AGP x4, approximately 300 Mb/sec, but i do think (for a lot of other reasons) that my system is messed up.

knackered wrote:
this is copytexsubimage you’re talking about I assume?
also, how do these figures compare to fbo?

I have to ask, was it my post you referred to, or something else? I never really got the context of your questions.

If it was a follow-up to my post; no, it was the copyteximage as in the code presented by ChiefWiggum. I did however consider extending and modifying this test a bit to get hopefully more interesting results (copy* bandwidth, texture creation+upload speed, latencies, …).

ChiefWiggum, I hope you don’t mind if I were to extend it a bit to test more areas?

knackered, if you have some simple FBO code in one or more scenarios to add, feel free to pitch in. :slight_smile:

eh? I wasn’t directing my question at anyone in particular, just throwing it into the thread.
You can do the work of testing it against fbo, I don’t care. These kind of threads aren’t that interesting to me that I would do any coding/ftp’ing. Believe you and me, if the speed of copyteximage was in my top 10 of concerns I’d be a happy man.
Not very constructive, I know.

tamlin: no of course not, why would I mind!

Infact I was thinking of doing the same thing. There are so many “myths” going around about what never to do, and whats slow. Some were started from GDC presentations like 7 years ago, hardly applicable these days I’d say! It’d be worth writing an app that benchmarks everything, setup speeds, operations such as changing shaders, certain states (that are considered heavy weight changes), FBO speeds (though that article about FBO that someone linked a bit further up is really good for that)…

I might pitch in when I find the time. Would be good to write it as a sort of demo, instead of it pumping out values it could average them over say 500 frames or something.

Such benchmarking apps have been done time and time again. Whatever you do would merely be an update adding new extensions.
Dismissing all performance advice given over the last 6 years as myths merely because you’ve discovered that glCopyTexImage is now as fast as glCopyTexSubImage is foolish. Frankly, it just means that the driver now checks to see if the new format/dimensions are the same as the last glCopyTexImage call on that texture object, and if they are then just replace the data rather than a destroy/create cycle. This is more indicative of a flurry of badly written apps using glCopyTexImage, and improved CPU speeds that the driver now does that check. It will also certainly be implementation dependent, so you should always still use glCopyTexSubImage when replacing the pixel data of a texture - because this removes any ambiguity (I’m replacing just the contents of the texture).

knackered:

I never said go and reject what was said for the past seven years, but it might not be a bad idea to test these things out, and see how much of a speed difference they DO make relative to other things. I myself am using copytexsubimage in my app now, but was curious to see the speeds you can get with copyteximage too.

Either way the weight of the argument against using copyteximage AND copytexsubimage is a little high when you’re dealing with post processing effects. Everyones struggling to get PBuffers or FBO working and draw into that, to save that extra bit of time, but then they think ok how the hell do I antialias this sucker… The whole reason I came up with this thread was that in my other thread I started testing the speeds of copyteximage (at first) because I figured it simply CANT be slower then doing your own supersampling into a PBuffer.

So all I’m saying is its great to take the advice in too, but it doesnt hurt to check if it particularly applies to you :slight_smile: