PDA

View Full Version : Speed of glCopyTexImage2D... Brace yourself :)



ChiefWiggum
09-19-2005, 10:32 PM
Well, as prompted by my other thread about FBO and AA, I've started to more seriously investigate the speed of glCopyTexImage2D. I've modified some code I found on the net, and put some timing in.

I'm hoping to verify that my results are right, and to gather some stats from other peoples machines.

My machine is:
P4 2.4B (533mhz FSB)
1gb ram
Asus Geforce 6800GT 128mb (so slower ram then a full GT)

Software:
Windows XP SP2
Nvidia 78.01 drivers
VSYNC, AA and AF are all OFF

And if I run the program below in 1280x1024 (maximized, not fullscreen, havent tried that) I get the following:


CopyTime: 0.001271 = 2815.384809MB/s, Frame: 0.003100
CopyTime: 0.001290 = 2773.318703MB/s, Frame: 0.003103
CopyTime: 0.001266 = 2827.193310MB/s, Frame: 0.003038
CopyTime: 0.001271 = 2814.765973MB/s, Frame: 0.003051
CopyTime: 0.001299 = 2754.233572MB/s, Frame: 0.003636
CopyTime: 0.001271 = 2814.765973MB/s, Frame: 0.003053
CopyTime: 0.001280 = 2794.498191MB/s, Frame: 0.003076
CopyTime: 0.001288 = 2778.734027MB/s, Frame: 0.003117
CopyTime: 0.001287 = 2780.544089MB/s, Frame: 0.003077
CopyTime: 0.001267 = 2823.453731MB/s, Frame: 0.003038
CopyTime: 0.001288 = 2778.734027MB/s, Frame: 0.003076
CopyTime: 0.001281 = 2792.670176MB/s, Frame: 0.003089
CopyTime: 0.002794 = 1280.334444MB/s, Frame: 0.004620
CopyTime: 0.001282 = 2791.452911MB/s, Frame: 0.003012
CopyTime: 0.001266 = 2826.569272MB/s, Frame: 0.003053
CopyTime: 0.001276 = 2804.288509MB/s, Frame: 0.003073
CopyTime: 0.001284 = 2787.200450MB/s, Frame: 0.003108
CopyTime: 0.001286 = 2781.147716MB/s, Frame: 0.003095
CopyTime: 0.001282 = 2791.452911MB/s, Frame: 0.003247
CopyTime: 0.001262 = 2834.075168MB/s, Frame: 0.002991
CopyTime: 0.001279 = 2798.161666MB/s, Frame: 0.003062
CopyTime: 0.001280 = 2795.718368MB/s, Frame: 0.003100
CopyTime: 0.001281 = 2792.061284MB/s, Frame: 0.003066
CopyTime: 0.001284 = 2786.593941MB/s, Frame: 0.003089Heres the source code:


// console.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"



#include <GL/glut.h>
#include <GL/glext.h>
#include <stdio.h>
#include <assert.h>
#include "TimeCounter.h"



int imageWinWidth = 256;
int imageWinHeight = 256;

void reshape(int w, int h)
{
glClearColor (0.0, 0.0, 0.0, 0.0);
glViewport(0, 0, (GLsizei) w, (GLsizei) h);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();

glFrustum(0.0, 1.0, 1.0, 0.0, 1.0, 100.0);
gluLookAt(0.0,0.0,0.0, 0.0, 0.0, -1.0, 0.0, 1.0, 0.0);

glMatrixMode(GL_MODELVIEW);

glLoadIdentity();
glutPostRedisplay();

imageWinWidth = w;
imageWinHeight = h;
}


void myIdle(void)
{
glutPostRedisplay();
}

void keyboard (unsigned char key, int x, int y)
{
switch (key) {
case 27:
exit(0);
break;
default:
break;
}
}

void MouseFunc( int button, int state, int x, int y)
{
switch(button) {
case GLUT_LEFT_BUTTON :
break;
case GLUT_RIGHT_BUTTON :
break;
}
}

unsigned int texture(0);

void render_redirect(void)
{
TimeCounter wholeframe;

// draw a scene. the results are being
// written into the associated texture,'tex'
glClearColor(0.0, 0.0, 1.0, 1.0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glColor4f( 1.0, 1.0, 1.0, 1.0);
glLineWidth(5.0);
glBegin(GL_LINES);
glColor4f( 1.0, 1.0, 1.0, 1.0);
glVertex3f( 0.0, 0.0, -1.0);
glVertex3f( 1.0, 1.0, -1.0);
glEnd();

if(!texture)
{
glGenTextures(1, &amp;texture);
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
}
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texture);
glEnable(GL_TEXTURE_RECTANGLE_ARB);
glFinish();
TimeCounter copytime;
glCopyTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB, 0, 0, imageWinWidth, imageWinHeight, 0);


float fS(imageWinWidth), fT(imageWinHeight);
float fW(1), fH(1);
glBegin(GL_QUADS);
glTexCoord2f(0,0);
glVertex3f(0,0,-1);
glTexCoord2f(fS,0);
glVertex3f(fW,0,-1);
glTexCoord2f(fS,fT);
glVertex3f(fW,fH,-1);
glTexCoord2f(0,fT);
glVertex3f(0,fH,-1);
glEnd();
glFinish();
copytime.Tick();

glDisable(GL_TEXTURE_RECTANGLE_ARB);

wholeframe.Tick();
printf("CopyTime: %f = %fMB/s, Frame: %f\n", copytime.GetTimeStep(), float(imageWinWidth*imageWinHeight*3)/1048576.0f*1.0f/copytime.GetTimeStep(), wholeframe.GetTimeStep());

glutSwapBuffers();
}

int _tmain(int argc, _TCHAR* argv[])
{

glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGBA | GLUT_DEPTH);
glutInitWindowSize(imageWinWidth, imageWinHeight);
glutCreateWindow("copytest");


glutDisplayFunc(render_redirect);
glutIdleFunc(myIdle);
glutReshapeFunc(reshape);
glutKeyboardFunc(keyboard);
glutMouseFunc(MouseFunc);
glutMainLoop();
return 0;
}And heres TimeCounter's source

class TimeCounter
{
public:
TimeCounter();

void Reset();
void Tick();

float GetTotalTime();
float GetTimeStep();

protected:

// time tracking variables
float m_Frequency, m_LastStep;
__int64 m_Start, m_StartFrame, m_Last;
};





#include "stdafx.h"
#include "windows.h"
#include "TimeCounter.h"

#pragma warning(disable : 4244) // disable __int64 conversion warning



TimeCounter::TimeCounter() : m_Frequency(0), m_LastStep(0), m_Start(0),
m_StartFrame(0), m_Last(0)
{
// get the timer frequency
__int64 frequency;

if(!QueryPerformanceFrequency( (LARGE_INTEGER*) &amp;frequency))
return;

// assign it to the float version, frequency will fit in a float easily!
m_Frequency = frequency;

// reset ourself
Reset();
}

void TimeCounter::Reset()
{
// get the "last" time
// QueryPerformanceCounter( (LARGE_INTEGER*) &amp;m_Last);

// we got the frequency, get a start time
QueryPerformanceCounter((LARGE_INTEGER*) &amp;m_Start);
m_LastStep = m_Last = m_StartFrame = m_Start;
}

void TimeCounter::Tick()
{
// get the time
QueryPerformanceCounter( (LARGE_INTEGER*) &amp;m_Last);

// Store the difference between the 2 values
m_LastStep = (float)(m_Last-m_StartFrame)/m_Frequency;

// save the current time as the start of the next time frame.
m_StartFrame = m_Last;
}


float TimeCounter::GetTotalTime()
{
// return the time so the app start is 0...
return (float) ((m_Last-m_Start)/m_Frequency);
}

float TimeCounter::GetTimeStep()
{
return m_LastStep;
}Others, post your results here, with hardware & software configuration.

Heres the source and binary zipped up (http://users.on.net/~pixelated/console.zip)

And heres glut! (http://www.xmission.com/~nate/glut.html)

Note: My results above might be wrong, so feel free to correct me :)

def
09-19-2005, 10:52 PM
Before I'll give your code a try I can second your results. For me glCopyTexImage2D has been blazingly fast (unnoticable really) on the last few NVidia generation cards. Don't know about ATIs.

ChiefWiggum
09-19-2005, 10:56 PM
And for comparison, some results from friends machines:

Athlon 3200+ @ 2.5ghz (~4100+)
Geforce 6800GT 256mb at 450core/1.12ghz mem:


CopyTime: 0.000060 = 3136.283545MB/s, Frame: 0.001035
CopyTime: 0.000063 = 2982.954177MB/s, Frame: 0.001048
CopyTime: 0.000077 = 2440.598975MB/s, Frame: 0.001059
CopyTime: 0.000062 = 3036.944442MB/s, Frame: 0.000896
CopyTime: 0.000062 = 3036.944442MB/s, Frame: 0.001030
CopyTime: 0.000059 = 3165.871203MB/s, Frame: 0.000973
CopyTime: 0.000060 = 3121.696236MB/s, Frame: 0.001033
CopyTime: 0.000060 = 3151.008016MB/s, Frame: 0.001037
CopyTime: 0.000059 = 3180.875274MB/s, Frame: 0.001038
CopyTime: 0.000057 = 3273.974106MB/s, Frame: 0.001032
CopyTime: 0.000061 = 3050.748502MB/s, Frame: 0.001048
CopyTime: 0.000060 = 3136.283545MB/s, Frame: 0.001037
CopyTime: 0.000059 = 3180.875274MB/s, Frame: 0.001034
CopyTime: 0.000062 = 3023.264387MB/s, Frame: 0.001036
CopyTime: 0.000061 = 3092.924767MB/s, Frame: 0.001028
CopyTime: 0.000059 = 3165.871203MB/s, Frame: 0.001034
CopyTime: 0.000059 = 3196.022240MB/s, Frame: 0.001043
CopyTime: 0.000058 = 3258.081089MB/s, Frame: 0.001033
CopyTime: 0.000059 = 3165.871203MB/s, Frame: 0.001037
CopyTime: 0.000060 = 3151.008016MB/s, Frame: 0.001035
CopyTime: 0.000064 = 2930.850039MB/s, Frame: 0.001036P4 2.4 @ 2.7
Radeon 9700 non-pro


CopyTime: 0.002092 = 1710.361292MB/s, Frame: 0.006626
CopyTime: 0.002120 = 1687.598595MB/s, Frame: 0.006781
CopyTime: 0.002077 = 1723.017092MB/s, Frame: 0.006611
CopyTime: 0.002099 = 1704.670067MB/s, Frame: 0.006598
CopyTime: 0.002082 = 1718.623887MB/s, Frame: 0.006698
CopyTime: 0.002051 = 1744.372805MB/s, Frame: 0.006606
CopyTime: 0.002056 = 1740.579625MB/s, Frame: 0.006615
CopyTime: 0.002068 = 1730.233033MB/s, Frame: 0.006682
CopyTime: 0.002071 = 1727.432620MB/s, Frame: 0.006641
CopyTime: 0.002064 = 1733.511940MB/s, Frame: 0.006643
CopyTime: 0.002089 = 1713.106585MB/s, Frame: 0.006668
CopyTime: 0.002065 = 1732.808230MB/s, Frame: 0.006597
CopyTime: 0.002036 = 1757.056583MB/s, Frame: 0.006621
CopyTime: 0.002064 = 1733.746442MB/s, Frame: 0.006620
CopyTime: 0.002040 = 1753.928482MB/s, Frame: 0.006628
CopyTime: 0.002076 = 1723.712677MB/s, Frame: 0.006672
CopyTime: 0.002071 = 1727.199629MB/s, Frame: 0.006589
CopyTime: 0.002032 = 1761.164177MB/s, Frame: 0.006618
CopyTime: 0.002068 = 1729.999481MB/s, Frame: 0.006564
CopyTime: 0.002110 = 1695.866761MB/s, Frame: 0.006653
CopyTime: 0.002081 = 1719.085315MB/s, Frame: 0.006639
CopyTime: 0.002091 = 1710.818293MB/s, Frame: 0.006679
CopyTime: 0.002069 = 1729.065124MB/s, Frame: 0.006591
CopyTime: 0.002087 = 1714.712303MB/s, Frame: 0.006627Edit: oops someone had vsync on :)

def
09-19-2005, 11:08 PM
P4 3.0 GHz GeForceFX5900:
-------------------------
CopyTime: 0.000169 = 1106.895655MB/s, Frame: 0.000277
CopyTime: 0.000179 = 1047.929828MB/s, Frame: 0.000275
CopyTime: 0.000161 = 1165.319527MB/s, Frame: 0.000283
CopyTime: 0.000165 = 1134.351351MB/s, Frame: 0.000242
CopyTime: 0.000161 = 1165.223207MB/s, Frame: 0.000266
CopyTime: 0.000154 = 1214.847128MB/s, Frame: 0.000239
CopyTime: 0.000137 = 1367.099214MB/s, Frame: 0.000204
CopyTime: 0.000138 = 1355.711660MB/s, Frame: 0.000210
CopyTime: 0.000152 = 1236.981060MB/s, Frame: 0.000224
CopyTime: 0.000137 = 1365.152802MB/s, Frame: 0.000207

Dual Opteron 2.2 GHz GeForce6800:
---------------------------------
CopyTime: 0.000088 = 2142.836372MB/s, Frame: 0.000153
CopyTime: 0.000087 = 2152.653784MB/s, Frame: 0.000342
CopyTime: 0.000093 = 2011.445562MB/s, Frame: 0.000168
CopyTime: 0.000087 = 2145.721117MB/s, Frame: 0.000149
CopyTime: 0.000086 = 2170.251028MB/s, Frame: 0.000179
CopyTime: 0.000090 = 2079.170257MB/s, Frame: 0.000231
CopyTime: 0.000086 = 2188.496808MB/s, Frame: 0.000171
CopyTime: 0.000088 = 2125.341136MB/s, Frame: 0.000353
CopyTime: 0.000082 = 2273.051161MB/s, Frame: 0.000183
CopyTime: 0.000103 = 1823.220921MB/s, Frame: 0.000202

Dual Opteron 2.2 GHz GeForce6800:
(no AA, no filtering, low quality)
----------------------------------
CopyTime: 0.000059 = 3162.483138MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3167.462546MB/s, Frame: 0.000113
CopyTime: 0.000061 = 3079.573005MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3181.175663MB/s, Frame: 0.000113
CopyTime: 0.000059 = 3180.736289MB/s, Frame: 0.000113
CopyTime: 0.000059 = 3187.096253MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3179.223361MB/s, Frame: 0.000113
CopyTime: 0.000061 = 3089.283524MB/s, Frame: 0.000115
CopyTime: 0.000059 = 3180.272507MB/s, Frame: 0.000113
CopyTime: 0.000059 = 3182.886018MB/s, Frame: 0.000113

ChiefWiggum
09-19-2005, 11:27 PM
Thanks def!

So it looks like I'm not crazy after all :) These cards are really fast doing copys back! Like I said in the other thread, I assume its mainly because this can be a video mem->video mem copy, the driver isnt forced to copy back to system ram!

Interesting to see that older cards arent all that much slower, around about 1/2 the speed.

Keep 'em coming :D Would be good to see how mid range cards are doing too, say 6600's or 5700's!

tamlin
09-20-2005, 12:50 AM
Interesting to see that older cards arent all that much slower

Oh no? :)

ATI 9250 (128-bit):
RGB texture:
256x256 roughly 220MB/s
512x512 roughly 370MB/s

RGBA texture:
256x256 roughly 290MB/s
512x512 roughly 465MB/s

To reduce transaction overhead and more measure the actual copy speed, I looped CopyTexImage2D 40 times, and for 256x256 I got fluctuations 544-609MB/s. Copying to an RGBA texture it climbed to 745-815MB/s.

Coming to think of it, 815MB/s perhaps isn't that shabby for such an old card.

knackered
09-20-2005, 03:02 AM
this is copytexsubimage you're talking about I assume?
also, how do these figures compare to fbo?

kon
09-20-2005, 03:04 AM
Pentium M 1.7GHz FX5200 :rolleyes:
~360 MB/s

Changing the glCopyTexImage2D call to

glCopyTexSubImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, 0, 0, 0, 0, imageWinWidth, imageWinHeight); and adding
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB, imageWinWidth, imageWinHeight, 0, GL_RGB, GL_UNSIGNED_BYTE, 0); when initializing the texture gives the same results!
Using the sub-versions of glTexImage2D and glCopyTexImage2D used to be faster, but now the results (at least in this test) are equal!

kon

Jay Cornwall
09-20-2005, 05:28 AM
A paper I just finished might be of interest here:

fbo.pdf (http://www.esuna.co.uk/~jay/papers/fbo.pdf)

dorbie
09-20-2005, 10:00 AM
Graphics cards have finally made the jump to ludicrous speed.

NitroGL
09-20-2005, 06:44 PM
Here's my X800 Pro (256MB) on an Athlon XP 2500+ with 1GB RAM.

CopyTime: 0.000135 = 1383.844700MB/s, Frame: 0.000282
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000280
CopyTime: 0.000135 = 1389.574958MB/s, Frame: 0.000281
CopyTime: 0.000136 = 1375.337527MB/s, Frame: 0.000284
CopyTime: 0.000136 = 1378.161508MB/s, Frame: 0.000284
CopyTime: 0.000136 = 1380.997257MB/s, Frame: 0.000279
CopyTime: 0.000135 = 1392.457920MB/s, Frame: 0.000281
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000282
CopyTime: 0.000135 = 1386.703909MB/s, Frame: 0.000282
CopyTime: 0.000155 = 1209.305698MB/s, Frame: 0.000305
CopyTime: 0.000136 = 1378.161508MB/s, Frame: 0.000283
CopyTime: 0.000139 = 1350.431936MB/s, Frame: 0.000286
CopyTime: 0.000143 = 1310.868550MB/s, Frame: 0.000290
CopyTime: 0.000135 = 1386.703909MB/s, Frame: 0.000282
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000273
CopyTime: 0.000136 = 1378.161508MB/s, Frame: 0.000282
CopyTime: 0.000140 = 1336.981436MB/s, Frame: 0.000287
CopyTime: 0.000134 = 1398.259730MB/s, Frame: 0.000281
CopyTime: 0.000136 = 1380.997257MB/s, Frame: 0.000282
CopyTime: 0.000135 = 1389.574958MB/s, Frame: 0.000244
CopyTime: 0.000136 = 1380.997257MB/s, Frame: 0.000283
CopyTime: 0.000139 = 1350.431936MB/s, Frame: 0.000300
CopyTime: 0.000135 = 1392.457920MB/s, Frame: 0.000265
CopyTime: 0.000135 = 1392.457920MB/s, Frame: 0.000281

zed
09-20-2005, 07:14 PM
Using the sub-versions of glTexImage2D and glCopyTexImage2D used to be faster, but now the results (at least in this test) are equal!yeah ive noticed for a while (at least with nvidia) theres no difference between the two, though i suppose sub image is always a better bet

Nikolai Timofeev
09-20-2005, 08:38 PM
GALAXY 6600GT /AMD Barthon 2700+/RAM 1Gb

CopyTime: 0.001512 = 2362.034325MB/s, Frame: 0.013285
CopyTime: 0.001397 = 2556.193702MB/s, Frame: 0.009863
CopyTime: 0.001414 = 2525.883038MB/s, Frame: 0.011541
CopyTime: 0.001415 = 2523.389496MB/s, Frame: 0.011696
CopyTime: 0.001410 = 2531.887569MB/s, Frame: 0.011707
CopyTime: 0.001409 = 2533.393113MB/s, Frame: 0.011680
CopyTime: 0.001402 = 2547.024436MB/s, Frame: 0.011689
CopyTime: 0.001425 = 2505.089716MB/s, Frame: 0.011722
CopyTime: 0.001399 = 2553.129812MB/s, Frame: 0.011714
CopyTime: 0.001412 = 2528.381514MB/s, Frame: 0.011668
CopyTime: 0.001460 = 2445.182275MB/s, Frame: 0.011854
CopyTime: 0.001416 = 2521.398245MB/s, Frame: 0.011337
CopyTime: 0.001416 = 2521.895815MB/s, Frame: 0.011660
CopyTime: 0.001443 = 2475.013145MB/s, Frame: 0.011851
CopyTime: 0.001409 = 2534.397734MB/s, Frame: 0.011336
CopyTime: 0.001427 = 2502.637243MB/s, Frame: 0.011712
CopyTime: 0.001407 = 2537.416804MB/s, Frame: 0.011522
CopyTime: 0.001399 = 2551.600725MB/s, Frame: 0.011663
CopyTime: 0.001409 = 2533.895429MB/s, Frame: 0.011733
CopyTime: 0.001499 = 2381.398862MB/s, Frame: 0.012312
CopyTime: 0.001398 = 2554.150363MB/s, Frame: 0.010848
CopyTime: 0.001407 = 2537.920507MB/s, Frame: 0.011721
CopyTime: 0.001409 = 2534.900449MB/s, Frame: 0.011677
CopyTime: 0.001444 = 2472.619180MB/s, Frame: 0.011814
CopyTime: 0.001411 = 2530.383814MB/s, Frame: 0.011411
CopyTime: 0.001410 = 2531.887569MB/s, Frame: 0.011666
CopyTime: 0.001405 = 2540.442865MB/s, Frame: 0.011572
CopyTime: 0.001407 = 2538.424620MB/s, Frame: 0.011703
CopyTime: 0.001454 = 2456.461328MB/s, Frame: 0.011815
CopyTime: 0.001402 = 2547.531961MB/s, Frame: 0.011351
CopyTime: 0.001408 = 2536.409579MB/s, Frame: 0.011719
CopyTime: 0.001397 = 2556.193702MB/s, Frame: 0.011693
CopyTime: 0.001541 = 2316.651787MB/s, Frame: 0.013247
CopyTime: 0.001884 = 1894.880356MB/s, Frame: 0.010349
CopyTime: 0.001415 = 2523.389496MB/s, Frame: 0.011204
CopyTime: 0.001421 = 2513.464776MB/s, Frame: 0.011783
CopyTime: 0.001407 = 2536.913091MB/s, Frame: 0.011590
CopyTime: 0.001498 = 2384.064227MB/s, Frame: 0.013821
CopyTime: 0.001412 = 2528.381514MB/s, Frame: 0.009370
CopyTime: 0.001412 = 2527.881382MB/s, Frame: 0.011731
CopyTime: 0.001447 = 2467.844719MB/s, Frame: 0.011755

CrazyButcher
09-21-2005, 07:56 AM
Geforce 6600 128 AGP 4x P4 2,23

CopyTime: 0.000222 = 845.295557MB/s, Frame: 0.011567
CopyTime: 0.000225 = 831.678691MB/s, Frame: 0.011567
CopyTime: 0.000217 = 862.679574MB/s, Frame: 0.011534
CopyTime: 0.000230 = 815.509926MB/s, Frame: 0.011600
CopyTime: 0.000221 = 847.430162MB/s, Frame: 0.011541
CopyTime: 0.000210 = 894.886274MB/s, Frame: 0.011554
CopyTime: 0.000234 = 802.828588MB/s, Frame: 0.011548
CopyTime: 0.000232 = 808.632171MB/s, Frame: 0.011536
CopyTime: 0.000221 = 848.501512MB/s, Frame: 0.011529
CopyTime: 0.000213 = 878.487806MB/s, Frame: 0.011556
CopyTime: 0.000230 = 815.509926MB/s, Frame: 0.011597
CopyTime: 0.000227 = 824.526651MB/s, Frame: 0.011541
CopyTime: 0.000205 = 914.393320MB/s, Frame: 0.011520
CopyTime: 0.000228 = 822.505734MB/s, Frame: 0.011632
CopyTime: 0.000276 = 679.316507MB/s, Frame: 0.011621

Ysaneya
09-23-2005, 01:08 AM
Geforce fx 5600 AGP x4, approximately 300 Mb/sec, but i do think (for a lot of other reasons) that my system is messed up.

tamlin
09-23-2005, 02:33 AM
knackered wrote:
this is copytexsubimage you're talking about I assume?
also, how do these figures compare to fbo?I have to ask, was it my post you referred to, or something else? I never really got the context of your questions.

If it was a follow-up to my post; no, it was the copyteximage as in the code presented by ChiefWiggum. I did however consider extending and modifying this test a bit to get hopefully more interesting results (copy* bandwidth, texture creation+upload speed, latencies, ...).

ChiefWiggum, I hope you don't mind if I were to extend it a bit to test more areas?

knackered, if you have some simple FBO code in one or more scenarios to add, feel free to pitch in. :)

knackered
09-23-2005, 07:57 AM
eh? I wasn't directing my question at anyone in particular, just throwing it into the thread.
You can do the work of testing it against fbo, I don't care. These kind of threads aren't that interesting to me that I would do any coding/ftp'ing. Believe you and me, if the speed of copyteximage was in my top 10 of concerns I'd be a happy man.
Not very constructive, I know.

ChiefWiggum
09-25-2005, 01:36 AM
tamlin: no of course not, why would I mind!

Infact I was thinking of doing the same thing. There are so many "myths" going around about what never to do, and whats slow. Some were started from GDC presentations like 7 years ago, hardly applicable these days I'd say! It'd be worth writing an app that benchmarks everything, setup speeds, operations such as changing shaders, certain states (that are considered heavy weight changes), FBO speeds (though that article about FBO that someone linked a bit further up is really good for that)...

I might pitch in when I find the time. Would be good to write it as a sort of demo, instead of it pumping out values it could average them over say 500 frames or something.

knackered
09-25-2005, 01:57 AM
Such benchmarking apps have been done time and time again. Whatever you do would merely be an update adding new extensions.
Dismissing all performance advice given over the last 6 years as myths merely because you've discovered that glCopyTexImage is now as fast as glCopyTexSubImage is foolish. Frankly, it just means that the driver now checks to see if the new format/dimensions are the same as the last glCopyTexImage call on that texture object, and if they are then just replace the data rather than a destroy/create cycle. This is more indicative of a flurry of badly written apps using glCopyTexImage, and improved CPU speeds that the driver now does that check. It will also certainly be implementation dependent, so you should always still use glCopyTexSubImage when replacing the pixel data of a texture - because this removes any ambiguity (I'm replacing just the contents of the texture).

ChiefWiggum
09-25-2005, 03:56 AM
knackered:

I never said go and reject what was said for the past seven years, but it might not be a bad idea to test these things out, and see how much of a speed difference they DO make relative to other things. I myself am using copytexsubimage in my app now, but was curious to see the speeds you can get with copyteximage too.

Either way the weight of the argument against using copyteximage AND copytexsubimage is a little high when you're dealing with post processing effects. Everyones struggling to get PBuffers or FBO working and draw into that, to save that extra bit of time, but then they think ok how the hell do I antialias this sucker... The whole reason I came up with this thread was that in my other thread I started testing the speeds of copyteximage (at first) because I figured it simply CANT be slower then doing your own supersampling into a PBuffer.

So all I'm saying is its great to take the advice in too, but it doesnt hurt to check if it particularly applies to you :)

knackered
09-25-2005, 05:34 AM
Fair enough, but that intent wasn't clear from your post.