Thank you for the detailed reply, Aleksandar. I should have post a more complete question .
For the first question, I have tested the “normal” glTexImage2D() ( not glTexSubImage2D(), Do the two funcs differ much? ). Without using PBO, glTexImage2D() is performed on the CPU, and I got a speed of ~600M/s.
For the second item, could you tell me how to “overlap multiple texture transfers” for a better performance?
For the third, here is the code snippets:
void func1()
{
//--------------codes for timing---------------------------------------
GLuint query;
glGenQueries(1, &query);
glBeginQuery(GL_TIME_ELAPSED,query);
//---------------------------------------------------------------------
//uploading ---- the code-block tested
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo1); //
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
glBindTexture( GL_TEXTURE_2D, tex1 );
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo1);
glTexSubImage2D( GL_TEXTURE_2D, 0, 0, 0,tex1_width,tex1_height,GL_RED_INTEGER,GL_UNSIGNED_INT,0); //where the data transfer occurs
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
//-----------codes for timing----------------------------------------------
//yes,measuring like this stalls the whole progress. But I am just testing the time consumed for
//uploading from host to video memory, rather than the performance of the whole app.
//Does measuring like this broke the GPU execution sequence, thus hurts the uploading speed?
glEndQuery(GL_TIME_ELAPSED);
GLuint done = 0;
while (done == 0)
{
glGetQueryObjectuiv(query, GL_QUERY_RESULT_AVAILABLE, &done);
}
GLuint elapsed_time;
glGetQueryObjectuiv(query, GL_QUERY_RESULT, &elapsed_time);
glDeleteQueries(1, &query);
float time_ms = elapsed_time/1000000.0f;
LogTime( time_ms ); //write time_ms to a log file.
//-----------------------------------------------------------------------
Render();
//remap the buffer
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo1);
glBufferData(GL_PIXEL_UNPACK_BUFFER, buffersize, 0, GL_STREAM_DRAW);
pBufferData = (byte*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY);
//pBufferData is a global variable, it will be refilled in another thread which focus on I/O.
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
}
Fouth, yes, the GTX570 and the GTX670 are on the same machine. And i tend to believe it is a diver issue.