As I have need for directly writing depth and color atomically (w active depth testing), I’ve come to the conclusion that GL_POINTS rendering is my way to go.
Looks like this
struct Vertex
{
float x,y,z;
};
Vertex* vbuf=NULL;void
init_gl_stuff()
{
<…>
vbuf=(Vertex*)malloc(640480sizeof(Vertex));
float x_step=2.0f/640;
float y_step=-2.0f/480;for (int x=0;x<640;++x) for (int y=0;y<480;++y)
{
vbuf[640y+x].x=-1.0f+x_stepx+0.5fx_step;
vbuf[640y+x].y= 1.0f+y_stepy+0.5fy_step;
vbuf[640*y+x].z= 0.0f;
}
<…>
}void
pedestrian_blit()
{
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
glVertexPointer(3,GL_FLOAT,sizeof(Vertex),&(vbuf->x));//source color from disjoint color array, init code is not shown
glColorPointer(4,GL_UNSIGNED_BYTE,4,shredder+offset);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);glDrawArrays(GL_POINTS,0,640*480);
}
This ‘blit’ is wrapped inside a glut display callback that also takes care of buffer swaps and does timing.
On to the issue:
This ‘blit’ replacement times in at ~16ms (640x480), for R200, R300 and Geforce 3. So far, so good.
If I pluck it into a display list (which is impractical, I did so to look at what performance I can expect from static VBOs), the R300 tpf drops into the microsecond domain. Whoosh.
However, on my Geforce 3 the tpf increases to ~60ms. Does that mean that there’s no hw support for point rendering on Gf3? Is ‘pretesselation’ into pixel-sized quads the only way to get acceptable performance?
Or is it simply a driver glitch?
It’s hardly practical to disable VBO usage when the renderer string starts with “Geforce” … or is it?
(Dets 44.03, Geforce 3Ti200, Athlon XP 2400+, Win2k)