How to make interleave row rendering?

My program need a interleave row rendering function,it means program only need to render row 0,2,4,…,or row 1,3,5,…,The method used in my program is rendering the whole frame,copy data to a texture,then set the texture’s sample filter as GL_NEAREST,and draw it all in a half height rectangle.But the method is slow because it need rendering whole frame,is there anyway can do interleave row renderig,and the performance is better than whole frame rendering?

Why not just squash the rendering to half-height, rather than redrawing it as a texture?

If you need to display rows 0,2,4 in one frame and then rows 1,3,5 in next frame using the same texture, then what you do is correct.
You could possibly gain some speed by rendering directly to texture (using FBO).

If you need to display different images every frame, then you could do what Korval suggests. Just add offset to GL_PROJECTON matrix to switch between even/odd rows before you render a frame.

Originally posted by Korval:
Why not just squash the rendering to half-height, rather than redrawing it as a texture?
Because I need a whole frame image,but the odd row and even row transformed by different projection,so I need interleaved rendering

I do a “depth test” test in my program,before every rendering,I clear depth buffer with 2x2 image,the first row of the image is black,and second is white,so the every pixel that depth value below 0.5 is cancelled in rendering.But I found the performance gained is very small,it only speed up little in a complex scene,and has no help to a simple scene.Why?

An interleave rendering method was describe in ShaderX 4: 2.1 Interlaced Rendering by Oles V. Shishkovtsov. Maybe it can help you.

Originally posted by gybe:
An interleave rendering method was describe in ShaderX 4: 2.1 Interlaced Rendering by Oles V. Shishkovtsov. Maybe it can help you.
Thanks your reply,but I haven’t the book in hand,can you tell what described in that book?

I would suggest you use the stencil buffer. Do a first pass setting stencil to 1 in every other row. Then you can just render twice with the stencil test testing for 0, and 1 in the second pass.

If you don’t need stencil for anything else, the stencil fill pass can be done once at initialization.

Originally posted by pango:
[quote]Originally posted by Korval:
Why not just squash the rendering to half-height, rather than redrawing it as a texture?
Because I need a whole frame image,but the odd row and even row transformed by different projection,so I need interleaved rendering
[/QUOTE]You can probably tweak the projection matrix to shift the rendered result so the correct lines are rendered.

Originally posted by Humus:
[b] I would suggest you use the stencil buffer. Do a first pass setting stencil to 1 in every other row. Then you can just render twice with the stencil test testing for 0, and 1 in the second pass.

If you don’t need stencil for anything else, the stencil fill pass can be done once at initialization. [/b]
Thanks your reply,but are you sure the speed of “stencil test” is faster than “depth test”?

Originally posted by Komat:
[quote]Originally posted by pango:
[quote]Originally posted by Korval:
Why not just squash the rendering to half-height, rather than redrawing it as a texture?
Because I need a whole frame image,but the odd row and even row transformed by different projection,so I need interleaved rendering
[/QUOTE]You can probably tweak the projection matrix to shift the rendered result so the correct lines are rendered.
[/QUOTE]Thanks your reply,but what’s the meaning of “tweak the projection matrix”,do you mean I can write a vertex shader,and do tweak in it?

Tweaking the projection matrix is just adding the offset I mentioned before:

are you sure the speed of “stencil test” is faster than “depth test”
That depends if your application needs to use depth test for other purposes. If not, then use depth test - fill depth buffer (once) with proper values and use depth test (which is fast) - do not clear depth buffer every frame - just initialize it and leave it this way using glDepthMask(GL_FALSE).
If you use depth testing to render 3D geometry (and I guess you do), then stencil could prove faster since you will not have to fill depth buffer with that 2-pixel image every frame. Fill the stencil buffer once and re-use it when rendering, just as Humus suggested.

Originally posted by pango:
Thanks your reply,but what’s the meaning of “tweak the projection matrix”,do you mean I can write a vertex shader,and do tweak in it?
I was thinking about applying a half pixel offset to the screen space coordinates by multiplying the projection matrix with matrix with appropriate bias or with addition in vertex shader however after some drawing with pencil and paper I am not sure that this will work without some form of hw support. For example different mipmaps will be selected than in the unsquashed image and textures will be sampled at different coordinates.

[QUOTE]Originally posted by k_szczech:
[QB] Tweaking the projection matrix is just adding the offset I mentioned before:

glMatrixMode(GL_PROJECTION);
glLoadIdentity();
if (evenFrame)
  glTranslatef(0.0f, 0.5f, 0.0f);
glFrustum / gluPerspective

Note that this if for rendering half-height images.

Thanks your reply,but I don’t think it is help for me,because I need interleaved compressed half-height image,means I need the row 0,2,4,…,not row 0,1,2,…

Seems that you want to render interlaced video!? Stencil is the way! Every odd line fill with 1 in stancil buffer. Later… enable stencil test and stencil func to be equal to 0 or 1 and do render.

Originally posted by yooyo:
Seems that you want to render interlaced video!? Stencil is the way! Every odd line fill with 1 in stancil buffer. Later… enable stencil test and stencil func to be equal to 0 or 1 and do render.
Yes,what I want to do is render a scene and play it in TV,so I must render it interlaced.I had write a Stencil test version of my engine,but I found the performance has no increase than frame render version.I want to get a interlaced rendering method that is quicker than render a whole frame,is it possible?
I use a GF7800GT card,how about its “Stencil test” performance?

Originally posted by pango:
I had write a Stencil test version of my engine,but I found the performance has no increase than frame render version.I want to get a interlaced rendering method that is quicker than render a whole frame,is it possible?

The problem might be that the HW does fast stencil/depth rejection test on rectangular blocks of pixels and can reject the block only if all pixels within it would be rejected. Otherwise it has to process the pixels and reject individual results so the pixel shading work is not avoided. Because in your case each second row is present, the fast rejection will probably do nothing.

Originally posted by Komat:
[quote]Originally posted by pango:
I had write a Stencil test version of my engine,but I found the performance has no increase than frame render version.I want to get a interlaced rendering method that is quicker than render a whole frame,is it possible?

The problem might be that the HW does fast stencil/depth rejection test on rectangular blocks of pixels and can reject the block only if all pixels within it would be rejected. Otherwise it has to process the pixels and reject individual results so the pixel shading work is not avoided. Because in your case each second row is present, the fast rejection will probably do nothing.
[/QUOTE]Oh,What your said means that I can’t find a method to do interlaced rendering faster?What I can do is to opatimize my engine to catch double fps?

Halve rendering height, as already said.

Whatever you do, consider following issues:

  • Your engine MUST have constant 50 or 60 fields per second (depending on PAL or NTSC)
  • TV out on todays gaming cards reports 50 or 60 Hz
  • When your render engine work in frame mode, scanline converter chip on card discard every even or odd row depending on current field status and order.
  • If you want to render in field mode, then you must interleave fields an deliver constant 25 or 30 fps output. Each frame must be sent twice or do carefull timings in app and call SwapBuffers every 20 or 16.666 ms.
  • Finally… when your app start rendering you can’t be sure that it start on even or odd field. So… sometimes user may need to push “swith fields” button (if you app provide such function).

There is a another option … render interleaved frame in offscreen buffer, apply RGB -> YUV422 shader, grab frame and send it to overlay mixer using DirectShow.