Order in which fragments are processed (in fragment program)

Do we know how fragments are processed in a GL_NV_fragment_program? I am asking this because I am trying to use a calculated value in a subsequent equation in the same fragment program, but not for the same fragment, but a neighbor. I would like to do everything in one pass and not have to save the intermediate result to a texture and then do a second pass to get access to the offset fragments.Any recommendations?

As a related question… I am using the NV30 software emulator to run my openGL code, because I don’t have an FX card. My geforce3 does not support fragment programs. Can anyone guide me on what the frame rate penalty will be if I use two passes versus one - assuming that all is equal except I divide my fragment program instructions into two different programs? i.e. will my program run twice as slow because I am doing two passes instead of one?

Thanks.

Fragments are processed individually. All registers are reset to 0 when the fragment program starts. There’s no way to communicate results to neighbouring fragments in the same pass. Sorry.

Originally posted by al_bob:
Fragments are processed individually. All registers are reset to 0 when the fragment program starts. There’s no way to communicate results to neighbouring fragments in the same pass. Sorry.

I guess it was wishful thinking that maybe we knew that it started processing fragments at the bottom left, frag[0,0] then proceeded left to right, frag[1,0], frag[2,0]…frag[width,0], then started on the next row with frag[0,1] etc. If I knew this then I could do some counting in my fragment program to keep track of neighbors of interest.

Rendering today is so fast because it’s massively parallel. There is no “this one first, then that one”. There is no constant number of pixels in flight at the same time and there is no constant ordering between parallel pixel batches that could be exposed.
That’s a good thing.

Originally posted by zeckensack:
Rendering today is so fast because it’s massively parallel. There is no “this one first, then that one”. There is no constant number of pixels in flight at the same time and there is no constant ordering between parallel pixel batches that could be exposed.
That’s a good thing.

Should I just render my first pass to the frame buffer, copy it to a texture and do my second pass? Is there any faster way? In general if I do multiple passes before I display, is it best to render to the frame buffer and copy to texture?

Thanks.

You could render to a floating-point PBuffer and use that as a texture for the second pass.

Originally posted by al_bob:
You could render to a floating-point PBuffer and use that as a texture for the second pass.

Can you please point me to documentation on how to use PBuffers?

Originally posted by sek:
Can you please point me to documentation on how to use PBuffers?

Here’s ATI’s paper on it:
http://www.ati.com/developer/ATIpbuffer.pdf

NVidia probably has their own on their website, too.

Mark Harris has a really nice C++ class that takes the pain out of using pbuffers:

http://www.cs.unc.edu/~harrism/misc/rendertexture.html