PDA

View Full Version : Render to current (!) Texture on Nvidia



Martin Kraus
03-25-2005, 03:49 PM
Hi ,
i`m currently doing some gpgpu stuff for which i would like to evaluate a seperable kernel in the following fashion :
- evalute the full kernel in one corner pixel of the image
- evalute the rest of the first row by reading the last result , subtracting the column thats moved out of the kernel area , add the column that moved into the kernel
- proceed similarly for the other rows , but this time subtracting /adding rows ...

Now , for this to work i`d need to use read from current render texture , and i`d need to make sure that the data i`m reading was already written , therefore i would need to control which pixels go parallel in the pixel shader and which not.
I know that generally the results for what i`m trying to do are undefined but i`d be very interested on any experiences doing this or similar , especially on Nvidia hardware ( 6th generation preferably ... )
To be precise : Can i assume that pixels generated from the same Primitive Command ( Lines , Tris , obviously not points ) are the only ones that will go parallel , and if not is there any way to restrict parallelization ?

Thanks in advance
Martin Kraus

P.S : i know that i could just split the seperable kernel into horizontal and vertical kernels and evaluate them sequentially , but this would result in doing 2 passes , something i would like to avoid .

Pete Warden
03-25-2005, 05:17 PM
I've experimented with similar ideas in the past. I have no inside knowledge of the hardware, but it appears that texture reads go through a cache.

This is a problem, because texture caches appear to only flush when they spot normal texture updating going on (uploading texture data, switching to a different texture, etc). They don't flush between primitives, so there's no guarantee you'll read the data that the last primitive just wrote to the screen.

I have played with binding a different texture when I want to flush the cache. I never found a recipe that worked very well though.

It sounds like you're trying to port the fast sw box blur algorithm? That was something I also looked at, but ended up doing a repeated series of increasing width 8x1 horizontal and vertical kernels to get a fast blur even for a large radius.

Pete

Korval
03-25-2005, 05:18 PM
Now , for this to work i`d need to use read from current render texture Yeah, that's not going to be happenning anytime soon. You're talking about either changing how a very fundamental piece of hardware functions, or relying on timing-based behavior that can be different even on different clock-speeds of the same card (GeForce 6600s vs. GeForce 6800s), let alone different cards altogether.


To be precise : Can i assume that pixels generated from the same Primitive Command ( Lines , Tris , obviously not points ) are the only ones that will go parallel , and if not is there any way to restrict parallelization ?It isn't (just) a question of parallelism; it's a question of hardware architecture and cache. Since hardware is not designed to allow for reading the current pixel through the texture unit, it is entirely possible, and quite likely, that if the pixel writes are cached (and they should be), that texture units do not have access to that cache. So, you would need to flush the cache, an operation that probably doesn't happen even with primitive changes. In fact, the only operation I can think of that would force a cache flush is a buffer swap or binding a new destination buffer.

Martin Kraus
03-26-2005, 10:39 AM
Hi ,
hmmm ... seems like this is not feasible then .. what i was looking for was a relatively lightweight operation to flush those caches etc ..
To be precise i`m trying to accelerate an algorithm for creating disparity maps for stereo vision using a SAD or SSD kernel .
Seems like i`ll have to go with the two pass version then ... :(

simongreen
03-27-2005, 10:30 AM
I talked about this briefly at GDC this year. I wouldn't recommend depending on the behaviour
of texturing from a buffer you're also rendering to. You can make it work by adding glFinish in the right places, but there's no guarantee this will work on future hardware.

You can get this to work reliably by ping-ponging between two buffers and copying back the changes, but the cost of this sometimes negates any benefit.

http://download.nvidia.com/developer/pre...sing_Tricks.pdf (http://download.nvidia.com/developer/presentations/2005/GDC/OpenGL_Day/OpenGL_Image_Processing_Tricks.pdf)