I am working on a volume renderer using view-aligned slices, 3D textures, and a fragment program. It's working quite nicely and I am trying to optimize it, but it isn't clear to me how much various operations cost.
Some ideas I've had:
I could decrease the number of slices and then use a fragment program to take multiple texture samples along the viewing vector. This would reduce the number of fragments and polygons and not change the number of texture fetches, but would increase the fragment program complexity.
I could break up the volume into smaller cubes and varry the slice frequency per sub-volume, thereby only working hard at edges where slice-aliasing becomes obvious.
I could render once with low slice frequency in near-to-far order, drawing only nearly-opaque fragments, then draw in far-to-near order with full alpha blending. The idea being that the depth buffer would allow me to avoid running the fragment program on fragments that will never be seen.
I think there may be other ways of getting more out of the paralellism of texture hardware, but I am unclear about what's possible.



