Fragments for all

Janika · May 28, 2012, 7:43am

1** If we could change the fragment location by explicitly specifying new screen coordinates or relative to its original location.

relocateFrag(int x, int y)

relocateFragRelative(int dx, int dy)

2** If we could control size of the fragment by specifying a scale/zoom factor in the fragment shader:

scaleFrag(float scaleFactor, int scaleMode, int antialiasingMode)

scaleMode specifies the shape of the pixels generated (circle, square, triangle…)

antialiasingMode specifies how the generated pixels are to be blended.

3** If we could read a fragment color at a given location in fragment shader:

readFrag()
readFrag(int x, int y)
readFragRelative(int dx, int dy)

OpenGL 5.0? Which is coming soon.

menzel · May 28, 2012, 8:05am

You might be able to get this now on GL 4.2 with atomics for locking and image read/write. I guess the performance will not be great and that might also be the reason why it’s not exposed in such simple functions as you suggest: the hardware assumes perfect parallelism of the fragments and gets a lot of its performance by exploiting this. Your ideas destroy the parallelism and introduce special problems (like fragment A reads fragment B and vice versa: you can’t order that…). Whenever you kill the parallelism (what you can already do in 4.2), you will get performance problems.

Janika · May 28, 2012, 10:13am

That’s why we suggest here :), things that are not yet in because of incapable hardware structure or performance issues. The idea is to lead hardware by designing for the future and be proactive before someone else bring this feature in their API and claim victory, then OpenGL has to follow the same path after years.

I say add it in, let it affect performance, but give the user an option until the hardware is capable. No body is forced to used it. However if implementing it is going to be a problematic, which I doubt, then emulate it!
Don’t you guys think that several features added already suck performance even though it’s how hardware works??? VBO is one example. My point is performance can always suck by driver bugs or bad coding.
My believe that a certain feature is given an excuse not to be implemented now because of “some hardware issues” is a pure laziness of the implementer.

Alfonse_Reinheart · May 28, 2012, 11:10am

The idea is to lead hardware by designing for the future and be proactive

That never works. GLSL tried to design for the future and be proactive, but it sacrificed the present, screwing up OpenGL for years.

Just look at quad-buffer stereo. OpenGL has had the possibility for this from day 1, and yet neither NVIDIA nor AMD will expose it outside of high-end workstation cards. Thus, stereoscopic 3D relies on a number of driver-side hacks, where they figure out what your perspective matrix is and so forth.

Predicting the future only ends in tears. Especially when you don’t control the future.

Don’t you guys think that several features added already suck performance even though it’s how hardware works??? VBO is one example.

… this is I believe the second time you’ve stated that buffer objects lower performance, while providing no evidence whatsoever for this. If you’re talking about streaming, or static buffers vs. NVIDIA’s display lists, that’s one thing. But since you patently refuse to qualify your statements, they just come off as so much nonsense.

menzel · May 28, 2012, 11:30am

Janika, killing parallelism will not only slow down current hardware, but also future hardware. If you want to give up, you’ll end with a bunch of independent cores that are bigger on the DIE as the ones now. So basically you will have a mutlicore CPU with hardware texture units and similar fixed function parts. Much more flexible but also slower as you sacrifice space on the DIE to get this flexibility that otherwise would be used for more cores to speed up parallel stuff. To justify such a sacrifice we would need strong usecases that can’t be emulated on current hardware and that also doesn’t fit multicore CPUs.

Janika · May 28, 2012, 12:18pm

If you’re talking about streaming, or static buffers vs. NVIDIA’s display lists, that’s one thing. But since you patently refuse to qualify your statements, they just come off as so much nonsense.

Dynamic buffers vs. what I get from glBegin/glEdn immediate mode. It could be my video card which is Mobility Radeon x1600, but I assumed it’s not the card but it’s the OpenGL drivers since I had no problem running the same demo code with Direct3D 9. I had to do some “tricks” to match the performance of old way immediate mode. Particularly the map/unmap buffer was causing the frame rate kill.

menzel, I cannot picture the problem from a hardware designer’s perspective. If there’s a reference on how video cards work and features map to silicons that would be awesome. Otherwise I still see it doable without having to break the current hardware parallelism path.

Ilian_Dinev · May 28, 2012, 4:16pm

2x2 pixels get grouped together during rasterization. This is to calculate dfdx/dfdy for texture-coords (for mipmap selection, anisotropy). Drawing points may use 1/4 of the available resources, or group 4 1x1 points together while calculating, and then overburden your ROP to handle scatter-write (or do the ROP stage at 1/4 rate). Fortunately 1x1 points are rarely useful for visualization, so no-one feels/minds the slowdown.
Primitives get bounding-boxes, through which rasterization often could be skipped altogether , if Hi-Z/zcull test fails. Scattering your fragments around disables this.
GPUs are starting to get multiple rasterizers, so moving a fragment outside its expected position could end-up sending it in a place, where the current rasterizer shouldn’t touch.

Ilian_Dinev · May 28, 2012, 4:29pm

Btw I originally wanted something to let us use the rasterizer to generate geometry and stuff, but it can all be implemented with imageloadstore/atomic_counters in a fragment shader nowadays, with not too much penalty. The idea to move a fragment to random coordinates is a tiny subset of this. Still as suboptimal, but at least doesn’t involve adding more gates for functionality that other existing functionality could help you emulate. HW vendors could focus on improving the performance of the existing functionality rather than try to add+optimise a rarely-useless effect of it

Janika · May 28, 2012, 9:44pm

Yeah ic, makes sense. Thanks a lot for all your feedback.