PDA

View Full Version : Programmability...the double edged sword !!



Devulon
05-06-2002, 07:30 AM
After reading over the article (check opengl.org home page) on the P10 and Opengl 2.0 programs i am really getting concerned. I am a fan of low level programming, for the primary reason of control. No compilier or processor out there will ever be able to optimize better than you can. The higher the level of the programming language (as well as the number of total instructions) the harder it becomes for a compilier to generate really fast code. With x86 simple branch prediction has become a problem. While most of the time the branch prediction is ok you can force cases where the prediction will always come up wrong. With opengl 2.0 having something like 200 instructions i see some real issues. The P10 is suppose to be all about the parallelism but with so many instructions I promise you you can write a program (especially with loops and conditionals) that will severly hault that thing. Dependency between registers and operations is a serious problem. In lower level code you can be very careful about these issues but the higher you get the more you are at the mercy of the compiler.

The second thing that I don't like is all the damn transistors. All the different instructions have eaten a lot of real estate. Although I know the 256 bit memory interface definately ate most of the real estate. I think I would have preferred the current level of vertex programs with say a dozen more instructions and the ability to pass in and out arbitrary amounts of data over the enlarged instruction set. For example I would like to be able to pass nurbs control points in and send out tri's which is currently not possible with NV Programs or ATI. ITs now possible with 2.0 but at the same time I think I would of simply enjoyed having a couple extra texture units and a few less instructions.

Full Programability sounds great on paper but inside the silicon it gets messy and the potential for failure is equally as great as success. Programming is a balancing act best left to the programmer not the video card. I want to see the assembly that thing generates. I want the opcodes. And I want to write programs in hex. Then and only then will I feel like I am in control.

All I am saying is this. Watch it guys. Opengl 2.0 may come back to bite you. You better start the super advanced forum right now. You're gonna need it.

Devulon

davepermen
05-06-2002, 07:46 AM
black vision guy..

there is no standart x86 instructionset for gpu's yet, but there is the intermediate language.. good gpu's will support this directly, and like that you will get direct assembler programability..

and parallel programming was never easy (see ps2)

take it easy..

oh, and compilers can be DAMN GOOD! http://www.opengl.org/discussion_boards/ubb/smile.gif

Devulon
05-06-2002, 09:00 AM
There are good compiler's I will give you that. But I would like to say that the number of good ones is a lot less than the number of bad ones. And of course define good. Visual Studio (Microsofts compilier for C++) is quite good. Very good in fact. But then again the intel compiler rips the crap out of it. But then again who can afford intel's compiler.

As to the parallelism its never easy, thats my whole point. This is something that becomes implementation dependant. I imagine that the same code will run at different speeds on different videocards with different drives. The same way the exact same C code generates different assembly with different compilers on different platforms.

OpenGL 2.0 is a big leap. I would of preferred a big step. With a leap you really gotta watch your landing.

Devulon

LordKronos
05-06-2002, 09:38 AM
Originally posted by Devulon:
the number of good ones is a lot less than the number of bad ones

A statistic thats generally true of everything in life. The number of programmers that can out-optimize even a modestly decent compiler is a lot less than the number of programmers that would be put to shame by it.

dorbie
05-06-2002, 11:08 AM
You are used to low level programming where you have a single target instruction set, with shader graphics that will not be true. The advantage of the language which you are missing here is the hardware abstraction is provides. The hardware manufacturers have different design philosophies and if you get what you wish for, you will have to hand tune your code for a plethora of graphics 'instruction set' platforms (at least the 2 or 3 most popular). I don't think there is a good option here, I like competition that stymies NVIDIA's plans to charge gamers $700+ for a video card, but don't like that NVIDIA and ATI can't agree on extensions. Maybe the price of freedom is a bit of opacity beneath a shader compiler.

[This message has been edited by dorbie (edited 05-06-2002).]

Devulon
05-06-2002, 11:25 AM
Originally posted by dorbie:
[B]You are used to low level programming where you have a single target instruction set, with shader graphics that will not be true. The advantage of the language which you are missing here is the hardware abstraction is provides...

Anyone every here of Direct X. Its abstraction is so nice that the number of tri's you can push with it sucks compared to opengl. There is a price to pay for hardware abstraction. I would like them to agree on an instruction set. Not on a language. Add is Add I really don't care what you call it. But c = a + b; where a, b, c, are vectors is not addition. Its logic, the obvious addition and a store. As well as the potential reads from memory/registers. Its a lot more complicated and hence has a lot of variations on what actually occurs. The end result must always be the same but the actual work the processor does can be quite different. I want to know what it does. Thats what allows you to make stuff fast. This add eax, eax is very explicit as to what I want the processor to do but eax = eax + eax; clearly isn't. That abstraction will cost you.

Devulon

davepermen
05-06-2002, 11:43 AM
well actually it depends on hardware. hardware designed for the dx abstraction work well and fast.. nvidias hardware is not that well designed for dx (but nvidia designed dx to 90% so i think this is rather stupid http://www.opengl.org/discussion_boards/ubb/wink.gif) so they are not fast on dx..

but hey?

you say that on every other hardware we'll get different speed?
well.. if that is now not the case, why is the whole world comparing fps of quake3? http://www.opengl.org/discussion_boards/ubb/wink.gif

i don't see your point.. its not more or less abstract than before.. its just different cause now you get programability you hadn't before..

and if you want to use extensions you have to abstract them in some way to support other hardware..

if you don't use extensions in gl you will get by no ways the fastest out of your gpu


and intelcompiler made my code slower than vc6! and vc7 is even faster than vc6.. so who needs that crappy compiler? http://www.opengl.org/discussion_boards/ubb/wink.gif

(i would love to try vectorC one time.. but oh well.. money.. http://www.opengl.org/discussion_boards/ubb/wink.gif)

Lev
05-06-2002, 11:49 AM
I will gladly accept a performance loss for hardware abstraction and I think many developers will also. Low level programming is terrible in terms of productivity, and this is what business is about. Surely low level programmed code can be fast, but a wisely chosen algorithm hidden behind a high-level interface can be equally fast.

Low level programming at "standard" application level is a disease that should be fought.

Just my 0.02

-Lev

Jurjen Katsman
05-06-2002, 11:53 AM
Where is the comment of DX pushing less tris on NVidia hardware as on GL coming from.

I would say this is entirely untrue. Actually using VAR in GL or using DX is pretty much exactly the same thing.

There are a tiny amount of things that can be done directly in register combiners that can't be expressed very well in pixel shaders (because of abstraction/having to work on other hardware), but your comment makes it sound like that's not what you are talking about.

knackered
05-06-2002, 12:10 PM
I agree, locking vertex buffers leaves it open to the driver as to how it treats that vertex data - in the case of nvidia hardware, the driver probably copies the vertex data to agp memory anyway, until the buffer is unlocked.
All in all, and this probably isn't relevant to this topic, but I think opengl is becoming as messy as really early versions of direct3d...extensions are MESSY, and are getting messier.

zed
05-06-2002, 12:36 PM
>>Where is the comment of DX pushing less tris on NVidia hardware as on GL coming from.<<

i thought this was due to the overhead of COM for d3d

>>I would say this is entirely untrue. Actually using VAR in GL or using DX is pretty much exactly the same thing.<<

funny how the nvidia demo to show off their nvidia cards (benmark5) is about 10-15% faster when u convert the d3d code to opengl code.
why?
im no hardware freak (i find it all very boring http://www.opengl.org/discussion_boards/ubb/smile.gif) but what actually causes the speed difference between opengl + d3d?, is it like what i wrote above the overhead of com?or is it something else?

Korval
05-06-2002, 01:37 PM
i thought this was due to the overhead of COM for d3d

D3D COM is not particularly slow. The function call overhead is likely to be an additional bit of function-pointer indirection, which happens to be the same as any OpenGL call (since the ICD is done as a dll, there's always a function-pointer indirection involved). D3D is done in-proc as a dll, so the speed isn't particularly slow. Certainly, I can't imagine the speed is slow enough to get past the actual hardware bottlenecks when the hardware is pushed.

Yes, VAR is probably a bit faster than D3D, even with locked buffers (since you are managing the memory yourself).

Korval
05-06-2002, 01:54 PM
Programming is a balancing act best left to the programmer not the video card.

That's a pretty arrogant statement. Who's to say that the card doesn't know more about itself than you? Who's to say that you will be allowed to know enough about the hardware to even come close to being faster than their optimized version of code? Remember, the actual functionality of their hardware is propriatery, as it should be. The general public, even the developers, will not be allowed access to the meat of their hardware. That's why they have good abstractions.


I want to see the assembly that thing generates. I want the opcodes. And I want to write programs in hex. Then and only then will I feel like I am in control.

Why do you want opcodes? You don't need them to do what you want. And your statement that the abstraction will cause an inevitable slow-down from the peak performance you could squeeze out if only you knew the internals of the hardware is borne of a lack of trust. You aren't willing to trust programmers who have direct access to the hardware designers to write decent compilers.

It's not like vertex programs on nVidia cards aren't compiled into some form of microcode or something. You didn't think that it went through a simple assembler, did you?

Don't fear abstractions simply because you no longer have access to the bare hardware (which you shouldn't have to touch anyway). Instead, embrace them, for they give you access to a wide variety of hardware.

MikeC
05-06-2002, 02:08 PM
Originally posted by Korval:
D3D COM is not particularly slow. The function call overhead is likely to be an additional bit of function-pointer indirection, which happens to be the same as any OpenGL call (since the ICD is done as a dll, there's always a function-pointer indirection involved).

Nah. The cost of one layer of pointer indirection is pretty insignificant in the context of a function call. Matt Craighead has stated (in response to an earlier question along similar lines) that D3D's higher function call overhead is because DX runs in kernel mode. GL runs in userland, same as your app, so there's no expensive mode switch involved.

dorbie
05-06-2002, 02:18 PM
Devulon, there is nothing equivalent to shader compilation going on in D3D. This is a means to a single codebase which exploits hardware features. Don't confuse the two, just because I used the phrase hardware abstraction. You should be aware that OpenGL 1.x offers hardware abstraction to the developer while exploiting the hardware acceleration of the platform. The fact that OpenGL offers hadrware abstraction is a "GOOD THING", not a bad thing. It is inherently desirable and when well designed need not be a performance killer. To say that OpenGL 2.0 is bad because it offers hardware abstraction simply because D3D does is totally misguided. The PRIMARY OBJECTIVE of an API like OpenGL 1.x is hardware abstraction.

As for them agreeing, see the ARB notes. There are several problems beyond the competition and ideological differences, there is little understanding that for many developers if it's not core it is irrelevant. Interesting features are too easily shunted into a marginalized extension for short term gain by most ARB members.

The subtext is that a Graphics architecture takes 2-3 years to implement during which the parties try and jockey their their extensions into a favourable position (either with M$ or through the ARB), at the same time they are still finalizing out how to expose unreleased functionality through an API. They look for common ground but they have divergent hardware and don't want to tell the competition what they are up to or even let on that they know what the competition is up to.

I think OpenGL 2.0 in future and some watered down lowest common denominator OpenGL 1.4 shader in the mean time is the best we can expect.

The D3D way is worse of course leaving all but one hardware developer to bang a square peg into a round hole or simply drop the feature.


[This message has been edited by dorbie (edited 05-06-2002).]

dorbie
05-06-2002, 02:53 PM
P.S. This is a best case scenario, the whole higher level shader thing could turn into a messy competition for developer hearts and minds. I'm not sure all the IHVs have bought into 3DLabs' spec. There are even worse things which could happen but I don't want to give you nightmares so I won't reveal what a certain large predatory monopolist may have to wield as a cudgel.

Won
05-06-2002, 02:56 PM
I would much rather deal with graphics at a higher level abstraction than vertex program instructions, thank you very much. The cost of abstraction is extremely low, and it can buy you quite a bit. And there are situations where the compiler can do as good a job than a human (and sometimes better, depending on the human). Deal with it because you have plenty of better things to worry about in the grand scheme of things (graphics and otherwise).

Extracting parallelism is difficult in the general case, yes. However, these shading languages encourage data-parallel constructs -- each vertex and fragment and pixel is operated upon pretty uniformly and independently. Extracting parallelism in this case is quite simple because you pretty much know the structure of the problem even before you compile. Consider the architecture of the new 3Dlabs chip.

-Won

V-man
05-06-2002, 03:04 PM
Originally posted by MikeC:
Nah. The cost of one layer of pointer indirection is pretty insignificant in the context of a function call. Matt Craighead has stated (in response to an earlier question along similar lines) that D3D's higher function call overhead is because DX runs in kernel mode. GL runs in userland, same as your app, so there's no expensive mode switch involved.

I heard that DX runs in kernel mode on NT and that's one of the reasons MS has hesitated to update DX for NT. The other probably being that NT is not popular among everyday users. I assumed that DX was in user mode on all the other windows.

Anyway, won't a switch occur between your app and the driver anyway? So DX and GL must be on equal footing.

V-man

dorbie
05-06-2002, 03:08 PM
Won, you're right about the parallelism. It's inherently SIMD. You don't program a scanline algorithm or anything so corny, you effectively program an individual fragment, with data inputs which vary across the primitive. There is no issue with parallelism. The real killer is support for branch instructions and whether a branch instruction has some kind of combinatorial or multipass effect or whether the hardware has native branch support.

Devulon
05-06-2002, 04:46 PM
All of you have made some interesting points. And I would like to comment.

I still don't understand why we need to have hardware abstraction of Addition and Dot Product. All I am really asking for is a unified instruction set. Everyone use the same instructions, same register names, etc., etc. The individual instructions in the current level of shader programs really don't need to be abstracted. I can't imagine being able to make "function calls" and the like in my program that generates a color fragment.

As for my comment that the code may run at different speeds. I was thinking back through the intel processor line. It's not just C/C++ and other high level languages that suffer from different compilers and different platforms. Look at pure basic 32 bit x86 assembly from the time the pentium came out through the pPro through the now P4. Explicit assembly code actually runs at different speeds all on different intel processors. This is largerly because of changes to the core. Yes the add instruction still works the same and usually doesn't change the speed at which it executes but the branch predicition and memory fetching/writing drastically changed between processors. The fact that the entire memory interface and the way branching is dealt with will be different on each video card makes me believe that there will be identical code that runs at different speeds.

For example, pick a piece of code that intentional makes random memory accesses that you know are going to break cache lines. It sounds like the p10 will do a reasonable good job at preventing a major stall. I would think that any chipset with only a 128 memory bus will probably have a little more trouble. Just from the size of the bus I wanna guess that if the data is no where near the cache there is gonna be a memory run and I bet the 256 bit bus will be twice as fast. (Assuming the speed it runs at is the same, same latency, etc., etc.)

So we have this language that is abstracted except you already got me worrying about the memory issues between different types/sizes/speeds. I want to make sure that if I need to use say 129 bits that I am on a 256 bit bus so that I don't end up with a two read stall versus one.

All I am trying to say is that its nice not to have to worry about these things. But unless every maker (basically nVidia, ATI nad 3dLabs) or 3d graphics chips use an almost identical core/memory bus/memory interface I am really gonna be concerned about the little timing nuances and the like.

As long as everything isn't the same I will be concerned about the differences.

Devulon

dorbie
05-06-2002, 05:15 PM
OpenGL 1.4 will provide a unified instruction set and way of calling it. Right now neither exists. For more capabilities which exploit arbitrary stages, resource, and branching capabilities across multiple platforms you need to hide some of the implementation details. There are all sorts of things which might comprise an instruction in a low level graphics library, for example a LUT might supply an arbitrary function in a single operation, a blend, alphatest or stencil op might be a branch. Do you want to have to implement that LUT using whatever strange hardware is available on 3 or more platforms in an implementation or do you want to just request the function or operation in a higher level language and have the guys at NVIDIA, ATI and 3DLabs implement it as best they see fit on their implementation?

You will get the basics in OpenGL 1.4, but there is a growing need for something more powerful that hides the implementation details.

[This message has been edited by dorbie (edited 05-06-2002).]

Jurjen Katsman
05-06-2002, 05:19 PM
Zed: I'm not sure what exactly is going on in BenMark5 again. From experience on machines with broken AGP I would conclude it will keep vertices in AGP memory, instead of in system memory. This might be because of the way BenMark creates vertexbuffers, or because of a DX7 limitation, or because of something in NVidias DX7 drivers.

Assuming this is indeed the case it would make sense that a GL conversion using VAR and video memory would yield 10-15% gain. Also not using VAR I'm sure good things are possible, just using more CPU, but you're probably not measuring that.

Any COM overhead or anything else CPU related wouldn't really matter in BenMark, it should be entirely GPU limited. Ofcourse hard to really say anything about those 2 without actually looking at both pieces of code... so well, basically I'm just rambling now http://www.opengl.org/discussion_boards/ubb/wink.gif

As mentioned, DX runs mostly in kernelmode on 2k/XP, and GL in usermode. This should give GL an advantage when call overhead does become an issue. (Note however, that DX will not do a modeswitch every single call, it will just queue op commands and flush them after a while, but at a cost ofcourse).

Because of this expected cost I once converted a full game app from DX8 to GL using VAR.

Unfortunately I got to pretty much exactly the same performance, even in cases where performance was dominated by thousands of small batches of polygons with a ton of renderstate changes. I'm still not quitte sure why I didn't get a nice performance boost. It seems like GL should be able to exploit the fact that it's in usermode to atleast create a decent performance gain.

dorbie
05-06-2002, 05:29 PM
There is another more abstract but possibly more powerful reason. Everyone knows that the biggest obstacle to fancy new hardware sales is lack of new features used in games, the reason these features aren't used by games is lack of market penetration by hardware which supports them, this is an obvious catch 22, and games like Doom3 are too rare to break the pattern. Allowing hardware developers to implement new features under a higher level language to accelerate existing code which can be of arbitrary complexity helps break this impasse. Whether it works longer term depends on the design of the API.

Korval
05-06-2002, 08:28 PM
So we have this language that is abstracted except you already got me worrying about the memory issues between different types/sizes/speeds. I want to make sure that if I need to use say 129 bits that I am on a 256 bit bus so that I don't end up with a two read stall versus one.

All I am trying to say is that its nice not to have to worry about these things. But unless every maker (basically nVidia, ATI nad 3dLabs) or 3d graphics chips use an almost identical core/memory bus/memory interface I am really gonna be concerned about the little timing nuances and the like.

As long as everything isn't the same I will be concerned about the differences.

Why do the differences concern you so much? The days of cycle-counting are coming to an end. If the same program runs in 4 cycles on one card and 5 on another, do you ultimately really care? Sure, maybe if you had access to (propriatery) documents on the hardware and the underlying microcode, you might be able to make the second one run at 3 cycles. Maybe. But, then you have to hand optimize all of the possible hardware arrangements. Not to mention, deal with different API's that expose different ways to get at similar features. I'm willing to sacrifice 1 cycle on one card (or on all of them) to not have to do that, and be able to spend more time on the actual product.

In another case, maybe the implementation writers are actually competent in constructing their compilers and can optimize their code far better than you. Maybe in the current release, your code runs in 5 cycles, but with a new release, it runs in 4 or 3. Or maybe, just maybe, even if you had the documentation and the microcode access, you wouldn't be able to beat their compiler because their compiler (and the programmers that wrote it) are competent.

Your fears are grounded in the possibility that the driver writers suck, not in any actual facts. Don't presume that you are superior to the driver writers. In all likelyhood, their optimized compiled code will beat your attempt at hand-optimized microcode, if for no other reason than that they have direct access to the hardware-building staff.

Devulon
05-07-2002, 02:15 AM
I completely agree with you Koval...almost.

First I definately don't think I am superior to ther driver writers. I will give nVidia a lot of credit for their drivers. In my mind it always has and probably always will be one of the biggest selling points for there cards. At times I think they have sacrificed small amounts of speed here and there but all in the name of stability and compliancy. nVidia drivers confrom to the OpenGL Spec to the letter and are stable. Thats all I ever wanted in a driver.

As for caring that something takes 3 cycles on one card and 4 on another. I do care. Lets remember what we optimize with hand coded assembly. We optimize the inner loop, the big function that takes 65 percent of the entire time spent during one frame. That is a the place where a minor 2 cycle chance adds up really fast. I truely don't know what the hardware will look like in the future, nor will I ever claim to. I'm just saying that say for example. Some instruction such as say DotProduct with the result used in a conditional. Say this causes a stall of 1 cycle. Not bad no biggie except the card next to this one doesn't cause the 1 cycle stall. This shouldn't be a big deal unless these couple lines of code are in the middle of the fragment shader. Now lets take an estimate of how many times this 1 cycle will get wasted if we are trying to draw a 250,000 triangles a frame.

Most stalls occur from a branch causing memory thrashing of some kind. The memory system is going to have to be balanced with the chip so well. The most minor of delays inside a function that is called 2 million times adds up really really fast.

Lets just remember something, programmers write code the way they see. The way the processor sees it is a completely different matter altogether.

Devulon

P.S. I hope I didn't get people to wriled up with this post. I was just trying to get posts going in a rather slow week. ( :

ScottManDeath
05-07-2002, 04:00 AM
Originally posted by davepermen:


and intelcompiler made my code slower than vc6! and vc7 is even faster than vc6.. so who needs that crappy compiler? http://www.opengl.org/discussion_boards/ubb/wink.gif


Hi

this may be but i had another experience. I wrote a kond of learning vector quantification algorithm which is essentially taking a vector with a dimension X (usually around 100), finding the distanc to N other vectors of dimension X and then move the vector with the nearest vector a little bit towards the first.
I wrote this algorithm with standard C and the I tested the VC6 3dnow/sse intrisincs. depending of the cpu, different algortihms were benchmark. And please take a seat. A P4 2000Mhz took around 12 minutes for a benchmark (C code/sse code), but an AMD 650 took only 4 minutes!! for benchmark(C code/3dnow). On my AMD 1333Mhz I saw a difference from C FPU code to 3dnow code of around 60%.
But then I tested the Intel compilers evaluation version: It was able to bring the FPU code to the speed of my 3dnow code and sse code.

Perhaps my case is not representative but I think you can see that a good compiler can result in a much better performance

bye
ScottManDeath

Tom Nuydens
05-07-2002, 04:07 AM
Devulon, with what you said in mind, do you also write different code for each type of CPU (i.e. Pentium2/3/4, Athlon, ...)? They all have different architectures, you know, so they will make the same code run at different speeds.

Let's face it: by the time you've finished writing optimal code for each CPU (or GPU), a new generation will have already appeared. Life's too short to program at such a low level these days.

-- Tom

Devulon
05-07-2002, 05:35 AM
The key difference is that most people aren't running all the different architectures. By that I mean no one is running a 386 or a pentium or even a Pentium Pro. I write for PIII/Athlon. The current level of x86 processors (PIII/P4/Athlon) tend to have very similar memory latency's and the cache hits/misses are quite predictable. Besides most of the time 99 percent of your code you really don't care anyway. The graphics cards/chips are all going to be quite different. Or at least I imagine they will be. But the base vertex/shader programs are really important since they get executed so many times. Hardware based fixed function T&L runs pretty much the same speed on all graphics cards, because it is fixed function. Have programability simply adds a layer of slight unpredictability.

Devulon

davepermen
05-07-2002, 06:10 AM
no, it doesn't..

and coding for amd/intel is not the problem.. coding for 500mhz and 2gig is more of a problem..

and that is what gpu's differ

speed

not speed in some instruction..
there is not
one with fast dp3, and one with slow dp3
cause they will loose in all he benches else..

they will be faster or slower than the other ones

oh, and developing for intel/amd means as well developing for different ram-speeds, different cache-sizes, different acess-speeds for harddrives and different agp-speeds for writing/reading and different gpu's.. say you want to optimize something to the last you have to code it for every pc new, or you take some interface, like gl, or dx for the gpu, to do the job for you..

you don't get more abstraction.. you get a nicer interface..

look at the specs, they define the amount of "functions" the language will have.. this amount of functions is in fact the amount of instructions such a processor has to support.. each of those instructions should be very fast, and, at best, equally fast..

but there is no difference if i write

ADD r0,r1,r2

or

r0 = r1 + r2

the second is just nicer

its not more abstraction.. and the parser is opensource, so you can even look at your "asm"output..

and hey.. the whole design of the p10 for example is to make it easy for the developer.. the chip does the whole parallelism for you, you don't even have to care about really.. it will take all the resources it can find.. no need to care..

and we dont need 100% of the gpu for having good grafics (that isn't possible anymore on new hardware as you can see if you take a look at VAR-benches on nvidia hardware for example..)

Devulon
05-07-2002, 06:42 AM
I agree there really is only one way to do a DP3 instruction. I definatly expect that to be the same speed on all systems. I can tell you exactly how addition is done in a microprocessor. Its all the same regardless of who makes it. Anyone who has any understand of processor design can show you the exact layout of transistors for just about every basic operation. The thing that concerns me most is branching and conditionals and the fact that when you put multiple instructions together a dependency is formed. For example say we are doing some simple cartoon shading. Dot the norm with the light vector and use that as a lookup into a 1d texture map. And lets say I want to do 2 dots and 2 lookups. Forgive the psuedo code.

temp1 = normal1 . light;
lookup_texture1[ clamp(temp1) ];
temp2 = normal2 . light;
lookup_texture2[ clamp(temp2) ];

Lets say the dotproduct will take the same time (number of cycles) as the clamp and the lookup. The first lookup gets stalled waiting for the first dot product. Thats unavoidable. What the "compiler"/processor must recognize is that the second dot product does not need to wait for the first clamp. The two are not dependant. What I don't want is the second dot product waiting for the first clamp which is waiting for the first dot product. A better way to write this would perhaps be.

temp1 = normal1 . light;
temp2 = normal2 . light;
lookup_texture1[ clamp(temp1) ];
lookup_texture2[ clamp(temp2) ];

In this case the use of temp1 doesn't occur until after the second dot product. Which would help it to avoid stalling. The first lookup will probably still stall slightly waiting for the first dot product. But by the time the first lookup is done the second dot product should be ready to go. Assuming the time as stated above.

Depending on a lot of things the 2 should in the end run the same. But in reality they don't have too. Using the second form helps the compiler to see that the 2 dots/lookups are completely independant.

Anyone who has ever tried to do a software renderer fully understands the importance of timing within groups of instructions. Everyone knows that division takes a very long time. So don't do a division and then try to use the result right away. Find something else to do (that isn't dependant on the division) to fill the time. That is what parallelism is. Being able to do other operations while the division is taking place.

Sorry for the rather vague crap reply here. Its kinda hard to explain all of this. But I am sure you get the idea davepermen.

I don't know if these kinds of things are going to be an issue or not. Its something as a programmer that I may not have to worry about at all. I can only hope that the chip itself has a really nice decoding unit that properly fills the pipes in the best manner. Heck this could just be a nightmare I am making for myself. Then again maybe not.

Devulon

davepermen
05-07-2002, 07:37 AM
you know what?
there are tons of parallel pipelines in the chips, and why are they in?
to take care of such call_it_"irregularities"

means if a texturelookup takes longer than excepted, this only pipeline will stall till it gets its pixel, but while that all the other pixels are continued processing and even if the time for the whole takes then longer, because they are done in parallel the final output is at the same speed (you can see this on geforces and radeons.. no texture against several big textures is not really a speed drop, except you add more texturenv-stages, wich is another topic)

same for branches.. even with branches, the pixels can be processed independent from eachother, and so the parallelism stays the same..

just take the statement of my ex-girl-friend and true-love:
TAKE IT EASY!
(other sence, but it works..)

davepermen
05-07-2002, 07:40 AM
Originally posted by Devulon:
I don't know if these kinds of things are going to be an issue or not. Its something as a programmer that I may not have to worry about at all. I can only hope that the chip itself has a really nice decoding unit that properly fills the pipes in the best manner. Heck this could just be a nightmare I am making for myself. Then again maybe not.

if you read more about the p10, you'll see that the whole power of the p10 is that IT DOES ALL THIS JOB FOR YOU

it fills the pipes as good as it can, and you dont have to worry about..

dorbie
05-07-2002, 12:59 PM
Devulon, most people ARE running different architectures, you have to look at the graphics card in the system, not the processor instruction set for shader compilers. Even chips from the same manufacturers require different code paths.

Devulon
05-07-2002, 02:30 PM
Maybe I am just sad. I feel like I am loosing a friend. I used to get to do all the fancy math and stuff myself and now the damn video card is stealing that away from me. There was a time when games all had crappy graphics. When a game was good it was because you programmed it that way. The more the videocard does the more I feel like the purity of the do it yourself attitude is being lost. I definately thing video cards are heading in the right direction. I just miss the ways of old.

To hell with this I am going to write audio codecs from now on.

Have fun guys and don't hurt yourselves.

Devulon

dorbie
05-07-2002, 04:34 PM
There's lots of hard stuff left to do, even in graphics, you could focus on implementing great detail preserving simplification for example :-). Using OpenGL has never been more math intensive but then you never had to do the really low level stuff with OpenGL. I think the key is to shift your focus of attention to the next frontier which constantly changes.

V-man
05-07-2002, 04:53 PM
>>>and hey.. the whole design of the p10 for example is to make it easy for the developer.. the chip does the whole parallelism for you, you don't even have to care about really.. it will take all the resources it can find.. no need to care..<<<

This isn't something new, pretty much all modern general processors have dynamic issue capability. It makes the die size larger due to increasing complexity, but the result is that even a crappy code can run nearly as fast as software pipelined code.

With graphics cards, we don't need to care about technical details of processors and that's the way it should be. For educational purposes, I would say otherwise.

I certaily agree with Devulon in that part of our task is being taken away by the GPU. That's the way I felt when I first began using OpenGL, but for the sake of hardware acceleration, a nice clean abstract layer... well I love gl now.

What about this fact -> we all will be pretty much coding similar special effects, per pixel, bump, normal map and so on. So we will end up with similar features, so why have a programmable hw for these? My first concern was that it would be slower than a hardwired circuit. Might as well have those "basics" good and ready.

V-man

davepermen
05-08-2002, 02:33 AM
well.. they are taken onto the gpu.. so what?
rastericing went lost as it got into hardware, cause you can't code it then anymore..
but all the stuff now moving onto gpu you have to code as you had to before! just on the gpu, not on the cpu..
its a second processor, nothing more and nothing less..

so what?

you don't loose a friend, you'r friend changed place, and got stronger.. much stronger..

zed
05-08-2002, 12:56 PM
i agree with dorbie (btw hes a star, read his name in the news the other day http://www.opengl.org/discussion_boards/ubb/smile.gif) + davepermen. graphics programming hasnt become anyeasier . yeah getting a triangle onto the screen is easier than before but now instead of a decal triangle that triangle is displaced with perpixel lighting + shadows + what not else. the bar keeps on getting higher

JD
05-08-2002, 05:22 PM
If you feel that you need control you might make a software renderer in which case you're not limited by gfx api. Might even use special 3D cpu instructions to speed up the app. Just because 3d in hw is mainstream doesn't mean that you can't do 3D in software. It would be a good challenge for you since I think you're looking for one.

Actually, I've now switched from dx built in lighting model to pixel lights and it's much more math intensive since I have to do everything myself. I don't think higher level interface to shaders takes anything away from complexity since now the burder of creativity lies on your shoulders. I like programmability in gpu as well as higher abstraction level which allows for a higher complexity in our apps.

davepermen
05-09-2002, 11:29 AM
uhm well.. if you read carefully, this got _ALL_ programable on p10 (about all.. http://www.opengl.org/discussion_boards/ubb/smile.gif)

and you know what? i don't care about.. as i'll move to raytracing on the next hardware anyways.. and never look back.. http://www.opengl.org/discussion_boards/ubb/smile.gif (and for doing raytracing on rastericing-gpu's, you have to know your gpu, you have to know rastericing, and you have to know raytracing down the lowest levels.. so what? the expirience is still there..)

V-man
05-09-2002, 04:53 PM
good point! I just hope things dont get to be too real looking. Then it would kill that cartoony look I got used to in games.

V-man

dorbie
05-09-2002, 08:14 PM
Ahh the old ray tracing myth.

Ray tracing resolves surface visibility and reflection/refraction visibility. Most interest is in improved shading and lighting, for this ray tracing doesn't bring much to the party.

deshfrudu
05-09-2002, 08:35 PM
Yup, ray tracing is way overrated IMO. Besides, we're memory bandwidth limited now and for the forseeable future. I don't even want to think about the cache thrashing that goes on inside a ray tracer. PixelPlanes 5 stored the entire scene at each pixel to address this, IIRC. Ouch.

dorbie
05-09-2002, 08:58 PM
Pixel planes 5 had very little memory per ALU. The implementation I'm familiar with did not raytrace, although it was just a programmable SIMD array of ALUs with a little memory each so it might have been programmed another way in different implementations. In most implementations it evaluated 3 edge equations per triangle to see if it got a hit then calculated Z (more edges for polys). All of the triangles were transformed to screen space, although this was done on a separate geometry engine, the pixel planes 5 I know just did zbuffer visibility and shading. The geometry transform also needed to bin the primitives to chip sized screen regions for efficient processing. Ultimately after binning each ALU array had to be sent the data for every primitive in it's region and each would first determine which was visible (2 tests, the edge equations and the z buffer). It would use the limited available memory to parameterize it's triangle info, like normal color texture coord and light position and texture ID. Without texture it would then do a lighting calculation, with texture every texel in every texture would be sent to the SIMD array and each ALU would pull out the address from the stream it needed based on texture ID and coordinates, texture on PP5 was really a hack to do something it was never designed to do.

[This message has been edited by dorbie (edited 05-09-2002).]

davepermen
05-10-2002, 02:36 AM
Originally posted by deshfrudu:
Yup, ray tracing is way overrated IMO.

first: it solves all our problems (if we can trace enough rays, logically http://www.opengl.org/discussion_boards/ubb/wink.gif)
second: it is damn easy

try doing shadows on todays gpu and you'll realice that its a much too complicated task somehow..

try to get reflections and you have the same problem..

and as we get more and more of these effects we get several renderings from several viewports per frame, wich is technically about the same as raytracing..

raytracing is just so much easier to organise, cause its a general solution.

about realism. sure, simple raytracing does not look much more accurate than current scenes, lightingmodels don't have to change in raytracing. all we get is accurate shadows and reflections and refractions, wich we currently only get for planes.

but, about realism. most of you know montecarlo method, and we all know its slow. but its only slow because it takes tons of rays wich are about the same (and in reality would be done during the same time in parallel). its just perfect for a streaming processing unit in the design of future gpu's.. if we push raytracing then we are in some years really soon at first realtime montecarlo techniques, and you can't say montecarlo images don't look cool.

and about all the hardcore-math-fans who want to be cool: get the metropolis-raytracing method onto a streaming architecture, and we have full global illumination problem solved for realtime apps in about 2 or 3 years!

then finally no one talks about perpixellighting and all that stuff but instead we can concentrate on good quality images, means the graficer has to design good worlds giving cool feelings, and we finally can move back to programming good games wich make FUN..

then the good games come back. finally..

dorbie
05-10-2002, 04:18 AM
No it doesn't solve all our problems.

Solving the same problem with completely different methods does not mean you are using the same method, it means the opposite.

If you examine the problem of shading a surface for illumination ray tracing is no help at all, unless you're talking about an unholy number of rays. You talk about ray tracing as if it magically calculates lighting. Once again, ray tracing does not do your lighting calculation. Just because this is not the dominant feature of a ray tracing algorithm, it still has to be done once you have your fragment and surface information.

It's at best unfair to say that somehow your fragment lighting problem goes away simply because you ray trace.


[This message has been edited by dorbie (edited 05-10-2002).]

PH
05-10-2002, 04:47 AM
Although I haven't done any raytracing I doubt it will be what everybody is doing in a few years. If I'm not mistaken, the complexity of ray tracing is far from (sub)linear in the number of rays. It would require a massively parallel processer to run in real-time. I'm sure hacks are possible but real raytracing in real-time is not likely to happen any time "soon".

dorbie
05-10-2002, 05:05 AM
Most of the smarts in ray tracers goes into speeding the traversal by minimizing the redundant ray-surface tests, usually there's some other structure which the rays traverse to quickly determine which surfaces must be tested.

davepermen
05-10-2002, 05:23 AM
Originally posted by dorbie:
No it doesn't solve all our problems.

Solving the same problem with completely different methods does not mean you are using the same method, it means the opposite.
well.. actually solving a problem that could only be approximated is.. yeah.. solving the problem. what else?


You talk about ray tracing as if it magically calculates lighting. Once again, ray tracing does not do your lighting calculation.
It's at best unfair to say that somehow your fragment lighting problem goes away simply because you ray trace.


grmbl, lost my text due some copypaste.. okay once again.

lighting equation is independend on rendering method.
this is true, no point in this. but on the other side one rendering method can make it more easy to solve complex lighting equations than others..
for doing correct lighting we need a lot of samples per surfaceelement. and in my eyes there a random ray generator is the best approach.. another approach is rendering a hemisphere for every pixel.. but that will take a VERY LONG TIME for having this realtime in a rastericer.. i think raytracing is faster there (except you can connect all the rastericers to one huge renderfarm okay..)

and the first step we want is 30x640x480 rays per second.. i think this is possible on p10.. next would be more than that, to implement simple raytracing algos.. that will mean we get at first phong-lighting working, with real reflections and refractions. that is what most of the people think wich is raytracing.. but we can push more and more rays with future hardware.. say it doubles every 6 months as it does currently.. then this means 60fps with one ray in the middle of 2k3, 120fps yet at the end of 2k3.. on 320x240 this means yet 30fps with 32 rays per pixel. thats enough for a very good realisation of the simple phong stuff with interreflections and everything.

this can be yet enough to do simple demos with smooth shadows..

these 32rays perpixel will be 128 one year later, 512 in the end of 2k5 2kr end of 2k6, 8kr 2k7 32kr 2k8 or 512kr in 2k10 (2010).

then we move back to higher res (we can do this before as well..) and we get 32768 rays per pixel on 640x480 with 30fps.. thats in 8 years.. and that is ENOUGH for solving montecarlo..
and montecarlo is damn nice looking, you can't say thats not true..

ok, this is when everything works good, true.. but tell me how to solve montecarlo with a rastericer... perpixel, with bumpmaps and brdf's and much more. tell me

i don't want to hipe raytracing. i want to stop the hipe around rastericers.. they simply plot triangles. what we see today is damn good plotting of triangles, but its yet quite difficult to make further steps. and all the preprocessing steps for accurate lighting stuff is done with.. say what? RAYS! (take a look at the papers on dx wich are out since some days, or other things..).

and if i can choose between a complicated set up for a fake, or a simpler set up for a solution, i'll choose the solution.. and getting correct shadows is much simpler to do with raytracing (its simply the best example)

shadows = found_intersection_between(pos,light);

more easy than shadow volumes, more easy than setting up shadow-cubemaps.

and first time i can do with ease volumetric objects, like fog.. (i would need some glBegin(GL_TETRAEDER) for this.. glEnd())


and if you would read more about the people trieing to solve lighting problems, finding statistical approaches to solve all the stuff, building up huge camera-arrays to solve the equations not solvable by brain anymore, then you would know they only work with rays.. putting this all into a rastericer is quite stupid.. (universities i talk about, standford has lots of great papers..)

and raytracing is not more complex than rastericing because you need to know the whole scene.. for having accurate refractions and reflections you simply rerender your scene several times, thats because you need this info as well http://www.opengl.org/discussion_boards/ubb/smile.gif for accurate solutions you would need to rerender your scene for every pixel.. but you realise that there its stupid to use a rastericer http://www.opengl.org/discussion_boards/ubb/wink.gif..


oh, and my view is not bound by a frustum with planes. i can see infinitely far, i can see with some viewangles, but not with a nearplane as well. and i don't see lines as lines anymore, but projected as curves..

dorbie
05-10-2002, 06:22 AM
Your point about surface samples to do a proper BRDF (maybe even global illumination) is what I meant by an unholy number of rays, and it's still not the only way.

You're losing sight of the problem. The objective of changing to another system would be to improve performance for equivalent cost, not conduct a science experiment, I'll leave that to the academics. It isn't practical to brute force these problems for reasonably interractive situations, and the stuff you can do practically can be done better using other methods, particularly for unbounded problems.

If you want to monte carlo incident global illumination on every fragment in hardware go ahead, but I think that's trying to contrive a problem ray tracing can handle (slowly) instead of making the most of the available hardware. You've gone from the main objective of ray tracing and it's most obvious strength (shadows and reflections), to the ostentatious treatment of fragment shading.

I look forward to seeing your results.

dorbie
05-10-2002, 06:23 AM
P.S. if your display surface is flat, your lines should be straight after projection, anything else is wrong.

PH
05-10-2002, 06:33 AM
Originally posted by davepermen:
...but we can push more and more rays with future hardware.. say it doubles every 6 months as it does currently..

That's exactly what won't happen unless you're dealing with a linear algorithm ( especially using brute force like dorbie mentioned ). I did a bit of searching and I found a lower bound of O(log n) for ray shooting but with a space complexity of O(n^4). Like I said previously, I don't know much about raytracing but time/space complexity speaks for itself.

Maybe the future is more along the lines of multipass algorithms such as that suggested by Paul J. Diefenbach. He illustrates global illumination using a multipass techniques.

Maj
05-10-2002, 06:53 AM
Originally posted by PH:
Although I haven't done any raytracing I doubt it will be what everybody is doing in a few years. If I'm not mistaken, the complexity of ray tracing is far from (sub)linear in the number of rays. It would require a massively parallel processer to run in real-time. I'm sure hacks are possible but real raytracing in real-time is not likely to happen any time "soon".

Soon enough (http://www.openrt.de/Publications/index.html) ? ;]

I recommend the first paper (http://graphics.cs.uni-sb.de/Publications/2001/InteractiveRenderingWithCoherentRayTracing.pdf) as an introduction. There are no great "hacks" as far as I can tell, apart from using a static, preprocessed scene.

Now, I'm not qualified to judge whether it can ever become practical for heavily dynamic interactive applications (games), but it's certainly working for some applications now, and on consumer hardware to boot.

davepermen
05-10-2002, 06:54 AM
well.. why then doing evolution in hardware at all if you don't want accurate simulation of the reality? there is no need. take a gf4, push tons of triangles and be happy.. i am for 2 years now looking at ways to get accurate lighting in realtime for dynamic worlds. and the more i see the more i have to say brute force with brain will be the only way to solve general situations. as long as we can't solve general situations, we can't render arbitary scenes.. and thats what i want, to have the power to have every kind of world for games...

well its not O(n^4) really..

its quite linear to the amount of rays. means if you have several samples per point and then for every of those samples again several samples per point, then yes, its growing damn fast.. but remember we don't need much samples there anymore really, just because the screen and our eye is not that accurate anyways..

well.. i need several parallel processors doing tasks for me (3 or 4). and they work in linear time on a simple stream of data. current rastericers have 2 processors working on simple streams of data.

sure there is a big difference, but i think its worth it, cause we're currently coming at the edge of rastericers. doing a next big step for more accurate lighting and materials is nearly impossible, because of the local structure of rastericers. i dunno how much we can fake anymore, but i dislike those fakes even now.. (take a look at the water on gf3/4, it looks so faked.. nice but faked anyways.. (i mean the envbump reflections) at least they should have made a z-difference-dependent displacement-factor, but no.. http://www.opengl.org/discussion_boards/ubb/smile.gif)

well no, the lines should not be straight if i capture my scene with a camera with lenses. and like that. no. if i take a cam, go out, film a straight line, then i see the curve on my tv at home.

"..instead of making the most of the availible hardware.." well. i can't get quite much of my hardware anymore, its a gf2mx, and i pushed it to quite much work yet. but as i want to upgrade, i'm thinkin about how much i can get out of the hardware availible currently, and availible soon. and i know how much i could push out of a gf4, and this is simply not worthing it to buy.. funny embross-features, bether lighting shemes, okay. more accurate realisations of my equations on gf2, okay. and its faster, yeah.. but its not a real evolution somehow. and like that its not worthing that much money. i wait for the next gen, and they i push to the limits.

pushing to the limits means getting more out of it that it was thought to, not plotting tons of triangles.. that is _OLD_. now we want GOOD triangles. ACCURATE triangles. STUNNING triangles.

and yeah, i would love to have a pc doing all the stuff bruteforce so i don't need to solve/fake the stuff myself to get around its "slowiness"..

i look forward to seeing my results as well.


but there will be a time where we have to change our roots. the more early we step away from the current roots, the more easy the step will be.. (its yet too hard to simply change, because rastericers are so much faster currently..)

PH
05-10-2002, 07:13 AM
Thanks for the link Maj. That's very interesting. I've had a quick look and it doesn't appear to be applicable to dynamic scenes ( though I have seen RTRT demos with simple moving objects ). I can't imagine complex dynamic scenes in real-time being possible "soon" http://www.opengl.org/discussion_boards/ubb/smile.gif - I still think that there are better ways to achieve equivalent results that scale better with future hardware.

I'm certainly looking forward to seeing what can be done with RTRT ( keep us posted daveperman ).

davepermen
05-10-2002, 07:38 AM
i just suggest to remember one thing.. rastericing is not old (rtrastericing i mean http://www.opengl.org/discussion_boards/ubb/wink.gif) remember the time we had 468er and just could play quake1 on 320x240.. and this with very stupid lighting (and cool lightmaps) and very low amount of everything..

and now? what do we have now? final fantasy the movie with about 12 to 25fps on a gf4..

raytracing is soon fast enough for quake1 on 320x240 on a pc.. but it will start with perpixellighting, all shadowed and reflections/refractions.. where will it be when we made the same steps we've done for now with rastericers?

and even this very simple first "big game" of rtrt will have features in wich are not possible with a gf4 at all.. and not with future rastericer-algos as well..


so what?

raytracing hadn't had the opportunity to be pushed by everyone to the extreme, just because of the initial power it needs for even a small scene..

at the moment it gets that support to be pushed to the extreme, it will grow as exponential as the rastericers did over this time..

and no one thought at the times of quake1 that what we have today will be possible that soon, did you?

dorbie
05-10-2002, 07:43 AM
That's not really the issue, the unbounded complexity of the model, cache coherency is and overarching traversal structure (and associated processing) are. It's an inherently retained architecture with considerable structural requirements like octrees or whatever. Insufficient distinction is being made between various graphics problems. You're lumping them all together reguardless of performance and other existing solutions. Ray tracing is the brute force approach to graphics, that's doubly true when you talk about monte carlo sampling for illumination, which even most hard core ray tracing advocates don't consider practical.

When talking about ray tracing I think you should be clear on what problems you are trying to solve. Not all problems are the same, and because ray tracing can solve it doesn't mean it's the best way.

Just because ray tracing will get faster doesn't mean other approaches won't outpace it. Why should I raytrace when there are faster more practical approaches? Ray tracing for ray tracings sake is not a good policy.

Heck, many people accelerate ray tracing today by getting first hit info using hardware rasterization. With more complex programmable framebuffers and interpolators even more of that kind of thing will be possible. Imagine for example rasterizing various colors, world position normals etc in a conventional rendering pass, perhaps even shading and including a couple of destination alpha terms say for Fresnel reflection and/or refraction term. Then once you know the information on your first hit and have your primary lighting done quickly you could process the reflection or more complex illumination (with ray tracing if you insist).

I'm not saying do it one way, but be pragmatic about your method.

[This message has been edited by dorbie (edited 05-10-2002).]

PH
05-10-2002, 07:53 AM
Dave,
Well, Quake1 was very impressive. The difference is : improving the lower bound of an algorithm is a lot harder ( that's science ) than improving a constant ( low-level assembly hacks ). As seen in the papers ( link from Maj ), distributed raytracing is key to real-time performance ( 2fps is interactive, though hardly real-time ).

dorbie
05-10-2002, 08:49 AM
P.S.

Another point overlooked is that a raytracing interface would probably be some sort of scene graph, unless you intend to implement your own optimized ray database traverser. So folks espousing this need to think carefully about how they would expect to program on such a system. How you give it your data and how you specify fragment shading. Would you expect to do your own math at the fragment and just request arbitrary recursive shading results from any old direction for example or let the hardware just handle the whole thing (clearly you wouldn't be happy with that). The implications for stalling a pipeline seem horrendous.

davepermen
05-10-2002, 10:43 AM
funny to post questions you don't care about, isn't it?

about cache-problems. there are tons of cache-problems with simple rastericers and textures, but i hardly see problems on my gf2mx. why? cause the architecture itself is intelligent enough (and p10 will have MUCH more power in this issue, we'll see if it will have, but according to the info yes)

there is not really a scene-graph needed, you "render the geometry into the world" and the raytracer captures the screenies from this.

but its all open. do what you want. bench to see wich is fastest and wich not.

i don't do any static stuff, cause i hate it. never used glGenLists and glCallLists in my code. NEVER. even if i knew it would be faster, i never cared. so don't think my raytracer will ever use any precompilation. others will. and they will be much faster logically. some will do fixedfunction shading, some will give full access to the programable shaders. there is no point in talkin about that, thats part of the implementation.

my approach is to code a fixed phong-shader with bumpmaps and arbitary number of currently pointlightsources (cubemaps are not that well for software-engines, so no directional-maps..)

quake1 was very impressive. but well. when i play it now, and take a look at the same time at halo on xbox for example.. well.. its crack http://www.opengl.org/discussion_boards/ubb/wink.gif

i don't do assemblerhacks, i'm a too new guy in the whole environment, i am born with c++ so.. i know asm but i'm not that familiar with it. i always code in c/c++ and all optimisations i do are algorithmical (and simply coding only fast code http://www.opengl.org/discussion_boards/ubb/wink.gif). and well. my algorithm is only O(n*m) for now, where n are the amount of rays and m the amount of intersectiontests needed per ray. now n varies dynamically, and m is dependend on how you manage the scene, yes. if you have some tree then you don't need to test that much http://www.opengl.org/discussion_boards/ubb/smile.gif

Korval
05-10-2002, 11:10 AM
and the first step we want is 30x640x480 rays per second.. i think this is possible on p10

Um, the P10 is not a CPU. The P10 may be very programmable, but it is still a polygon scan converter, with scan conversion hardware and pixel pipelines and so forth. You can't make it do ray tracing.

davepermen
05-10-2002, 11:35 AM
Originally posted by Korval:
Um, the P10 is not a CPU. The P10 may be very programmable, but it is still a polygon scan converter, with scan conversion hardware and pixel pipelines and so forth. You can't make it do ray tracing.
http://tyrannen.starcraft3d.net/loprecisionraytracingonatiradeon8500.jpg

radeon8500 at raytracing

what can't i do?

Michael Steinberg
05-11-2002, 01:24 AM
Davepermen,
where are the casted shadows on that pic?
Also, how would raytraces shadows do smooth shadows? Moving the light around and averaging?

davepermen
05-11-2002, 01:38 AM
okay. read the siggraph paper about it. why its just so crap? mostly because the whole vector math happens with 8bits per component and they are happy to just get a raytraced first hit working. but they describe the algo for all additional modes. and as you possibly know, there is no difference between adding a shadow ray or simply a first hit ray, its just an additional ray (say an additional pass on the ati i guess..). i think you see that it is _VERY_ low presicion http://www.opengl.org/discussion_boards/ubb/wink.gif

siggraph 2002 documents have it all..

dorbie
05-11-2002, 07:14 AM
The cache problems with rasterizing textures are handled well because with MIP mapping and screen region local rendering you get a lot of reuse and even sequential access. You are saying that because one class of problems has been solved that another is equally tractible!

Again you don't draw any distinction between the various problems you want to solve. As for using better than 8 bit ALUs that buys you some raw compute power, but doesn't resolve the other issues.

This is all slightly amusing because I suspect that you'd hate a real ray tracing system because of the development paradigm it would impose on you. You've ignored this and many other points I made.

It seems that for very complex databases a ray tracing approach makes most sense for first hit provided your memory system can store the entire scene and you do indeed get some cache coherency there but the paradigm is an inherently retained mode system of rendering, those two things are a bad combination. Anything resembling state changing happening at the fragment level depending on results. This isn't of course why you extoll the virtues of ray tracing, all that other hard stuff you mention is a different class of problem altogether with completely different issues. This is why it is important to draw some distinction between these problems.

It would also help if you were prepared to think about how you'd want to program a system like this, because it wouldn't be about implementing ray plane intersections & bounds tests, most of the efficiency would be from minimizing the tests through some traversal of an abstract data representation, would this be imposed by the hardware or would you write it?

A central feature of a programmable system like this would be the ability to write a shader which requested an incident color (radiosity) at the fragment level from an arbitrary vector traced into the database. So you have a rasterizer requesting (and possible blocking on) ray database intersections. Would this be cached in a really deep framebuffer and post processed with alpha in a subsequent fragment pass or done there and then blocking on the results? One doesn't seem too programmable to me, or even as recursive as it needs to be, the other seems to have nasty implications for stalling the fragment processing. Discuss.

Remember that at the fragment level you're going to need your big database traversal with abstract structure to optimize and minimize the number of ray/primitive tests.

[This message has been edited by dorbie (edited 05-11-2002).]

dorbie
05-11-2002, 07:34 AM
P.S. no Dave, it's a very simple teapot, there's no scene or shading complexity and the work to perform a trick like this is just going through the motions. I might use some arithmetic overloading of various shading operations to perform a ray triangle intersect (and edge test) in hardware, but if the rays don't selectively traverse the database there's no point. A key issue is how this was implemented, showing every triangle to every fragment doesn't win you anything, so the devil is very much in the details here. To show a picture and say "see it can be done" shows you either don't understand this or are deliberately throwing in a non sequitir.

I can see we're going to have a spate of "hardware ray tracing" on new hardware as it arrives, I can hardly wait, sigh.

[This message has been edited by dorbie (edited 05-11-2002).]

davepermen
05-11-2002, 08:15 AM
Originally posted by dorbie:
The cache problems with rasterizing textures are handled well because with MIP mapping and screen region local rendering you get a lot of reuse and even sequential access. You are saying that because one class of problems has been solved that another is equally tractible!
dependend texture reads are BY NO MEANS AT ALL a lot of reuse and prefetch. learn why gpu's can work that fast anyways (because while waiting it can continue the next one probably? because of parallelism of tasks? uh..)



Again you don't draw any distinction between the various problems you want to solve. As for using better than 8 bit ALUs that buys you some raw compute power, but doesn't resolve the other issues.
well.. having floating points for pixelshading will solve very much problems for rastericers as well. for raytracing it means that the restriction between pixel and vertex shaders drop, so they can be used equally, no rastericer needed really between.. that does not _solve_ anything, but is useful to save a lot of tweaking.

what it solves is easy visible. as you have to store all the geometry in 8bit per component, you only have a 256x256x256 scene currently to trace.. not _that_ wonderful..


This is all slightly amusing because I suspect that you'd hate a real ray tracing system because of the development paradigm it would impose on you. You've ignored this and many other points I made.hm? its very funny to see that you don't really know how to get raytracing to work on gpu's, and thats all i'm talkin about. on normal hardware (cpu's) its an easy task to get it working, and even if my ones are not very fast (no assembler used, no sse, no 3dnow, no mmx or something, wich result in quite low speed on a p3 500) they work quite good.

on the cpu you'll get a lot of other problems than on a gpu, because of stalling and memory-accesses etc. this is much less a problem on gpu's, cause they are designed to manage those problems themselfes. thats not a feature of a rastericer, thats a feature of the hardware..


It seems that for very complex databases a ray tracing approach makes most sense for first hit provided your memory system can store the entire scene and you do indeed get some cache coherency there but the paradigm is an inherently retained mode system of rendering, those two things are a bad combination. Anything resembling state changing happening at the fragment level depending on results. This isn't of course why you extoll the virtues of ray tracing, all that other hard stuff you mention is a different class of problem altogether with completely different issues. This is why it is important to draw some distinction between these problems.
well.. to the state-changing.. i can manage to have 256,512 or 1024 (using these number to give you a hint: *HINT*) textures on gpu's from gf3 on per pixel, just choosing wich one i need. state-changing?.. well.. the objects possibly use different material-parameters, but the material-structure will be the same (at least, for the start), so its sort of fixed function pipeline. i don't need more, i don't even need more on programable rastericers, i just use the programability to set a nicer base-material up..


It would also help if you were prepared to think about how you'd want to program a system like this, because it wouldn't be about implementing ray plane intersections & bounds tests, most of the efficiency would be from minimizing the tests through some traversal of an abstract data representation, would this be imposed by the hardware or would you write it?
thread is about programability on next hardware gen.. guess what? i will program this next gen hardware to do the job for me.. more or less.. how to do? UTFG.. use our lovely google and finally read the document wich gives nice inspirations.. that is the name: "Ray Tracing on Programmable Graphics Hardware" and the filename (should be enough for google) : "rtongfx.pdf"


A central feature of a programmable system like this would be the ability to write a shader which requested an incident color (radiosity) at the fragment level from an arbitrary vector traced into the database. So you have a rasterizer requesting (and possible blocking on) ray database intersections. Would this be cached in a really deep framebuffer and post processed with alpha in a subsequent fragment pass or done there and then blocking on the results? One doesn't seem too programmable to me, or even as recursive as it needs to be, the other seems to have nasty implications for stalling the fragment processing. Discuss.
hm.. little iritated about the text (in this editbox here in i can't read it that well..)
i'll comment that later..
but i know one thing.. i talk about a raytracer-engine, not a raytracer-api. i don't need programability at the end, i need a raytracer..

on the next gen hardware it will be quite difficult to arange all to get it working, thats right.. it will be the first time where it is possible and thats cool enough to push it. if hardware-designers realise that pushing raytracers could mean money (at least, for workstations for example..) they will design hardware more "raytracer"friendly.. means letting the rastericerpart be enable/disable-able..(what a word..)
we'll see..
i don't see much problems in most of the problems you suggest, i know they _ARE_ there, but they are not major problems.. they are just used everytime to bitch on raytracing that it will never be possible. but it is, quite well even, and this quite well running stuff is only on a p4 2gig, not on hardware designed for raytracing..

dorbie
05-11-2002, 10:57 AM
Dave the issue is not whether it can be done but whether it's worth doing. You can insult by implying I don't know how to do this but in reality some of the implementation decisions are arbitrary, and if you don't care about relative merits then what's the basis for evaluation? Ray tracing == good?

A lot of this is in line with my expectations, especially the multipass for reflections & storing vectors to the framebuffer, and the need for a higher level traversal as the key component. I just don't see how you can take this paper as a basis to refute all of what I've said.

The most interesting part for me was the ray traversal of a voxel texture to do a dependent fetch of a geometry texture, I hadn't anticipated that, largely because the problems with it seem obvious. Maybe they can keep it on chip and make it go merely slow for modest models.

It's not clear how this relates to the ATI paper, I doubt the interesting part is in any way related.

This is not a discussion we'll agree on I think since you're advocating ray tracing rather than evaluating it, and have cited features that are beyond the merely impractical to bolster the case.

[This message has been edited by dorbie (edited 05-11-2002).]

davepermen
05-11-2002, 11:21 AM
yep. why this whole issue?
because raytracing is good.
i want simple correct accurate representation of the realworld. the only way that _can_ do it as near as we want with enough horsepower _IS_ raytracing, the _ONLY_ way. rastericing _CANT_ do any of the more complex stuff without hacks, raytracing can. (and if you take a look at todays complex stuff, they all _work_ with rays till it gets merget in some textures to do the whole on rastericers then, but all the job before has to be done with raytracing...)

we can't stay for ever with the funky rastericers, they don't solve anything more than plotting triangles on screen. what they can do with this is awesome, and i love shrek, but this does not mean we can do everything with it generally. we have to do special hacks. thats why we need a different shader for every stupid feature. we can't just have all the features all the time availible.

raytracers are an easy and general solution, you don't even need linear math to use them (i mean matrices), you can stick with quaternions, vectors, and calculus for the more complex stuff.

today, no, raytracer can't do what a rastericer can in the same speed, but just saying hey, rastericers are fully boosted now, raytracers not, does that mean we have to drop raytracers for ever? can't we push them anyways to get them faster and faster till they _CAN_ overtake rastericers? and nextgen hardware is the first giving the possibility to power raytracing up with help of spu's, wich is quite handy

just open your eyes, those fakes don't help you out that long.. funny small waterriples on reflections are nice, but if you want huge stormy waves as in "perfect storm", you have huge problems..

if you want do do volumetric fog (and a lot of people i've seen liked to and failed), you have to hack around with your scene-graph quite much..

global effects need a global knowledge of the scene.. reflections need them (real ones as well as fakes), fog needs it, transparent objects need it (there we z-sort) and and and..

raytracers mean you don't have the problem of switching between planar reflections to cubemap-rendering for different reflection-objects
raytracers mean you don't have to z-sort anything at any point in the rendering
raytracers mean you _can_ have everything realtime

IF we get the hardware-vendors to do some support. but as long as we all sit on geforces and love nvidia we can't make any step further (we didn't, even with the funky shaders they gave us.. wich aparently are _NOT_ really programable, texshaders i mean.. and rc's i have enough of them on my gf2 to do whatever i want)

vertexprograms are great, but at the end all i use to do is pushing trough some interpolated point_to_point vectors, to do the whole shading perpixel, as vertexshading is a hack we have to drop as well (why should we push thousands of triangles if one represent the flat wall?)

and rastericers are possibly more easy simply for guys who don't like to sort their data, but i prefer having my stuff structured and clean, and then, raytracing is _VERY EASY_ compared to rastericing..

zed
05-11-2002, 11:49 AM
i feel this is a lost cause, but dave (i assume its your first name) at one stage everybody in their graphics programming lives believes that raytracing is the holy grail, but after u discover a few of there limitations u normally change ya mind, eg raytracing doesnt tend to do GI very well (unless u pump a sh*t load of rays into the scene which is not gonna be practical even if cards can do 1000x the tests)

to contradict myself http://www.opengl.org/discussion_boards/ubb/smile.gif i mentioned this before but i think opengl2.0 aint gonna have display lists (personally i dont use them in opengl1.X) but i can see a use for them in the future if the card (in virtual memory whatever) could store the whole scene, think of the benifits, GI, reflections etc. then again theres the problem of hows the card gonna store that 1000km2 scene with hugely detailed trees etc. all the stuff im doing is moving towards this,its like imploding, the grand unified theory of CG. definitely i feel we're in a golden age of computer graphics http://www.opengl.org/discussion_boards/ubb/smile.gif

dorbie
05-11-2002, 12:05 PM
Ray tracing doesn't mean you can have everything real-time. Yes it's simple, it's a one solution to solve many problems, the Swiss army knive with one blade, but for many problems it is brute force and slow. This is why people generally don't differentiate between problems when they advocate ray tracing. I think it's important to do that.

As for sitting on GF and loving NVIDIA, what has that to do with anything? They are going FP if this paper is anything to go by but they aren't doing it just so you can write a ray tracer in their fragment processor.

dorbie
05-11-2002, 12:15 PM
zed, if you look at the paper Dave cited they assume in future you will be able to store triangle geometry in a 3 component floating point texture and index to it using a dependent texture read (after a voxel ray traverse to a triangle list, both in texture), so your texture would effectively become your on card geometry cache. Display lists would be redundant. That's the essence of their ray database intersect. I assume you'd invoke the ray trace of the scene by drawing a single quad over the frustum with an interpolated 3 component ray vector which would be used by the fragment geometry engine.

A key issue it seems is the model size, you're fetching texture during a ray traversal of a regular grid. You must partition your model, you have a lot of texture fetches striding through memory for each fragment, but you have to trade this number against the triangle intersect tests in the voxel. That's an unhappy tradeoff, but there may be other structures that would work better for bigger more detailed models. They claim they're compute not bandwidth limited, but how does strided fetch latency affect real performance? Very badly I'd guess unless you can keep the voxels on chip, which means keep it small which means more triangle tests, maybe if you had a 1 bit texture and only did the fetch of the triangle pointer from the bigger off chip voxel list it would help, but they store a full pointer at every voxel in the paper, maybe if you had 3 voxel textures depending on ray orientation to improve fetch coherency...? Maybe a vector based prefetch...?

Anyway it's fascinating, and I'm not trying to pick holes in a great piece of work.


[This message has been edited by dorbie (edited 05-11-2002).]

davepermen
05-11-2002, 12:24 PM
well.. i see a lot of people around who thought that a gf3 was the holy grail, then they now think gf4 is the holy grail and so they continue buying every piece of them not looking left or right. currently the best solution for programable hardware is shown by ati, and next are following soon. but everyone sticks around on texshaders and registercombiners, wich are both very crappy to program on them (don't ask me, i know rc's, i've studied them over 2 years now and i know how powerful they are, but anyways they are crappy to use)

i don't see any point in your stuff. yeah rastericers are pushed to the limits and very fast like that, but hey, they should not let you close the eyes for the rest.

i think most of you are blinded by the truth that rastericers are not very good for the stuff they actually do today.. thats why we start pushing extensions like rendertexture to render several views of the whole scene for every place we need a reflection,refraction or soon for lighting as well. this leads to renderings of our scene up to several 10's or 100's of times (depending on what ya wanna do)

at this point raytracer catch up, because they need much less switching of memory-targets and all.. in fact, a raytracer can be coded that the raytracer itself needs 10floats for itself, and the screenbuffer to draw in..

always remember that hitting a scene with N triangles means O(logN) in a good raytraced scene.. in a trianglesoup it means O(n), for sure.. but hey, really.. todays hardware is fast enough for pushing through a big trianglesoup for you and you don't have to care.. AS LONG as you don't want to use the advanced features of the hardware.. if you want envbumped surfaces everywhere around and all, you have to start organicing your scene again, culling like stupid and lodding as well..

well, with a framebuffer possible to store a whole ray you can do even a polygonpusher hardware like before, simply draw your triangle with another way, but you have to draw every triangle today, so why not intersecting it with a bunch of rays instead of plotting a bunch of pixels? no big difference.. and for the additional reflections, refractions you have to push them trough again one time.. this leads to one pass first visibity check, one pass shadowing, one pass first reflection resulting in three passes, tree times all your triangles drawn.. that is not _that_ much work really.. does _not_ need good scene-graphs, does not need intelligence, all it needs is a different pixelshading setup..
well yes, its bull****, but it runs quite fast.. just need hardware with huge fillratio, so a gf4 would be even enough..

wazzup? where is your motivation to start helping on the _R_evolution instead of following the _evolution of the others?

john carmack pushes hardware to build in features for him, all these new features are due some inspiration from him (and some others).. that means if we get some raytracing-architecture working with some crappy demo, then hardwarevendors can get inspired as well and build the new exts in for faster doing the same.. they did it with shadowmaps, they did it with depth_clamp, they did it with textureshaders, why not some raytracing-helpers? (if you go on the nvidia page and look at a lot of old demos, they do stuff like 16bit shadowmapping, enviromemntmapped bumpmapping and much more, damn slow, but they do it.. before the hardware came.. this is a big *HINT* for everyone needing a feature on next gpu's)

dorbie
05-11-2002, 12:45 PM
Dave, in reality there have been people in the industry advocating floating point framebuffers when Carmack was still writing his Doom engine and advocating perspective correction in software engines (I'm not saying I was one of them I wasn't).

Carmack has been the most visible and recent of them. A few years ago it wasn't clear that game cards could ever justify extended range and precision framebuffers. Anyone who wrote even a simple emboss bumpmap algorithm instantly saw a need for more precision, and signed arithmetic that doesn't clamp at 0 or 1, (I mean implemented as opposed to lifted verbatim from someone's example code). Even with this revelation you'd typically have been late to the party.

I think Carmack and his advocacy deserves a lot of credit for giving hardware developers some faith that if they built it developers would exploit it, but it didn't hurt that the hardware developers who left SGI had just spec'd out or worked on a piece of hardware that had floating point framebuffer and fragment arithmetic support. It didn't hurt that Carmack saw a presentation from Mark Peercy of SGI that showed that with FP framebuffers and 'dependent texture' you could implement renderman with multipass on hardware, although Carmack had been advocating ERP before then, I think he became bullish on real FP right around that time.


[This message has been edited by dorbie (edited 05-11-2002).]

davepermen
05-11-2002, 01:56 PM
i'm with you.. http://www.opengl.org/discussion_boards/ubb/smile.gif

when you look at the nvidia papers explaining the so new features, you see references to documents from 1960 and such.. well.. our "problems" are old, and the solutions, if existing are found long ago.. thats not the point, thats just plain fact.. (there are even new topics coming, but they are so complicated, my brain crashes on them just reading the title http://www.opengl.org/discussion_boards/ubb/wink.gif)

all i want to say is, its quite easy to change the future of hardware, if you show up something cool. show it slow, show it with software, but show it. i try to get now a simple gf3 or something to show up a simple raytracer with it.. all i need is much passes, so i hope to get some gf4ti4600 to bench it with that hardware..
but it will be a funny hack, nothing more.. as long as there is no floatingpoint calc everywhere, you can't represent a scene in a good way (mostly because of the 0-1 range, HATE IT HATE IT HATE IT http://www.opengl.org/discussion_boards/ubb/wink.gif)

think i buy now a new pc.. full with nice fast amd cpu and geforce4 thought i hate both .. they are mainstream and i want to show what mainstream yet can do, so i can't code for an ati radeon8500 currently, its not a mainstream hardware.. hope the radeon10000 will be bether, go for ati!. then i'll buy such a card and work on this.. and and and.. (oh, and i think amd are damn cool x86er cpu's, but.. well.. x86 is **** imho. it works but its (mostly the fpu) design is stupid somehow.. good that i have a compiler that does it for me, thats why i dont move to sse2p4 http://www.opengl.org/discussion_boards/ubb/smile.gif)

and, well.. one time.. we'll see global illumination realtime.. and no one will say he does not want to.. it will make games looking more real than episodeII, and it will actually help solving a whole lot of problems, say for the home-enthusiasm, who want to place his dolbi digital surround sound the best way in his house, can calc how the sound flows. architects can do all the design in realtime and all the time see how it looks, the resting power can be used to calc if the building will be stable..
movies can be realtime manipulated, news-reports on tv don't tell the truth anymore, big brother is watching you.. the future is bright, go for it! http://www.opengl.org/discussion_boards/ubb/wink.gif

dorbie
05-11-2002, 06:08 PM
One last niggle Dave, you said:
"well no, the lines should not be straight if i capture my scene with a camera with lenses. and like that. no. if i take a cam, go out, film a straight line, then i see the curve on my tv at home."

You have a monitor or other display. It's flat and has a rectangular display region. It forms a projection with your eye looking at the monitor. If any straight line drawn on the monitor is rendered as curved then it is geometrically wrong. It has indisputably been rendered with the wrong projection. Lenses of all sorts, even wide angle lenses are all designed to produce straight lines and avoid what they call barrel or pincushion distortion, only some radical fisheye lenses have anything else as a design criteria. Zoom lenses tend to have very slight barelling when wide and pincushion when at full zoom because of design limits, this is considered an undesirable artifact. If you insist on reproducing this it can easily be done with render to texture and conventional graphics.

[This message has been edited by dorbie (edited 05-11-2002).]

GeLeTo
05-12-2002, 12:00 AM
Most of the CGI movies nowdays are not raytraced!!! Most of them (All Pixar ones, Final Fantasy...) are rendered with the PRMAN renderer which does not support raytracing yet. I think the PDI renderer(Antz,Shrek) does not support raytracing too. Raytracing anything that does not involve uber-realistic reflections and rafractions is MUCH slower and gives you the same visual quality.

zed
05-12-2002, 11:54 AM
Originally posted by dorbie:
zed, if you look at the paper Dave cited they assume in future you will be able to store triangle geometry in a 3 component floating point texture and index to it using a dependent texture read (after a voxel ray traverse to a triangle list, both in texture), so your texture would effectively become your on card geometry cache. Display lists would be redundant.
ok i understand, nice method which ild honestly never thought of before http://www.opengl.org/discussion_boards/ubb/smile.gif
though by display lists i did mean something completely different to the current method

davepermen
05-12-2002, 12:08 PM
thats the deal. vertex arrays will be the same as textures, and this yet quite soon (both store floatingpoint-values, its just a mather of the api to give a generic buffer to use instead of textures and vertexarrays and indexarrays and all that stuff..)

hope that comes soon, gives the possibility to render into a vertex-array, means updating physics on gpu for example.

dorbie
05-12-2002, 04:56 PM
Dave, the paper describes two implementations, the useful one assumes branching and looping in the fragment instructions. The other is multipass & branch with stencil for all processing, that would be REALLY slow, not to mention brute force with all fragments doing the full work even with an early hit.

I don't think you'll get the interesting programmability in the fragments in next generation hardware. It's clear these guys have inside knowledge of future hardware and have let the cat out of the bag in their paper with the distinctions they draw.

The key question is; is their branching looping fragment processor really on the drawing board or is it just on their wish list? I'm sure they're lobbying NVIDIA who seem to have been very helpful so far, or did NVIDIA have it planned anyway?


[This message has been edited by dorbie (edited 05-12-2002).]

V-man
05-12-2002, 07:07 PM
Originally posted by GeLeTo:
Most of the CGI movies nowdays are not raytraced!!! Most of them (All Pixar ones, Final Fantasy...) are rendered with the PRMAN renderer which does not support raytracing yet. I think the PDI renderer(Antz,Shrek) does not support raytracing too. Raytracing anything that does not involve uber-realistic reflections and rafractions is MUCH slower and gives you the same visual quality.

Hmmm , I thought that all of these CGI used raytracing in the end.

Modelling -> rendering some basic scenes with opengl -> raytracing to acheive the final product

I've heard many times that these guys have hundreds of SGI's running in parallel for months to generate a few hours of film.

I havent yet dealt with raytracing, but from the images I've seen (since childhood), it appeared to be the holy grail. What is really important, is the math for the lighting and also the detail of the models.

At least so I thought. The methods involving textures in some way to fake detail has impressed me, but they dont solve for the general case. The general case requires extreme detail and heavy computation... there's no escape.

I have a question for the experts: why textures at all. Why not subdivide a huge triangle that is to be textured into as many needed so that each receives a texel (a single color) and render like that?
Bilinear or trilinear filtering is cheaper than transforming vertices?


V-man

dorbie
05-12-2002, 08:09 PM
V-man, this is largely wrong.

For these kinds of things nobody uses OpenGL for production rendering. The modelling tools use it for interractive modelling. SGI used to dominate both tools and rendering, but that has changed. _Maybe_ there are those who accelerate the rendering with HW OpenGL but I doubt this happens in most production. Live broadcast stuff is the exception where OpenGL is used.

Generally they use SGI systems (typically desktop systems) for tools (Maye etc) (again this is changing, lots of NT & others), and render farm clusters running software rendering engines. The rendering farms are often heterogeneous but it's usually the best price/performance they can buy at the time that will run their software with minimum hastle. Some also use the workstations for rendering when they are idle or have a spare CPU. It's definitely true that many of the most popular (and fastest) renderers don't use ray tracing. They often don't use the renderers that come with the modelling packages but buy or write software that imports renderman scene descriptions and renders that.

This makes a lot of sense when you consider that many scenes are composed after rendering and are actually rendered as separate layers or composed with live action. Fire a ray out there and unless you have made an environment map it hits nothing because there's nothing to hit.

Even the rendering engines which do ray trace have options to use shadow buffer stytle schemes rather than ray trace for occlusion, it just makes more sense.

Stuff like leaves shadowing a character through a forrest backdrop is usually a projected texture and almost never actually ray traced. For one thing they prefer more artistic control of the shadow and to adjust the trees to get an effect is impractical.

Disclaimer: I don't actually do this stuff, I know a few people who either do or did.

V-man
05-12-2002, 09:07 PM
Dorbie,

that is certainly possible (software rendering with multiprocessor systems), but having hw acceleration has some value. Also OpenGL's design is excellent. What else do people want for a quick rendering. Perhaps OpenGL's extensions make things problematic or these companies are running on old scalable (expensive) machines so they dont care for an upgrade??

Maya, 3D studio, SoftImage all use OpenGL. I know one of them has a speedy raytace for a quick view.

V-man

dorbie
05-12-2002, 09:16 PM
V-man, HW acceleration might help, but generally it's not how things are done btw I wasn't meaning accelerated rendering but the acceleration of a rendering algorithm for example first hit in a ray tracer.

In future more will be possible but right now the framebuffers lack the programmability and the precision. OpenGL's design may be excellent but it cannot currently support what is needed for production rendering.