PDA

View Full Version : // ;-p



Ozzy
05-04-2002, 02:56 AM
Hello,

I'm using GF boards and static data with VAR extension and i'm trying to get the best performances with parallelisation in mind.

When i say parallelisation, i don't think to multi-threading or such things, i just want to order things suitable for the best CPU vs GPU efficiency! ;)

So in other words:
I'm displaying my scene with one frame delay, then pre-ordered primitives (MIPMAP_LEVEL->MATERIAL_IDS->LIGHT_IDS)
are ready for drawing and with static data and VAR everything is located onboard and should be cool for the GPU to work alone with its own memory).
After this filling sequence CPU shouldn't deal again with OpenGL or maybe with little states changes like Fog &| Ambient and so on.. (is this a problem? could it broke the // process?)
But anyway, the most important job for the CPU after this is:
to insert primitives in the OrderingTable for the next frame (game display).
game management (entities behavior and so on)
music replay.

So what i'm trying to do?
My goal is to make the GPU & CPU work independently on their own side.
As an example, the more CPU time u use before sending primitives to draw your scene, the less GPU time you'll get
to draw it until the end of the frame/vertical retrace period (please don't tell me to work with vsync off, thx)
just bcoz GPU will start its work after a bunch of burned cycles by the CPU!! ;))

note: Unfortunately i'm unable to test this on GF coz i'm working on a TnT2 at the moment and displaying primitives with this kind of implementation is sequential and wait until drawing is complete! :(
Thus it doesn't change anything to send from the begining of the frame or when CPU has finished its others tasks.

what do you think of the technique? Is this the best method for GPU efficiency?

Moreover, i've noticed strange results with & without ordered primitives using LIGHTS_IDS for final sort method.
Well, it looks that undordered prims (per light) is faster using GPU for lighting while nicely ordered prims is a faster method using others FPU/SIMD GL implementations...

funny eh? :)
Any idea?



[This message has been edited by Ozzy (edited 05-04-2002).]

jwatte
05-04-2002, 07:51 AM
What is your question?

It's up to the driver to give you all the parallelism your hardware can provide. As long as you don't read any data back, and stay on the hardware path (which is fairly wide these days) I've found nVIDIA hardware to be able to enqueue pretty much any operation and return from the driver quickly to let me get on with my CPU work.

If you use triple-buffering, then there's no reason for the CPU to ever wait for the GPU for any reason, unless you do get/read of state, or if you render a frame faster than the time it takes to display.

V-man
05-05-2002, 08:54 PM
>>>>>>So what i'm trying to do?
My goal is to make the GPU & CPU work independently on their own side.
As an example, the more CPU time u use before sending primitives to draw your scene, the less GPU time you'll get
to draw it until the end of the frame/vertical retrace period (please don't tell me to work with vsync off, thx)
just bcoz GPU will start its work after a bunch of burned cycles by the CPU!! http://www.opengl.org/discussion_boards/ubb/wink.gif)<<<<<<

OK, but you need to know when to begin processing so that you will be in synch with the v-retrace. Someone asked a question like this a long time ago.

The timing table should look something like this, assuming you have a copy of the data for GPU side and a copy for CPU side:

| 1/frequency seconds |
-------------------------------------------
issue &#0124; &#0124; process &#0124; &#0124; GPU
opengl &#0124; &#0124; geometry, do &#0124; &#0124; completed &#0124; &#0124;
commands &#0124; &#0124; AI on CPU &#0124; &#0124; scene, flush, swap
------------------------------------------

So you issue opengl commands, which go through the CPU, then you begin processing for the next cycle (next scene), meanwhile GPU is executing opengl commands, and GPU and CPU finishes just before the retrace, and the next cycle begins.

Hope the idea is doable.

V-man

[This message has been edited by V-man (edited 05-05-2002).]

[This message has been edited by V-man (edited 05-05-2002).]

jwatte
05-06-2002, 07:03 PM
You only waste CPU cycles waiting for the vtrace, and GPU cycles not drawing, if your card doesn't do triple buffering. With triple buffering, or with vsync turned off, there will be all possible overlap between the GPU and the CPU.

The easiest answer to your question is: "turn off vsync in the control panel"

Ozzy
05-06-2002, 10:16 PM
Sorry Jwatte, i *don't* want to turn vsync OFF bcoz i want a nice & smooth refresh! ;)
Moreover, it already takes less than one frame to display, so now.. I'm asking myself and others advanced coders who have to face this kind of probs, how can i get the best GPU performance during a Vertical Retrace period ?(with Vsync ON ofcourse).
The cherry on the cake would be to get the CPU & GPU to work separately on their queued tasks! ;) So i was wondering if some internal GLstates changes could force the card (GF,Radeon series) to finish drawing in a finite time (kind of blocking states changes) and if there were limitations with the amount of queued items (displayLists and so on) that would also cause the GPU to wait until some slots are available in the queue..

But please.. everytime it is the same answer with this disabled vsync!! ;) ) i don't want to get 900FPS while drawing a spining cube here! ;) i want to draw a maximum of primitives in one frame within Vertical retrace period with the best & appropriate use of today's HW T&L architectures. ;)



[This message has been edited by Ozzy (edited 05-07-2002).]