How to enable SLI operation?

Hi,
As the nForce4 SLI motherboards are becoming available, I have a question:
When, if ever, will it be possible to enable the SLI processing from a custom CAD
application, or GPGPU application? AFAIK at this moment the driver decides which
applications can use SLI. I would also like to know if there are some application
design guides for SLI enabled operation.
Thank you

Wait, so SLI only works on apps that have been “enabled” by the drivers? Are you sure? Can anyone confirm or deny this?

If it works on any app, does it matter if the app is windowed or fullscreen?

If it’s anything like the old Voodoo2 SLI system, then that’s bollox…any application will benefit.
I would imagine there would be choices in the display settings panel though…is your application vertex bound or fillrate bound, for instance. This would determine how the work is divided. Interleaved scan lines (transforms would occur twice, but fillrate and shader performance would be doubled), or alternate frame rendering.

The driver includes profiles for applications that have been tested with SLI, but you can add new applications yourself.

Our GPU programming guide includes some advice on SLI (p.53):
http://developer.nvidia.com/object/gpu_programming_guide.html

For GPGPU applications you probably wouldn’t want to use SLI mode - you would treat the two cards as entirely separate devices, each with their own graphics context.

How would you create a separate GL context on each card?

Normal context creation goes something like this:

  1. Create a window using CreateWindow()
  2. Get a DC for that window using GetDC(hWnd)
  3. Choose a pixelformat and call SetPixelFormat() to set the pixel format for that DC.
  4. Call wglCreateContext(hDC) to create a render context.

What steps in this procedure would change?

I don’t see any option/variable anywhere in typical OpenGL startup code to select which gfx device to use. Is there some window’s method to associate a particular DC with a particular device (gfx card)?

Originally posted by knackered:
If it’s anything like the old Voodoo2 SLI system, then that’s bollox…any application will benefit.
I would imagine there would be choices in the display settings panel though…is your application vertex bound or fillrate bound, for instance. This would determine how the work is divided. Interleaved scan lines (transforms would occur twice, but fillrate and shader performance would be doubled), or alternate frame rendering.

It’s not like Voodoo SLI, the screen is divided into two portions top and bottom (possibly with changing coverage frame to frame based on load) rather than the original SLI scheme. SLI I think stood for scan line interleaved, the NVIDIA scheme doesn’t, the label is marketing and probably more than a coincidence. In addition the Voodoo was a simple glide engine with basic MiniGl support. I’m not sure if SLI even worked with readback.

I suspect (but don’t know, I’ve never tried teh setup)that issues with the Nvidia and SLI may stem from dependency issues like copypixels. Sure these could be solved with a lot of driver work but maybe it just ain’t worth it or it’s a work in progress there would be stalls involved and complex data transfers either via the host to through the SLI card interconnect. That could be a lot of work and may take time to get coverage or make it robust. On the face of things without framebuffer dependency issues it’s difficult so see why any application would have a problem utilizing any SLI implementation of either flavor.

P.S. on second thoughts there’s an outside chance there may be classes of dispatch not covered for broadcast to both cards but this would really be sloppy driver coverage.

Maybe you could ask NVIDIA to characterize the types of rendering actions that aren’t compatible with SLI :slight_smile:

Thank you for the answers, and shame on me for not reading the GPU_Programming_Guide first.
The main drawback of SLI seems to be that the video memory of GPU 0 must contain the
same data as the video memory of GPU 1, thus limiting the memory amount and bandwidth
available to the application. So for GPGPU with two cards it would be better to not enable SLI.
What about a new mode, besides the SFR and AFR modes, in which the video memory wouldn’t
have to be duplicated, but only the OpenGL buffers? GPU 0 memory would contain vertex and
texture data for about 50% of the current frame, GPU 1 memory would contain the other ~50%.
And the rendering would be by GPU 0 for ~50% of the frame n+1, while GPU 1 is finishing the
remaining work on frame n, then GPU 0 works on frame n+2, while GPU 1 is finishing frame n+1.
There would be some tricky copying of buffers involved, when GPU 1 takes over from GPU 0, but
the memory amount available for geometry and textures would be increased.

Oops, I hadn’t seen this either and missed Simon’s reply, sorry :-). Wow it supports a frame interleaved mode.

Thanks for the link. It seems that it’s pretty thorough even transporting render targets from alternate frames across cards if you don’t follow their advice.

I’m very impressed, it looks like its very easy to use with no unreasonable restrictions. Awesome job.

Hi,
I still have a few questions for Mr. Simon Green, if he pleases:

  1. Is there any publicly available specification for the data transfer between the two cards?
    I mean the latency and bandwidth of the SLI bridge connector.
  2. Would it be possible to have the main back-buffer rendering assigned to the first card and
    other helper jobs (like render-to-texture) to the second card?
  3. If two cards are not in SLI mode, but on a SLI motherboard, would they run at 16x or just
    at 8x (as I presume). In the latter case, would the readback performance of two 8x cards
    be higher than that of a single 16x card?
  4. How will I know, when I create two rendering contexts (for two cards), if the rendering
    contexts are distributed to the two cards, and not just to the first one? I suppose that there
    should also be two threads, since there’s only one current rendering context per thread.

If some of the stuff above sounds silly, please feel free to say so. I’m pondering on an
upgrade, and I would like to know more about SLI in case I go for it. Currently I have a
GFFX 5900 on nForce2 mobo, and either I get a GeForce 6800 GT (with pre-fitted
AC Silencer), or make the big move for nForce4 with SLI.

The NForce4 supports four x1 pci-express lanes, and one x16 OR two x8 pci-express lanes. Thus, with two cards, you get two x8 slots.

Hi jwatte,

It’s common knowledge that the nForce4 has 20 PCI-E lanes, and I agree that when two
cards are using the bus at the same time only 8x would be available for each. But since
the two PCI-E slots are physically 16x, it may be possible (maybe not on nForce4) to
dynamically allocate 16x for one card, that uploads / downloads data, while the other
card is just crunching numbers, without any bus activity. This dynamic allocation of PCI-E
lanes could be important for GPGPU applications (maybe not now, but for NV48 it will be).

If two cards are not in SLI mode, and have 8x bandwidth each, would it be possible to
transfer data between them in a ‘DMA-like’ mode, without going through the CPU memory?

Although for GPGPU purposes SLI should not be enabled, one obviously needs a SLI
mobo to run two cards. I found no mention of SLI in the GPGPU presentation from
Eurographics 2004. It would be nice to have an updated GPGPU presentation that
covers the usage of two cards (on a SLI mobo, but not in SLI mode).

Although for GPGPU purposes SLI should not be enabled, one obviously needs a SLI
mobo to run two cards. I found no mention of SLI in the GPGPU presentation from
Eurographics 2004. It would be nice to have an updated GPGPU presentation that
covers the usage of two cards (on a SLI mobo, but not in SLI mode).
I would very much like to see something like this for OpenGL. I’ve been looking through the old posts on this board and there doesn’t seem to be a way to set up a separate context on each card to do GPGPU computations. If this is true, I think its past time to address this with a WGL (or more preferably a GL) extension that allows us to choose which device a context gets created on.

CPU chips are currently hitting a materials limit and they’re moving to dual core CPUs, and the next logical step is multicore. I suspect GPU’s will also start hitting this limit sometime in the near/mid future, so SLI and dual core graphics cards might be the next natural evolution. I can’t find any information anywhere on using multiple graphics cards with OpenGL (except people who say its impossible).

It’s not impossible if the driver implements its own version of the multimonitor architecture of MS Windows - such as nView from NVidia. Then as far as the wgl is concerned there is only one display device, and that is the primary device and that is what the GL context will be attached to.
I expect that nvidia will use a similar approach for using SLI with opengl.
Always the best way, bypass microsoft’s crap whenever possible.

Well, it seems that there aren’t many people interested by this topic… :frowning:
I have read that a Quadro 4400 (bridged to PCI-E) has a readback capability of ~1 GB/s, while
a Quadro 1400 (native PCI-E) does ~2.4 GB/s. Could similar values be expected from a PCI-E
GeForce 6800U and a GeForce 6600 GT?

This is somewhat off-topic for this forum, but here goes:

  1. The bridge connector only transfers digital video data. There is no public spec for this. All other data transfer between the cards happens over the PCI-E bus.

  2. Splitting rendering tasks like this would certainly be possible, but there’s no public API for this currently.

  3. PCI-E performance depends entirely on your motherboard. My understanding is that most motherboards only support two 8x slots currently. But this is rarely a bottleneck, at least in game appliations.

  4. You will know which rendering will go to which card because you specify the device when creating the GL context. In SLI mode you have no direct control. Yes, you will probably require a separate thread for each context.

Using multiple PCI-E graphics cards and treating them as separate devices is pretty bleeding-edge stuff at the moment. We hope to have a programming example showing how to do this kind of thing soon.

Thanks

-S.

Originally posted by Tzupy:
[b]Hi,
I still have a few questions for Mr. Simon Green, if he pleases:

  1. Is there any publicly available specification for the data transfer between the two cards?
    I mean the latency and bandwidth of the SLI bridge connector.
  2. Would it be possible to have the main back-buffer rendering assigned to the first card and
    other helper jobs (like render-to-texture) to the second card?
  3. If two cards are not in SLI mode, but on a SLI motherboard, would they run at 16x or just
    at 8x (as I presume). In the latter case, would the readback performance of two 8x cards
    be higher than that of a single 16x card?
  4. How will I know, when I create two rendering contexts (for two cards), if the rendering
    contexts are distributed to the two cards, and not just to the first one? I suppose that there
    should also be two threads, since there’s only one current rendering context per thread.

If some of the stuff above sounds silly, please feel free to say so. I’m pondering on an
upgrade, and I would like to know more about SLI in case I go for it. Currently I have a
GFFX 5900 on nForce2 mobo, and either I get a GeForce 6800 GT (with pre-fitted
AC Silencer), or make the big move for nForce4 with SLI.[/b]

Mr.Simon Green, thank you very much!
I just ordered an Asus A8N-SLI :wink:

I just read that the fan on the chipset is spinning at 8,000 rpm, making a racket!!! :eek:
Holding back my A8N-SLI order until I read that someone has a solution for quieting it :confused:
Apparently the nForce4-SLI runs much hotter than the standard nForce4…

I noticed this page

http://www.hardocp.com/article.html?art=NzEx

and this page explains how to make profiles

http://www.hardocp.com/article.html?art=NzExLDM=

It turns out the noisy chipset fan can be replaced with the Zalman NB47J. But the Asus A8N-SLI is in such tight supply… :frowning:
I plan to get a Gigabyte 6600 GT passive (no noise, dumps heat inside case) and then later a 6600 GT factory fitted
with the AC Silencer (low noise, dumps heat outside case).
Would it be a problem if the two 6600 GT cards wouldn’t run at exactly the same core / memory speeds?