Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 5 of 5

Thread: Bad multi GPU performance scaling

  1. #1
    Junior Member Newbie
    Join Date
    Jun 2014
    Posts
    3

    Bad multi GPU performance scaling

    Iím having some troubles running my OpenGL renderer in multi-gpu configuration. There are two Quadro graphics cards in my computer with one monitor connected to each card. My renderer creates two windows, one on each monitor with correct gpu affinity. After that, two rendering threads are created, each with itís own rendering context and with a local copy of data to render. Thereís no data sharing or synchronization between threads and also thereís no data sharing between render contexts.

    Trouble is that thereís almost no performance scaling and GPU utilization is below 50%. Framerate is exactly the same as in case with rendering only on one GPU.

    I can run this renderer in one thread/one window configuration. In this case, selected GPU utilization is almost 100% and frame time is exactly halved compared to situation above. Surprisingly, running two instances of this renderer I can achieve perfect utilization of both GPUs.

    I have observed similar behavior on Windows 8.1 64bit and also on Linux, both running latest nvidia drivers.

    - What do I have to do to achieve good scaling from within one process/multiple render threads configuration? Do I need special driver profile for my app? Is there some other conditions to meet?

    - Under Windows 8.1, it seems that GPU affinity is set correctly by default from initial window position. I can verify that each thread is sending commands to different GPU via NSight Performance Profiler.
    - Under X11, thereís two X server displays, Xinerama is disabled.

    Please, I would be grateful for any tips or suggestions.

  2. #2
    Advanced Member Frequent Contributor
    Join Date
    Apr 2003
    Posts
    666
    One thing that could help would be creating the contexts in the thread that use it.

  3. #3
    Junior Member Newbie
    Join Date
    Jun 2014
    Posts
    3
    Quote Originally Posted by skynet View Post
    One thing that could help would be creating the contexts in the thread that use it.
    Unfortunately, contexts are already created from within the thread.

    There's a simple code I'm using to setup and test my rendering. Maybe there are some errors I'm unable to see...

    Code :
    class window
    {
    public:
     
    	window(int x, int y, int width, int height, const char * title, int affinity = -1)
    	{
    		HINSTANCE hInstance = GetModuleHandle(NULL);
     
    // registration here
     
    		wnd = CreateWindow(title, title,
    			WS_CAPTION | WS_BORDER | WS_SIZEBOX | WS_SYSMENU | WS_MAXIMIZEBOX | WS_MINIMIZEBOX,
    			x, y, width, height,
    			NULL, NULL, hInstance, NULL);
    		if (!wnd)
    			throw std::runtime_error("CreateWindow failed!");
     
    		SetWindowLongPtr(wnd, GWLP_USERDATA, (LONG_PTR)this);
     
    		dc = GetDC(wnd);
     
    		PIXELFORMATDESCRIPTOR pfd =    
    		{
    			sizeof(PIXELFORMATDESCRIPTOR),         // Size Of This Pixel Format Descriptor
    			1,                                      // Version Number
    			PFD_DRAW_TO_WINDOW |                    // Format Must Support Window
    			PFD_SUPPORT_OPENGL |                    // Format Must Support OpenGL
    			PFD_DOUBLEBUFFER,                       // Must Support Double Buffering
    			PFD_TYPE_RGBA,                          // Request An RGBA Format
    			24,                                     // Select Our Color Depth
    			0, 0, 0, 0, 0, 0,                       // Color Bits Ignored
    			1,                                      // Alpha Buffer
    			0,                                      // Shift Bit Ignored
    			0,                                      // No Accumulation Buffer
    			0, 0, 0, 0,                             // Accumulation Bits Ignored
    			24,                                     // 24 Bit Z-Buffer (Depth Buffer)  
    			8,                                      // 8 Bit Stencil Buffer
    			0,                                      // No Auxiliary Buffer
    			PFD_MAIN_PLANE,                         // Main Drawing Layer
    			0,                                      // Reserved
    			0, 0, 0                                 // Layer Masks Ignored
    		};
     
    		int _pixelFormat = ChoosePixelFormat(dc, &pfd);
    		if (_pixelFormat == 0)
    			throw std::runtime_error("ChoosePixelFormat failed!");
     
    		if (SetPixelFormat(dc, _pixelFormat, &pfd) == FALSE)
    			throw std::runtime_error("SetPixelFormat failed!");
     
    		rc = wglCreateContext(dc);
     
    		wglMakeCurrent(dc, rc);
     
     		glewInit();
     
    		if ((WGLEW_NV_gpu_affinity) && (affinity != -1))
    		{
    			HGPUNV  gpu;
    			wglEnumGpusNV(affinity, &gpu);
     
    			HGPUNV gpu_list [] = { gpu, nullptr };
    			affinity_dc = wglCreateAffinityDCNV(&gpu_list[0]);
    			if (!affinity_dc)
    				throw std::runtime_error("wglCreateAffinityDCNV failed!");
     
    			int _pixelFormat = ChoosePixelFormat(affinity_dc, &pfd);
    			if (_pixelFormat == 0)
    				throw std::runtime_error("ChoosePixelFormat failed!");
     
    			if (SetPixelFormat(affinity_dc, _pixelFormat, &pfd) == FALSE)
    				throw std::runtime_error("SetPixelFormat failed!");
     
    			affinity_rc = wglCreateContext(affinity_dc);
    			if (!affinity_rc)
    				throw std::runtime_error("wglCreateContext failed!");
     
    			if (!wglMakeCurrent(dc, affinity_rc))
    				throw std::runtime_error("wglMakeCurrent failed!");
    		}
     
    		ShowWindow(wnd, SW_SHOW);
    		UpdateWindow(wnd);
    	}
     
     
    	template <typename F>
    	void run(F fn)
    	{
    		MSG msg;
     
    		bool done = false;
    		while (!done)
    		{
    			while (PeekMessage(&msg, wnd, 0, 0, PM_REMOVE))
    			{
    				if (msg.message == WM_QUIT)
    					done = true;
     
    				TranslateMessage(&msg);
    				DispatchMessage(&msg);
    			}
     
    			fn();
     
    			SwapBuffers(dc);
    		}
    	}
     
    	static LONG WINAPI MainWndProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
    	{
    		window * w = (window*) GetWindowLongPtr(hWnd, GWLP_USERDATA);
    		if ((!w) || (w->wnd != hWnd))
    			return (LONG) DefWindowProc(hWnd, uMsg, wParam, lParam);
     
    		switch (uMsg)
    		{
    		case WM_CREATE:
    			break;
    		case WM_PAINT:
    			break;
    		case WM_SIZE:
    			break;
    		case WM_CLOSE:
    			PostQuitMessage(0);
    			break;
    		case WM_DESTROY:
    			PostQuitMessage(0);
    			break;
    		}
    		return (LONG) DefWindowProc(hWnd, uMsg, wParam, lParam);
    	}
    ...
    };

    Code :
    void run(int x, int y, int w, int h, const char * title, int affinity = -1)
    {
    	try
    	{
    		window wnd(x, y, w, h, title, affinity);
    // create gbuffer fbo
    // create shaders/programs
    // load textures and meshes
    		wnd.run([&]()
    		{
    // render to gbuffer fbo
    // display result from gbuffer
    		});
    	}
    	catch (std::exception & e) { std::cout << e.what() << std::endl; }
    	catch(...) { std::cout << "Unknown exception!" << std::endl; }
    	return 0;
    }
     
    int main(int argc, char * argv [])
    {
    	try
    	{
    		std::thread t1(run, 50, 50, 1024, 768, "win1", 0);
    		run(1920 + 50, 50, 1024, 768, "win2", 1);
    		t1.join();
    	}
    	catch (std::exception & e) { std::cout << e.what() << std::endl; }
    	catch (...) {  std::cout << "Unknown exception!" << std::endl; }
    }

    For the purpose of testing there’s no data upload during render loop. There’s just binding of textures, binding of vertex buffers and glDrawArraysInstancedBaseInstance calls. Data for each draw call is sourced from shader storage buffer using gl_BaseInstanceID.
    Last edited by TomSka; 07-01-2014 at 12:37 AM.

  4. #4
    Junior Member Regular Contributor
    Join Date
    Mar 2004
    Location
    Seattle, WA, USA
    Posts
    110
    Quote Originally Posted by TomSka View Post
    Trouble is that thereís almost no performance scaling and GPU utilization is below 50%. Framerate is exactly the same as in case with rendering only on one GPU.

    I can run this renderer in one thread/one window configuration. In this case, selected GPU utilization is almost 100% and frame time is exactly halved compared to situation above. Surprisingly, running two instances of this renderer I can achieve perfect utilization of both GPUs.
    This is the same behavior I observed six years ago while trying to use three NV Quadro GPUs in a single system. I reported it to NVIDIA, who mentioned something about their OpenGL driver serializing all work within a process (but, as you notice, not across different processes). They tracked the bug for a couple years, didn't fix it, and it sounds like this must still be a problem today. For what it's worth, their Direct3D driver doesn't have this limitation.

    At the time, their GPU affinity extension was being pushed alongside QuadroPlex systems and a paper on how amazing it was to scale across multiple GPUs, so it was pretty surprising to find out how that was a lie in practice and couldn't actually be obtained. $10k in GPUs just to try out a feature based on that advertising was a pricey mistake...

  5. #5
    Junior Member Newbie
    Join Date
    Jun 2014
    Posts
    3
    I have switched back to GLFW and after creating two fullscreen windows it's finally working! I'm getting around 85% performance scaling (4.45ms for a one window, 5.15ms for two windows/threads on two GPUs). I was expecting to see a little bit better results but it's better than nothing. Later I have tested two older AMD 5870 and there's almost 100% scaling.

    Funny thing, at begining I was using GLFW and fullscreen without much luck. But there might have been some bug and both contexts and windows were created from within main thread. Because of that I have switched to custom code to create a window but never tried to create a fullscreen one again.

    Thank you for your inputs! I was staring to think that it's not possible to get it working...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •