PDA

View Full Version : User scripting on GPU at runtime: CUDA versus GLSL



twinbee
02-17-2011, 03:56 PM
Hi all,

For months I've been looking for a way to combine the speed of GPU processing with the feature of allowing arbitrary user code at runtime (otherwise known as scripting, or dynamic code generation, or Reflection, which I'll refer to as from now on). CUDA and OpenCL are great languages, but unfortunately, they don't allow this kind of reflection I need. On the other hand, languages like Lua or C# *do* allow reflection, but alas, they don't support the GPU as yet.

Then I came across a program called 'Fragmentarium' which I later found out does use the GPU *and* allows reflection. It also happens to use this thing I've barely heard about called GLSL (yes I think that might have some relevance here! :) ) - and a quick peek at Wikipedia confirms the situation:

"GLSL shaders themselves are simply a set of strings that are passed to the hardware vendor's driver for compilation from within an application using the OpenGL API's entry points. Shaders can be created on the fly from within an application, or read-in as text files, but must be sent to the driver in the form of a string."

I want to program a raytracer with custom user 3D functions (and even custom user renderers/raytracers) at runtime. The functions could be as simple as: "x^2+y^2+z^2 < 1" to create a sphere, or something more sophisticated such as the Mandelbulb function (still only 10 lines of code or so). I would also like the large renderer code (<1000 lines) to use the GPU for fast rendering and for building the scene. It's all software based rendering - I don't want to use the GPU's inherent 3D capabilities.

Well, after asking around, no one has ever suggested GLSL as a solution before, but it seems very interesting. What I am most interested in is hearing the disadvantages compared to coding in say CUDA, as I hear GLSL allows fairly general programming constructs (while, for, local variables etc.). For example, can I create a raytracer using GLSL and have it running as fast as CUDA would allow? Specifically, I make use of CUDA's 2D spatial memory locality for my semi-random accesses (so that nearby areas around a pixel in a 2D picture are cached in memory), and also so-called "shared memory" so that the programmer can define which lucky set of data has special fast cached access (has to be less than 16-64k's worth). Using those I gain 15x speedup over purely CPU-generated code. As I reckon some of you may know, both of these CUDA features are to partially remedy the burden of reaching out to the GPU's large device/global memory, which can be prohibitive.

So does GLSL support these (2D spatial locality and shared memory) ?

And what are the other advantages and disadvantages of GLSL compared to coding in CUDA?

skynet
02-17-2011, 04:47 PM
Why not use OpenCL for your needs?
It gives you the best of both worlds: 'easy' GPGPU of CUDA and runtime-generable code like with GLSL.

Alfonse Reinheart
02-17-2011, 04:51 PM
otherwise known as scripting, or dynamic code generation, or Reflection, which I'll refer to as from now on

Dynamic code generation and reflection are different things.

Reflection refers to the ability to query information about the nature of certain objects, and to refer to program constructs by string name. If C++ supported reflection, you would be able to construct an object by providing the string name of that object's type and some number of parameters.

Dynamic code generation refers to, well, exactly that: the ability to create new functions and/or language constructs directly from strings. That is, the ability to compile and use new code at runtime as a first-class feature of the language.


Then I came across a program called 'Fragmentarium' which I later found out does use the GPU *and* allows reflection.

If you are referring to this (http://syntopia.github.com/Fragmentarium/), it provides neither dynamic code generation (GLSL shaders generating a string which is compiled into a new GLSL shader that the generating shader can call and/or use) nor reflection. Fragmentarium appears to be merely a GLSL IDE: a means to develop GLSL shaders.


It's all software based rendering - I don't want to use the GPU's inherent 3D capabilities.

Then you don't want to use GLSL. The OpenGL Shading Language (GLSL) is designed for use by OpenGL. And OpenGL is a renderer, first and foremost. While you can use OpenGL and GLSL to do generic computations, it's not a good interface for that.

Use OpenCL instead. Or if you're inclined to live in NVIDIA-land, stick with CUDA.

twinbee
02-18-2011, 02:41 AM
Thanks both...

I have considered OpenCL in the past, though I think the compilation of user code is a few seconds. Really, I would like below half a second. What would you say about that? It is still an option, though I came away from this below thread somewhat confused:
http://www.khronos.org/message_boards/viewtopic.php?f=37&amp;t=3382

In hindsight, if I were to go the OpenCL route, then it would be 100% OpenCL, rather than a mix of CUDA and OpenCL as I initially wanted in that thread. If I did, then would I need to include a compiler (or compilers?) with my software, or would the compiler automatically be available on the end user's PC?


Reflection refers to the ability to query information about the nature of certain objects,
Reflection is also used to describe the actual alteration or addition of arbitrary code at runtime. Here's a couple of choice quotes from Wikipedia:

"Reflection can be used for observing and/or modifying program execution at runtime."
...and particularly...
"Evaluate a string as if it were a source code statement at runtime."

That second one seems to fit the bill of "dynamic code generation"; however I'll use 'DCG' in future posts as I guess it's even more specific.


If you are referring to this, it provides neither dynamic code generation (GLSL shaders generating a string which is compiled into a new GLSL shader that the generating shader can call and/or use) nor reflection.
Okay, I was stretching the definition there a bit, since it is only *apparently* generating code at runtime. I'm guessing in reality, the code is compiled behind the scenes - it does take a second or so anyway to 'build' the user's function code after all. However, my overall point was that the compiler is at least available to the end user automatically and is also fairly quick. On the other hand, one wouldn't be able to easily redistribute the CUDA compiler, or the Microsoft C++ compiler, as I've looked into those options in the past.


Then you don't want to use GLSL. The OpenGL Shading Language (GLSL) is designed for use by OpenGL.
The fact that GLSL is capable of compiling user generated code so easily makes it at least slightly tempting. I've also heard reports that it can be faster than even CUDA in some areas, say 2D histograms, at least for naive implementations. See: http://forums.nvidia.com/index.php?showtopic=159960 . Judging by that thread, it would appear CUDA has no equivalent to GLSL's use of the framebuffer? I won't be creating a histogram, but I will be using random memory access, or at least semi-local/random access.

I still want to know if 2D spatial locality is supported in GLSL. AFAIK, CUDA/OpenCL's "shared memory" (scatter) isn't available in GLSL.

nickels
02-18-2011, 10:18 AM
GLSL does not allow you to utilize shared/local memory (unless the drivers themselves are doing some of this under the covers).
This could cause a large slowdown.

Also, computations in GLSL are slaved to the rasterization pipeline, which works well for image processing shaders but might be terrible for other algorithms. CUDA (and probably OpenCL) are much more flexible in how they dispatch jobs.

Aleksandar
02-18-2011, 11:44 AM
In hindsight, if I were to go the OpenCL route, then it would be 100% OpenCL, rather than a mix of CUDA and OpenCL as I initially wanted in that thread.

Why would you mix OpenCL and CUDA? Both APIs have the same purpose. On NV hardware, OpenCL executes atop CUDA. That's why I didn't try to use OpenCL ... so far. ( Should I mention that GLSL executes atop Cg? Ups! But I'm still using GLSL... ;) )


If I did, then would I need to include a compiler (or compilers?) with my software, or would the compiler automatically be available on the end user's PC?

Compiler for GLSL is included in GL drivers, and it does its job "just in time". For CUDA, you'll need to ship your application with appropriate DLLs (at least CUDA runtime - cudart.dll). I'm not using OpenCL, so don't know what it needs. If the target machine uses Intel's graphics card, then you are in trouble in any case!


I won't be creating a histogram, but I will be using random memory access, or at least semi-local/random access.
Even if it is possible (I don't understand what exactly you need), it will be probably slower than with CUDA. There are NV specific extensions that can alleviate those tasks by direct access to buffers (GL_NV_shader_buffer_load, GL_NV_vertex_buffer_unified_memory and GL_shader_buffer_store), but they are vendor specific, and furthermore GL_shader_buffer_store requires SM5 hardware.


Also, computations in GLSL are slaved to the rasterization pipeline, which works well for image processing shaders but might be terrible for other algorithms.
I have to disagree with this statement. Vertex/tessellation/geometry shaders can also be used for computation. In that case transformed vertices are captured with transform feedback.

Alfonse Reinheart
02-18-2011, 11:52 AM
In hindsight, if I were to go the OpenCL route, then it would be 100% OpenCL, rather than a mix of CUDA and OpenCL as I initially wanted in that thread

OpenCL and CUDA solve the same problem. They do the same things. Questions might be raised about the viability of different implementations, but the same functionality is provided by both. So unless you have a legacy CUDA codebase, I don't think there's much of a reason to need to combine them.


If I did, then would I need to include a compiler (or compilers?) with my software, or would the compiler automatically be available on the end user's PC?

Due to the intimate nature of GPGPU applications, how much they are tied to specific hardware, GPU-based APIs almost always allow you to compile your code on the machine in question.


The fact that GLSL is capable of compiling user generated code so easily makes it at least slightly tempting.

GLSL is not capable of compiling user generated code. OpenGL, which is defined by a C API, compiles and generates code. The GLSL language itself does not provide the ability to compile code.

twinbee
02-20-2011, 03:33 PM
Why would you mix OpenCL and CUDA?
Well my original intention was to code in the slightly more mature CUDA language for an (arguable) speed increase and ease of use (sacrificing portability), and use OpenCL in addition only to provide compilation of the user's short code while the program's running. They advised me against that direction though, as you guys have.


Compiler for GLSL is included in GL drivers, and it does its job "just in time".
Interesting you should say that as that Fragmentarium takes a while to compile the user's tiny code (say a couple of seconds), and I thought JIT was almost instant due to the way JIT needs to work.


Due to the intimate nature of GPGPU applications, how much they are tied to specific hardware, GPU-based APIs almost always allow you to compile your code on the machine in question.
Right. For CUDA, I would need to learn the driver API apparently, and that's not an easy task (though probably not as bad as writing PTX). I'm not sure what the OpenCL equivalent would be, but it would be immaterial if it takes more than a couple of seconds to compile in any case.

Aleksandar
02-21-2011, 04:16 AM
Interesting you should say that as that Fragmentarium takes a while to compile the user's tiny code (say a couple of seconds), and I thought JIT was almost instant due to the way JIT needs to work.
I wanted to say that you don't have to precompile the shaders, and there is no separate GLSL compiler (except for OpenGL ES, where memory allocation is critical, and keeping compiler/linker in memory all the time is not affordable). Although, with binary shaders you can do it off-line. GLSL is not an interpreter, so GL don't require classical JIT. The GLSL speed is lower than JIT, because it compiles code from the source, not from the bytecode (already precompiled in environment independent way). In any case, you need to spend some time in compiling GLSL from the source (unless you have static shaders that can be loaded as binaries).