PDA

View Full Version : "gl-streaming" - Lightweight OpenGL ES command streaming framework



shodruk
10-23-2013, 06:45 AM
Hi,
I wrote an OpenGL ES command streaming framework for embedded systems - "gl-streaming".

It is intended to make it possible to execute OpenGL programs on a embedded system which has no GPU.

It is a server-client execution model of OpenGL like the function of GLX, but it is completely independent of X server and GLX, so it runs on a embedded system which doesn't support X and GLX.

The client executes a OpenGL program, but does not execute OpenGL commands.
It simply sends the OpenGL commands to the server over network, so the client system does not need to have a GPU.

The server recieves OpenGL commands, executes them, and displays graphics to the monitor connected to the server system.

gl-streaming is...

"fast"
It runs at 60 frames per second.

"simple"
The source tarball size is below 30KB !

"lightweight"
The gl_server consumes only 2MB RAM!

"low latency"
Its performance is suitable for gaming.

source code & demo
https://github.com/shodruky-rhyammer/gl-streaming

If you are interested in this project, please post a comment.

Thank you.

Stephen A
10-23-2013, 09:44 AM
Looks extremely interesting!

The Little Body
10-23-2013, 02:31 PM
This look effectively fine :)

What about the possibility to transfert the graphic output from the server side to the client side via something like a glse_SwapBuffers() call at the end of each frame ?
(for to can display back each GPU computed picture on the server side to the screen on the client that haven't a GPU)

And about to regroup set_server_adress/set_server_port and set_client_adress/set_client_port ?


void set_server_address_port(server_context_t *c, char * addr, uint16_t port)

{

strncpy(c->server_thread_arg.addr, addr, sizeof(c->server_thread_arg.addr));

c->server_thread_arg.port = port;

}


void set_client_address_port(server_context_t *c, char * addr, uint16_t port)

{

strncpy(c->popper_thread_arg.addr, addr, sizeof(c->popper_thread_arg.addr));

c->popper_thread_arg.port = port;

}

The Little Body
10-23-2013, 03:19 PM
About glse_SwapBuffers(), I have see this into gl_client/main.c



gls_cmd_get_context();

gc.screen_width = glsc_global.screen_width;

gc.screen_height = glsc_global.screen_height;

printf("width:%d height:%d\n",glsc_global.screen_width,glsc_global.screen_heigh t);

init_gl(&gc);


=> we can perhaps add something like some minimalistics GLUT-like calls for to can set a screen_width and screen_height that are adapted to the size of the client screen for the recopy of each GPU picture (that are computed from the server side) to the client side ?

shodruk
10-23-2013, 09:06 PM
Looks extremely interesting!

Thanks!
The gl_client program can be run on an ordinary PC, so please try it if you have an Raspberry Pi. :)

shodruk
10-23-2013, 09:38 PM
This look effectively fine :)

What about the possibility to transfert the graphic output from the server side to the client side via something like a glse_SwapBuffers() call at the end of each frame ?
(for to can display back each GPU computed picture on the server side to the screen on the client that haven't a GPU)

And about to regroup set_server_adress/set_server_port and set_client_adress/set_client_port ?


void set_server_address_port(server_context_t *c, char * addr, uint16_t port)

{

strncpy(c->server_thread_arg.addr, addr, sizeof(c->server_thread_arg.addr));

c->server_thread_arg.port = port;

}


void set_client_address_port(server_context_t *c, char * addr, uint16_t port)

{

strncpy(c->popper_thread_arg.addr, addr, sizeof(c->popper_thread_arg.addr));

c->popper_thread_arg.port = port;

}



Thanks!
It's good to simplify like that. I'll improve.


=> we can perhaps add something like some minimalistics GLUT-like calls for to can set a screen_width and screen_height that are adapted to the size of the client screen for the recopy of each GPU picture (that are computed from the server side) to the client side ?

It's possible to get the rendered image by gl_client, like glGenBuffers(glclient.c) does.
But, bandwidth may be a challenge. :)
I'll try to implement glReadPixels and check the performance.

The Little Body
10-24-2013, 12:13 PM
It's possible to get the rendered image by gl_client, like glGenBuffers(glclient.c) does.
But, bandwidth may be a challenge. :)
I'll try to implement glReadPixels and check the performance.


For to minimize the size of data needed for to transmit the rendered image on the network, we can:

1) compress the rendered image on the server side
2) transmit this compressed image on the network
3) decompress this compressed image on the client side

For example, each image can to be compressed on a JPEG format if the client can handle this directly in hardware
(wavelet/filtering + huffman compression can to be employed instead if the client haven't the necessary hardware for to handle JPEG pictures)

Or to use a MPEG, MJPEG or anothers motion picture compression schemes if the client side have the necessary hardware for to handle one of them
(on the PSP platform, we can use the MJPEG hardware decompression engine for example)

shodruk
10-25-2013, 04:22 AM
For to minimize the size of data needed for to transmit the rendered image on the network, we can:

1) compress the rendered image on the server side
2) transmit this compressed image on the network
3) decompress this compressed image on the client side

For example, each image can to be compressed on a JPEG format if the client can handle this directly in hardware
(wavelet/filtering + huffman compression can to be employed instead if the client haven't the necessary hardware for to handle JPEG pictures)

Or to use a MPEG, MJPEG or anothers motion picture compression schemes if the client side have the necessary hardware for to handle one of them
(on the PSP platform, we can use the MJPEG hardware decompression engine for example)

Raspberry Pi has hardware accelerated h264 decoder and encoder, so these may possibly be useful to reduce transfer bandwidth.
But, h264 encoder usually causes long latency, and reading data from GPU memory is usually very slow.
Furthermore, h264 decoding is very heavy task for non-accelerated clients.
So, MJPEG may be rather efficient in some cases.

GClements
10-25-2013, 12:05 PM
For example, each image can to be compressed on a JPEG format if the client can handle this directly in hardware
JPEG compression is lossy, and something like gl-streaming can't automatically know whether this is acceptable. You'd need a glHint() (or a custom equivalent) to allow the application to control whether lossy compression can be used. Clearly, it shouldn't be used for depth or stencil buffers, or integer colour buffers. In general, JPEG compression is a poor fit for anything with hard edges (e.g. wireframe).


Or to use a MPEG, MJPEG or anothers motion picture compression schemes
MJPEG is just a container for multiple distinct JPEG frames. It doesn't offer any additional compression over JPEG itself.
MPEG offers significant compression, but is also lossy, and requires that the frames are samples of a single animated image (also, the relative timing must be known for motion estimation to be useful). Consecutive calls to glReadPixels() don't have to use the same source rectangle, nor can you be sure that they even refer to the same "scene" (the client is free to use a framebuffer as a "scratch" area for e.g. generating texture data). IOW, consecutive calls to glReadPixels() don't necessarily constitute a single video stream.

The Little Body
10-25-2013, 01:33 PM
We can limit the compression to only make the Huffman part for to have a lossless compression

And I think that a RGB[A] to YCbCr[A] color space conversion, instead to directly compress/decompress on a RGB[A] color space, can a little improve the compression ratio of the Huffman part

Note that with a relatively smooth quantization step (that only loose one or two bits on the standard 8 bits depth for each color component for example), the Huffman compression can to be very more efficient
(this make a little loose, but I don't think that this can be really distinguishable)

GClements
10-25-2013, 03:35 PM
You seem to be assuming that the pixel data only needs to be "visually" correct, but that isn't always the case. It's entirely possible that the data will be subject to additional processing which can magnify any errors. If you're going to add lossy compression (even if it's only "slightly" lossy), it needs to be optional.

The Little Body
10-25-2013, 06:11 PM
I see too in gl_client/glclient.c a lot of deported glFuncs with systematicaly a very similar first line of code :



gls_glBindBuffer_t *c = (gls_glBindBuffer_t *)(glsc_global.tmp_buf.buf + glsc_global.tmp_buf.ptr);
gls_glBlendFuncSeparate_t *c = (gls_glBlendFuncSeparate_t *)(glsc_global.tmp_buf.buf + glsc_global.tmp_buf.ptr);
...
gls_glDrawElements_t *c = (gls_glDrawElements_t *)(glsc_global.tmp_buf.buf + glsc_global.tmp_buf.ptr);
gls_glBindAttribLocation_t *c = (gls_glBindAttribLocation_t *)(glsc_global.tmp_buf.buf + glsc_global.tmp_buf.ptr);



=> I think that this can to be replaced with a more generic #define pattern like this :


#define GLS_POINTER(PTR, FUNC) gls_FUNC_t *PTR = (gls_FUNC_t *)(glsc_global.tmp_buf.buf + glsc_global.tmp_buf.ptr)


That you can use for the first line for a lot of yours deported gl* functions :


GLS_POINTER(c, glBindBuffer);
GLS_POINTER(c, glBlendFuncSeparate);
...
GLS_POINTER(c, glDrawElements);
GLS_POINTER(c, glBindAttribLocation);


The same thing about push_batch_command() calls that can use this #define for example


#define GLS_PUSH(FUNC) push_batch_command(sizeof(gls_FUNC_t))


and push_batch_command() calls to be generated by this #define


GLS_PUSH(glBindBuffer);
...
GLS_PUSH(glUniform1f);
...
GLS_PUSH(command);


This make a very more compact source code for the deported glBindBuffer() call by example :


GL_APICALL void GL_APIENTRY glBindBuffer (GLenum target, GLuint buffer)

{

GLS_POINTER(c, glBindBuffer);

c->cmd = GLSC_glBindBuffer;

c->target = target;

c->buffer = buffer;

GLS_PUSH(glBindBuffer);

}

The Little Body
10-25-2013, 06:48 PM
You seem to be assuming that the pixel data only needs to be "visually" correct, but that isn't always the case. It's entirely possible that the data will be subject to additional processing which can magnify any errors. If you're going to add lossy compression (even if it's only "slightly" lossy), it needs to be optional.

By default, we can not to make any compression at all for the transmission of the picture and add one or more optional(s) argument(s) at the command line for to can handle the type of the compression between the client and the server for example

The Little Body
10-25-2013, 06:54 PM
@shodruk,

This can work from a Linux box to another Linux box for that I make some tests ?
(I does not have a Raspberry Pi :( )

The GLS_POINTER / GLS_PUSH scheme can to be a little more ehanced for to make a more compact source code :
(but any gain about the size or speed of the binary executable code because this only use #define)


#define GLS_PTR_FUNC(PTR, FUNC) gls_FUNC_t *PTR = (gls_FUNC_t *)(glsc_global.tmp_buf.buf + glsc_global.tmp_buf.ptr); \
PTR->cmd = GLSC_FUNC;

#define GLS_PUSH_FUNC(FUNC) push_batch_command(sizeof(gls_FUNC_t))


GL_APICALL void GL_APIENTRY glBindBuffer (GLenum target, GLuint buffer)
{
GLS_POINTER_FUNC(c, glBindBuffer);

c->target = target;
c->buffer = buffer;

GLS_PUSH_FUNC(glBindBuffer);
}

shodruk
10-25-2013, 08:51 PM
You seem to be assuming that the pixel data only needs to be "visually" correct, but that isn't always the case. It's entirely possible that the data will be subject to additional processing which can magnify any errors. If you're going to add lossy compression (even if it's only "slightly" lossy), it needs to be optional.

That is an interesting indication.
I hit on an idea to use this library not only rendering but also GPGPU.
Fortunately, it is possible to run gl_server on many host, theoretically.
This enables to build GPGPU super computer using numbers of Raspberry Pi ! :surprise:

shodruk
10-25-2013, 08:53 PM
The Little Body,

Thanks!
I'm ashamed that I didn't use that macro.
Maybe I love copy&paste. :D

shodruk
10-25-2013, 09:22 PM
This can work from a Linux box to another Linux box for that I make some tests ?
(I does not have a Raspberry Pi :( )

gl_client runs on normal PC, but gl_server don't, because this library is hard-binded to RPi's header source and using OpenGL ES 2.0 for now.
I'll make the PC & OpenGL version later.

The Little Body
10-26-2013, 06:16 AM
gl_client runs on normal PC, but gl_server don't, because this library is hard-binded to RPi's header source and using OpenGL ES 2.0 for now.
I'll make the PC & OpenGL version later.

I have download the gl-streaming-master.zip file
=> I begin to see how gl_client/glclient.c and gl_client/glclient.h parts can to be ehanced for to be handle by anothers plateforms than the RPI platerform

Note that the GLS_POINTER_FUNC / GLS_PUSH_FUNC scheme seem relatively similar that what is used into GLX protocol after I have see this on http://utah-glx.sourceforge.net/docs/render_buffer_interface.txt :)


__glx_Vertex3f( GLfloat x, GLfloat y, GLfloat z)
{
char* buffer=NULL;
__GLX_GET_RENDER_BUFFER(buffer, 70, 16, 0);
__GLX_PUT_float(buffer, x);
__GLX_PUT_float(buffer, y);
__GLX_PUT_float(buffer, z);
}

void __glx_DrawPixels(GLsizei width, GLsizei height,
GLenum format, GLenum type, const GLvoid*pixels)
{
char* buffer = NULL;
int s=GLX_image_size(width, height, format, type);
__GLX_GET_RENDER_BUFFER(buffer, 173, 40, s);
__GLX_PUT_PIXEL_DATA_ARGUMENTS;
__GLX_PUT_sizei(buffer, width);
__GLX_PUT_sizei(buffer, height);
__GLX_PUT_enum(buffer, format);
__GLX_PUT_enum(buffer, type);
__GLX_PUT_buffer(buffer, pixels, width, height, format, type, s);
}


But the additions of more clients plateforms supports is perhaps not a good idea because of the big increase of the source code that this implies :(
(where the client part is where OpenGL's commands are sended and the server part where they are executed)

The Little Body
10-26-2013, 07:26 AM
The use of gettimeofday() and get_diff_time() does't seem to have the best resolution/speed on alls plateforms :(

For the wait_for_data() call into gl_client/glclient.c, we can perhaps use clock_gettime() instead ?
(as indicated on the Linux Clock portion at http://tdistler.com/2010/06/27/high-performance-timing-on-linux-windows)

And/or use a clock_nanosleep() instead usleep() ?
(http://pubs.opengroup.org/onlinepubs/000095399/functions/clock_nanosleep.html)

shodruk
10-26-2013, 10:45 PM
Note that the GLS_POINTER_FUNC / GLS_PUSH_FUNC scheme seem relatively similar that what is used into GLX protocol after I have see this on http://utah-glx.sourceforge.net/docs/render_buffer_interface.txt :)

Honestly, this is the first time I see their code. :)
I used structs, they used macros, but there are some similarity between the codes, because the objective is similar.
Their codes are very useful, but I love reinventing the wheel. :D


And/or use a clock_nanosleep() instead usleep() ?

Thanks. I'll consider using nanosleep().

shodruk
12-04-2013, 05:13 AM
I uploaded a new demo video.
(texture mapping demo)

http://youtu.be/y0eRwrwetcA

Also, I ran the gl_client program on Qemu guest OS, it ran fine at 60 FPS too!

EagleEye996
04-24-2014, 01:39 AM
Hi shodruk, yesterday i tried GL_streaming on a raspberry pi and on a i686 laptop, and it did work! I am interested in this project and i would like to use GL_streaming to run different programs, please can you explain to me how can i do it? If there is the possibility

shodruk
04-24-2014, 04:48 AM
Hi shodruk, yesterday i tried GL_streaming on a raspberry pi and on a i686 laptop, and it did work! I am interested in this project and i would like to use GL_streaming to run different programs, please can you explain to me how can i do it? If there is the possibility

Thanks!
You can write your own program by modifying gl-streaming/gl_client/sample1.c or sample2.c.
It's an almost normal OpenGL-ES 2.0 program, except initialization method.
I haven't implemented all OpenGL-ES 2.0 functions yet, so if you find lacked functions, please let me know (with a priority).