Creating an OpenGL context for use with > 2.1 core

Hello,

I am trying to use a OpenGL functionality greater than 2.1, specifically 3.1 and higher. I am not using the GLEW utility functions, but instead have created and mapped myself. This is on a windows 7 platform.

What I do in my code is first determine what is the maximum version supported by the graphics are (4.6 on my test system). Then if the major is greater than two, I use wglCreateContextAttribsARB and if it is 2 or less I use the traditional wglCreateContext. So, what I have been trying to do is start with 3.1 so that I can continue to use the traditional glBegin/glEnd with glVertex3f and such to get basic rendering without a shader.

The issue I am having is that nothing will work between glBegin/glEnd. I know that I got a valid rendering context from wglCreateContextAttribsARB because I am able to set the glClearColor and actually clear the screen to that color with a glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

For the attribs list for wglCreateContxtAttribsARB, I profile the following:
int TOpenGlRenderContext::FAttribList[] = {WGL_CONTEXT_MAJOR_VERSION_ARB, 3,
WGL_CONTEXT_MINOR_VERSION_ARB, 1,
0,0, //No layers
0,0, //No flags
WGL_CONTEXT_PROFILE_MASK_ARB, WGL_CONTEXT_CORE_PROFILE_BIT_ARB};

Now, I have tried several flavors of the ARB, but none of them allow me to see anything rendered between glBegin/glEnd.

Am I missing something here?

If I create using the traditional wglCreateContext(hDC), then everthing is displayed between glBegin/glEnd as expected.

Thanks,

James

– UPDATE –
I am now able to render up to 3.1. However, after 3.1, nothing is rendered and only the glClear color is rendered. Is there additional parameters in the ARB that must be set in order to get glBegin(GL_TRIANGLES) … glVertex3f(…) … glEnd() to work with OpenGL 3.2 and higher with the use of a shader program? FYI, I have not attempted doing shaders yet as I am still learning what is needed to keep some of my legacy code.

3.1 is the version which removed all of the legacy features. 3.2 was the first version to support “core” and “compatibility” profiles.

If you want to use glBegin/glEnd, you need either a version <=3.0, or a version >=3.2 using the compatibility profile.

As for requesting a version: if your code can work with 2.1, then request 2.1. The implementation may return a context which supports a later version, but it won’t be 3.1 and will use the compatibility profile. If you need later features, then request at least 3.2 and include WGL_CONTEXT_PROFILE_MASK_ARB, WGL_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB in the attribute list.

Also: if you need to use legacy features and want the code to work on MacOSX, you’re limited OpenGL 2.1. MacOSX doesn’t support the compatibility profile, so you’re limited to the core profile if you need features introduced after 2.1.

Thanks,

I figured the issue out a few minutes ago. It does render with 3.1. As for >= 3.2 I had to set the compatibility flag as you mentioned. I will never be running on a MAC for my application, so I don’t have to work about that. My main target is windows only.

The purpose of going to a higher supported version is to hopefully gain better performance than my existing application does with OpenGL 1.2 as default. Basically, I am reading in some VRML files that are generated from SolidWorks to display and manipulate a Fanuc six axis robotic arm and parts of a machine conveyor system. Because the VRML’s contain thousands of triangles, I first tried to use a display list for this. While the display list worked and the performance was good whey moving my camera “gluLookAt”, the memory usage in my application was nearly 1GByte as compared to the files that described features in ASCII which was around 100MByte (ASCII). I then tried to just use data arrays and index tables from the loaded files and just abandon the display list. The memory usage went down to about 200MByte, but then the application had huge delays when trying to rotate the scene/camera.

This lead me to thinking about using vertex arrays and possibly VBO’s which requires > 1.2 of OpenGL. What I really need to know is how to get the best performance with having several large objects made up of from VRML files with uses as minimal CPU memory as possible as well as have good response during the rendering. Obviously the robotic arm with be getting update angles from an actual robot which will manipulate 6 matrixes. Along with 5 or 6 additional objects being moved in a single axis (translation matrix).

Any info on performance benefits from using higher versions of OpenGL is greately appreciated as it seems difficult to find informatoin on this. For example, how much is done on the CPU side with the “out-of-box” setup of OpenGL 1.2 as compared to say OpenGL 3.1 or greater.

Thanks,

James

The biggest benefit is likely to be from using VBOs, which only requires OpenGL 1.5. That largely eliminates the CPU from the equation.

If the model contains many copies of similar objects (in this case, “similar” meaning identical topology), instanced rendering may be of use, but that requires at least 3.3 (technically, instanced rendering was added in 3.1, but per-instance attributes weren’t added until 3.3, and it isn’t of much use without those).

[QUOTE=GClements;1291740]The biggest benefit is likely to be from using VBOs, which only requires OpenGL 1.5. That largely eliminates the CPU from the equation.

If the model contains many copies of similar objects (in this case, “similar” meaning identical topology), instanced rendering may be of use, but that requires at least 3.3 (technically, instanced rendering was added in 3.1, but per-instance attributes weren’t added until 3.3, and it isn’t of much use without those).[/QUOTE]

How much is actually done on the CPU other than the function calls and why is display lists faster, but consume enormous client/application memory if they are suppose to be server side from what I have read. If a simple scene that includes perhaps 10 VRML files consumes nearly 1GByte of memory on a 32-bit platform, then I will be out of application memory before I can even complete my scene and it really contains nothing.

If I discard the display list and do immediate rendering then my memory usage drops to just what is required for the storage of the vertexs/normals/materials/ and the index tables for those, which is about 200MByte. However, the performance drops to where the application response is sluggish when the user is interacting with the scene (rotating camera) as well when my matrixes change do to robot arm moving on the machine.

What I find extremely interesting is that the actual VRML files that describe the scene are in ASCII character format. The total file size of all the files in the scene are around 86MByte. If anything, I would expect my memory usage to be much less than 86MByte since, for example a vector uses 12 bytes for three floats and the ASCII file to represent those uses 2Bytes per character and for a single coordinate there is 33 characters including space and comma delimiters per vertex, at 2Bytes per character that becomes 66Bytes per vertex. You can see how I am concern since in reality I should really only see memory usage of maybe 16MByte from the raw data.

Likewise, I wonder, I can’t really use per instance since there is no way to know what is similar objects in these files, so how would I deal with the issue of normal vectors and material colors since they are one per triangle and and one per vertex. If I now have to store the material colors with the normal vectors every vertex, then wow, things really grow on memory. I had not found how to do vertex arrays that can use per triangle access for the normals and material colors. Every example I have see always has the material colors and normals per vertex only.

Thanks

The simple solution is to de-duplicate all of the vertices, so every triangle has 3 unique vertices, with the same colour and normal. A more efficient solution is to use flat shading so that the normal and colour are taken from the last vertex of each triangle. This can reduce the number of unique vertices by a factor of 3, although it typically still requires double the number of unique vertices compared to per-vertex colour/normals.

The overhead of per-triangle data is unavoidable, as a typical mesh has roughly twice as many triangles as vertices. The implementation is going to do something similar for constructing display lists, i.e. it will store the colour and normal for each vertex which you create, even if you only set it per-triangle.

For flat shading, you can avoid storing face normals and calculate them in a fragment shader using cross(dFdx(p),dFdy(p)), where p is the interpolated position (in any space affine to world space; the normal will be in the same space). You can’t get around providing per-triangle colours, although if adjacent triangles often have the same colour, you can take that into account when constructing the mesh to reduce the number of unique vertices.

[QUOTE=GClements;1291745]The simple solution is to de-duplicate all of the vertices, so every triangle has 3 unique vertices, with the same colour and normal. A more efficient solution is to use flat shading so that the normal and colour are taken from the last vertex of each triangle. This can reduce the number of unique vertices by a factor of 3, although it typically still requires double the number of unique vertices compared to per-vertex colour/normals.

The overhead of per-triangle data is unavoidable, as a typical mesh has roughly twice as many triangles as vertices. The implementation is going to do something similar for constructing display lists, i.e. it will store the colour and normal for each vertex which you create, even if you only set it per-triangle.

For flat shading, you can avoid storing face normals and calculate them in a fragment shader using cross(dFdx§,dFdy§), where p is the interpolated position (in any space affine to world space; the normal will be in the same space). You can’t get around providing per-triangle colours, although if adjacent triangles often have the same colour, you can take that into account when constructing the mesh to reduce the number of unique vertices.[/QUOTE]

Ok, seems I have some work to do if I want to improve my performance.

Let me ask this question, do you think 5,106,330 vertexs would cause rendering performance to suffer such that you notice it when you are doing a scene/camera rotation? This amounts to 1,702,110 triangles. Currently I am using glBegin/glBEnd with glVertex3f for the rendering.

Can I expect an increase in performance by going to vertex arrays and using glDrawElements?

Thanks

It depends upon the hardware. High-end video cards should handle that at 60 fps, low-end cards will struggle.

If you’re doing that every frame, it’s going to have a substantial performance cost. Using display lists will get rid of much of that, but it’s hard to say how much difference there will be between display lists and arrays.

I suspect that it will depend on the extent to which you can share vertices. Unshared vertices require 3 vertices per triangle, whereas a mesh with shared vertices approaches 2 triangles per vertex, so there’s potentially a 6:1 difference in vertex shader workload depending upon how well you can optimise the mesh.

My graphics card is an NVidia GTX960 with 2048 cores and 4GByte of ram. You are correct, using display lists makes a day and night difference in the performance but at a cost of 10x the memory, which in my case is almost 1GByte of CPU ram.

If I offload the vertex into VBO, then I can’t use display list. I am wondering if the problem is that with glBegin/glEnd that much of the work is done on the CPU where each vertex is being transmitted to the GPU in per vertex fashion. Maybe if using vertex arrays, the CPU will do a block transfer of memory to the GPU and thus the glDrawElements will require less CPU-to-GPU interactions. Since I don’t know exactly how OpenGL is performing its operations, I.E. what is on the CPU and what is on the GPU, it is difficult for me to determine what method will give me better performance without the use of a display list.

glBegin/glEnd requires more effort from the CPU, but only when you actually call those functions. If you’re putting the calls inside a display list, then that’s only happening when you create the list, not when you execute it. The implementation may be optimising the GPU-side data based upon the state at the point the list is executed, in which case it will need to store the raw data so that it can re-build the GPU-side state where necessary.

The main problem is that glBegin/glEnd isn’t a good fit for modern GPUs. They’re designed around vertex arrays, and anything else requires some kind of translation.

Display lists are simply a recorded sequence of commands. Their original purpose was to avoid the need to repeatedly send the same commands from the client application to the X server each frame (possibly over a network connection). They aren’t limited to vertex data, so there isn’t much that the implementation can do to optimise the general case.

[QUOTE=GClements;1291753]glBegin/glEnd requires more effort from the CPU, but only when you actually call those functions. If you’re putting the calls inside a display list, then that’s only happening when you create the list, not when you execute it. The implementation may be optimising the GPU-side data based upon the state at the point the list is executed, in which case it will need to store the raw data so that it can re-build the GPU-side state where necessary.

The main problem is that glBegin/glEnd isn’t a good fit for modern GPUs. They’re designed around vertex arrays, and anything else requires some kind of translation.

Display lists are simply a recorded sequence of commands. Their original purpose was to avoid the need to repeatedly send the same commands from the client application to the X server each frame (possibly over a network connection). They aren’t limited to vertex data, so there isn’t much that the implementation can do to optimise the general case.[/QUOTE]

Ok, so what I will be doing is to place my data into a VBO if the GPU will give me one. I have about 1.7 million triangles. Of course these triangles make of multiple objects. As all of this will be new to me as far as using shaders and VBO/VAO, I am going to need a bit of direction here. I will start a new message thread for this so that it has a good noticeable subject. One last question before I switch to a new thread. If I am able to put most of this into a VBO, can I expect to see my CPU memory usage drop in a big way? All of my objects are static and will on need to be run through a transform matrix which can be done on the GPU side. Then it comes down to basically a few glDraw* commands. Most of my state is maintained through my application. What happens if the GPU can’t allocate enough Buffer space? Do I go by way of vertex arrays on the client side?

Thanks

You aren’t going to get close to exhausting video memory.

Even if you have no shared vertices (so 3 vertices per triangle), and you use 9 floats per vertex (which is excessive; colours only need to use bytes, normals don’t need a full 32-bit float per component), that still only works out at 1.7 * 3 * 9 * 4 = 183.6 MB for the vertices plus 1.7 * 3 * 4 = 20.4 MB for the indices, so ~200 MB in total.

Realistically, you only need 3 bytes for colour and 4 bytes (GL_INT_2_10_10_10_REV)for normals (2 bytes is usually sufficient, but requires a bit more work), which gets you down to 1.7 * 3 * 19 = 96.9 MB for the vertex data (or 1.7 * 3 * 20 = 102 MB if you want 4-byte alignment).

Sharing vertices will reduce the memory consumption further, and also reduce the vertex shader workload.

[QUOTE=GClements;1291761]You aren’t going to get close to exhausting video memory.

Even if you have no shared vertices (so 3 vertices per triangle), and you use 9 floats per vertex (which is excessive; colours only need to use bytes, normals don’t need a full 32-bit float per component), that still only works out at 1.7 * 3 * 9 * 4 = 183.6 MB for the vertices plus 1.7 * 3 * 4 = 20.4 MB for the indices, so ~200 MB in total.

Realistically, you only need 3 bytes for colour and 4 bytes (GL_INT_2_10_10_10_REV)for normals (2 bytes is usually sufficient, but requires a bit more work), which gets you down to 1.7 * 3 * 19 = 96.9 MB for the vertex data (or 1.7 * 3 * 20 = 102 MB if you want 4-byte alignment).

Sharing vertices will reduce the memory consumption further, and also reduce the vertex shader workload.[/QUOTE]

That is about correct to my calculations as well, however, from what I read is that whether I use 1,2,3,4 elements per vertex, the GPU always allocates 4 (X,Y,Z,W) and the same was true for the colors. I was not aware of the normal vectors being reduced, but really normal vectors are 1/3 what is necessary as per vertex. Currently my scene has about 1.7 Million, but it will grow to be much more once I finish. My overall goal is to have as little memory footprint on the CPU side as possible. In either event, I am going to give it a go and see what happens. Just going to need help from the group to get just one object working under the current definition of what I currently have.

Thanks for your information in this thread.

[QUOTE=GClements;1291761]You aren’t going to get close to exhausting video memory.

Even if you have no shared vertices (so 3 vertices per triangle), and you use 9 floats per vertex (which is excessive; colours only need to use bytes, normals don’t need a full 32-bit float per component), that still only works out at 1.7 * 3 * 9 * 4 = 183.6 MB for the vertices plus 1.7 * 3 * 4 = 20.4 MB for the indices, so ~200 MB in total.

Realistically, you only need 3 bytes for colour and 4 bytes (GL_INT_2_10_10_10_REV)for normals (2 bytes is usually sufficient, but requires a bit more work), which gets you down to 1.7 * 3 * 19 = 96.9 MB for the vertex data (or 1.7 * 3 * 20 = 102 MB if you want 4-byte alignment).

Sharing vertices will reduce the memory consumption further, and also reduce the vertex shader workload.[/QUOTE]

I haven’t read all the info yet on normal vectors but does the GL automatically convert from float [3] to just four bytes for the entire normal [X,Y,Z] when the array is sent to the VBO?

Buffer objects contain exactly and only what you put into them. If you only put 3 floats into them, then they contain 3 floats. OpenGL doesn’t know that any particular buffer is supposed to contain vertices of a particular format until it comes time to render from them. So there is no way for OpenGL to automatically convert data to some other data within the buffer.

OpenGL can automatically convert the data when it reads it in vertex rendering. But this would be converting a 4-byte signed normalized normal value into a 3-float normal.