Thread friendly GL_COMPILE through gl{Pause/Resume}List()

BoeroBoy · January 16, 2007, 9:21am

One of the reasons OpenGL isn’t very thread-friendly is that once you call glBegin or glNewList, you need to finish what you’re doing first.

Would it be difficult to implement some kind of context switch to put an incomplete display list on hold. I frequently find myself saying

I need to render to the frame buffer, but I’m already in the middle of an important display list that will be used later.
How great would it be to call

glPauseList(name)

so immediate rendering can be done without waiting for the list to finish? When the extra task is finished, call something like

glResumeList(name)

I think this would be a huge boost to performance when using display list background workers.

I’m not exactly an expert on the pipeline, and I did a minimal search to see if this has already been suggested, so sorry if my concept is laughable.

John

Zengar · January 16, 2007, 9:47am

I see no future in this approach if you consider that DL will be no more present in future GL versions (there is no reason to use display lists even now)

If you want to create a DL in background, just create two contexts with shared lists in two different threads. You can use one thread to draw to DL and another one to draw to the window.

BoeroBoy · January 16, 2007, 10:48am

WHAT!?!? Is there an article or link you can point me to about why anyone is eliminating display lists in the future? I’m shocked (well not exactly) but I don’t follow your facted opinion that there is no reason for display lists. Especially about eliminating a harmless construct that gave OpenGL an edge on the slower systems of 10 years ago.

If you can tell me how I can render 100000 vertices per frame @60 fps using less than 1% of even a PII’s CPU time without using display lists, please let me know.

I never even thought of the contexts with shared lists. Thanks for the suggestion.

system · January 16, 2007, 11:38am

That’s what ATI and NV are planning.
Read the part about “OpenGL: What’s Next”
http://www.gamedev.net/columns/events/gdc2006/article.asp?id=233

Zengar wasn’t clear. DL and glBegin and all the old stuff will be in a GLU like library.
Core GL will just be the bare minimum, at the hw level.

As you might know, old graphics cards don’t get updated drivers so they won’t see the change.
This stuff is likely for this years cards and above.

I’m sure some old Matrox card still doesn’t have VBO support.

Korval · January 16, 2007, 11:58am

Maybe you missed the part in that article where they were talking about geometry-only display lists being part of the API, as opposed to layered.

But in any case, Zengar is right on one thing. The Longs Peak API will formally abandon all that glBegin/End nonsense. Object creation will be atomic, and context sharing is a feature of objects.

BoeroBoy · January 16, 2007, 12:30pm

Well I definitely don’t get as much time as I’d like to keep up on the latest OpenGL technology lately.

This is the first I’ve heard of Longs Peak (out of it, I know). It sounds like they’re turning into DirectX. I thought the whole brilliance of OpenGL was that it’s not object-based. Oh well, as long as there’s a nice modern object-oriented API for it, I’ll be happy.

I have had a steady eye on nVidia’s CUDA development. I applied for the API, but I have no justification for it and I didn’t get accepted. I think to run the entire program on the GPU sounds like it has loads of potential.

Korval · January 16, 2007, 2:20pm

I thought the whole brilliance of OpenGL was that it’s not object-based.
Yeah, you were wrong about that. So was SGI.

bmerry · January 16, 2007, 10:27pm

There is a difference between objects in the programming language and objects in the API. OpenGL has had texture objects since 1.1. Long Peaks is going to change the way objects are handled in the API (e.g., they will probably use a driver-assigned opaque pointer rather than a GLuint handle), and it looks like there is going to be a bewildering profusion of object types all in the name of efficiency, but I’m pretty sure that these will not be objects in the C++ sense.

Zengar · January 16, 2007, 10:36pm

Originally posted by BoeroBoy:
If you can tell me how I can render 100000 vertices per frame @60 fps using less than 1% of even a PII’s CPU time without using display lists, please let me know.
Vertex Buffer Object

As the matter of fact, I find the object-based approach more convenient for modern hardware (it is just more natural). The state machine abstraction is very convenient, but objects as pluggable entities can make it very customizable.

BoeroBoy · January 17, 2007, 8:24am

I never had great luck with VBO. Not as much as display lists. I suppose it depends on your platform (QuadroFX in my case).

My background as a hobbyist was in QuickDraw3D using objects. The transition to OpenGL was such a relief. Oh well, as long as it optimizes bandwidth to the card…

I know there is a difference between objects and API objects. I just hope there’s a strong correlation. Even standardized enums would make current OpenGL eons easier with autocompletion. As long as naming doesn’t go near Microsoft with their crazy typedefs. I don’t want to waste time with LP_LONGSPEAK_JIGGLYJUG objects that are simply typedefs of other crazy typedefs (P_LS_JG*, etc) that whittle down to (void*). Que horror!

Zengar · January 17, 2007, 12:06pm

Well, that’s strange, VBO’s are usually faster (or at least not slower) then display lists, and easier to manage…

And it’s true, DirectX always scared me off with constant mnemonics OpenGL needs just a major cleaning, because 10 years of extensions and hardware revolutions made it a bit messy.

system · January 18, 2007, 12:55pm

Originally posted by Korval:
Maybe you missed the part in that article where they were talking about geometry-only display lists being part of the API, as opposed to layered.

Yes, you are right. They were considering it. I think they should just flush them. One way of storing vertices is enough.
When they post their reasons for that, it would be better.

Korval · January 18, 2007, 5:10pm

I think they should just flush them.
The reasoning behind geometry display lists is that only the driver can know what is truly the most optimal format for data. There’s no real way to communicate that information to the user. Therefore, it is a reasonable optimization for static geometry.

system · January 19, 2007, 12:24pm

That’s what I think the reason would be but I prefer having a query of some kind. Some way to know what the prefered formats are, like a string returning “V3 V3N3T2 V3C4T” something like that. FOr the moment, this info might be found in some pdf on the NV site.
Also something for alignment.

Because how else are we suppose to know what is prefered? Check pdf to know which GPU wants what format. But then we have to figure out what GPU is in the user’s system.

I prefer to put what the GPU wants in a VBO myself.

Korval · January 19, 2007, 1:11pm

“V3 V3N3T2 V3C4T”
Yes, but it’s not nearly as simple as that.

It can be questions like whether you can, or should, separate attributes in different VBOs, how many separate buffer objects you can use in that way, what the data format of the data is (int, byte, float, etc), etc.

In modern hardware, basically any combination of vertex data is allowed at “regular” performance (understanding that the more stuff you send, the lower the performance). It’s not the combination anymore; it’s whether you have the right formats in X buffers, etc.

system · January 21, 2007, 12:29am

I think different buffer objects doesn’t matter. If it does matter, find a way to let us know.
In my opinion, GL needs to tell us what the GPU can do. It’s the cleanest way to program our apps/game.

I can understand that it’s GPU is unique and GL tries to be agnostic. Unfortunatly many people want to use it and want to know what the GPU is capable of.

Komat · January 21, 2007, 4:23am

Originally posted by V-man:
I think different buffer objects doesn’t matter. If it does matter, find a way to let us know.

If your program does not have bottleneck in vertex fetching, the layout does not matter. Otherwise even size of single vertex in buffer might matter. (e.g. having 64bytes vertex might be better than having 50bytes vertex) Similarly if you are not limited by vertex processing, effective use of post transform caches on hw without unified shaders is not important.

In my opinion, GL needs to tell us what the GPU can do. It’s the cleanest way to program our apps/game.

There are two levels of such knowledge.

One level is “this shader/feature is HW accelerated and I am this GPU”, this is the level the application really should know to decide what to do (e.g. disable effect on GPUs that can not do it or are known to be slow to do it).

The second level “if you do this in such way, it might be faster” is too complex and hw dependent to communicate to the application in usable way without complex queries that might still be unable to communicate something specific for target hw. Even DX9 with its millions of caps bits and queries does not have this. Such queries would only complicate the application so it is best to let that work on the driver (in case of the geometry only lists) or IHV papers.

One thing that would be really useful for applications is ability to store and latter load opaque content of driver optimized objects (e.g. compiled shaders, geometry only lists) without need to take the optimization cost again until hw or driver changes.

system · January 22, 2007, 12:27am

If your program does not have bottleneck in vertex fetching, the layout does not matter. Otherwise even size of single vertex in buffer might matter. (e.g. having 64bytes vertex might be better than having 50bytes vertex) Similarly if you are not limited by vertex processing, effective use of post transform caches on hw without unified shaders is not important.
Yes, obviously, but the discussion is about what the driver should communicate to the app so that in the app, we make the best choice.

Korval was talking about sourcing a vertex from VBO 1 and normal from VBO 2 or something like that.
I think that’s too complicated to communicate to the app.
Something basic like “in case of single VBO for all your vertex, normal, texcoord”, use V3N3T2 or V4N4T2, etc
There are cache line size related implications.

They could even tell us what the mem alignment should be. glGetINteger(GL_MEM_ALIGN_VBO, …)

The second level “if you do this in such way, it might be faster” is too complex and hw dependent to…
Yes, but I think some basic info is sufficient.

There isn’t even any way to know if a GLSL shader would run in hw or soft. A simple query is enough to tell us a shader fits or doesn’t fit in the “shader instruction cache”.
I don’t find the “scanning logfile for the word software” a clean method.
It’s better if compilation just fails.

Komat · January 22, 2007, 5:35am

Originally posted by V-man:
Korval was talking about sourcing a vertex from VBO 1 and normal from VBO 2 or something like that.
I think that’s too complicated to communicate to the app.
Something basic like “in case of single VBO for all your vertex, normal, texcoord”, use V3N3T2 or V4N4T2, etc

Question is if gains from such limited query capability would be worth the work necessary to implement it in application. With driver optimized lists the driver has possibility to optimize the data based on any knowledge it has even if that knowledge is not communicable to the application.

Korval · January 22, 2007, 9:08am

Something basic like “in case of single VBO for all your vertex, normal, texcoord”, use V3N3T2 or V4N4T2, etc
Yes, and what about those of us who have long since abandoned concepts like “vertices” and “normals” in lieu of generic glslang attributes? How do we get information about vertex formats? What is a vertex format in that case?