Quote Originally Posted by malexander View Post
So it's not impossible, but certainly isn't trivial either. If you are thinking about Mac OSX as a potential platform and want to use modern GL features, you'll be faced with this problem. Otherwise, I wouldn't worry about the core profile at all, and instead gradually upgrade the parts of your application that will benefit from modern GL techniques (whether it be performance or new capabilities).

And behold - here lies the problem with it all! Yes, we want our code to work on MacOSX and Intel hardware but what the theoreticians completely overlook is that management also has a say in the matter, resulting in the following:

- no rewrite from the ground up
- no change of general program flow
- no time consuming changes

Of course it's easy to say 'you should have done...' and other smart-ass remarks but they always fall way off the mark of reality. That's what some people seem to forget: The old legacy code exists, and in some form it needs to continue to exist, and worse, it needs to be kept operable on more modern systems.

So here it goes:

Quote Originally Posted by mhagain View Post
No.

The point is that this isn't a GL3.x+ problem; this is a problem that goes all the way back to GL1.3 with the GL_ARB_vertex_buffer_object extension, so you've had more than ample time to get used to the idea of using buffer objects, and more than ample time to learn how to use them properly.
Yes, tell that to the people who made the mess more than 10 years ago. I'd fully agree that it was badly designed but that's what I have to deal with and no considerations you make, will make the code go away.
But it'a complete bullshit anyway. glBegin/glEnd was a tried and true feature until GL 2.1 so whatever you are trying to say here goes way off the line. You are arguing from a theoretic standpoint, completely forgetting that what I have to deal with is code that actually exists and actually needs to be kept working.
Plus, the performance characteristics of both methods are so totally different that there's simply no 1:1 transition, that's why the old code was never changed.

Quote Originally Posted by mhagain View Post
Talking about it as though it were a GL3.x+ problem and as if it were something new and horrible isn't helping your position. Howabout you try doing something constructive like dealing with the problem instead?
Of course it's a GL 3.x problem, that's when the immediate mode stuff was deprecated and some driver makers decided to drop it without any equally performant feature to replace it.


And now to the other person who doesn't seem to have a grasp on the maintenance of old legacy code...


Quote Originally Posted by thokra View Post
Astrom: My take is simple: no legacy GL in new code.

If you're forced to maintain a legacy code base, usually due to economical, time and compatibility constraints, by all means, keep the legacy well and clean. As mhagain already stated: there are core GL 4.4 features you can already use even in legacy code, the most prominent being plain vertex buffer objects.
Sorry, that doesn't work. 'Legacy' doesn't necessarily mean to keep the old feature set. What if you want to upgrade to integrate some newer shader-based features but for some reason or another cannot afford to do a complete overhaul of your code base, be it for financial or time reasons. In that case you have to find a compromise.
So far the compromise has been the compatibility profile but at my workplace everybody is in agreement that this is a stopgap measure at best, and as soon as it's technically doable, migrate to a core profile so that we aren't locked to AMD and NVidia on Windows.


Quote Originally Posted by thokra View Post
I see you completely fail to see that it will be a huge or possibly massive investment of time anyway. The question is, do you invest the time in small steps, porting feature by feature, or do you go ahead and rewrite everything. Going from legacy to modern core OpenGL takes time and care - no doubt. Still, mhagain already proposed the first option - and he's right to do so IMO.
The orders are, not to do a huge investment of time. And as I already said, mhagain's proposal has already been nixed. It can't be done. End of story. Too much work for no gain. We'd have to do months of work with no result in sight, that's plain and simply not affordable.
So again, find a compromise that gets us where we want to be. (Yes, you read that correctly: The operative term is always to 'compromise'...)




Quote Originally Posted by thokra View Post
Proof please. I'm not aware of any D3D10 feature that so massively kicks the GL's ass. Or am I simply not aware of something similar to persistently mapped buffers in D3D10? I thought the only thing giving you an advantage over is D3D10_MAP_WRITE_NO_OVERWRITE with a D3D10_MAP_WRITE_DISCARD at frame begin.
The problem seems to be that buffer mapping is a lot more efficient with D3D than with OpenGL 3.x. All I can tell you that the buffer updates were killing us with GL but not when doing a D3D test setup.


Quote Originally Posted by thokra View Post
Yeah yeah, you mentioned that already - several times - in another thread. It's high time you tell us what frickin' exotic scenario you're talking about. Otherwise you'll simply stay in that magical position that no one here can disagree with because there isn't enough hard facts to do so. Cut the crap and get real.
I think I said this countless times before: The code I have to deal with is sprinkled with immediate mode draw calls, one quad here, one triangle fan there, and a triangle strip elsewhere. It's not exotic, it's just crufty, bad old code from another time. Due to the way all of this is done it's very hard to optimize. Since I am not allowed to disclose more information you have to trust my saying that the only way to port this to a buffer-based setup is to upload each primitive's data separately, issue a draw call and go on. The code is inherently tied to such an approach (which was all nice and well when it was written a long time ago)


Quote Originally Posted by thokra View Post
Microsoft did it. Maintaining backwards compatibility for over 20 years is ludicrous for something like OpenGL. D3D10/11 doesn't give a crap about the D3D9 API. The things is, even if you only leverage the features that comply to the D3D9 feature subset still supported by D3D11, you still have to code against the D3D11 API. You can't even use the old D3D9 format descriptors. No way you're gonna have a D3D11 renderer and still write stuff similar to glEnableClientState(GL_FOG_COORD_ARRAY), to mention just one example the make me want to jump out the window, while at the same time wanting to have kids with the GL 4.4 spec because of GL_ARB_buffer_storage.
... which ultimately was the reason why we decided against porting to D3D. As soon as you need to move beyond the currently set-in-stone feature set you are screwed. A new API every 3 or 4 years is deadly if you got to work with software that may exist for a decade or more and also needs to be kept up to date to a degree - not to mention that D3D11 is restricted to Windows 7, causing problems if it needs to be accessed from an older system.


Quote Originally Posted by thokra View Post
And what are we gonna do anyway? Suppose there had been a compatibility break and we now were forced to either stay with GL3.0 at max OR start rewriting our code bases to use GL3.1+ core features - what would have been the alternative? Transition to D3D and a complete rewrite of everything? Also, where I work, we're supporting Win/Linux/Mac - go to D3D and you have to write another renderer if you want to keep Linux and Mac around.
What would have happened? Easy to answer: The code would have stayed as it was, limited to GL 2.1 features with no chance of ever being upgraded, having our bosses quaking in their shoes that the old API won't eventually vanish completely.


Quote Originally Posted by thokra View Post
IMO, you have to make sure that people have to adopt - see D3D. That's where the ARB failed - by letting us use the new stuff and the old crap side-by-side. I seriously doubt many companies would have been pissed off enough to leave their GL renderers behind.
And here you are forgetting something:
D3D is mainly used for entertainment software, which MUST be current with actual technology. The 5 year old D3D9 engine won't make do anymore for a new product.
The same is not true for corporate software, which is often badly maintained, full of ancient cruft and something a company's well-being relies on.
It's absolutely unfeasible to go at this with the 'out with the old - in with the new' approach, management would balk at this. Again, the nice word 'compromise' must be mentioned. And it's clearly here where the compatibility profile comes in: Lots of high profile customers who simply cannot afford to port their software to an entirely different paradigm of working. The mere fact that a compatibility profile had to be established was a clear indicator that something was wrong with how the deprecation mechanism was used.

Quote Originally Posted by thokra View Post
And there is no substantial problem I know of that's solvable with GL2.1 but not with GL 3.1+ - if you have one, stop rambling and prove it with an example.
It's not about the inability to solve a problem but about the inability to redesign an existing solution without blowing it up. Face it, GL 3.x was completely missing an efficient method to do small and frequent buffer updates, resulting in horrendous CPU-side primitive caching schemes and similar crutches to reduce the amount of buffer uploads. I have written my share of those myself for other projects, all this did was cost a lot of time, while providing absolutely no performance increase over using immediate mode.
And frankly, this particular thing was the ONLY thing that was sorely missing from GL 3.x


Quote Originally Posted by thokra View Post
Name one feature you're missing from GL 3.1+ that forced you to rewrite your entire application. I'm very, very curious. If you're answer is gonna be what you repeatedly mentioned, i.e. immediate mode vertex attrib submission is king and everything else is not applicable or too slow (which is a hilarious observation in itself), I refer you to my earlier proposition.

See above: The inability to just put some data into a buffer without some insane driver overhead. Yes, just an efficient method to replace immediate mode draw calls. You may ignore this problem as much as you like, that doesn't change anything about the cold hard fact that our 'big app''s life depends on it.


Quote Originally Posted by thokra View Post
Liberally invoking draw calls? Since when is someone writing a real world application processing large vertex counts interested in liberally invoking draw calls? Please define liberallyand please state why you can't batch multiple liberal draw calls into one and source the attribs from a buffer object. Otherwise, this is just as vague as everything else you stated so far to defend immediate mode attrib submission.
Again: The code exists, the code needs to continue to exist, it's one of the backbones of our company that this application continues working.
Again: It's very old, it's very crufty and today would be written in a different way.
Again: All of this doesn't eliminate the fact that I have to deal with the code as it was written more than a decade ago and liberally expanded over the years.

It's a simple question of economics - a rewrite would be too costly. There's no point to discuss this. The decision has been made and I have to deal with this and make do with what I can do - which is merely picking out the immediate mode draw calls and replace them with anything that's compatible with a core profile and doesn't bog down performance.


Quote Originally Posted by thokra View Post
More than 15 years isn't enough? Seriously?
You cannot pull away the rug under some existing software in the vain hope that everyone can afford to take the time to reorganize all the data.

Quote Originally Posted by thokra View Post
See? That's what I'm talking about ... the code to do that, except for a few lines of code, is exactly the same. In fact, with persistent mapping, you have to do synchronization inside the draw loop yourself - a task that's non-trivial with non-trivial applications.
Huh? The point of persistent, coherent buffers was precisely to AVOID such schemes! Just write some data into a buffer, issue a draw call and go on, allowing perfect 1:1-translation of existing immediate mode code without any need of restructuring and none of the overhead from the inefficient way to specify vertex data in immediate mode.

Quote Originally Posted by thokra View Post
Persistent mapping is an optimization and it doesn't make rewriting your write hundreds of times easier. You, however, continue to state this perverted notion that persistently mapped buffers are the only viable remedy for something that was previously only adequately solvable with immediate mode ... Have you ever had a look the the "approaching zero driver overhead" presentation of the GDC14? Did you have a look at the code sample that transformed a non-persistent mapping to a persistent mapping? Your argument before was that you cannot replace immediate mode with anything else other than persistently mapped buffers. If you're so sure about what your saying, please explain the supposedly huge difference between an async mapping implemenation and a persistent mapping implementation - because you didn't say that async mapping was too slow because of implicit synching inside the driver or something (and that's AFAIK only reportedly so in case of NVIDIA drivers which really seem to hate MAP_UNSYCHRONIZED), you said you couldn't do it at all.
The problem with a non-persistent mapping (using glMapBufferRange) is that each time I want to write data to the buffer is to lock the buffer, write some data into it, unlock it again, and issue a draw call (since a draw call may not source from a mapped buffer.) And that process is SLOW!!! Sure it's doable but it's far from performant, it was significantly slower than using immediate mode, to the point where it bogged down the app. Same for updating with glBuffer(Sub)Data. From day one of working with a core profile, my one and only gripe has been that a low-overhead buffer update mechanism had completely been overlooked, it was all geared toward having large static buffers while forgetting that not everything is large and static and not all code is easily rewritten to keep data large and static.

That's the main reason I jumped for persistent, coherent buffers, with those the code is actually FASTER than immediate mode, even on NVidia where glBegin/glEnd still works fast.



Quote Originally Posted by thokra View Post
Again, there is nothing of importance you can't do with core GL 3.1+ that you can do with GL 2.1 - except for quads maybe. You have everyhing you need at your disposal to go from GL2.1 to core GL 3.0 - and everything you write then is still usable even if you then move directly to a GL 4.4 core context.
Aside from performance in some border cases, one of which our app unfortunately depends on, sure, you can do everything with GL 3.x core. (And from what I learned all these border cases stem from the convenience of using immediate mode drawing just like a simple 'draw something to the screen' function so it's something that has been heavily used in legacy code.)
The problem is that in order to make it work some more extensive rewrite may be in order if you are dealing with legacy code from another generation. And it's particularly that extensive rewrite that corporate programmers often won't be able to take.


Quote Originally Posted by thokra View Post
Even if it means a little more work, it's almost definitely solvable and never a worse solution. If I'm wrong, please correct me with concrete examples.

Yes, unless that 'little more work' you are talking about is being considered too much by management, than all your therories fall flat on their face with a loud 'thump'.




Quote Originally Posted by thokra View Post
Wrong again. Developers chose client side vertex arrays before VBOs because for amounts of data above a certain threshold, client side vertex arrays substantially improve transfer rates and substantially reduce draw call overhead. Plus, there is no way of rendering indexed geometry with immediate mode because you needed either an index array or, surprise, a buffer object holding indices.
Client side vertex arrays - just like static vertex buffers are nice when you can easily collect larger amounts of data. But they become close to useless if your primitives regularly consist of less than 10 vertices and on top of that are dynamically created and contain frequent state changes that break a primitive. Sure, you can continue to collect them, but you also got to collect your state along and in the end save no time vs. glBegin/glEnd. The world doesn't entirely consist of 100+ vertex triangle strips.


Quote Originally Posted by thokra View Post
Again, purely speculation - and stating the a buffer object supposedly performs better than immediate mode sometimes ... that's really something to behold. Unless the driver is heavily optimized to batch vertex attributes you submit and send the whole batch once you hit glEnd() or even uses some more refined optimizations, there is no way immediate mode submission can be faster than sourcing directly from GPU memory - not in theory and not in practice.
No speculation. You seem to operate from the assumption that once the data is in the buffer it will stay there. Yes, in that case buffers are clearly the way to go.
But believe it or not, there are usage scenarios where it's far more important to optimize the way of the data into the buffer than anything else. For a strictly CPU-bottlenecked app it doesn't matter one bit how much data you can draw with a single draw call, all that matters is to find the fastest way to get your data onto the GPU - and that's exactly my problem. Restructuring the code to allow better batching would cause maintenance overhead that's entirely on the CPU, where we are already at the limit and each small addition can be felt immediately.



TL;DR, I know, to make it easier to digest I'll post the summary separately.