No, only some of the problems with the D3D10 method were there at first. The state correlation issue, in an immediate mode API, is basically a minor inconvenience. Yes, IHVs have to deal with it, but so what? They probably had to deal with similar issues in OpenGL too. It’s something you handle at render-time. You’ll notice that the state change penalty is usually assessed when you next draw something, not when you actually change the state.
The D3D10 method gets most of its disadvantages when you’re trying to build a command-queue-style API with it. For an immediate mode API, it’s more or less equivalent to OpenGL, in terms of IHV implementation and overall performance. But in a command queue API, you get all of the downsides of the OpenGL model, with none of the upsides of the PSO model. It’s basically the OpenGL model, where you have to do more work.
So I would say that the D3D10 approach was not significantly better or worse overall than the OpenGL model for that style of API. But in a command queue API, it’s strictly worse.
Furthermore, NVIDIA has demonstrated a great willingness to use extensions to replace any part of the OpenGL API that they feel isn’t fast enough. If they believed that the D3D10 method was significantly superior to the OpenGL model for their hardware, wouldn’t they have introduced immutable state objects via some extension?
I didn’t see them come up with extensions to replace the blend state or viewport state with state objects. ARB_sampler_objects was a collaborative effort, with far more AMD people involved than NVIDIA ones. And NVIDIA has always been rather skittish on VAOs. So I see no evidence that NVIDIA was sold on the D3D10 method being better for their hardware.
You may be confusing love for D3D10 overall with love for any particular element of the API.
As for your notion of “herd” mentality on PSOs…
AMD’s Mantle was really the first of these next-gen APIs, and sadly there’s very little information readily available about it. However, there is some evidence that Mantle uses something rather like PSOs (the line about rolling shader stages into a “single object” is telling).
Apple Metal and D3D12 could be said to be taken from Mantle, as they were all announced well after AMD’s effort. However, it should be noted that there are significant differences here.
Metal in particular is clearly designed for mobile hardware; it’s not blindly following a “herd”. Specifically, their equivalent of a PSO doesn’t include one very important thing: framebuffers. Why?
Because RPS’s are allowed to change within a command queue, but framebuffers cannot. This is done because changing framebuffers on most mobile hardware is a very, very costly operation. So they designed their API to force you to start a new queue (clearly a heavy-weight operation) if you want one.
NV_command_list does something somewhat similar; you can’t change the framebuffers themselves within a single token stream, nor can you change the images you’re rendering to. But their PSO doesn’t really capture the framebuffer; it captures the image formats and binding qualities, not the specific bound images. So you can use the same PSO with different sets of images, so long as those sets of images are all compatible, though you do have to use a new token stream (aka: command queue).
D3D12 by contrast appears to stick framebuffers entirely into the PSO’s state. And therefore, you can change framebuffers as often as you change any other PSO state.
So it seems clear that the details of these APIs differ. And that suggests careful thought, rather than succumbing to some form of “herd” mentality. Sure, they all use the PSO approach, but their differences suggest that they’re not blindly applying something.
Furthermore, while I don’t trust NVIDIA to play nice with others (unless it serves their interests), there is one field of endeavor in which NVIDIA has proven themselves highly adept: making their stuff go as fast as possible. They are perfectly willing to make numerous changes to the API, whether via proprietary extensions or in tandem with others, that makes their hardware perform beautifully. They make no compromises on this, and they do not succumb to “herd” mentality when it comes to performance (see their bindless graphics stuff as an example. Pointers in shaders?).
NV_command_list is all about performance. So if they adopted the immutable PSO approach to this extension, it is reasonable to assume that they have actual working knowledge of what’s faster on their hardware. Whether it’s faster on everyone’s hardware is up for debate.
Oh sure, it’s possible that they’re all following the same wrong idea from Mantle. But that would require that AMD got it wrong first; Apple copied it, change it, and still got it wrong; and then NVIDIA copied it and kept it wrong.
It just seems rather unlikely.