Hardware vs. software - again

I gave some more thought to the matter of finding out about software rendering, and decided to start a new, unlcattered thread with these thoughts.

I see two different uses for finding out about software rendering: detection and prevention. Detection means finding out that software rendering took place. Prevention means finding that software rendering will take place before it happens, so that the program can adapt, and not cause this condition.

I will start with detection, which I currently see as more important, and which should be easier to implement. My own experience, and a lot of questions in the “advanced OpenGL” forum, will testify that getting software rendering is quite common, yet it’s often not recognized, and even if we guess that this is the problem, there is no way to verify this. It would therefore be valuable to know that software rendering did take place. This is helpful not only for the debug and in-house testing, but the test can be included in the software to allow a user who suffers from a case of “very slow rendering” to diagnose the problem, and send a meaningful report to the developer (“your f*&^ing game is using software rendering on my expensive overclocked GeForce3”).

Detecting that software rendering had occurred could be done using a mechanism similar to glError, as suggested by Dodger in the original thread. That is, when software rendering takes place, a flag is set, and it can be read (and at the same time reset) by the program. I think that this should not be difficult to implement - the ICD can set this flag when it performs software rendering. The flag can’t be guaranteed to always be valid (by this I mean, you can’t really tell when it will be set), but it should be valid after a call to glFinish() or equivalent. (It’d be good to have other ways to guarantee it’s validity - perhaps similar to NV_fence - to allow to find more easily where software rendering happened within the frame.)

While I’m using the term “software rendering”, it’s actually made of several different operations. In the original message I used it for “triangle rasterisation”, but it could also mean “line rasterisation”, “buffer operations” and “vertex processing”. By software buffer operations I mean operations like copy to texture and using the accumulation buffer, that use copying from card memory to system memory, and would not have done so in a full hardware implementation. By “vertex processing” I mean T&L and vertex shaders. (Are there are other categories you can think of?) These different categories should be detectable within the flag (as different flags, or bits within one flag).

Now to prevention. While detection can allow finding out that software rendering had happened, it doesn’t make it easy to pinpoint the cause of software rendering. Being able to query for a software rendering condition before it happens can allow diagnosing the problem better, and possibly to eliminate software rendering seamlessly at runtime, by eliminating some features. Implementing a short “prevention” style code after context creation can help identify the problematic condition of software rendering, without the need for doing rendering, which can be invaluable for testing when some features are used only in certain places in the program (such as only in certain scenes).

The main problem with this is that it’s more difficult to implement. There is no way to guarantee that software rendering will or will not take place, as it highly depends on what the program is doing. However, there are cases where it is possible to say “a feature you have asked for will result in software rendering if you use it.” This is naturally only possible when the user has asked for a feature. For example, any OpenGL 1.2 context will provide 3D textures. Since this feature is not specifically requested, it can’t be reported to be in software. However, asking for EXT_texture3D can be made a specific request, and this guarantee (that software rendering will be used for it) can be made for it.

Such an indication that software rendering will happen (and I’m mainly talking about triangle rasterisation) can be made available in both mode selection (WGL) and as a function for extension querying. In addition, a query to test whether the current state will lead to guaranteed software rendering can be made available. For example, if line smoothing is enabled, a query of “will drawing lines result in software rendering” should return true if this is what line drawing will do. The result of this query should be interpreted as “yes” or “I don’t know” and not “yes/no”.

While prevention is more difficult to implement than detection, it should still be quite possible. The “is guaranteed to be software rendering”, or “yes / don’t know” interpretation, allows the ICD programmers to not test conditions that may be impossible or just too complicated to test. Yet, this can provide the benefits of prevention in most cases, and in the cases when this is not possible, it would always be possible to know in retrospect that software rendering occurred, if detection is also available.

I’d appreciate comments, especially from Matt and others “in the know”.

My answer remains the same.

  • Matt

Can you please explain why? Your original answer was “there are a number of other practical reasons why this is a bad idea”. You never explained beyond that. Can you at least explain some of these practical reasons?

He’s probably not allowed to, as it would require giving us internal details about the OpenGL implementation.

Possibly, but I would have hoped that he could explain things in general enough terms. It’s frustrating not to know what’s so wrong in the things that seem pretty straightforward to me (like knowing that software triangle rasterisation occurred, or adding tests like “if this is 16 bit mode on a GeForce, and you asked for a stencil, we’ll let you know that you’ll be rendering in software”).

Reminds me of asking Intel what extensions were supported on the i810. After a lot of back and forth, I finally got the reply “this information is confidential” and that I could ask about specific extensions and they could tell me yes or no. I asked them how confidential this can be considering that I could just go to an i810 based PC and get the extension string (and that I asked just because I didn’t have access to one), but still haven’t received a reply.

Okay, I realise that it’s not the same thing (it was just a nice anecdote but I still hope that Matt could say something about this.

[This message has been edited by ET3D (edited 05-14-2001).]

Yup, it’s frustrating not receiving a full answer, but I’m sure there are good technical (and not political) reasons.

So my suggestion is, why not do a kind of performance test yourself?
If glIsThisDoneSoftware was available, I’m sure that some driver’s would be made to lie.

Besides, “hardware acceleration” is not guaranteed to be faster than software. Imagine running on a CPU that can actually do the job quicker than the GPU. Maybe it doesn’t happen nowadays, but I remember having a S3 Virge and a P200MMX. Software beat the crap out of the S3. The GL drivers were a complete mess!

V-man

I think that I answered these questions in the original thread, but I’ll answer them shortly again:

I probably will make a benchmark, but it will be more limited, take longer to come to conclusions, more work to develop and generally an inferior solution (to both developer and user) to an API based test.

The difference in performance between graphics chips and CPUs is growing, so assuming that this trend continues, I can’t forsee a time where CPU rendering will be faster or even close in speed to hardware rendering. I don’t see a reason why the trend won’t continue.

BTW, AFAICR the ViRGE didn’t have an OpenGL driver. It’s the D3D drivers that were a mess. (I had a ViRGE.)

I read the other thread. Most people covered the basics, specially the fact that texture filtering is primarily what slows down software rendering. That’s pretty much the reason why a Geforce is not very useful for CAD apps, but is superb for games.

What you were asking for, is knowing when software rendering takes place. Perhaps if you were to ask “what part of the API can be done by the hardware, and we don’t care about cases where software fallback will occur”.

Besides, I think we all know more or less what will bring down FPS drastically. So you don’t need to make a complete benchmarking suite.

Don’t get me wrong. I would like to have this feature too. But the solution is not there and hasn’t been there from the begining. So don’t get overexcited and write up new threads. This stuff will be read and considered, I’m sure.

V-man

Actually, the GeForce should be pretty useful for CAD apps, due to T&L. If it’s not useful, it may be because the CAD program is using AA lines, and the GeForce does them in software (which has been the case in the past, don’t know how it is now). Again a hardware vs. software thing.

It’s not true that the situation is the same as in the past. In the “old days”, because OpenGL 1.2 became common, if the hardware didn’t have a stencil buffer, then the ICD wouldn’t have advertised it, and then it was obvious that stencil buffer rendering isn’t accelerated. Not a perfect solution for software detection, but a partial solution. You also pretty much knew that if an extension is available, then it exposes some hardware feature. These days it’s not true - anything you ask for, including extensions, can result in software rendering.

The problem is that we don’t always know what results in software rendering. We typically develop on a specific platform (GeForce family in 32 bit colour, most likely). We then move to another platform, and suddenly get a slow frame rate, and it takes some time to find out that the reason is software rendering, since we don’t usually expect it. It’s true that experience can help, and that it’s not a truly serious problem, but still, a way to know you’re doing software rendering easily would have been helpful.

The reason I started a new thread was mainly to return to discussing the main topic, since I thought that the original was getting a bit off topic, and since there were things I didn’t consider originally, and thought I’ll detail better in this post.

It’s a pity that what this comes down to (for me) is:

Cards like GeForce II’s give good performance for the most part for the CAD-like programs I work with on.

The drivers for these cards are optimized for Quake III-like games.

I get annoying slowdowns occasionally because I’m not writing a game, so I fall off the fast path - but it’s not something I can just hand wave away, saying “Oh, I’ll just profile my code at app startup and just leave out the swap so the user doesn’t see it”.

I ain’t writin’ a game here, folks!

It’s doubly sad when the developers of Quadro 2-class systems (or even high-end “workstation” OpenGL hardware) give the same reasons:

“It’s slow? Well, you fell off the fast path”
-ok, how do I stay on it?
“Can’t tell you that. Just profile it at app startup”

What, me bitter?

this is something where we could help ourselves instead of bitching to the driver makers, all we need is a table like so
btw the info is bogus.

               radeon  |  tnt1 | savage3

3dtextures HW(1.23)| | sw
antialiased lines sw |HW(6.5)|
max texture size
etc

the numbers are in what driver version the functionality came.
blanks mean its not possible.
NOTE there will prolly be a lot of little numbers denoting notes in the table but some info is better than none.

lets help ourselves (im willing to find out the info about the vanta)

ET3D: That’s not fair, to say that this is a new problem. In fact, it was a much bigger problem several years ago, when large portions of the OpenGL API were completely unaccelerated on the majority of drivers.

I think there is also a lot of fundamental confusion about “HW vs. SW” and “fast vs. slow”. I would assert that you, the app developer, don’t actually care about HW vs. SW; you care about fast vs. slow. This is not an on-or-off thing, either, since there are different degrees of SW and many different degrees of performance. Furthermore, you’ll have a very hard time building a table, because it’s usually not just one feature that does it; it’s combinations of multiple features at the same time. And when you start asking for that kind of information, well, some of it is definitely not the kind we’re going to share.

If anyone has real problems with this in a real app, they should consult with our devrel people to figure out what they need to change in their app, or, what we need to change in our driver.

  • Matt

Matt, I agree that there were some essential problems some years ago. Some cards didn’t have any OpenGL acceleration, or just “miniGL” for Quake engine support. A lot of hardware didn’t support all OpenGL features, and silently failed to use them, that is, there was no indication to the program that the feature wasn’t available. It may be argued that providing a software only rendering path in these cases is preferable (although I’m not sure that it is). So I agree that what I said might have beautified reality a bit

About “fast vs. slow”, yes, this is indeed what I care about, but it’s more than this. I think that I described it somewhere, but it may be buried in my long and convoluted first post in this thread, or in another thread, so I’ll explain it again shortly.

I do care first and foremost about speed, but I also care about potential speed. That is, I see a major difference between running on a chip that is too slow for my program’s basic requirements, and running on a chip that would be fast enough if I didn’t drop into software rendering. In the latter case, if I can drop some eye candy for good frame rates, I’d like to know this.

This feature is mainly a convenience. It’s a time saver for both developer and user. From the user’s point of view, it saves the need to tweak options manually, or run benchmarks I provide, just in order to get basic performance or image quality (which also alters the initial perception of my software). From the developer point of view, it provides a means to detect sources of bad performance more easily, and adjust to them more easily at runtime when running on hardware that I haven’t tested with.

Even though in many cases it’s a combination of features that may cause a need to drop to software rendering, or it could be just a code path that is not yet optimised in the drivers, there are cases that are much more obvious, and could easily be tested for. Things like using stencil or accumulation buffers, or 3D textures, or antialiased lines - unavailable hardware features that can easily be listed in a table. I admit that I’m not a driver writer, and certainly don’t know the innards of your chips, but I would imagine that such features, that can be put in a table, can also be provided by the driver to a program through some mechanism.

I don’t claim that this is a complete solution, but I also don’t require a complete solution. Even if not all potential software rendering scenarios are detectable, it should be possible, I imagine, to cover a lot of the most common cases that would be interesting.