Two issues with VBO

The first part of this thing has already been asked some time ago on beginner’s forum. After weeks of waiting I got no reply so I wanted to post it there.

1- About the (now not so) new NV whitepaper on VBO.
This paper has filename “Using-VBO.pdf”, it has been named on opengl.org homepage, I’m sure you got what paper I’m referring to.
It tells that VertexAttribArray and calls similar to VertexPointer or ColorPointer have a very high performance penalty and suggests to avoid calling that as much as possible.
Well, I still wonder how this could be possible since this calls also passes some very useful informations (for example number of components). Besides that, avoiding this call (when possible) looks somewhat cumbersome to me.
Do someone measured the “high performance hit” associated with this call? Unless there are very good reasons to optimize (say a >30% performance loss), I would not enjoy in doing this kind of management. Looks pretty difficult and somewhat VAR-fashioned to me.

2- This is a new stuff, not mentioned in the previous post. It takes some background however.
The vertex management API I’ve developed wraps around most hassle when working with vertices. As an added plus, it always uses the fastest avaiable memory (vbo if avaiable, according to buffer update flags).
A component I am developing is likely to create dozens (hundreds in a worst case condition) of small buffers. Since this component does not require high performance and it’s unlikely to be a bottleneck, I wanted to use standard arrays. Reason: choosing between few objects is fast, choosing between a ton of objects is much slower. I know for sure this was true for NV vertex programs (stated in a pretty old whitepaper) however I think this should true even now and also applies to other kind of objects such as FPs and textures.
So, to not cluttle precious VBO management resources (which need to be fast for the high-performance components), I wanted to go with standard arrays. The problem is that this requires to mix conventional arrays with VBOs.
Actually, that requires me to check if a VBO has been enabled for an array and disable it. In pseudocode, it looks like this:

// This thing will be referred as code (2a)
foreach vertexAttribute
    if(used)
        Bind buffer.
        Pass vertex attribute information (also take a look at (1) for this).
        Enable that array (generic attribute 0 is assumed to be always enabled).
Bind index buffer.
Draw all the vertices with current settings.
Unbind index buffer.
foreach vertexAttribute
    if(used)
        Bind buffer 0.
        Make sure the current array is unbound from the buffer (this requires a VertexAttribArray call as far as I know).
        Disable attribute array (attribute 0 is never disabled).

This looks somewhat cumbersome to me. Is there a better way to unbind the arrays? I was thinking about pushing and popping the client array state. Would it work? Are there suggestions you could point me?

Thank you!

EDIT: a personal comment, fixed pseudocode instructions order, better layout and some (admittedly useless instructions). Made more clear that I use only generic attributes with VertexAttribArray.

EDIT:
I realized there’s also another way to do (2), but I have mixed feelings on it. I would appreciate and advice from someone who’s mantaining a large code base.
Simply put, I realized I could just do

// This thing will be referred as code (2b)
Draw buffered things.
Disable buffering for all vertex attributes.
Draw unbuffered things.

While this could be potentially optimal, it requires to be placed correctly in the right spots. Right now this is pretty easy, but it could be an issue in the future.
Another problem is that external program does not have concept of buffered and unbuffered arrays. One could specifically allocate a buffered array but by default, the provided interfaces does not allow that. Since this concept is hidden away, it would conceptually be really ugly to call a function just to disable vertex buffers.
Performance-wise, (2b) is far better on most cases. While I’m somewhat worried for my ability to mantain it correctly, I could not think at more than few calls per frame to it.
The first method (2a) is hidden deep in the API and totally transparent to everything is outside the API itself but it adds so much overhead I feel I can’t sleep the night knowing it’s so lame. Considering I could have hundreds of VBOs flying around, I feel it could be a real performance hit.

Besides those two differences, I tested both methods and they both seems to work correctly. Not tested accuratly but I’m sure I should not have any kind of problem and tests gone very smooth.

I understand it’s difficult to point out a suggestion without knowing the details, but I would like to get some feedback, maybe something will arise in my mind with an help.

[This message has been edited by Obli (edited 01-31-2004).]