Pipeline Newsletter Volume 4

http://opengl.org/pipeline/

lpMapBuffer – oh yeah!

void lpDrawElements(LPenum mode, LPsizei *count,
LPsizeiptr *indices,
LPsizei primCount,
LPsizei instanceCount)
Hmm… No indices from buffer objects?

Everything about this document is satisfying. Except one thing.

No date is mentioned. Should I be concerned about that? I mean, it all seems pretty solid at this point (except for the misc. state stuff). Are you still aiming for a summer or SIGGRAPH release?

Commentary and minor questions:

Longs Peak is interesting. First, it seems, from the example given, that the notion of a default framebuffer went away. So, how does one actually draw to the screen? Is that a non-GL function (like wgl/glx/agl)? Do we have to do our own buffering?

Texture rectangle types seem to have gone away. Will hardware that supported them bot not NPOTs be able to handle NPOTs with no mip levels?

I find it interesting that programs are still not instanced. If you want to use a program object over multiple objects, you have to re-bind all the attached objects. Should we expect program object copy operations to be cheap (ie, not storing multiple copies of the compiled shader, etc)?

I also recall, from the last Newsletter, that it was possible to create separate program objects for each shader type, and bind them to the context separately. Is that still do able? Because it’s not mentioned on the diagram that multiple programs can be attached.

The new lpMapBuffer turns me on…

BTW, how is non-serialized map buffer access not incredibly dangerous? Or is this one of those, “Our developers are big boys now. They can take care of themselves,” kind of things?

What I want to know more about:

Format objects. These have been danced around for far too long. I want specific knowledge of what they can do.

Hmm… No indices from buffer objects?
I assume if one is bound to the VAO, then the “LPsizeiptr” will act as an integer offset to start from. Otherwise, it is a pointer.

First of all, let’s not start discussion here, so we don’t delay ARB at their work :smiley:

Since I’m an addict I cannot resist to put some comments. Even on new function prefix, hehe :wink: I’ll just put comments as I read through. When I’m done I’m gonna drop dead. I’m a dead man walking anyway (MOV ECX, 6 \ REP CALL get_up_6_30_AM_go_to_work_and_return_home_by_11_00_PM).
If you plan to read this post, be advised: frustration ahead (don’t worry, I have 2 weeks off in July).

  1. “gl”/“lp” prefix - If that would be up to me it would be “og”, but it’s not up to me :slight_smile: ogBindBuffer = “Oh, Goodness! Bind buffer”. Yeah, luckily it’s not up to me.

  2. Debug context - interesting. My guess is that it’s gonna be (a little?) vendor-specific, which I think is good. Just a guess though.

  3. “Good progress was made defining what the draw calls will look like. We decided to keep it simple” - I wanna KISS you guys. As Einstein said: “Everything should be made as simple as possible, but no simpler”.

  4. No comments on buffer objects. No need to :slight_smile:

  5. Same for object management

  6. glDrawArrays - ha! Multiple TRIANGLE_STRIP’s with one draw call. I wonder if “first” and “count” arrays could also become buffers stored on GPU. That would be nice. Also primitive type could be an array to allow mixing TRIANGLE and TRIANGLE_STRIP in one call. I bet some geometry optimization libraries could take advantage of that. Of course there is no strong need for such feature in LP, methinks. Mt Evans maybe.

  7. “We’ll continue to show you details of Longs Peak in future issues of OpenGL Pipeline”. I’m gonna measure time in a new unit: “2 OpenGL Pipeline Newsletters ago” sounds fashionable… Ok, seroiusly, this sounded like LP will be out after at least two more newsletters (6 months?). Perhaps that’s why this fragment got my attention.

Sorry for the frustrated tone of my post. Ok, time to drop dead… Err… I mean go to bed.

The “lp” prefix is a placeholder, and will not be used in the final spec.

The “lp” prefix is a placeholder, and will not be used in the final spec.
Yes, I know. It’s stated clearily in the newsletter. I just felt in mood to put something into that place. As I said - let’s not start discussion over it here. I think comments are welcome on everything, but discussion should rather focus on important things.

Hmm… No indices from buffer objects?
When I looked at glDrawArrays I thought it’s nice that we can define multple areas of vertex array to draw in just one call.
It would be nice if we could do exactly the same thing with index arrays. Would be usefull for frustum culling for example.

Ok, I swear I’m really going to bed now. I wanna be capable of thinking at work tomorrow :slight_smile:

You will be able to source indices from buffer objects in LP.

Keep in mind, the Pipeline newsletter submissions were written and submitted some weeks ago. There has been steady progress since that time on the LP spec effort.

Originally posted by Korval:
Longs Peak is interesting. First, it seems, from the example given, that the notion of a default framebuffer went away. So, how does one actually draw to the screen? Is that a non-GL function (like wgl/glx/agl)? Do we have to do our own buffering?
Noticed this on page 8:-

Create a framebuffer object to render to. This is the fully general form for offscreen rendering, but there will be a way to bind a window-system provided drawable as a framebuffer object, or as the color image of an FBO, as well.

So presumably that means there’ll be a wglGetImageObject(HDC) call somewhere. I just wonder how that’s going to work with back buffers and quad buffered stereo. Maybe they’ll have wglGetBackImageObject(HDC), wglGetFrontImageObject(HDC), wglGetBackLeftImageObject(HDC) and wglGetBackRightImageObject(HDC).

It’s about time you could map a range of a vertex buffer, been in d3d for years.

The whole object model does look very nice indeed now I see it written down.

void lpDrawArrays(LPenum mode, LPint *first,
LPint *count, LPsizei primCount,
LPsizei instanceCount)

Finally, instanceCount is used for geometry instancing; the entire set of ranges will be drawn instanceCount times, each time specifying an instance ID available to the vertex shader, starting at 0 and ending at instanceCount-1.

Wouldn’t it be nice to specify the instance ID on my own? Something like

void lpDrawArrays(LPenum mode, LPint *first,
                  LPint *count, LPsizei primCount,
                  LPint *instanceIDs,
                  LPsizei instanceCount)

If that is possible I can store all objects of a similar type (for example trees) in one VAO and use instancing for rendering. I have only one got big uniform object with an array of transform matrices, colors, etc. which I have to bind. Then some culling algorithms check which of these are seen and those instance IDs are passed by my modified function. Then in vertex shader I have all information to render all seen trees. With this method I only need one draw simple draw call.

I don’t know long peaks/the object model very good but its just an idea…

“As I said - let’s not start discussion over it here.”
Well, this is a discussion board…

I demand, that the next newsletter is posted at 10 AM European time, not American, this is just too late.

The only thing, that comes to mind right now, is that drawcalls do not include an “offset” parameter for the indices (offset added to each index, not the thing the “first” parameter is used for). As discussed in several long threads, that i leave to other people to find links to.
Just want to make sure, it is not forgotten.

I also am wondering, whether rectangle textures might be gone for good. I am not sure, whether i think this is a good thing, maybe it is, but i do find them useful, even with NPOT textures, so, as long as hardware can be made faster with rectangle textures than with NPOT textures, i’d like them to stay (including non-normalized texture coordinates).

I like the idea, to remove the framebuffer and use FBOs throughout the pipeline. The question, how we present the result on screen is of course a valid one.

I find the “Non-serialized access” weird. If i am forced to make sure i don’t do anything wrong, using semaphores and sync-objects, doesn’t that come down to the same stuff, the driver needs to do? I am pretty sure only very few people would use this fragile feature. Not with partial and whole buffer invalidation, which seems to be a good idea.

None of the ideas for shader/program objects, i have read/used so far have convinced me. The ARB_xxx_program system was a mess, the current GLSL system is not good and the way i understand the lp system it seems not to really change. Also, i fear i need to link or create new program objects, every time i only want to bind a different texture to the shader / pipeline.

Well, some interesting information, nothing that surprised me (though that’s certainly a good thing).

Jan.

I’ve never seen the point in texturerect. I must be missing something. For me they just clutter up glsl.

About Corrail’s idea to specify ones own instance IDs: That’s GREAT !!!

Non-serialized access is roughly equivalent to the non-blocking MapBuffer extension on OS X OpenGL (flush buffer range).

It turns out to be straightforward to use if adopted as part of a “write once” policy; if for example you set up a large VBO, map and write the first N KB, then draw that, then map and write the next N KB, and draw that, etc - having this option means the second map need not wait for the drawing of the first batch to complete, giving you concurrency. But note, in this example each block of data was only written to once and so there is no risk of a scheduling hazard.

Using the non-serialized option is much more difficult in situations where you might overwrite a section of the buffer with previously specified data in favor of new data, and if you are not handy with fences, should definitely be avoided.

Note that you can still potentially get concurrency (avoiding a block condition in MapBuffer) if you specify write-only, invalidate-range and explicit-flushing, because this gives the driver an “out” to optionally provide an efficient scratch buffer for your writes, which it can deliver into the specified buffer range later, based on the flush mapped data calls you must make after writing when explicit flush is enabled.

(why/how is this different from BufferSubData? because it allows you to maintain any representation for source data that you like, and to be able to uncompress directly to the destination using your own code - BufferSubData would not allow this, the source data must be in a copyable form)

If that is possible I can store all objects of a similar type (for example trees) in one VAO and use instancing for rendering.
Um, you can do that currently (well, currently as LP stands). What you can’t do is specify directly what the ID will be. I don’t really see a problem with that. Is an increasing counter not good enough to index into an array?

I mean, in the typical case, the contents of that array are being updated every frame. So, you’re going to have to fill out the buffer anyway; you may as well fill it out in order.

The thing I like about this is that it suggests that this functionality will be properly emulated transparently for non-DX10 hardware. That way, we don’t have to worry about it. Worst-case, it acts like a number of repeated draw calls that modify a uniform value, which would be what you would have done in the first place.

Just want to make sure, it is not forgotten.
It probably hasn’t been forgotten so much as made unnecessary.

If VAOs do all the validation and so forth up-front, then there is an open question as to the need for such a thing. After all, the impetus for the call is performance based on the current GL API. It’s entirely possible that simply making a VAO for each of the things you want to render and swaping them in/out as needed will offer all the performance benefits of the parameter.

The ARB_xxx_program system was a mess, the current GLSL system is not good and the way i understand the lp system it seems not to really change.
What is the specific problem?

I’ve never seen the point in texturerect.
The main point was to have NPOTs before we had NPOT hardware. It’s rather important to be able to at least expose something of that hardware, such that we can create unmipped textures on hardware that supported rects and not generalized NPOTs.

It turns out to be straightforward to use if adopted as part of a “write once” policy
Sounds like the perfect way to implement a GL 2.1 wrapper :wink:

Instance ID’s - we usually have to specify some instance-specific data anyway (like transformation matrix), but it could be usefull if that data is static and we just want to skip some instances (frustum culling). It would give some extra performance in this case.
But what if someone is drawing a bunch of particles and he doesn’t want to do frustum culling for every one of them individually (that would be crazy)? Then such array of instance ID’s is unnecessarry and introduces additional performance cost.
So if there will be such functoinality, then I should be able to pass NULL as a pointer to this array, so my instances will receive ID’s automatically.

Also, i fear i need to link or create new program objects, every time i only want to bind a different texture to the shader / pipeline.
Why would one have to link or create new program object when binding another texture? It doesn’t make any sense to me.

At last, it’s been released (of course I’ve been reading it since before it’s release :smiley: ).

Everything looks good, apart from no spec :frowning:

Judging by the comments in this thread, it seems like we could use an ‘lpSuperDrawArrays’ and ‘lpSuperDrawElements’:

void lpSuperDrawArrays(LPsizei instanceCount, LPsizeiptr *instanceIDs.
                       LPsizei primCount, 
                       LPenum *mode, LPint *first,
                       LPint *count, LPint *offset)
void lpSuperDrawElements(LPsizei instanceCount, LPsizeiptr *instanceIDs,
                         LPsizei primCount,
                         LPenum *mode, LPsizei *count,
                         LPsizeiptr *indices, LPsizei *offset)

Where offset is the value to add to each index read from the buffer. instanceIDs[i] can be NULL if want to draw all instances of primitive i and instanceIDs can be NULL if you want to draw all instances of all primitives. Same for offset.

I also rearranged the parameters, they seem to make more sense to me like this:, i.e. draw 1 instance of 2 primitives.

Of course this raises the question, why can’t all these options + all the objects that are bound to the context (fbo/vbo/program objects/etc…) be wrapped up into a ‘Draw Object’? Then the draw command is just lpDraw(drawObject);

Regards
elFarto

When you also introduce a “drawobject”, that would mean you need to have an awful lot of drawobjects. I am thinking about my octree, which has over 20 thousand nodes and through culling there can be many different combinations of parts of the array i want to render. So, doing this on-the-fly is, in my opinion, the way to go. Otherwise, in this scenary, i would be creating and deleting hundreds of drawobjects every frame (or even worse, i would need to reuse them, for performance reasons … if they would be mutable, at all).

“Why would one have to link or create new program object when binding another texture? It doesn’t make any sense to me.”

I have to admit, that i haven’t fully understood what the program object contains and which parts can be changed, without relinking it. However, it sounds like, when changing some buffer-object (that contains uniforms), the shader needs to be validated (linked?), because that buffer might have a different layout. I hope it is not that way, because changing a buffer object is a very common operation.

Jan.

Originally posted by Jan:
However, it sounds like, when changing some buffer-object (that contains uniforms), the shader needs to be validated (linked?), because that buffer might have a different layout.

If the buffer has different layout, you will almost certainly need different program object because the shaders must be updated to match the layout so layout will likely be part of the immutable attachment properties of the program object.

It is possible to modify the attachment point of existing program object to reference different buffer with the same layout so, when swapping multiple buffers with the same layout, you can use single program object.

To me, VAO’s look a bit scary. If it really means, I´d have to create and fill out a specific VAO whenever I want to draw a certain range out of one or more buffer objects it could be even worse than Jan’s 20000 statically allocated VAOs.

I´m using a number (ca. 100) of buffers of medium size (ca. 4mb) to upload geometry on the fly. Because of this, each geometry might end up in a different buffer at a different position each time it gets uploaded again. Sometimes, even the attribute count differs between geometries (so they don’t have a homogeneous layout inside one buffer).

This means: every time, geometry is uploaded, I’d have to recreate a corresponding VAO and for each uploaded piece of geometry (several thousands) I´d have to keep a VAO around.
Of course, at draw time I´d save alot of calls with this model!

I just want to make sure, that LP is ready to create/delete/recreate/keep several thousand VAOs :slight_smile:

If a VAO really means an immutable combination of:
buffer(s), generic attribute Id(s), offset(s), size(s), type(s) and strides(s)
it would sound logically to me to include the indexbuffer into it as well… and we would wind up with what elFarto called “DrawObject”.

What just comes to mind: often you need to render the same object, but with different attributes. Imagine for instance, a z-fill pass, where you´d need only the position (no normals, no texcoords). Would that mean, I´d have to create another VAO for each object just for this pass?
Of course, if the shader is not using the additional attributes, it should not hurt correctness, but since the ‘normal’ VAO also specifies normals and texcoords the driver believes I´m accessing buffers which in fact are not accessed. The driver then possibly blocks those buffer objects unnecessary.

Anyway, looks like VAOs need some more detailed explanation :slight_smile: