PDA

View Full Version : Vertex Array Error



Rodrix
05-04-2006, 07:30 PM
Hi guys! :)
Let's see if anyone can help me with this. I am stuck! :(

I am implementing Vertex and Color Arrays for my point sprite particle System. The problem lies in the glDrawArrays call (I discovered after a lot of debugging) and it returns:

"Unhandled exception in ogl_particle_system.exe:
0xC0000005: Access Violation"

However the problem only occurs if the number of particles used is big. For 30000 particles it returns the error (for 3000 particles it works fine).

I would like to fix this so that I can use big number of particles, but I don't know what's wrong.

My calls are:

GLsizei i=0;
while( pParticle )
{

m_PVertexArray[i].r=pParticle->m_vColor.x; //Red
m_PVertexArray[i].g=pParticle->m_vColor.y; //Green
m_PVertexArray[i].b=pParticle->m_vColor.z; //Blue
m_PVertexArray[i].a=pParticle->m_Alpha; //Alpha

m_PVertexArray[i].x=pParticle->m_vCurPos.x;
m_PVertexArray[i].y=pParticle->m_vCurPos.y;
m_PVertexArray[i].z=pParticle->m_vCurPos.z ;

i++;
pParticle = pParticle->m_pNext;

}


// enable and specify pointers to vertex arrays
glEnableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);


glVertexPointer( 3, //3 components per vertex (x,y,z)
GL_FLOAT,
sizeof(PointVertex),
m_PVertexArray);
glColorPointer( 4, //4 components per vertex (r,g,b,a)
GL_FLOAT,
sizeof(PointVertex),
&m_PVertexArray[0].r); //Pointer to the first color

//THE PROBLEM OCCURS DURING THE EXECUTION OF THIS LINE WHEN i IS BIG
glDrawArrays(GL_POINTS, 0 //Starting at 0
, i+1); //Rendering i+1 points

glDisableClientState(GL_VERTEX_ARRAY); // disable vertex arrays
glDisableClientState(GL_COLOR_ARRAY);
My point vertex structure is:

struct PointVertex
{
GLfloat x,y,z;
GLfloat r,g,b,a;
};My Point/Vertex Array is correctly defined at the init of the program:

m_PVertexArray = new PointVertex[m_dwMaxParticles]; Where m_dwMaxParticles is the specified maximum. In the example above is 50000.

What's going on!? Does glDrawArrays have a maximum limit for the size of the Count parameter?!!!!?! :confused:

Thanks so much in advance!!!

Cheers!
Rod

Relic
05-05-2006, 02:11 AM
There is only a theoretical limit for glDrawArray on 32bit systems, it's the range of GLuint 2^32-1 which makes your system memory run out before you reach that.

There is one bug and a some unsafe things in your code:
1.) After your while loop "i" is already the count of vertices, glDrawArrays with i+1 is wrong. When you have specified m_dwMaxParticles that might crash the way you've seen.
2.) The while loop should be while( pParticle && i < m_dwMaxParticles) to avoid crashes in case you have more particles than the array can hold.
3.) If the structure is x,y,z,r,g,b,a I would initialize it in that order for optimial cache performance.
4.) glVertexPointer address should be written safer the way you did it for colors: glVertexPointer(3, GL_FLOAT, sizeof(PointVertex), &m_PVertexArray[0].x);
5.) Check if you really need to toggle the vertex array enable state all the time. Most of the times you will need the vertex array anyway if there is more drawing going on with arrays.
Also verify that no other array is enabled.

If that doesn't help you might have found a driver bug. Check which OpenGL implementation you use with glGetString(). If it's not Microsoft try updating your graphics driver.

Rodrix
05-05-2006, 09:09 AM
Oh no..... :(
I fixed all the errors you found... but still nothing. Same error persists.

Ok this is weird. I catched the maximum permitted value I can use before it crashes. It works well with 16623. With that number +1, it crashes!
The crash occurs on the same line glDrawArrays.

I am using NVidea 's GeForce XFX5500 2.01 Driver... (Just updated last week).

This is weird...
Any genius out there that can crack this puzzle?!

Thanks everyone!
Cheers
Rod

knackered
05-05-2006, 09:15 AM
In all honesty, I think you should give up programming.
If you're not going to give up, go to the beginners forum.

Rodrix
05-05-2006, 09:40 AM
Originally posted by knackered:
In all honesty, I think you should give up programming.
If you're not going to give up, go to the beginners forum. Ok, maybe I thought that this was an advanced question and turned not being advanced, but why so much flaming is necessary?!

Man, for me programming is a hobby and I am no proffesional (like you mentioned you are), but how the heck should I know with certainty if my question belongs to the Begginers or Advanced forum!? There are no parameters to determine that. When I think that the topic I am asking is not Begginers then I post it in the Advanced. I can also be mistaken. And I don't mind if someone answers me "Hey, you know, that question should go to the Begginers' forum since it's actually not advanced.". That's ok, I don't mind reciving that answer, and I would reply just fine: "Sorry, I didn't know, I will make sure I don't ask questions of that topic on the advanced forum".

However I don't think that your agressive reply is appropiate to this forum.

Please don't answer back. I don't want this to be a flaming post. Thanks.

jide
05-05-2006, 12:08 PM
Also think about GL_MAX_ELEMENTS_VERTICES.
It is not advised to put more elements than that limitation.

jide
05-05-2006, 12:12 PM
Okay, even if it seems not so an advanced topic, don't read what some might write: they just live in their megalomaniacal vanity beach but never went to swim in the ocean. They would have liked to be moderators but failed, so they must put their anger on everyone seeming inferior to them.

knackered
05-05-2006, 03:10 PM
jide, I keep meaning to ask. It says you registered in this forum in July 2001, yet you continually ask quite basic questions about OpenGL - why is this? Is it like Memento, where you started off knowing everything but as time passes you've forgotten more and more?
Or were you in some kind of motorway accident, where some projectile took a chunk of your brain away, like what happened to poor old Gordon Kay?

Rodrix
05-05-2006, 03:55 PM
Knackered:
If you think I should give up programming, I think that you should definitely give up this forum, as your answers are completely inappropiate.

Moreover, I suggest that you join an anger management programa, or invest in some psychological advice to understand which are the causes of your innapropiate and irrespectful attitude.

Wishing the best for you,

jide
05-05-2006, 05:27 PM
to the community: sorry.

to knackered:

And after ? What's the problem ?

No it's more like Fight Club :-D And I can see you toke 3 hours to find an 'appropriate' answer to my post as you were online almost the same time I posted my last post.

You're always the same knackered, so it has been written you will never change, slave of some kind of a desperate destiny: GL isn't a goal for my life: may the quantum neuronal transmissions make you understand this point. But you might need some milk to try to understand this point.

I don't want to do the same as before. Here is not the place. So I won't turn around anymore.

EDIT: between, you have not understood Memento. Try to see it again and again so you might understand what really is his problem.

EDIT 2: Well, is it really the good place to guess what's happening to people ? And what if I have had an accident ? Would that point makes you happy ? Feel better ? What if I ask you if someone made a trauma to you when you was a child ?
Come on. If you read all the posts carefully (and I'm pretty sure you do), then don't reply if they don't interrest or hurt you and make and endearing to your girl friend (if you have one --- and I don't wanna know if you have one !!!).

No knackered (does it mean a past thing ??) you really disappoint me, no smart thoughts, no humanity, no evolution, no changes, no feelings (but bad ones), well nothing what...

I told some months ago I will ignore you. I must admit now this was the thing I HAD TO DO.

knackered
05-05-2006, 05:54 PM
My computer's always on, and I use firefox tabbed browsing pathologically, so as far as you're concerned I'm eternally reading your posts, jide.
It must be something about the translation, but your diagnosis of my 'condition' is always beautiful, jide. It almost makes me wish I were French....almost.
Rodrix, the code you posted is litered with basic programming mistakes, therefore it's safe to assume you're a beginner. If you're not a beginner, then it's a reasonable suggestion that you give up programming, as you're going to drive any people you work with insane debugging your crazy stuff. No anger, just logic.

jide
05-05-2006, 05:57 PM
Oh sh** a fanatic...

knackered
05-05-2006, 05:58 PM
Ok jide, just read your reply. I'm sorry for winding you up, really don't take me so seriously. I'm trying, really trying to be funny in a thread that I believe has no other value...at your expense, admittedly - but nevertheless, humour is my motive, not malice.

jide
05-05-2006, 06:40 PM
Humor ?? Is that really of humor ?????? Try to put yourself at the place of the one who wrote the post instead of telling so absurd and ineptitude things. You really need a mentor that makes you understand the real values of life knackered and I wonder if I would accept to be this one.

Not everybody is a "PRESIDENT OF A COMPANY AS YOU ARE" knackered. I already told you years ago that times are changing. We're no more in the euphoria of the year 2000 you know.
But what I know is that I won't like to be at your place now, I would prefer selling fruits in the week-end market instead...

And please, as for the other post, don't push me, please, please, please.

def
05-05-2006, 08:27 PM
Originally posted by knackered:
... really don't take me so seriously.
... humour is my motive, not malice. Why are there so many people around who think they can excuse their arrogant behavior and offending talk by feinting humor?
It's a cheap trick only these people think will work to excuse their brainless comments after they realise they've been acting a jerk!

knackered
05-06-2006, 04:17 AM
def, it's just humour you don't understand. I take issue with your assertion that my comments were brainless, they were funny and rooted in truth.

def
05-06-2006, 05:22 AM
Nobody seems to understand your socalled "humour".

What does that make you:
Genius? Misunderstood? Lunatic? Dumb? Scatterbrained?

Take your pick...

zed
05-06-2006, 06:00 AM
comon u cant say that to knackered hes our mascot
whilst his comments are internded as homour (which is as plain as day to me that they are), if u feel not then its either cause u have either
A/ different mentality
B/ different culture
whilst they are intended as humour, admitted sometimes theyre not funny but seeing above 'pened to poor old Gordon Kay' me - who?? search search, canadian actor no, hmmm 5 minutes later, hes the fella from 'ello 'ello, **** truly the low point of english humour ( even crap like it aint half hot mum, open all hours, r u being served, i rate higher ) knackered i must say im disapointed

i feel sorry for poor old relic.

knackered
05-06-2006, 06:03 AM
you speak for everyone now, def?
what does that make you?
say something funny, def - so I can understand what makes you laugh. No 'buffy the vampire' references though, I don't watch it so they'd be lost on me.

As far as jide is concerned, if you actually read this thread from the beginning you'll see jide initiated this exchange of insults. This is the only reason I took the michael out of him. I will happily do the same to you def if you persist in attacking me.

knackered
05-06-2006, 06:09 AM
yes zed, sorry about the gordon kay reference, it was misjudged. It's the fact that gordon was the star of so many of england's most appauling comedy exports that made his accident in the '87 storm so funny.
But if you're not a british person over, say, 25 then it would mean nothing to you.
Thank you for defending me, you're a man of good taste.
Now, it's a rare sunny day in england today, so I'm going to shut down my computer and go out to play.

zed
05-06-2006, 06:14 AM
didnt ello ello descenfd into happy days (bottom of the barrel) there at the end , where the actorr/esse would come onscreen + the tv audience would appluaud until the first adbreak.
not knocking the fonz though hes cool.
if i ever get had up on a charge for the court + thye majistrate asks me how do i plead i intend to
stick both my thumbs up and say a big long aye!!!
saddam should use this tried and tested technique.

magistra - "what do u say about gassing 1000s of you countrymen in basra'
saddam - (thumbs up) ayyyyyye

how can he lose!!,
hopefully the mods wont close this insightful topic

def
05-06-2006, 10:48 AM
Originally posted by knackered:
I will happily do the same to you def if you persist in attacking me. I couldn't care less...

To Rodix: If you are still reading this, please excuse everybody going of topic, let us know what caused the errors. To me it seems the error lies somewhere else but the OpenGL code. Good luck!

Rodrix
05-07-2006, 07:55 AM
Thanks Def.
It's a shame that there are people like Knackered in this forum. I feel sorry for him. Let us all hope he gets some proffesional psychological help.

Anyways, the error was not caused by the C++ code. Jide was right (thanks! and thanks Relic too!), the problem was with GL_MAX_ELEMENTS_VERTICES. There is actually a limit of the number of vertices that glDrawElements can handle.
The recommended for my GPU is GL_MAX_ELEMENTS_VERTICES = 4096. Using more than that number reduces performance and using much more than that number causes the 0xC0000005: Access Violation error.

knackered
05-07-2006, 08:39 AM
Originally posted by Rodrix:
I feel sorry for him.Could have fooled me. Anyway, I don't want pity just because I'm confined to wheel chair.


Originally posted by Rodrix:
The recommended for my GPU is GL_MAX_ELEMENTS_VERTICES = 4096. Using more than that number reduces performance and using much more than that number causes the 0xC0000005: Access Violation error. No, it should never cause an access violation. If that's what you've discovered, then you should report it as a driver bug....but, after you've thoroughly checked through your own code for other problems such as the ones Relic picked up.

knackered
05-07-2006, 08:50 AM
zed, I honestly think there's a sitcom in the whole saddam imprisonment thing. It's just a matter of time.

Relic
05-08-2006, 09:18 AM
Originally posted by zed:
i feel sorry for poor old relic. Me? That's all relative. :p
(It's not my problem, I was only the first answering.)

Rodrix, knackered is right about the GL_MAX_ELEMENTS_VERTICES. That was introduced for glDrawRangeElements and must not crash if exceeded.
The values you got are way off what current hardware can do, it should be more like 64k or 1M indices. I wouldn't use 4k batches if I can have bigger ones.
You might want to report that to NVIDIA if you're really sure it's not your code.
Strip your code down to the absolute minimum number of OpenGL calls required to reproduce the problem, sometimes that clears things up.
Also try if the same thing runs using a pixelformat from Microsoft's GDI Generic OpenGL implementation. If that crashes as well, it's probably not a driver issue.

Madoc
05-08-2006, 11:37 AM
Though I don't usually personally endulge in quite so ruthless sarcasm concerning people I don't know, I find Knackered absolutely hilarious and I recount his posts to my friends (and my mum) who also appreciate them immensely. There's really no reason to take offence, or to become offensive (particularly in such a tasteless manner). Based on the evidence provided here, I would vouch for Knackered's mental health, I would describe it as vigorous. He can certainly make valuable contributions to this forum (beyond the much appreciated humour) and I think you ought be grateful that he does.

Rodrix
05-15-2006, 11:53 AM
Guys... I FOUND THE BUG! :)
This was the hardest bug I ever had to find.

I fixed the problem by calling

glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);

before glDrawArrays.

For some unknown reason, Normal and Texture Arrays were enabled, and the program tried to read them and called the Access Violation Error when it couldn't read any more. Apparently both arrays were defined to some address in memory, that could be read up to 16623 number of items (using more than 16623 particles it crashed).

However, the sample code I am using, did not contain a single glEnableClientState call; therefore, how is it possible that I had to Disable some client states in order to make the program work?

The program indeed contained a glDrawArrays with an InterleavedArray call using GL_T2F_V3F and an array of size 16. Yet, there was no explicit glEnableClientState call for the texture and vertex arrays. Therefore I commented out the code, and still the Access Violation error ocurred. :confused:

Therefore the only solution was to add glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
before my glDrawArray call.

I know now how to fix the bug; however, I don't understand WHY it happened, taking the fact there was no explicit glEnableClientState for Texture and Normals. Are this states on by default? (or are they implicitly called in any other standard call?)

Thanks so much everyone.

P.S: Knackered, I don't mean to have any personal problem with you. However, please don't use that type of 'humour' with me as it deeply offends me. Thanks.

nickn
05-15-2006, 12:17 PM
The call to glInterleavedArrarys enables those states (see the redbook).

Rodrix
05-15-2006, 12:53 PM
...yeah I supposed that...
But how come when I comment that line they are still enabled?
Do the states persists even after you reinitalize OpenGL?
Thanks!

tamlin
05-15-2006, 02:51 PM
Directly after you initialize OpenGL, check what glIsEnabled(GL_NORMAL_ARRAY) and GL_TEXTURE_COORD_ARRAY returns. They should be disabled (as should all arrays) after initialization of OpenGL.

Just spread a thin layer of such checks over your code and bake at working temperature until *crack* is heard from an exploding small bug. Remove it with a small surgical instrument (a +45 battleaxe is know to work), and enjoy the pretty meshes.

Rodrix
05-15-2006, 02:58 PM
...should I add sugar? :-)

Rodrix
05-15-2006, 03:49 PM
Oh noooo guys!!!!!!!!!

All this time spent for nothing!
There is no significant increment in performance in my programs using Vertex Arrays!!!!!
I get almost the same FPS count (or ms) than in Immediate mode :eek: (...

Should I go for VBO?
My program is meant for people that are not computer fans, and probably use default GPU cards that come with the computer. Is the VBO extension common in those types of GPU's?

Any comments, advices, or recommendations are welcome!
Thanks!
Rod

Humus
05-15-2006, 06:25 PM
If you did not see any improvement going from immediate mode to vertex arrays you're unlikely to see an improvement going to VBOs. Vertex processing is clearly not your bottleneck. I'd guess you're either fillrate limited or CPU limited. Both are pretty common with particle systems.

Rodrix
05-15-2006, 07:21 PM
Well that is -in a particular way- good news since I really didn't want to go with VBO, since Vertex Arrays was really a pain!

Could you explain what is fillrate limited?
Thanks in advance!

P.S: Isn't VBO and extension that processes the vertexes and loads in the buffer of the card, so that it is so much faster than Vertex Arrays. Nehe says it multiplies FPS x3 . How do you know in such advance that it wont improve performance? (I want to learn :-)

V-man
05-15-2006, 10:48 PM
Using an interleaved vertex format is good, but you should also use format that the hw likes. Most (if not all) want ubyte color, not float.
Use
glColorPointer(4, GL_UNSIGNED_BYTE, ..., ...);

Even if you don't get an improvement, it's best to do so. VBOs are for putting the data in AGP or VRAM and in the worst case, system RAM. If you have PCIEx, system RAM or VRAM. The driver decides. Read the wiki.

VBO is core since 1.5
It's ridiculous to continue using glVertex.
There is a desire to kick out vertex arrays from drivers.

http://www.gamedev.net/columns/events/gdc2006/article.asp?id=233


What features to consider layering:

* Immediate mode
* Current vertex state
* Non-VBO vertex arrays
* Vertex array enables (with shaders, this should be automatic)
* glArrayElement()
* glInterleavedArrays()
* glRect()
* Display lists

Humus
05-16-2006, 12:20 AM
Originally posted by Rodrix:
Could you explain what is fillrate limited?Basically you're limited by the amount of pixels rendered, rather than the amount of vertices.


Originally posted by Rodrix:
P.S: Isn't VBO and extension that processes the vertexes and loads in the buffer of the card, so that it is so much faster than Vertex Arrays. Nehe says it multiplies FPS x3 . How do you know in such advance that it wont improve performance? (I want to learn :-) Because it's not your bottleneck, as proven already by not seeing a performance increase by going from immediate mode to vertex arrays. To illustrate, assume you have to build a hundred houses and eat one cookie. If you can eat the cookie twice as fast it won't make you complete the total task noticably faster because building those houses is what really bogs you down, not the cookie eating. :)
So while VBOs may improve the speed at which the GPU can process vertices it won't speed you up when the biggest chunk of work for the GPU is to fill all those pixels.

Humus
05-16-2006, 12:23 AM
Originally posted by V-man:
Using an interleaved vertex format is good, but you should also use format that the hw likes. Most (if not all) want ubyte color, not float.
Use
glColorPointer(4, GL_UNSIGNED_BYTE, ..., ...);
Well, floats for colors are fine, but ubytes may be slightly faster because they are smaller.

zed
05-16-2006, 01:00 AM
Nehe says it multiplies FPS x3 . How do you know in such advance that it wont improve performance? (I want to learn :-)perhaps 3x in an optimal situation eg benchmark (but even then that sounds optimistic)
FWIW in my app (which has lots of geometry, higher than doom3/UT2004 etc)
going from immediate -> VA = ~300% increase
going from VA -> VBO = ~5% increase

Zulfiqar Malik
05-16-2006, 01:29 AM
Originally posted by Humus

Well, floats for colors are fine, but ubytes may be slightly faster because they are smaller.

ubytes are much much slower on NV3x, NV4x, R3xx hardware. I don't know whether that improved on R4xx but they certainly crippled performance on older hardware. I stress tested it out with a few hundred million vertices/s and i don't know for sure whether a smaller number would give better performance. It might, but that won't be very scalable.

Jackis
05-16-2006, 07:05 AM
Zulfiqar Malik

During my own tests I've found, that ubytes, passed as COLOR, are fast, about 30% faster than pure float, and with 3 floats and 4 ubytes vertex got 16-bytes sized, that is little bit better, then pure float 28-bytes vertex (for example, for stars rendering).

There is weird issue with ubytes as normal, texcoord and so on, but as color and vertex attributes (normalized ubytes) - it is a miracle for saving memory and bandwidth.

Thanks,
Jackis

Zulfiqar Malik
05-16-2006, 08:10 AM
Hmmmm ... what hardware did you use? I have tested ubytes with vertex position, normal and texcoord. My bad at not realizing that Humus was indeed talking about color pointers. I have not actually tested them with color, but why would they be any better? Does the driver handle them differently? If yes, then why?
I remember that while i was developing a terrain rendering system, my data (vertex position and normal) both we small enough to fit byte and ubyte respectively. But doing so gave such pathetic performance that i had to switch to shorts for position and floats for normals. But i did eventually end up saving memory because i packed more data in floats (in the mantissa and exponent) than the normal alone and that allowed me to geomorph terrain vertices in the vertex shader.
It also depends on the mangnitude of the test. The throughput of my terrain rendering algorithm was around 55MTris/s (using triangle lists that equates to around 165MVerts/s) on GFX 5700Ultra, textured (single) with per-vertex lighting from one directional light.

Zulfiqar Malik
05-16-2006, 08:12 AM
Ooops ... i meant 165MIndices/s. Vertices were reused and i can't remember the exact count, but it was fairly large to be called a valid stess test.

V-man
05-16-2006, 08:33 AM
[quote]Originally posted by Zulfiqar Malik:
[B]
uint mycolor;

glColor4ubv(&amp;mycolor);//Need to cast

RenderGeomWithVA or VBO (3_vertex + 2_texcoord)Edit:
Also, if you are benchmarking you need to align you data. I think it was multiples of 32 bytes per vertex for ATI. I think it's also fine for NV but not sure what NV officially prefers.

tamlin
05-16-2006, 10:17 AM
Rodix, before giving up on the arrays, you may want to try to lock them before use (glLockArraysEXT). My own tests, though with different access pattern and data, has in cases displayed over 50% speed improvement with compiled over plain vertex arrays.

Humus
05-16-2006, 04:43 PM
Originally posted by Jackis:
There is weird issue with ubytes as normal, texcoord and so on, but as color and vertex attributes (normalized ubytes) - it is a miracle for saving memory and bandwidth.What's the issue? One issue I can see is that glTexCoordPointer() doesn't even accept bytes or ubytes. glNormalPointer() should accept bytes though (but not ubytes).

Humus
05-16-2006, 04:52 PM
Originally posted by Zulfiqar Malik:
Hmmmm ... what hardware did you use? I have tested ubytes with vertex position, normal and texcoord. My bad at not realizing that Humus was indeed talking about color pointers. I have not actually tested them with color, but why would they be any better? Does the driver handle them differently? If yes, then why?
I remember that while i was developing a terrain rendering system, my data (vertex position and normal) both we small enough to fit byte and ubyte respectively. But doing so gave such pathetic performance that i had to switch to shorts for position and floats for normals.For the programmable pipeline all these semantics like "color" and "normal" has little significance. You could just use glVertexAttribPointer() and pass all your data that way, which I would recommend since this function takes all valid types, whereas the others have various restrictions on what types they accept inherited from the fixed function.

If you're getting really low performance with ubytes, make sure you're using 4 ubytes and not 3.

Humus
05-16-2006, 05:09 PM
Originally posted by V-man:
Really???
Isn't color and secondary color prefered as ubyte?
I thought these were "native" and anything else was not. The only problem (as I thought) was that we had to pass in RGBA instead of the MS way of BGRAIn the past that might have been the case, but in this age of shaders everything that's native to one particular attribute can equally much be native to others. So if you can do floats for normals, you can certainly do it for colors too. There's no difference on the hardware level on loading a normal or a color into the shader. These semantics mean little for programmable parts and are only relevant on the API level.

Zulfiqar Malik
05-17-2006, 12:40 AM
Originally posted by Humus

For the programmable pipeline all these semantics like "color" and "normal" has little significance.

Thats what i thought. As for 4 ubytes, i don't quite remember whether i tried using 4 because alignment must have been on my mind :) back then. I don't quite remember though. I will give it a shot soon.
But, keeping "personal tests" aside, can you tell me with certainty whether ubytes/bytes (aligned or non-aligned) are just as fast as, say shorts and floats? I am not just talking about R5xx, but R4xx and R3xx (minimum).
Thanks, its always good to get "first hand" information :) .

Rodrix
05-17-2006, 01:15 AM
This is becoming an intresting discussion.
I am now testing the unsigned bytes for the Color Arrays. It took me more than an hour to change all my code (color fading, alpha fading, many features for my particles, etc), but I want to know I am in the right track before I move on.


Humus said: If you're getting really low performance with ubytes, make sure you're using 4 ubytes and not 3. Is this what you meant:


Rodix, before giving up on the arrays, you may want to try to lock them before use (glLockArraysEXT). My own tests, though with different access pattern and data, has in cases displayed over 50% speed improvement with compiled over plain vertex arrays. Tamlin, I didn't know that. I am checking on that too! Will give feedback as soon as I get it running. :) Thanks!

Humus
05-17-2006, 02:09 AM
Originally posted by Zulfiqar Malik:
But, keeping "personal tests" aside, can you tell me with certainty whether ubytes/bytes (aligned or non-aligned) are just as fast as, say shorts and floats? I am not just talking about R5xx, but R4xx and R3xx (minimum).
Thanks, its always good to get "first hand" information :) . It should be at least as fast, as long as it's properly aligned and you use 4 components. Not too long ago I changed the font drawing in the ATI SDK framework to use ubytes for position and it's just as fast on my laptop (mobility 9700) as it was when I used floats.

Humus
05-17-2006, 02:11 AM
Originally posted by Rodrix:
Is this what you meant:Yes

Dez
05-17-2006, 03:36 AM
Zulfiqar Malik:
Hmmmm ... what hardware did you use? I have tested ubytes with vertex position, normal...
Table 2.4 pp. 25 of the OpenGL 2.0 Specification:


Humus:
For the programmable pipeline all these semantics like "color" and "normal" has little significance.
In practice it can be of significance, though. I remember that I used to get terrible performance, when using bytes/ubytes with generic attribute 0 and it was on some rather recent hardware like NV40 or so. Also I think that I have read somewhere that some of the attributes can be interpolated with lower precision or even don't have some of the components (like secondary color's alpha). So I guess that in such cases internal representation does matter and specifying the attribute with different data type can have potential performance hit.

Jackis
05-17-2006, 06:51 AM
Hello, sorry for such a late revenue and for my English ))

I've experimented with enumerable data types only on nVidia hardware.
As Humus said, specification limitations doesn't allow us to use conventional vertex arrays with integral types in a simple manner, but using vertex attributes removes this restriction.

Let me tell some words not about unsigned bytes, because this is simple, but about unsigned shorts, which is an alternative for GLHalfNV data type for elder nVidia hardware. This is my conversation on nVidia dev forum, I hope, posting this here is legal )))

=== Jackis
Hello!
Everybody knows, that OpenGL allows us to bind enumerable integer per-vertex attribs and it will treat it as floats in vertex shader.
That is very useful for packing, for example, everybody stores per-vertex colors as unsigned bytes, somebody even packs normals in bytes.
There is such a parameter in the description of glVertexAttribPointer(), called 'normalize'. It handles treatment of integer: should it be mapped to real [0..1] interval, or leave it as is.
But when I want to bind short integer as an attribute, perfomance infinitely drops down. So, it is clear, that nVidia driver does this remapping by it's hands using CPU power, not on the GPU.
O'key, I said, let's discover this. And I actually got, that GL_UNSIGNED_SHORT with normalization on/off is done in sofware, GL_SIGNED_SHORT with normalization on is also done in software, and ONLY GL_SIGNED_SHORT with normalization off is done on the GPU without any drop-downs!
So, anybody has an advice, can I believe in the happy future? I don't think, that is is so hard to implement it, because UNSIGNED_BYTEs are mapped fast.
Thanks in advance!
---

=== Simon Green
I don't think we support shorts as a vertex type natively in hardware. Use bytes, floats or half-floats.
Next generation hardware may be more flexible in this regard, but I wouldn't count on it.
---

=== Jackis
Thanks, Simon!!!
Actually, you do support shorts )) But only signed and not normalizaed, so I have to normalize them in the shader. Longs are not supported, neither signed nor unsigned, neither normalized nor unnormalized )))
---

Rodrix
05-20-2006, 02:32 AM
Hey guys! Look what I found:
An article about Quake3 Engine that talks about what we were discussing. Quake 3d Engine Optimization (http://www.gamers.org/dEngine/quake3/johnc_glopt.html)

Color Arrays are passed as unsigned bytes!:

GL_VERTEX_ARRAY is always enabled, and each vertex will bet four floats. The fourth float is just for padding purposes so that each vertex will exactly fill an aligned 16 byte block suitable for SIMD optimizations. Is that true?! Do you recommend passing 4*sizeof(GLfloat) instead of 3*sizeof(GLfloat) when passing 3vertex?
Why should this speed up!? I really don't understand that explanation...
Thanks so much in advance!
Cheers
Rod

jide
05-20-2006, 04:46 AM
CVA are no more used in recent hardware.

It could speed things up depending on the graphic card. Memory alignment is the main word here. If you pass 4 32 bits values, then each value will be aligned in whether the memory segment is 32/64/128 bits. When you pass 3 values, each new vertex won't be align to a new segment. That's all.

Komat
05-20-2006, 05:12 AM
GL_VERTEX_ARRAY is always enabled, and each vertex will bet four floats. The fourth float is just for padding purposes so that each vertex will exactly fill an aligned 16 byte block suitable for SIMD optimizations. Is that true?! Do you recommend passing 4*sizeof(GLfloat) instead of 3*sizeof(GLfloat) when passing 3vertex?
Why should this speed up!? I really don't understand that explanation...
Rod At time when the Quake3 engine was new, most of the cards did not have HW acceleration of vertex transforms so they were calculated on CPU. Ideally by use of SSE or similiar AMD instructions. Many float SSE instructions are designed to operate on four floats at once and there is performance penalty for memory access if four floats are read/written from/to memory if access is not at address aligned to 16bytes. This is what John Carmack is talking about.

V-man
05-20-2006, 11:19 AM
That's true. Q3 came out at 2000 I think.
CVA is certainly archaic. Use VBO if you want your stuff to be in VRAM. Tell it that your VBO is static.
Additinally, glDrawRangeElements is prefered over glDrawElements.
The vertex format can be xyz, but your entire vertex should be multiples of 32 bytes.

I have a GDC paper
GDC2004_PracticalPerformanceAnalysis.pdf and some other pdfs that mention this number.

Rodrix
05-22-2006, 08:01 PM
Originally posted by Humus:

Originally posted by Rodrix:
[qb]Could you explain what is fillrate limited?Basically you're limited by the amount of pixels rendered, rather than the amount of vertices.

Humus thanks for your replys,
The cookie metaphor was great ;)

I implemented the various suggested improvements (Unsigned byte for color, and gllockarrays) and now I am working on the fill rate limitation, that appears to be the main bottleneck. What are any advices to improve this?:
-One I guess is to reduce the texture loading used in all my program, which I am doing now.

Any other suggestions?

Thanks so much!
CHeers,
Rod

tamlin
05-23-2006, 01:24 AM
V-man wrote:
The vertex format can be xyz, but your entire vertex should be multiples of 32 bytes.Perhaps this is just a slight misunderstanding, but that's not what the document says.

It says that if you're shuffling data over AGP, use "multiples of 32 byte sized vertices".

The way I read it, is that if you're on AGP (and now we're venturing way outside OpenGL and into specific hardware optimizations for bus-transactions - for a bus that's being phased out) you should submit your data in a form allowing to maximize the throughput of that particular bus. I.e. if you have only xyz in your vertices, it means you should submit them in batches of multiples of 8 vertices (8*12 = 96 bytes = 3 bus transactions, 32 bytes each).

If supporting older (e.g Radeon 7000, TNT2) or less specialized (such as Intel integrated 915, 925 or similar) h/w, that performs TnL using the CPU, submitting data well aligned for the CPU will affect that/those stage(s) of the pipeline, but besides CPU and/or system RAM specific behaviours, I think todays and yesterdays GPU's (back to Radeon 9200 and GeForce... 1?) handles just about any 32-bit aligned data the same (someone with insight here; feel free to chime in if this assumption is wrong).

What can matter is the alignment of the starting address (in system memory) of a submitted batch of data. Using immediate calls (glVertex & co) the driver should handle this. Mapped buffers should already be page aligned (and on Windows, due to the way its memory manager works, I'm almost 100% sure you'll even get them 64KB-aligned).

I think that leaves only the "upload" style functions (e.g. BufferData), where source data could be mis-aligned from a cache-line, bus transfer, or even DMA perspective.

By that, I think I've left "off-topic" in the dust for this thread, so I'll stop here. Just to round off, I'm not saying alignment isn't still an issue, but something tells me it often isn't the AGP memory transaction requirements that is the issue, anymore. :)