dynamic array?

detzel · January 31, 2001, 6:30pm

I am making an ASE loader for my opengl program but I need to make a dynamic two dimensional array and then tell it the size when I know how many vertices there are. Anyone have any suggestions on how to go about doing this?

imported_john · January 31, 2001, 6:42pm

Hello,

two ways to do this: 1) flatten the matrix to a vector, or 2) use a pointer to a pointer.

ex:

const unsigned int w=123, h=456; /* whatever */
GLubyte *ptr;

/* alloc the vector (which is the size of the matrix) */

assert(ptr=(GLubyte*)malloc(sizeof(GLubyte)wh));

/* zzap element (x, y) with something */

ptr[x+y*w]=0;

see how it works? the element on the first column, second row is the 124th element becauwe we’re packing our matrix in row major (or column major, i forget which way the terminology goes=) like so:

ABCDE
FGHIJ
KLMNO

is stored like: ABCDEFGHIJKLMNO

anyway. the second method is slightly more involved:

const unsigned int w=123, h=456;
GLubyte **ptr;

/* alloc the rows */

ptr=(Glubyte**)malloc(sizeof(GLubyte*)*h);

/* alloc the set of columns for each row */

for(int t=0; t<h; t++)
ptr[t]=(GLubyte*)malloc(sizeof(Glubyte)*w);

hope this helps.

cheers,
John

Eric · January 31, 2001, 10:47pm

I developed a class that wraps dynamic arrays using the first method john exposed.

This is probably the fastest method when you allocate/deallocate memory but I suspect that the second one is faster when accessing elements in the array (because the first one implies mults and adds while the second is a ‘direct’ access).

Best regards.

Eric

system · February 1, 2001, 9:33am

On x86, foo[ab+c] is faster than bar[a][b]
when foo is a type and bar is a type**.
This is because the x86 instruction set has
an addressing mode explicitly for [a*b+c]
and compilers are trained to recognize it
and use it (and thus modern silicon optimizes
for it). The best results are had when “b”
is a power of 2.

Indeed, an x86 assembly trick is to use the
LEA instruction to cram a multiply and two
adds into one cycle with minimal pipeline
delay.

And, rather than implement your own classes,
first take a look at the standard library
(such as <vector> ) and only decide to roll
your own if the standard is provably not up
to the task.

Serge_K · February 1, 2001, 10:12pm

Originally posted by bgl:
On x86, foo[ab+c] is faster than bar[a][ b ]
when foo is a type and bar is a type**.
This is because the x86 instruction set has
an addressing mode explicitly for [a*b+c]
and compilers are trained to recognize it
and use it (and thus modern silicon optimizes
for it). The best results are had when “b”
is a power of 2.

Indeed, an x86 assembly trick is to use the
LEA instruction to cram a multiply and two
adds into one cycle with minimal pipeline
delay.

Not quite correct…

> foo[a*b+c] is faster than bar[a][ b ]
In both cases performance should be the same.

> This is because the x86 instruction set has an addressing mode explicitly for [a*b+c]
Oh, really?

Well, x86 can calculate address in this way:
b + Reg1 + (1/2/4/8)*Reg2[/b]
As you can see, this addressing mode is optimized for linear arrays with 1,2,4,8 byte per item - Reg2 contains index.
LEA instruction has the same fuctionality.

[This message has been edited by Serge K (edited 02-02-2001).]

Eric · February 1, 2001, 10:27pm

I must say I haven’t looked at x86 Assembler for a while… On the other hand, I am very used to Motorola 68030 and Motorola DSP 56001… On the former one, you could use these premultipliers on a data register and they were limited to 1,2,4,8 as well (then you would use an indirect access with this + an address register…).

Anyway, what bgl said would be valid only for 2-dimensional arrays… My classes handle n-dimensional ones (although I have never been further than n=3 !).

Regards.

Eric

Siigron · February 2, 2001, 10:14am

you know, compilers can optimize things like that…

system · February 2, 2001, 8:29pm

Yes, the values for B must be 1, 2, 4 or 8.
Sorry if I confused anyone.

> > foo[a*b+c] is faster than bar[a][ b ]

> In both cases performance should be the same.

I specified that foo was a type*, whereas
bar was a type**. In the first case, one
index is calculated, and one pointer dereference
is made. In the second, two indexes are calculated
(although each calculation there MAY be simpler)
and two pointer dereferences are made
(which certainly makes the method slower).

Only if bar is typedef-ed to a multidimensional
array will it be as fast as the foo, and with
typedefs you can’t have dynamically-sized
arrays. This is one of the “issues” some
scientific number crunching people have with
C, having to calculate their indexes
themselves.