-
Junior Member
Regular Contributor
Accessing a texture2DArray, speed it up?
I have 16 textures in a texture array, and would like to know if there anyway to access the thing once and then get all 16 textures, so I can eliminate 16 calls to texture2DArray? As of now this is killing my FPS.
Thanks
-
Advanced Member
Frequent Contributor
Re: Accessing a texture2DArray, speed it up?
If your hardware supports packing/unpacking you could pack your 8bit texture date into 32 bit floating point textures e.g. you can pack sixteen 8bit intensity values in one 32bit floating point RGBA texture.
-
Junior Member
Regular Contributor
Re: Accessing a texture2DArray, speed it up?
I'd be curious to know how much this helps, since the same amount of bandwidth would be utilized. Would the instruction count savings speed things up? I suppose it could help the cache...
-
Advanced Member
Frequent Contributor
Re: Accessing a texture2DArray, speed it up?
I recently used something similar in a paper of mine and like you already mentioned, texture caching improved performance a lot. Additionally, in case the texture is generated on the graphics card, you can write the results to a single frame buffer attachment rather than using multiple render targets which also improves performance.
-
Junior Member
Regular Contributor
Re: Accessing a texture2DArray, speed it up?
I see, good to know. It'd be cool to see packing functions exposed to GLSL through an extension, but for now I suppose one can use bitshifting operations when dealing with non-floats anyway.
-
Junior Member
Regular Contributor
Re: Accessing a texture2DArray, speed it up?
Ok, I not 100% sure on the packing idea. Are you saying take all 16 8bit RGB textures and just dump them into one RGB floating point texture? Then what do some math on the value since it will be greater then 1.0 to get to each texture? e.g. Red Channel 0-255, 256-511, 512-767? then in the shader just take each range out to get each texture?
-
Advanced Member
Frequent Contributor
Re: Accessing a texture2DArray, speed it up?
Well, first you need to check out if your hardware supports packing/unpacking.
A common pixel location in the 16 8bit RGB textures takes up 16*3 = 48 bytes of data. One 32bit floating point RGBA texture has storage for 16 bytes per pixel. This means that you could pack the data for a single pixel in the 16-D array into 3 pixels of floating point textures. On the one hand, this way the number of texture accesses will be reduced from 16 to 3. On the other hand, you will have to insert some unpacking instructions after accessing the texture to get the 8bit values back.
-
Junior Member
Regular Contributor
Re: Accessing a texture2DArray, speed it up?
I have a GF8800 GTS. So I am guessing it supports it? So what do I need to look up? Is there some extension?
-
Advanced Member
Frequent Contributor
Re: Accessing a texture2DArray, speed it up?
A GF8800 GTS definitely supports packing/unpacking. The only problem is that these functions are currently not exposed in GLSL but you can access them using assembly as described here (search for 'pack') or using Cg.
-
Senior Member
OpenGL Pro
Re: Accessing a texture2DArray, speed it up?
You can access them through GLSL:
vec4 arg = unpack_4ubyte(float value); // arg is always 0...1
float result = pack_4ubyte(vec4 arg);
To see the names of the other functions, I had to open cgc.exe in a hex editor and search for "unpack". (really, I couldn't find them in any documentation or online otherwise) :
unpack_4ubytegamma
pack_4ubytegamma
unpack_4ubyte
pack_4ubyte
unpack_4byte
pack_4byte
unpack_2ushort
pack_2ushort
unpack_2half
pack_2half
They are nifty instructions that nVidia hardware supports. They are very similar to such C code:
float value=1001.21;
char* ptr1 = (char*)&value;
float result = *(float*)ptr1;
They are described in http://www.nvidia.com/dev_content/nv...nt_program.txt
The instructions are: UP2H, UP2US, UP4B, UP4UB, PK2H, PK2US, PK4B, PK4UB
By packing all 16 8-bit textures into one float32 RGBA texture and using the above instructions to unpack, AND your shader needs to get _unfiltered_ color-values from the 16 textures at the same coordinates - you will speed-up the shader. Not because of texture-cache, but because of GDDR3 latency, I bet. (switching GDDR addresses is a relatively slow operation, that is done when starting to read from another texture, or at a very different coordinate).
(ATi hardware will never support these nifty instructions, afaik)
I use these instructions/"functions" when drawing to multiple render targets, or simulating MRT (whichever case performs faster, again due to GDDR3 latency). This also allows me to mix formats of render-targets i.e use a FBO's MRT#1 as a RGBA8, MRT#2 as a R32f, MRT#3 as RG16half_float.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules