PDA

View Full Version : Dynamic AND compressed textures



Ysaneya
11-14-2005, 12:28 AM
I haven't tried it yet, but i'd like to gather some opinions on this:

I'm generating a set of textures on the GPU. These are generated once, and then reused for thousands of frames (so it's not dynamically updated every frame). Yesterday i discovered that i was using more than 500 Mb of textures.. so i'd better compress them.

Now i'm wondering, we all know that decompression is hardware-accelerated.. but is compression accelerated too ?

Typically, i'll render my texture in the color buffer or in a PBuffer, and use glCopyTexSubImage2D to a texture created with a compressed internal format.

What i'd like to know is, will i loose performance by doing that ? Will the driver have to read back the color buffer, compress the texture on the CPU, and upload the result back... ?

Y.

michael.bauer
11-14-2005, 02:02 AM
Hi Ysaneya,

the driver will most likely do a readback followed by compression (in the driver) and upload. You could do a test to find out, please let us know about your results.

Michael

Overmind
11-14-2005, 02:35 AM
But I think if you really do this only once every 1000 frames or so, this won't have a strong impact on performance...

Ysaneya
11-14-2005, 02:41 AM
Not that simple :)

Once generated, a texture isn't modified for a thousand of frames, true. But there is a lot of textures in the system (many hundreds), which means the probability to generate a texture for a given frame is not that low.

If the driver has to read back the data, it will:
1. Kill the parallelism between the CPU/GPU for that frame
2. Require more bandwidth for that frame (GPU->CPU transfer, then CPU->GPU)
3. Require a lot of CPU power for that frame (textures are generally 512^2).

Which means i would not be surprised if generating a single texture would take between 50 to 100 milliseconds. That's not acceptable to keep a smooth and constant framerate.

Overmind
11-14-2005, 03:44 AM
When the hardware can't compress it for you there's nothing you can do about the bandwith. But perhaps you can improve parallelism by manually up-/downloading with PBOs.

That's something you'll have to benchmark.

michael.bauer
11-14-2005, 04:46 AM
Ysaneya,

I forgot the simplest way: you could reduce the resolution instead of compressing. E.g. reducing the resolution by a factor of 2 needs only a quarter of memory. And the resampling can be done efficiently by the gpu (e.g. just reduce to viewport you render to). If you do it right you won't see a great difference, but it depends on your application, of course. Please let us know your results.

Michael

pocketmoon
11-14-2005, 10:24 AM
Not what you want but related...

I did some tests in 2001 to update small sections of a large compressed bitmap using glGetCompressedTexImageARB and glCompressedTexSubImage2DARB.

I gave up because of buggy drivers.

I just reran and it works!! Hurray!

Ysaneya
11-14-2005, 12:40 PM
Results of the first tests..

It works, but it's not very fast, which seems to confirm the read-back-then-compress-on-CPU scenario.

The framerate goes from 75 fps to 13 fps when i compress one 256^2 texture per frame, and to 3 fps when i use a 512^2.

If no texture is updated at all, the framerate is around 85 fps. Without texture compression and 512^2, i get slowdowns due to the half-gigabyte amount of textures being paged from/to video memory. With compressed textures, the slowdowns disappear and i get a solid and constant 85 fps, which means decompression is working well (too bad compression is so slow, pretty much unusable for me then).

Y.

pocketmoon
11-14-2005, 01:40 PM
That's some slowdown. What's you vid card ?

I've tweaked my test to maintain a local 256x256x3 byte array which I use to create two compressed DXT1 textures, e.g.


glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_S3TC_DXT1_EXT,...Each frame I update a chunk of this, re-create compressed texture1, read back the raw compressed data and update compressed texture 2.

The reason for the round the houses is that originally I was testing the update of small chunks, i.e. comp text1 was just a 4x4 which I then texsubimaged into the large comp. texture.

I get >450fps updating the whole 256x256 compressed texture with a 6800GT...


hmm. Thats with a solid block of colour which is quick to encode. If I put more varied data in the re-compress step I get about 200fps.





glBindTexture(GL_TEXTURE_2D, texarray[0]);

glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_S3TC_DXT1_EXT , d, d, 0, GL_RGB, GL_UNSIGNED_BYTE, tdata);

//read it back in compresed format
if (compsize == 0)
{
glGetTexLevelParameteriv(GL_TEXTURE_2D,0,GL_TEXTUR E_IMAGE_SIZE_ARB,&compsize);
printf("Comp size %d\n",compsize);
compdata = (unsigned char*)malloc compsize*sizeof(unsigned char));
}

glGetCompressedTexImageARB (GL_TEXTURE_2D, 0, compdata);

glBindTexture(GL_TEXTURE_2D, texarray[1]);

glCompressedTexSubImage2DARB(GL_TEXTURE_2D, 0, 0,0,256,256, GL_COMPRESSED_RGB_S3TC_DXT1_EXT,compsize, compdata);

Ysaneya
11-15-2005, 01:46 AM
I'm a bit confused.. what are the two compressed textures used for ?

How do you perform the update of the 256x256 compressed texture ? Via glTexSubImage2D ?

Can you post the whole code ? Or maybe a link to the EXE ?

I'm using an ATI X850 XT on PCI-X16.

Thanks..

pocketmoon
11-15-2005, 03:53 AM
I'm confused too :) But...

The original idea was to create a high detail procedural texture on the CPU using layers of perlin noise, to use as a vast terrain texture.

The noise calcs are expensive so I though about gradually increasing the detail in the texture over a number of frames, e.g. add an octave of noise each frame.

Also, I wanted the texture to be compressed to save space and also wanted to test the updating of small chunks of the texture based on distance from the viewer.

So, I create my texture data on the CPU to be updated over time with noise.

I create the main compressed terrain texture on vmem.

To replace an 32x32 chunk of this main texture I first generate/update my 32x32 patch on the CPU 'uncompressed' then write this to a 32x32 compressed texture on the card.

I then read this back as 'raw' compressed data using glGetCompressedTexImageARB and then copy it to a subsection of the compressed texture using glCompressedTexSubImage2DARB

I'm basically using OpenGl to do the compression for me on the 32x32 chunk by going from local mem to texture. I then read this 'staging' texture back and update the second main texture. I should try this as a copytext but from your description you'd be copying back anyway.

I'll post the exe and source this evening.

Rob

Ysaneya
11-15-2005, 07:27 AM
Surprise surprise, i'm also doing terrain rendering. Everything works well except for the high amount of texture memory usage that i'd like to optimize using compressed textures.

So, i'm generating each texture on the GPU in the color buffer (so no compression here), and i just want to copy the result to a compressed texture.. without having to transfer the results back through the CPU.

I'm not sure to understand how it's possible that you got 200 fps compressing a 256x256 texture.. since that includes the bandwidth and the CPU costs. Maybe that's hardware accelerated on NVidia cards, but not on ATIs ?

I'll try your EXE both on an NVidia and an ATI card to see the difference.

pocketmoon
11-15-2005, 10:58 AM
exe here (http://www.wavestate.com/saver/dxtupdate.exe)

"old code" excuse alert :)

</font><blockquote><font size="1" face="Verdana, Arial">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;">#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <time.h>
#include <math.h>

#ifdef WIN32
# include <windows.h>
#endif

#include <GL/gl.h>
#include <GL/glext.h>

#include <GL/glut.h>

#define PI 3.141592654
#define DDD (-40.0)
double dist;
int ctest;


int g_update;


GLuint texarray[10];


unsigned char tdata [256][256][3];
unsigned char bdata [256][256][3];

PFNGLCOMPRESSEDTEXIMAGE2DARBPROC glCompressedTexImage2DARB = NULL;
PFNGLGETCOMPRESSEDTEXIMAGEARBPROC glGetCompressedTexImageARB = NULL;
PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC glCompressedTexSubImage2DARB = NULL;



void myinit (void)
{
int x,y;


g_update = 0;

glCompressedTexImage2DARB = ( PFNGLCOMPRESSEDTEXIMAGE2DARBPROC )
wglGetProcAddress ( "glCompressedTexImage2DARB" );

glGetCompressedTexImageARB = ( PFNGLGETCOMPRESSEDTEXIMAGEARBPROC )
wglGetProcAddress ( "glGetCompressedTexImageARB" );

glCompressedTexSubImage2DARB = ( PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC )
wglGetProcAddress ( "glCompressedTexSubImage2DARB" );


if ( glCompressedTexImage2DARB == NULL

pocketmoon
11-15-2005, 10:59 AM
</font><blockquote><font size="1" face="Verdana, Arial">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;"> #include <stdio.h>
#include <string.h>
#include <assert.h>
#include <time.h>
#include <math.h>

#ifdef WIN32
# include <windows.h>
#endif

#include <GL/gl.h>
#include <GL/glext.h>

#include <GL/glut.h>

#define PI 3.141592654
#define DDD (-40.0)
double dist;
int ctest;


int g_update;


GLuint texarray[10];


unsigned char tdata [256][256][3];
unsigned char bdata [256][256][3];

PFNGLCOMPRESSEDTEXIMAGE2DARBPROC glCompressedTexImage2DARB = NULL;
PFNGLGETCOMPRESSEDTEXIMAGEARBPROC glGetCompressedTexImageARB = NULL;
PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC glCompressedTexSubImage2DARB = NULL;



void myinit (void)
{
int x,y;


g_update = 0;

glCompressedTexImage2DARB = ( PFNGLCOMPRESSEDTEXIMAGE2DARBPROC )
wglGetProcAddress ( "glCompressedTexImage2DARB" );

glGetCompressedTexImageARB = ( PFNGLGETCOMPRESSEDTEXIMAGEARBPROC )
wglGetProcAddress ( "glGetCompressedTexImageARB" );

glCompressedTexSubImage2DARB = ( PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC )
wglGetProcAddress ( "glCompressedTexSubImage2DARB" );


if ( glCompressedTexImage2DARB == NULL

pocketmoon
11-15-2005, 11:03 AM
bugger! :rolleyes:

Code here :)

source code (http://www.wavestate.com/saver/dxtupdate.c)

Ysaneya
11-15-2005, 11:49 AM
I get around 100 fps, which means something is probably wrong in my code.

Still, i don't understand what's your logic behind the two textures texarray[0] and texarray[1]. Per-frame you are:

1. Updating the texture buffer on the CPU.
2. Creating a compressed texture in texarray[0] (why calling glTexImage2D instead of using glTexSubImage2D ?).
At that point, your texture is now compressed in video memory in texarray[0], right ?
3. You read back the compressed texture data from video memory, to a preallocated system memory buffer.
4. You upload this buffer as the data for the texarray[1] texture.
5. You render the quad with texarray[1].

I just fail to see the interest of doing points 3 and 4. Can't you simply forget the second texture, and render with texarray[0] ?

Y.

Ysaneya
11-15-2005, 11:55 AM
Just thinking about it.. an explanation would be, you're not really reading from video memory. By creating a compressed texture every frame, OpenGL probably keeps in system memory the compressed buffer. Which means in point 3, you're not actually reading from video memory, but simply from system memory. Am i right ?

If i'm right, then your scenario is not matching what i'm doing. You are generating your texture on the CPU, use OpenGL to compress it, and upload it to the video card. I am generating this texture on the GPU, so i need to retrieve it back to the CPU first before i can compress it and upload it back. This probably explains the huge performance difference i see.

Y.

pocketmoon
11-15-2005, 01:14 PM
Your right. It's a duff idea, but hey, it was 2001 and I was playing with dxt :)

Anyway. Lets borrow one of paulsprojects excellent demos from www.paulsprojects.net (http://www.paulsprojects.net)
Render To Texture Demo (http://www.paulsprojects.net/opengl/rtotex/rtotex.html)

And then modify Main.cpp thus

main.cpp (http://www.wavestate.com/saver/Main.cpp)

Running demo, hit 9 to force a readback of the pbuffer and immediate compress via a glTexSubImage2D to a DXT1 texture (which is left as the bound texture for the final fullscreen quad). 0 takes you back to direct pbuffer rendering.

I get a fps drop from 1300fps to 90 for a 256x256 texture.

512x512 gives a drop from 900 to 23

Ysaneya
11-15-2005, 01:51 PM
900 fps down to 75 fps in 256^2, and to 20 fps in 512^2.

Y.

Ysaneya
11-15-2005, 02:13 PM
I replaced your code to render to the color buffer, and do a glCopyTexSubImage2D. The results are interesting:

- 900 fps down to 18 fps in 256^2,
- 900 fps down to 5 fps in 512^2.

Which seems to match the results i noticed in my first post.. so via glGetTexImage and glTexSubImage2D, it's more than 4 times faster!

Y.

pocketmoon
11-15-2005, 02:16 PM
Hi Y,

I couldn't get glCopyTexSubImage2D to work.
Could you post your change?

Ah! wglsharelists ...

FPS now drops
1300 to 132fps 256^2
900 to 42fps 512^2

So for Nvidia, glCopyTexSubImage2D is substantially faster! The opposite of your ATI findings :)

The above demo doesn't generate mipmaps for the compressed texture... adding that slows thing down further:
1300 to 100fps
900 to 30fps.

Cheers.

Rob

Mars_999
11-15-2005, 03:38 PM
Hi Ysaneya,

Could you try this with a FBO to see if that would help your performance out due to pbuffers are nasty compared to FBOs? Be interesting to find out... BTW do you have any screenshots of your terrain? Be interested in seeing them with so much texture RAM being used.

Ysaneya
11-16-2005, 12:45 AM
Pocketmoon: that's a good news, because it means texture compression becomes usable for me. Yipee! So i'll use glCopyTexSubImage2D on NVidia and glGetTexImage + glTexSubImage2D on ATI.. thank you :)

Mars_9999: i'm not on the machine with the code right now, but if you want you can try it yourself (pocketmoon posted the link to the exe+src a few posts before).

Screens of my procedural terrain (that's a planet renderer):

http://fl-tw.com/Infinity/Media/Screenshots/planet_tex_new_4.jpg
http://www.fl-tw.com/Infinity/Media/Screenshots/planet_tex_new_13.jpg
http://fl-tw.com/Infinity/Media/Screenshots/planet_tex_new_17_med.jpg
http://fl-tw.com/Infinity/Media/Screenshots/planet_tex_new_18_med.jpg

The ground is lacking details - it's not a limitation of the algorithm, i just haven't got the time to implement detail textures yet.

Y.

pocketmoon
11-16-2005, 01:44 AM
Originally posted by Ysaneya:
Screens of my procedural terrain (that's a planet renderer):
Y.Doh! I'm kicking myself for not realising the link between this discussion and the one on http://www.fl-tw.com/Infinity about terrain texture usage :)

Glad to be of help!

Rob

knackered
11-16-2005, 03:29 AM
use a fragment shader to do the compression yourself.

pocketmoon
11-16-2005, 05:31 AM
Originally posted by knackered:
use a fragment shader to do the compression yourself.Is it possible to render a 'self-compressed' texture to a buffer (via a shader) and then have openGL bind to that buffer as DXT?

the rest is implementable...

find the min and max colour vectors in a 4x4 sample.
Derive the two interpolated values.
For each sample pick determine the closest of the 4.
Merge them into 4 x 16 bit values per compressed block (two 5:6:5 colour values and 32 bits of lookup)

Ysaneya
11-16-2005, 07:25 AM
I guess it could be done.. but with a complex and long (so slow) fragment shader. I'm not ready to trade memory for an horrible performance.

Korval
11-16-2005, 10:09 AM
Is it possible to render a 'self-compressed' texture to a buffer (via a shader) and then have openGL bind to that buffer as DXT?I seriously doubt it.

Without bitwise operations, I have no idea how you'd produce the 2 16-bit colors, let alone the 2-bit codes that tell how much of a linear blend to use.


with a complex and long (so slow) fragment shader. I'm not ready to trade memory for an horrible performance.No matter how long that shader is, it'd be faster than doing a memory transfer, running the compression on the CPU, and then re-uploading it again.

pocketmoon
11-16-2005, 10:24 AM
Well we're into to realms of 'just for fun' now :)

Looks like a start was made here :

gpgpu (http://www.gpgpu.org/forums/viewtopic.php?t=1239&highlight=dxt1+dxt2+dxt3+dxt4+dxt5+s3tc)

knackered
11-16-2005, 12:29 PM
Originally posted by pocketmoon:
Is it possible to render a 'self-compressed' texture to a buffer (via a shader) and then have openGL bind to that buffer as DXT?use a fragment shader to do the decompression yourself. ;)

Mars_999
11-16-2005, 12:59 PM
Ysaneya I would like to but my time is limited as of now... To much coding. :) :p

Ysaneya
11-16-2005, 01:28 PM
I think it's the same for everybody :)

SeskaPeel
11-18-2005, 04:20 AM
Ysaneya :
after looking at your picture, I'm wondering how you did the sky background. I used the famous light scattering effect (published on www.ati.com/developer (http://www.ati.com/developer) ) and got strange colors, still kind of realistic : http://dev.succubus.fr/ .

Did you use a texture, or did you use the same technique as me, tweaked to get better results? If so, what were your tweaks?

SeskaPeel.

Ysaneya
11-18-2005, 06:21 AM
I'm using a tweaked version of sky scattering that is supporting a viewpoint at any altitude (even in space). It's using similar equations than in the ATI paper, but it also adds a variable atmosphere density function (which requires a ray/sphere intersection test on the vertex shader) and a non static sun color.

But i must say, i can't really see your "strange colors". What you show in these screens looks pretty good to me.

SeskaPeel
11-20-2005, 07:36 AM
Yes, I'm using the same technique, from a swedish guy IIRC, that improved the Pretham's version.

Anyway ... my orange-ish colors look more like martian. Looking at yours, I have the feeling you get brighter colors, and that was precisely what I was missing (my blue was too dark, I had to tweak it manually).

Maybe I made a mistake in the implementation of the shaders ... but the colors I get (even more for the sun itself) seem messed up.

andras
12-13-2005, 11:48 AM
Originally posted by Ysaneya:
I replaced your code to render to the color buffer, and do a glCopyTexSubImage2D.Umm, sorry if I'm asking the obvious here: do you mean you copy part of an uncompressed texture into part of a compressed texture? I didn't know you could do that! :)
And you say it's fast on nVidia? Holy crab! :)

Ysaneya
12-13-2005, 02:16 PM
No, that's actually copying a part of the color buffer into a texture that was created with a compressed format.

andras
12-13-2005, 03:53 PM
Originally posted by Ysaneya:
No, that's actually copying a part of the color buffer into a texture that was created with a compressed format.Well, yeah, that's basically the same. You can bind any uncompressed texture to a FBO, and then use glCopyTexSubImage2D. Man, I'll have to try this :)

andras
12-14-2005, 06:09 AM
I have one question though: Did you guys figure out if there's any performance penalty for updating arbitrary sized subrectangles in a compressed texture? I would guess that, depending on the compression algorithm, it would be more efficient to update chunks that are aligned to 4x4 blocks, or something...

Fastian
12-15-2005, 02:35 AM
Originally posted by Mars_9999:
Hi Ysaneya,

Could you try this with a FBO to see if that would help your performance out due to pbuffers are nasty compared to FBOs?FBO won't help in improving performance. They help in cleaning code :) . I for one haven't seen much performance difference between FBO and pbuffer. But FBO are way better to work with.

tfpsly
12-15-2005, 04:19 AM
What are you exactly trying to do, Ysaneya ? We're still waiting for you Geforce3 soft shadows demo after 3 years ;)

Ysaneya
12-15-2005, 06:11 AM
Well, it's been released since 2 years and a half. It was published in ShaderX2: Shader Programming Tips and Tricks with DirectX 9.0. But i had to implement it in DirectX (arg!) because of the book title.. oh well.

Now i've moved on a planetary engine. You can see more on my website:

http://www.fl-tw.com/Infinity

I'm doing per-pixel texture splatting/lighting, spherical geo-mipmapping and seamless space-to-ground landing.

andras
12-15-2005, 12:54 PM
Originally posted by Ysaneya:
I'm doing per-pixel texture splatting/lighting, spherical geo-mipmapping and seamless space-to-ground landing.Looks awesome!! Can we get a demo pls? :o)