Dynamic AND compressed textures

I haven’t tried it yet, but i’d like to gather some opinions on this:

I’m generating a set of textures on the GPU. These are generated once, and then reused for thousands of frames (so it’s not dynamically updated every frame). Yesterday i discovered that i was using more than 500 Mb of textures… so i’d better compress them.

Now i’m wondering, we all know that decompression is hardware-accelerated… but is compression accelerated too ?

Typically, i’ll render my texture in the color buffer or in a PBuffer, and use glCopyTexSubImage2D to a texture created with a compressed internal format.

What i’d like to know is, will i loose performance by doing that ? Will the driver have to read back the color buffer, compress the texture on the CPU, and upload the result back… ?

Y.

Hi Ysaneya,

the driver will most likely do a readback followed by compression (in the driver) and upload. You could do a test to find out, please let us know about your results.

Michael

But I think if you really do this only once every 1000 frames or so, this won’t have a strong impact on performance…

Not that simple :slight_smile:

Once generated, a texture isn’t modified for a thousand of frames, true. But there is a lot of textures in the system (many hundreds), which means the probability to generate a texture for a given frame is not that low.

If the driver has to read back the data, it will:

  1. Kill the parallelism between the CPU/GPU for that frame
  2. Require more bandwidth for that frame (GPU->CPU transfer, then CPU->GPU)
  3. Require a lot of CPU power for that frame (textures are generally 512^2).

Which means i would not be surprised if generating a single texture would take between 50 to 100 milliseconds. That’s not acceptable to keep a smooth and constant framerate.

When the hardware can’t compress it for you there’s nothing you can do about the bandwith. But perhaps you can improve parallelism by manually up-/downloading with PBOs.

That’s something you’ll have to benchmark.

Ysaneya,

I forgot the simplest way: you could reduce the resolution instead of compressing. E.g. reducing the resolution by a factor of 2 needs only a quarter of memory. And the resampling can be done efficiently by the gpu (e.g. just reduce to viewport you render to). If you do it right you won’t see a great difference, but it depends on your application, of course. Please let us know your results.

Michael

Not what you want but related…

I did some tests in 2001 to update small sections of a large compressed bitmap using glGetCompressedTexImageARB and glCompressedTexSubImage2DARB.

I gave up because of buggy drivers.

I just reran and it works!! Hurray!

Results of the first tests…

It works, but it’s not very fast, which seems to confirm the read-back-then-compress-on-CPU scenario.

The framerate goes from 75 fps to 13 fps when i compress one 256^2 texture per frame, and to 3 fps when i use a 512^2.

If no texture is updated at all, the framerate is around 85 fps. Without texture compression and 512^2, i get slowdowns due to the half-gigabyte amount of textures being paged from/to video memory. With compressed textures, the slowdowns disappear and i get a solid and constant 85 fps, which means decompression is working well (too bad compression is so slow, pretty much unusable for me then).

Y.

That’s some slowdown. What’s you vid card ?

I’ve tweaked my test to maintain a local 256x256x3 byte array which I use to create two compressed DXT1 textures, e.g.

glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_S3TC_DXT1_EXT,...

Each frame I update a chunk of this, re-create compressed texture1, read back the raw compressed data and update compressed texture 2.

The reason for the round the houses is that originally I was testing the update of small chunks, i.e. comp text1 was just a 4x4 which I then texsubimaged into the large comp. texture.

I get >450fps updating the whole 256x256 compressed texture with a 6800GT…

[edit]
hmm. Thats with a solid block of colour which is quick to encode. If I put more varied data in the re-compress step I get about 200fps.
[/edit]

 	

glBindTexture(GL_TEXTURE_2D, texarray[0]);
	
glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_S3TC_DXT1_EXT , d, d, 0, GL_RGB, GL_UNSIGNED_BYTE, tdata);

//read it back in compresed format
if (compsize == 0)
  {
    glGetTexLevelParameteriv(GL_TEXTURE_2D,0,GL_TEXTURE_IMAGE_SIZE_ARB,&compsize);
    printf("Comp size %d
",compsize);
    compdata = (unsigned char*)malloc compsize*sizeof(unsigned char));
  }

glGetCompressedTexImageARB (GL_TEXTURE_2D, 0, compdata);

glBindTexture(GL_TEXTURE_2D, texarray[1]);

glCompressedTexSubImage2DARB(GL_TEXTURE_2D, 0, 0,0,256,256, GL_COMPRESSED_RGB_S3TC_DXT1_EXT,compsize, compdata);

I’m a bit confused… what are the two compressed textures used for ?

How do you perform the update of the 256x256 compressed texture ? Via glTexSubImage2D ?

Can you post the whole code ? Or maybe a link to the EXE ?

I’m using an ATI X850 XT on PCI-X16.

Thanks…

I’m confused too :slight_smile: But…

The original idea was to create a high detail procedural texture on the CPU using layers of perlin noise, to use as a vast terrain texture.

The noise calcs are expensive so I though about gradually increasing the detail in the texture over a number of frames, e.g. add an octave of noise each frame.

Also, I wanted the texture to be compressed to save space and also wanted to test the updating of small chunks of the texture based on distance from the viewer.

So, I create my texture data on the CPU to be updated over time with noise.

I create the main compressed terrain texture on vmem.

To replace an 32x32 chunk of this main texture I first generate/update my 32x32 patch on the CPU ‘uncompressed’ then write this to a 32x32 compressed texture on the card.

I then read this back as ‘raw’ compressed data using glGetCompressedTexImageARB and then copy it to a subsection of the compressed texture using glCompressedTexSubImage2DARB

I’m basically using OpenGl to do the compression for me on the 32x32 chunk by going from local mem to texture. I then read this ‘staging’ texture back and update the second main texture. I should try this as a copytext but from your description you’d be copying back anyway.

I’ll post the exe and source this evening.

Rob

Surprise surprise, i’m also doing terrain rendering. Everything works well except for the high amount of texture memory usage that i’d like to optimize using compressed textures.

So, i’m generating each texture on the GPU in the color buffer (so no compression here), and i just want to copy the result to a compressed texture… without having to transfer the results back through the CPU.

I’m not sure to understand how it’s possible that you got 200 fps compressing a 256x256 texture… since that includes the bandwidth and the CPU costs. Maybe that’s hardware accelerated on NVidia cards, but not on ATIs ?

I’ll try your EXE both on an NVidia and an ATI card to see the difference.

exe here

“old code” excuse alert :slight_smile:

</font><blockquote><font size=“1” face=“Verdana, Arial”>code:</font><hr /><pre style=“font-size:x-small; font-family: monospace;”>#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <time.h>
#include <math.h>

#ifdef WIN32

include <windows.h>

#endif

#include <GL/gl.h>
#include <GL/glext.h>

#include <GL/glut.h>

#define PI 3.141592654
#define DDD (-40.0)
double dist;
int ctest;

int g_update;

GLuint texarray[10];

unsigned char tdata [256][256][3];
unsigned char bdata [256][256][3];

PFNGLCOMPRESSEDTEXIMAGE2DARBPROC glCompressedTexImage2DARB = NULL;
PFNGLGETCOMPRESSEDTEXIMAGEARBPROC glGetCompressedTexImageARB = NULL;
PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC glCompressedTexSubImage2DARB = NULL;

void myinit (void)
{
int x,y;

g_update = 0;

glCompressedTexImage2DARB = ( PFNGLCOMPRESSEDTEXIMAGE2DARBPROC )
wglGetProcAddress ( “glCompressedTexImage2DARB” );

glGetCompressedTexImageARB = ( PFNGLGETCOMPRESSEDTEXIMAGEARBPROC )
wglGetProcAddress ( “glGetCompressedTexImageARB” );

glCompressedTexSubImage2DARB = ( PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC )
wglGetProcAddress ( “glCompressedTexSubImage2DARB” );

if ( glCompressedTexImage2DARB == NULL

</font><blockquote><font size=“1” face=“Verdana, Arial”>code:</font><hr /><pre style=“font-size:x-small; font-family: monospace;”> #include <stdio.h>
#include <string.h>
#include <assert.h>
#include <time.h>
#include <math.h>

#ifdef WIN32

include <windows.h>

#endif

#include <GL/gl.h>
#include <GL/glext.h>

#include <GL/glut.h>

#define PI 3.141592654
#define DDD (-40.0)
double dist;
int ctest;

int g_update;

GLuint texarray[10];

unsigned char tdata [256][256][3];
unsigned char bdata [256][256][3];

PFNGLCOMPRESSEDTEXIMAGE2DARBPROC glCompressedTexImage2DARB = NULL;
PFNGLGETCOMPRESSEDTEXIMAGEARBPROC glGetCompressedTexImageARB = NULL;
PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC glCompressedTexSubImage2DARB = NULL;

void myinit (void)
{
int x,y;

g_update = 0;

glCompressedTexImage2DARB = ( PFNGLCOMPRESSEDTEXIMAGE2DARBPROC )
wglGetProcAddress ( “glCompressedTexImage2DARB” );

glGetCompressedTexImageARB = ( PFNGLGETCOMPRESSEDTEXIMAGEARBPROC )
wglGetProcAddress ( “glGetCompressedTexImageARB” );

glCompressedTexSubImage2DARB = ( PFNGLCOMPRESSEDTEXSUBIMAGE2DARBPROC )
wglGetProcAddress ( “glCompressedTexSubImage2DARB” );

if ( glCompressedTexImage2DARB == NULL

bugger! :rolleyes:

Code here :slight_smile:

source code

I get around 100 fps, which means something is probably wrong in my code.

Still, i don’t understand what’s your logic behind the two textures texarray[0] and texarray[1]. Per-frame you are:

  1. Updating the texture buffer on the CPU.
  2. Creating a compressed texture in texarray[0] (why calling glTexImage2D instead of using glTexSubImage2D ?).
    At that point, your texture is now compressed in video memory in texarray[0], right ?
  3. You read back the compressed texture data from video memory, to a preallocated system memory buffer.
  4. You upload this buffer as the data for the texarray[1] texture.
  5. You render the quad with texarray[1].

I just fail to see the interest of doing points 3 and 4. Can’t you simply forget the second texture, and render with texarray[0] ?

Y.

Just thinking about it… an explanation would be, you’re not really reading from video memory. By creating a compressed texture every frame, OpenGL probably keeps in system memory the compressed buffer. Which means in point 3, you’re not actually reading from video memory, but simply from system memory. Am i right ?

If i’m right, then your scenario is not matching what i’m doing. You are generating your texture on the CPU, use OpenGL to compress it, and upload it to the video card. I am generating this texture on the GPU, so i need to retrieve it back to the CPU first before i can compress it and upload it back. This probably explains the huge performance difference i see.

Y.

Your right. It’s a duff idea, but hey, it was 2001 and I was playing with dxt :slight_smile:

Anyway. Lets borrow one of paulsprojects excellent demos from www.paulsprojects.net
Render To Texture Demo

And then modify Main.cpp thus

main.cpp

Running demo, hit 9 to force a readback of the pbuffer and immediate compress via a glTexSubImage2D to a DXT1 texture (which is left as the bound texture for the final fullscreen quad). 0 takes you back to direct pbuffer rendering.

I get a fps drop from 1300fps to 90 for a 256x256 texture.

512x512 gives a drop from 900 to 23

900 fps down to 75 fps in 256^2, and to 20 fps in 512^2.

Y.

I replaced your code to render to the color buffer, and do a glCopyTexSubImage2D. The results are interesting:

  • 900 fps down to 18 fps in 256^2,
  • 900 fps down to 5 fps in 512^2.

Which seems to match the results i noticed in my first post… so via glGetTexImage and glTexSubImage2D, it’s more than 4 times faster!

Y.