PDA

View Full Version : Strange texture performance/behaviour



Ffelagund
01-09-2008, 02:43 AM
Hello,

I'm working in an application that needs to load many many texture data. I'm talking about 40-50 4256x2848xrgb images in a typical worst case scenary.

Well, all my life I grown up with the idea that GL drivers were so smart that we didn't need to know the video memory usage in our programs. I thought about texture memory/disk swapping behaviour when applications exceeds the video memory available, causing a slowdown. This behaviour was fine for me until know, because it was according with the GL spec (driver tries to allocate texture objects in vram, that's resident textures, and if can't it will do it in main memory and will swap to vram when needed, even if the process is slow)

With my current application (many 2D textures, tiled in 3D textures to avoid the maximun 2d texture size limitation), when I load, say 350Mb of texture data (having a 8500GT 512Mb) strange things starts to happen. Let me describe the scenary: Suppose that I load 16 textures, and I proceed to draw them in the canvas as screen aligend quads in a 4x4 matrix fashion. Well, I expected that if I exceed the video memory available the only effect will be an horrible slowdown, but the real thing is that textures jumps from a position of the matrix to another, and some positions becomes invalid. Its like if the driver becomes crazy and the "internal structure that stores texture names and texture images" were messed up totally.

This problem happens when I reach to the 'magic limit' of ~350Mb of texture memory, until there, all works fine.
I'm sure that I could write a repro application to fill a bug report to NVidia, but I want to know your opinions and experience about this matter (using huge texture memory quantities)
I can workaround this problem by loading low resolution and grayscale versions of the images to avoid reach the magic limit but I do not want put the application in the production stage with this problem.

Other problem I'm having is this one.
If I load many textures, without reaching the magic limit, performance drops a lot (about 0.5f fps). Well, that's not strange. The strange thing is that if I zoom out a lot the scene (currently a 2D scene. This part of the application is some kind of photo editor) and then I restore the zoom to 1, the performance grows a lot, having about 30fps. I think that this could be cache related, but I'm not sure because I can't realize the true reason of this behaviour.

There is another scenary with the same problem. Photos have many markers placed over it, say around 60-100. Markers are drawn correctly without performance drops (thanks to textured point sprites and vertex shaders :) ) but if I activate the flag that enables the markers' labels, performance drops down again, until some seconds passes (around 20') or I zoom out/in the scene. Then, the fps gets useable again.

I checked the drawing loop to see if some high cost operation is performed in the first frames of the slowdown, and nothing strange happened. I tested too texture residence (font texture and photos textures) and the driver always reports residence, so I am really lost with these two problems, so any help will be welcomed :)

Thanks,
Jacobo.

michael.bauer
01-09-2008, 03:31 AM
Hi,

I have seen problems with 3D textures on NVIDIA hardware, too. I had problems with a volume rendering application when using very large volume data (several hundred MB) on a linux machine with the first Geforce 8 compatible drivers. When I exceeded a specific limit of memory (800MB on a 8800GTS), opengl output got extremely slow, sometimes the system crashed; in addition the framebuffer got currupted when the window was resized (it looked like a memory management error). I submitted a bug report but did not verify yet if it is really fixed; you should perhaps try the latest drivers and see what happens ...

Jan
01-09-2008, 04:14 AM
There was a thread recently, where someone from nVidia stated, that there hardware does not support 3D textures larger than 512 pixels (in any dimension), but that the driver has a bug and does not throw an error.

Apart from that having HUGE textures is in general a problem. And with 3D textures that's easy to achieve.

Jan.

Ffelagund
01-09-2008, 04:22 AM
I have 4256x2848 2D textures mapped as 3D tiled textures. I use MAX_3D_TEXTURE_SIZE for tile size and nearest interpolation to create them. For instance, a common 4256x2848 2D texture is created as a 3D texture with 6 tiles, having each tile 1419x1424 dimensions. So, are you saying that each tile must not exceed 512? it is not a problem for me, except that I'll waste more memory, few rows and cols that wont be used, but I don't care about that.

Relic
01-09-2008, 04:37 AM
There was a thread recently, where someone from nVidia stated, that there hardware does not support 3D textures larger than 512 pixels (in any dimension), but that the driver has a bug and does not throw an error.


512 was up to GeForce 7.
GeForce 8 class HW has a max 3D texture size of 2048.

Relic
01-09-2008, 04:38 AM
I have 4256x2848 2D textures mapped as 3D tiled textures. I use MAX_3D_TEXTURE_SIZE for tile size and nearest interpolation to create them. For instance, a common 4256x2848 2D texture is created as a 3D texture with 6 tiles, having each tile 1419x1424 dimensions. So, are you saying that each tile must not exceed 512? it is not a problem for me, except that I'll waste more memory, few rows and cols that wont be used, but I don't care about that.

Instead of 3D textures you should look into 2D texture arrays for your tiling.
Actually GeForce 8 class HW doesn't need to tile 4256x2848 at all, because the max 2D texture size is 8192 and it supports non-power-of-two textures.

Ffelagund
01-09-2008, 05:14 AM
Yes, I'm aware of texture arrays, but IIRC they are only available in 8800GTX (correct me if I am wrong), and we do not want set the minimun requisites for the application the current best card in the market. By the way, now we are working with 4256x2848, but we have planned support _very_large_ textures (23.000 x 23.000)

Jackis
01-09-2008, 05:36 AM
About perfomance drops - look here, this should help a bit:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=164629

Ffelagund
01-09-2008, 06:39 AM
hmm, the registry hack described in this link seems that solved the problems. I'll perform some tests and post again with the results.

Many thanks :)

-NiCo-
01-09-2008, 07:17 AM
I have 4256x2848 2D textures mapped as 3D tiled textures. I use MAX_3D_TEXTURE_SIZE for tile size and nearest interpolation to create them. For instance, a common 4256x2848 2D texture is created as a 3D texture with 6 tiles, having each tile 1419x1424 dimensions. So, are you saying that each tile must not exceed 512? it is not a problem for me, except that I'll waste more memory, few rows and cols that wont be used, but I don't care about that.

Actually, I believe you're already wasting memory. If texture3D is similar to texture2D you're wasting a lot of memory because the texture memory footprint is being padded with zeros until it reaches the next power-of-two size in both dimensions. You should use texture rectangles if you don't want to waste memory but note that they don't support mipmapping.

PS. Please correct me if I'm wrong, anyone :)

N.

Ffelagund
01-09-2008, 07:36 AM
I dont know the internals about npot extension implementation, but I really hope that if I create a 513x513 texture, the driver wont create a 1024x1024 texture and fill the holes with zeroes.If it does, it could be a problem for our application, because its very texture memory intensive.

Our algorithm uses the best fit for tiles, so only few rows and columns are wasted (~<9) (from the application programmer point of view) So, a comment from an NV person of how to waste the less amount of texture memory will be really appreciated :) (currently we only support NV cards for this project)

-NiCo-
01-09-2008, 07:44 AM
Well, here (http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide.pdf)'s one source of information that seems to confirm my suspicions. (Chapter 7.1.1-7.1.3)

N.

Ffelagund
01-09-2008, 08:07 AM
Thanks Nico :) I'll consider switch to texture rectangles. This could explain the "magic limit" I was talking about :)

Ffelagund
01-09-2008, 08:29 AM
By the way, the registry hack worked in all scenaries. Now the application runs faster than always :)

Many thanks,
Jacobo.

Seth Hoffert
01-09-2008, 09:08 AM
I hope this padding thing doesn't hold true for the 8800 line of cards. :(

EDIT: I could've sworn that one of the specs gave the formulas used for computing the mipmap levels of NPOT textures... why would the driver have to pad for NPOT with mipmaps, but not for NPOT without mipmaps? I don't understand.

EDIT2: I see now that they were talking about the texture rectangle thing... this is a real shame. I thought for sure it was possible to use the NPOT extension with mipmapping, and not have to suffer this padding nonsense... oh well.

EDIT3: Wait, is this padding only for getting it to the card? How is it that I'm able to still use normalized texture coordinates but with a potentially padded NPOT texture (with mipmaps) without accessing the padded portion? Seems odd.

I'm going to be performing some tests later to see if this is the case with the 8800. Perhaps timing the transfer of a 1025x1025 NPOT vs a 2048x2048 POT would be a good test.

jwatte
01-09-2008, 04:03 PM
Because your working set is larger than the working set of the target hardware, you need to do application layer optimizations.

For example, most displays are 2000x1500 pixels or smaller, so you never need all the pixels of a 4000x2000 texture. If you know what the geometry is that you're drawing, you can upload only the textures that you know will be visible, and you can upload smaller versions, or sub-regions, of textures where you know either that they will be filtered, or that only a region of the texture will be visible.

High-performance custom visualization software does these things. The driver writers claim that we "shouldn't have to," but clearly, we do. In fact, when you have more textures than can fit in main RAM, you HAVE to go this route, no matter what. Might as well start now.

Korval
01-09-2008, 05:17 PM
For example, most displays are 2000x1500 pixels or smaller, so you never need all the pixels of a 4000x2000 texture.

That depends entirely on how you use it. You may not need all of those pixels at any one time, but you do need them to be there.

jwatte
01-10-2008, 12:51 AM
You may not need all of those pixels at any one time, but you do need them to be there.


Yes -- my statement lacked the qualifier "at one time." In reality, you'll probably pre-fetch enough of the data for whatever the user is navigating (with some prediction) and make sure that's available in high-rez. If the user can hyper-space-jump around the data, then drawing pink while you're demand-paging the data is likely a good idea, too. Or perhaps a lower resolution version of the texture, if you're not in a situation where it doesn't matter whether the user knows whether he's seeing the full resolution image or not (e g, not in medical imaging).

With 23,000x23,000 pictures, no card will do that in RAM anyway, so you might as well start writing your paging and partial mipmapping functions now.

Ffelagund
01-10-2008, 01:33 AM
With our current working images (4000x2000) we need at any time all pixels. We can't give the user low resolution images to work with, because it will end in big inaccuracies in the application process. The application's workflow demands all texture pixels with nearest interpolation to be given to the user, and letting him mark some pixels, with subpixel accuracy.

So the user goes from a low resolution view of the images (having all images in rectangular disposition (for instance) over the canvas) to a hight detailed view in few senconds (zooming an image to be able to select a point inside a single pixel with an 0.001 subpixel precision at least.

The only way we could follow, is limiting the user's working set of pictures in layers, setting a number of images limit for the layer, having loaded only the pictures that belongs to the visible layer.

It's obvius that 23.000x23.000 pictures are too large for any current vram, but we dont reached this point yet. We are designing the algorithms and techniques, keeping an eye in this future scenary, but when we truly support the very high res pictures (actually, aerial pictures) we wouldn't live with the current 2D texture limitations, and we will have to "tile the tiles" :)

P.D: This is an photogrametric application, with metric quality in the measurements, so that's the reason of having such large images and the importance of subpixel precision and the fast interactivity needed for the user.

-NiCo-
01-10-2008, 02:44 AM
Maybe there's also some lossless texture compression (http://www.google.be/search?hl=en&q=%22Support+for+advanced+lossless+compression+alg orithms+for+color%2C+texture%22&btnG=Google+Search&meta=) you can use to save some memory. Haven't tried it myself but apparently it's supported.

N.

Ffelagund
01-10-2008, 02:57 AM
I performed a rapid measurement to know the truth about the padding issues, and here are my results:

3D Card: NV8500GT
Driver version: 169.21

I've uploaded 10 times a texture with no mipmaps, and I took the seconds that it taken. I tried with several texture's sizes: 4095, 4096, 4097 and 8000.
The thing I wanted to know was if effectively, the driver was padding with zeroes the textures when they are NPOT.

For a 4095x4095x3 bytes texture, I got a mean time (per texture upload) of 0.126444 seconds

For a 4096x4096x3 bytes texture, I got a mean time (per texture upload) of 0.12778 seconds (almost the same than the 4095 version)

For a 4097x4097x3 bytes texture, I got a mean time (per texture upload) of 0.128251 seconds. It is in the same 'range' than the 1023-1024 versions, so if it were padded with zeroes the time should be much bigger, because it shall be uploading a 8192x8192 texture

For a 8000x8000x3 bytes texture, I got a mean time (per texture upload) of 0.489454 seconds, four times more than the 4097x4047, so my thoughts are that, (at least without mipmaps) NPOT textures are not padded with zeroes to fill until the next power of two size, at least not in the uploading process, but we can't know how many vram is using the texture.

After these results, I reach to the conclusion that we don't know anything new :/, so I'll try some tests mixing DX GetAvailableTextureMem call after and before uploading the textures, to see if I can reach to any conclusion.

-NiCo-
01-10-2008, 03:20 AM
I think it's hard to track down what the driver is actually doing. It's possible that it is converting the array internally e.g. the RGBA texture format is internally stored as BGRA, so the driver will have to convert an RGBA array to a BGRA array before upload. Furthermore, it seems reasonable to assume that, if padding were true, there is no sense in uploading all 8192 rows for a 4097-row texture since the remaining rows are empty. Looks to me like it's hard to tell how much time is spent in the driver compared to actually uploading the texture.

Maybe it's better to check the upload rate of a texture rectangle of 4097x4096 with BGRA data against its 2D counterpart.

There's no reason at all to assume that anything I'm saying here is true. It's just that I have no idea why else they would mention texture padding in chapter 7.1.2 of the GPU Programming Guide.

Cheers,
N.

Relic
01-10-2008, 07:21 AM
With our current working images (4000x2000) we need at any time all pixels. We can't give the user low resolution images to work with, because it will end in big inaccuracies in the application process. The application's workflow demands all texture pixels with nearest interpolation to be given to the user, and letting him mark some pixels, with subpixel accuracy.

The point was that there don't need to be all original source pixels resident at all times, just so much to fill the pixels on screen.
Check out the video on "Seadragon" which mentions in the beginning of the presentation that the performance is NOT limited by the data they view but by the onscreen pixels only!

http://www.youtube.com/watch?v=PKwTurQgiak

Cool stuff!
(The "PhotoSynth" technology seems to be even more interesting since it is 3D-ish.)

Ffelagund
01-10-2008, 02:14 PM
Well, our software is in some way related with photosynt, but Architecural and Industry oriented, but the performance is good now (after the registry hack) and we've reached the enough number of high res images that the user could need to perform his tasks, so if I convince my bosses, I'll try the "performance is NOT limited by the data they view but by the onscreen pixels only" technique. This video really impressed me, and I think that I'm able of doing something similar, and if my bosses consider that worths the effort, that way of texels handling will be in the application, I promise :)

skynet
01-10-2008, 03:48 PM
I'd be happy if someone would invent something like Seadragon for 3d-geometry :-)

Ffelagund
01-10-2008, 04:10 PM
Actually, having a low density point cloud from a set of photos of an object is not difficult (that's what Photosynth does), even its not computionally expensive, the problem is try to have a very dense point cloud (a point for each picture pixel) and then triangulate it, because traditional computer vision algorithms are very error prone in a non user-assisted environment. But it's really possible with some relatively new techniques, reducing the user intervention to a minimal part.

jwatte
01-10-2008, 09:50 PM
so the driver will have to convert an RGBA array to a BGRA array before upload

I believe most modern hardware can do that kind of conversion in the DMA transfer engine, so the driver doesn't need to use CPU to do it.

Also, when I suggested using a low-res image, I meant only until the high-res data has been loaded into RAM, then re-paint with the high-res data. The main point being to provide a smooth, high frame rate for interaction, and then make sure that you only put detail where it matters.

Ffelagund
01-11-2008, 02:53 AM
I finally got the expected results :) Thanks to the registry hack, the framerate in the application is fast and stable, and thanks to -Nico-, that pointed me to that NV pdf, the memory consuption is optimal. Now I can load the expected number of photos until reaching the true video memory limit (setting the 3D slices dimensions to a tuned power of two amount ( 128 - 256 ) the memory waste is minimal, so I can confirm that NPOT textures are padded in memory.
Before setting dimensions to a POT size, I only was able to fill memory until 350 of texture data. Now, with POT sized slices, I can fill much more, about 500Mb for a 512Mb card, so the memory is much better used, and I can see a noticeable speed increasement (maybe using 'small' slices benefits the caché?)

Thanks to all for your ideas :)

Ffelagund
01-12-2008, 03:30 AM
By the way, only for information, current Seadragon version is written with OpenGL :)

Timothy Farrar
01-12-2008, 11:52 AM
By the way, only for information, current Seadragon version is written with OpenGL :)

Yes ironic isn't it!

BTW, I'm missing what is so new about Seadragon that they have run up such a patent storm? It is basically virtual mipmapping, been around forever. Seriously google maps, meta texturing, even that tiny flash gigapixel image viewer, all good examples of the concept already working and not just a prototype.