PDA

View Full Version : Rendering large amount of text.



M.Mortier
05-26-2004, 05:16 AM
Hello,

I'm a bit new to openGL (well not really but I've been having an on-and off relationship with it for quite some time as I don't make 3d applications very often). My apologies if this belongs in the Beginner's section.

But, I wanted to render large pieces of text on the screen. This appears to be a bit unnatural to do..
Here are two strategies (in descending order of naivity) I've tried, and their problems.

(The display lists I've used are "compiled" ones, of course..)

1) Init : Store each character in a display list (a textured quad).
Draw : Render text by calling the display lists for each character that is supposed to be drawn right now.
Problem : big slowdown from 5000+ letters on, due to the high pixel fill, I think..(I want to be able to zoom in and out of the text), since all the quads face the camera. Is that the reason why it slows down there?

2) Init : Store blocks of 1000 or so signs in display lists.
Draw : call the lists that effectively need to be drawn.
*No* improvement whatsoever.

3) A strategy I want to try, but I don't know how (I don't think it's that hard using buffers but your opinions on its efficiency would help me a lot in thinking straight).
Render blocks of text using method 2) or 1) to a buffer, and then store that buffer as a texture.
Then use it on big quads.

Is method 3) the most efficient way of doing this? It needs a lot of memory I think, storing all those rather big textures in VRAM..? (if I want to store them in display lists) Supposing I work with 1Mb+ files, I don't even think they'd fit.

Does anybody have any *better* ideas for me to try, or tell me what I'm doing wrong?

I do want to work in 3d and no rasters or anything.

Thank you lots for any help,

M. Mortier

yooyo
05-26-2004, 08:22 AM
Try to put font in texture. Then for each letter build mapping coordinates and render it using quads.
It will be fast enough.

I have few classes that can build texture from any font in Windows, take care about kerning and spacing and render strings. I can send to you if you want it.

yooyo

jwatte
05-26-2004, 09:30 AM
If you render lots of text, then my opinion (based on doing a lot of this) you should use a native bitmap (GDI, libttf, whatever) and native font renderer to draw the text, then upload that bitmap and use it as a texture containing all the text.

Typically, you can get away with a grayscale bitmap (8 bits per pixel) and uploading it as GL_ALPHA format, to save on texture memory.

fathom
05-26-2004, 10:49 PM
sounds like he's doing textured quads already.

are you sure your slow-down is fill related? how big are your quads? what card and os? you got a sample frame?

how are you generating your textures? many bitmap font systems tend to just assume a fixed width for fonts and not worry about the empty pixels. probably ends up taking 2 or 3 times as long per quad as needed because of this.

make sure that your quads are only big enough to get the letter on screen. figure that each letter in your texture is predominantly black/transparent. the less of this black/transparent texture that's dealt with, the better.

one thing that could probably help quite a bit would be to create low-res "outlines" that fit your font texture a little better.

take "A" for example. if you were to use a textured triangle that approximated the shape of the A instead of a quad that had bunches of useless space, you'd speed things up. of course, generating the shapes is not trivial.

evanGLizr
05-26-2004, 11:14 PM
Originally posted by M.Mortier:

But, I wanted to render large pieces of text on the screen. This appears to be a bit unnatural to do..
Some ideas:
a. Partially mentioned by jwatte and yourself, try to cache words: render full words to a texture (either with OpenGL or GDI) and reuse that texture.
b. Cull the letters/words that are not visible (are really all those 5000 letters visible on the screen or are you doing some kind of scrolling?).
c. Alpha test the letters (this will save you fill rate if you are fill-rate limited).

Using one display list per letter is overkill in my opinion. If anything, combine it with approach a) (cache words) by creating one display list of the whole word rather than of just a letter. If the text is dynamic, you probably want to use vertexarrays (VBOs, etc) instead of display lists.

My suspicion is that you are CPU bound, rather than fill rate bound (how big are your letters? how many have you got *really visible* on the screen? are you alpha blending?).

M.Mortier
05-27-2004, 01:40 AM
Well,

Yes, I'm already using textured quads. Perhaps I was a little ambigous.
I have a texture that contains the 256 characters, and I use offsets of that texture to draw each letter, in a quad. I store these letters in display lists.
In method 2) I then group 5000 of those letters (and y-translations after eahc line too of course) into another compiled display list. That doesn't improve anything though.

One of you said I should use words instead of letters. Let's assume I do that (I may as well split the text up into lines then), how can I then render one word in one quad, using only that texture? I think I'll need a quad per letter, either way to be able to set all coordinate points of the texture? Or can I effectively spread various parts of the 256char texture over one quad?

I have a PIV1.8 with 1G RAM and a 64Mb GFIII card, I don't think I'm CPU bound. I just think there's too may quads facing the camera? (I'm also using blending and smooth textures, which slows the whole thing down a bit (just a bit though). 5000 quads slows everything down that's for sure.
I don't have a screenshot cause I have to run 1 mile from my home to here to be able to be online. I'll bring it next time I do that.

Well, I think a native bitmap renderer that would render blocks of text to a texture would be a good idea. Just a little worried about texture size, but I guess that'll be allright.

The only thing I feel that is missing is some more optimizing. I mean, if I have the sentence "I am an ape", then for each "a" the offset to the 256character texture is the same. Why can't I use that to my advantage and cut texture size for the texture that represents the whole sentence?

M.Mortier
05-27-2004, 01:50 AM
And I am culling btw, I just wanted to be able to draw at least 5000 letters on screen (to zoom out of the text..thing is I want to make a dynamic reader kind of thing, for large raw text documents..so you can mark certain things, zoom out massively, and then flow back to certain marked spots.. It's an exercise in openGL really more than anything else, I know it's a bit silly to do it in 3d, but still..there seemed so much "shared reference" optimization possible, no?

I'll try the bitmap render thing.. I actually had the weird idea in mind at first to put the text in an offscreen tk widget or s'thing and then throw that widget's graphics to a bitmap buffer (taking care of text wrapping etc while we're at it) - but that doesn't work since I need to store *all* the text somewhere, not just the part that is visible in one window.

JustHanging
05-27-2004, 07:06 AM
Hi,

Unless you're drawing letters on top of each others, there's no way you're fill limited. 5000 quads shouldn't choke a GF3 either, so it's propably something else. Please make sure you're not

1. Binding a texture before each letter
2. Using glTranslate to place each letter

What you should do is create vertex and texture coordinate arrays of the entire text (or blocks of it since you're culling) and draw it all using a single drawArrays call. You don't have to update the arrays unless the text changes.

Oh, one more thing, you actually could be fill-limited if you're zooming out without using mipmapping. An easy way to see if you're running out of fillrate is to make the window smaller and see if it improves the framerate.

-Ilkka

Madoc
05-27-2004, 08:22 AM
We have an app that draws far more than 5000 characters at once. We use tricks when they are distant and have good culling schemes but they're not even necessary on something like a GF3.

You can't possibly be fill limited, as JustHanging said, unless you're drawing those characters really big and many times overlayed. Exclude that, I wouldn't do more than make sure you're using a single channel texture (ie only alpha) for the sake of efficiency.

From my experience, getting the text into biggish vertex arrays should be more than enough for that many characters, 10k unlit tris is not much for your GF3. Once using vertex arrays, VBO is always a good added bonus. I doubt display lists are a good choice, probably more so if you're filling them in immediate mode.

I haven't tried myself but it could be good to use words for the texture coordinate so you get smaller 16 byte aligned vertices (unless you're using 2D vertices, something we can't do, but smaller may still be better).

Edit:
I thought there might be some confusion about "words", which is intended as 16 bit values, of course. An apt thread for the ambiguity of the term :) .

jwatte
05-27-2004, 06:02 PM
My suggestion has very little to do with textured quads -- there is only one quad per block of text. This solution (upload the text as a bitmap texture) is pretty much always the fastest, unless you're really, really low on texture memory (which you usually aren't).


Alpha test the letters (this will save you fill rate if you are fill-rate limited).All modern cards implement alpha testing/blending very close to the memory controller, so if you touch ANY of the pixels in a "block" (4x4 pixels on some architectures, I think) then you pay for the entire block. Thus, alpha testing is only a win if you have wide swathes of transparent space that each is at least 7x7 pixels or more of transparency. That typically only happens on really large font sizes, which in turn mean that you can fit nowhere near 5,000 characters on screen...

Korval
05-27-2004, 07:33 PM
This solution (upload the text as a bitmap texture) is pretty much always the fastest, unless you're really, really low on texture memory (which you usually aren't).Not necessarily. If the text, for whatever reason, needs to be uploaded frequently (a score, a timer [with decimal second precision], etc) you will be better served rendering it directly.

In any case, dropping 5000 letter-sized quads, especially if you can put them in a vertex array, shouldn't be any real problem for any card.

evanGLizr
05-27-2004, 07:52 PM
Originally posted by jwatte:

Alpha test the letters (this will save you fill rate if you are fill-rate limited).All modern cards implement alpha testing/blending very close to the memory controller, so if you touch ANY of the pixels in a "block" (4x4 pixels on some architectures, I think) then you pay for the entire block. Thus, alpha testing is only a win if you have wide swathes of transparent space that each is at least 7x7 pixels or more of transparency. That typically only happens on really large font sizes, which in turn mean that you can fit nowhere near 5,000 characters on screen...True, but he mentions that he's also zooming into the text, plus I don't think having 4x4 empty aligned texels is so uncommon specially if you handle your font in a non proportional way (smallcaps, L, J, P, etc).

Edit: Not that I think that he's fill-rate limited anyway...

Madoc
05-27-2004, 10:57 PM
I have to say Jon Watte's suggestion is very interesting. Where you have a certain amount of static text it's clearly the most efficient way to render and it has the added bonus of a broad choice of fonts and hassle-free proportional fonts. I haven't actually heard this approach suggested before despite it's simplicity.

Unfortunately, it wouldn't work easily for our app. Texture memory _is_ a problem and it would be complicated with the size of some of the fonts. Needless to say, our app doesn't display text in a conventional way, we would need many thousands of high-res blocks of text.

def
05-28-2004, 01:12 AM
Have you considered the display lists as the performance killer?
If you are just storing glTexCoord and glVertex calls you should't see any improvement with display lists but you might be experiencing the display list overhead accumulating with 5000 letters...

M.Mortier
05-28-2004, 04:16 AM
Well, you're right I wasn't fill limited at all..- changing the window size doesn't really do anything to the framerate. I suppose I'm cpu limited then like a lot of you said (although I'm not sure how that works when I'm putting everything in a compiled display list - and I'm not binding a texture before each letter either btw. I thought display lists like those were stored in the video hardware. But I was probably wrong then.)
I'm a bit puzzled by what's happening then.. Are you sure that when you change the window size the effect should be noticeable when you have fillrate problems? (I mean perhaps in some situations the effect isn't linear but logarithmic so it could be barely visible?)

Well thanks for all the advice, I'll try to combine everything and post back when it works - once I figure out how to store vertex/texture arrays in the 3d hardware instead of in the system, or find a free font renderer that works on multiple platforms..

Madoc
05-28-2004, 04:39 AM
M.Mortier, have you actually tried using vertex arrays (DrawArrays or DrawElements)? If not, do this first. If you then want to use video or agp memory for them, use VBO, which is really easy to use.
As def said, display lists themselves might be a problem. Your final display list is very likely doing 5000 glCallList and nothing else. Display lists are fundamentally a means of batching commands, don't expect them to do anything exceedingly clever.
I would guess that even just drawing everything in immediate mode will be faster. You could start by putting everything in one display list directly in immediate mode without the per character DLs.

Forget fill, really. Yes, if you shrink your viewport considerably and there's no (or hardly any) improvement you are _not_ fill limited.

jwatte
05-28-2004, 12:33 PM
we would need many thousands of high-res blocks of textDo you draw them AT THE SAME TIME though?

You could set aside one big texture, or several smaller textures, for the text that's actually drawn during the current frame. You'd set aside more space for the formatted bitmaps in main RAM, which hopefully is more plentiful than VRAM.

Then, when rendering, you'd use an LRU cache of texture images; when you need to render a specific block of text, you see if it's already in a texture; if so, bind the texture, and put the texture first in the LRU list. If not, you take the texture last in the LRU list, and TexSubImage() your prepared data into it, and stick it first in the list.

Your LRU list should be bigger than the rendering needs of a single frame, for ideal frame rate :-)

Madoc
05-28-2004, 11:03 PM
I kind of assumed it was obvious that I didn't mean at the same time. The real problem is that we're already dealing with thousands of images that need to be swapped in and out and cleverly LoDed.
The other thing is that we're not just displaying text but a large 3D environment tapestried with it. It is actually posible to see it all at once and it has to be well mipmapped. Of course, we don't actually render the real text when it's small but we need detailed knowledge of the formatting to make a convincing fake.

I think it's an excellent method, but not for this application, it would hit on existing bottlenecks.

fathom
05-29-2004, 12:59 AM
m.mortier, just for grins, draw as wireframe or even disable the actual draw command all-together just to get a guage of what system is affecting your slow-down. you might find something totally unrelated is messing things up.

if the wireframe is fast, then try disabling texturemapping. if it's fast without texturemapping enabled, then there might be some issue with your texture creation (like somebody mentioned resampling earlier)...

zeckensack
05-30-2004, 12:38 AM
Re texture memory of jwatte's model:
There's an upper bound, actually. You don't really need more texels than you have pixels in your viewport. If the view changes, you can discard whatever won't be visible anymore to make room for the new stuff.

Of course it's more clever to use a bit more space and do some caching. Say you have a 1600x1200 viewport and use ALPHA8 textures and settle for 3x "needed" plus full mipmaps, you end up with 3*1600*1200*1.3 bytes ~=7.5megs of texture memory. Peanuts ...

Stephen_H
05-31-2004, 12:09 PM
Do you draw them AT THE SAME TIME though?

You could set aside one big texture, or several smaller textures, for the text that's actually drawn during the current frame. You'd set aside more space for the formatted bitmaps in main RAM, which hopefully is more plentiful than VRAM.

Then, when rendering, you'd use an LRU cache of texture images; when you need to render a specific block of text, you see if it's already in a texture; if so, bind the texture, and put the texture first in the LRU list. If not, you take the texture last in the LRU list, and TexSubImage() your prepared data into it, and stick it first in the list.

Your LRU list should be bigger than the rendering needs of a single frame, for ideal frame rate :-) Just curious, regarding details of implementation...

Do you have any recommendations regarding schemes for packing multiple text strings into one texture? I imagine you could get quite complex with this... you basically have a "2D" texture cache. Packing a bunch of rectangular textures into a larger one isn't solvable in polynomial time... IIRC. Or do you prefer to keep each text string in a separate texture?

jwatte
06-01-2004, 12:43 PM
We keep each sub-texture power-of-two, and only pack single lines, so they're all the same height. You then end up with the problem of packing "words" or "lines" into a sequence of available lines, which is significantly easier.

And just because a problem is NP-complete to solve OPTIMALLY doesn't mean it's impossible, just that it's expensive if you want the optimal solution. But we don't shoot for optimal, just good enough.

IIRC, we split the texture into (height/fontheight) strips, each of which is managed as to what parts are drawn into. Then we do a first-fit scan of each strip/row to find one that has a hole big enough for the new piece of text we want to add.

yooyo
06-01-2004, 02:44 PM
@M.Mortier


Problem : big slowdown from 5000+ letters on, due to the high pixel fill, I think..(I want to be able to zoom in and out of the text), since
Can you tell us framerate? I just test my font render. Here is some bench result:

1280x1024@32bpp
GF Ti 4800Se
CPU: P4 2.8Ghz

20000+ chars, Tahoma - Regular 26pt
86 FPS

20000+ chars, Tahoma - Regular 42pt
62 FPS

6000+ chars, Tahoma - Regular 42pt
152 FPS

All chars are visible!
All chars are rendered using immediate mode.
No display lists, no vertex arrays.

I can't belive that you have so big slowdown. It maybe something in your code.

yooyo

Stephen_H
06-02-2004, 03:57 AM
We keep each sub-texture power-of-two, and only pack single lines, so they're all the same height. You then end up with the problem of packing "words" or "lines" into a sequence of available lines, which is significantly easier.

And just because a problem is NP-complete to solve OPTIMALLY doesn't mean it's impossible, just that it's expensive if you want the optimal solution. But we don't shoot for optimal, just good enough.

IIRC, we split the texture into (height/fontheight) strips, each of which is managed as to what parts are drawn into. Then we do a first-fit scan of each strip/row to find one that has a hole big enough for the new piece of text we want to add. Thanks! I was curious because I implemented something similar a while back also. I basically did things pretty much the same.

I split up a large texture into rows of fontHeight. Each row could only be divided in half, and then those half-pieces in half... etc, producing a kind of "binary-buddy" style tree. My hope was that by "allocating" chunks that were only powers of 2 long, I could reduce the fragmentation of the texture. For each sub chunk size (1, 2, 4, ... 1024, 2048) I kept a free list. If a chunk wasn't available of that size, I'd start splitting down the larger ones. When a chunk gets freed, I'd check if it can be joined with its neighbor.