-
Direct acces to tetxure memory & drawing buffers
I think it would be very usefull to have an openGL command that returns a pointer to the adress of the front and back buffers as well as an especific textures.
This way you coud alter the contens of its memory areas with DirectDraw on PC or Quickdraw on MAC without having to move memory blocks.
At the end all this is just video memory.
-
Senior Member
OpenGL Guru
Re: Direct acces to tetxure memory & drawing buffers
This is a bad idea as it'll lock the hardware vendors to a certain memory mapping. Lost freedom => less optimization possibilities.
Direct memory access is going to be removed from DX too. From DX8 and on it won't be supported anymore.
-
Re: Direct acces to tetxure memory & drawing buffers
I beg to disagree. For instance, let's
assume we want to texturemap HDTV video.
With OpenGL, the standard technique is
to use glTexSubImage2D to introduce the
HDTV image into the texture. Unfortunately,
the overhead of glTexSubImage2D can be
very large and strongly reduces
performance.
I think that we definitely need a "backdoor"
into the graphics memory as it can
drastically improve performance with
streaming textures or very large textures.
We're talking about an order of magnitude
speed difference between the standard
OpenGL path and direct texture access!
-
Advanced Member
Frequent Contributor
Re: Direct acces to tetxure memory & drawing buffers
I'm with Humus on this one. Front buffer, MAYBE, though I suspect it'll raise merry hell with multi-frame pipelining. Backbuffer, no way. There was an interesting Carmack post on /. a while back, making a good argument for using blits for page swaps, and one of the reasons was that a linear addess space was NOT necessarily the best format for a backbuffer.
I agree that a faster way of putting data from other video sources would be useful, but this should be an API call hiding the implementation, not exposing a raw pointer. I'd be very surprised if the Khronos OpenML initiative didn't include something like this as an OpenGL ext.
-
Re: Direct acces to tetxure memory & drawing buffers
I think that you guys are right in the way you see that this will open an endless number of new things to care of ( memory mapping schemes, API implementations e.t.c ) but just think that the only I'm asking for is a way to know where in the video memory is my data, I'm not asking about any particular ordering-structure of video memory because I think this is the easiest part to deal with (pointer math).
This could be a feature that maybe not all the people needs, but in the other hand it's very easy to implement & is not going to affect any ather OpenGL functionality.
Just a pointer to the data that's all I need !
In the same way we search on the internet for sample code & docs about advanced rendering techniques.
We could search info about the specific way a video card is storing our data.
(If we need to )
Shure this is not good for games or for other aplications designed for running on a million different hardware configurations but I'm shure there's a lot of people out there that are using OpenGL for very specific projects other than games with very specific hardware configurations other than the standard consumer PC.
If the development of an API is going to be always linked to the standard people needs then this is going to happen very slow.
As KZ says a "backdoor" is definitely needed.
-
Re: Direct acces to tetxure memory & drawing buffers
There are a lot of very good reasons to not do this. The first and simplest is that it breaks the pipelined model for 3D -- once you start writing directly to video memory, you have to sync with the chip to make sure you don't write on top of memory it's using.
This feature has caused so many problems in D3D that DX8 has eliminated it.
- Matt
-
Re: Direct acces to tetxure memory & drawing buffers
I think mcraighead's opinion is shared by most card vendors - unfortunately. I've
had e-mail exchanges with various vendors trying to convince them of the need for direct texture access - and didn't get anywhere except for one vendor who provided
the direct texture access we needed.
As a result, our application runs *much*
faster on an 18 months old graphics card
compared to any consumer card currently
on the market.
Of course, SGI's O2 also provided direct
texture access via the dmbuffer mechanism
and I wouldn't be surprised that OpenML
will propose a similar extension.
Frankly, I'm surprised at the vendors' resistance to
direct texture access ("don't touch hardware, we'll do a better job") while
practice proves otherwise. At the same
time, vendors are introducing proprietary
OpenGL extensions that are very low-level and address specific hardware issues (eg NVIDIA's vertex-in-AGP-memory and fence extensions).
Also note that any direct memory access
doesn't necessary imply that vendors would
be locked into a specific memory layout.
Actually, it would be fairly simple to standarize a mechanism that would the application to inquire how the internal graphics memory layout is organized; the
application can then generate the texture
info in the vendor's proprietary memory
layout directly, thus bypassing the terribly
expensive glTexSubImage2D.
Come on, guys. Give me that memory pointer,
I'll put it to good use.
[This message has been edited by kz (edited 11-10-2000).]
-
Re: Direct acces to tetxure memory & drawing buffers
The short answer is, "no".
The long answer...
The fact that DX8 has _eliminated_ this feature really casts doubt on it. It has caused an absolutely huge number of problems in D3D!
I see two major areas where people might be asking for this access:
- framebuffer access
- texture access
For framebuffer access, use DrawPixels and ReadPixels. Both are fast on GeForce. If it's not fast, we can optimize it; there is no theoretical reason it would need to be slow, certainly.
For texture access, we are working on ways that texture downloads can be cheaper. There is no inherent limitation that causes TexSubImage to offer poorer performance than directly writing to video memory. There _are_ Windows platform restrictions that make doing this correctly very difficult.
Once we offer video memory pointers to apps, Bad Things can happen quickly. We have to sync the hardware before any such pointer is usable, which kills performance. We have to take some kind of system-wide mutex so that apps don't stomp on top of each other if we decide to reorganize video memory.
We _cannot_ give you a pointer to the start of your framebuffer without taking a system-wide OS mutex. What happens if you move the window? The part of the framebuffer used by your app moves with it. That means we have to take a mutex that prevents any window events from occurring. In turn, this means that if you take the lock and never release it, the system will hang. Even if you take it for, say, a second, the system will suddenly become very unresponsive to input. NT does a better job than 9x here, but not good enough for us to trust apps. In fact, in certain ways, it is worse on NT, to the point where it may not be safe to do this at all.
Finally, direct writes to video memory are actually _not fast_ on most PC platforms today. In fact, this is the "Fast Writes" feature that some of you have heard about. Without that feature, writing to video memory directly is _much slower_ than writing to AGP and then pulling from AGP. (which only the driver is in a position to do). Even where they are implemented, in many cases, there are motherboard and chipset bugs that break things pretty quickly. Also, they only work for sequential writes (just like AGP write combining), and apps that read from video memory directly (don't laugh, lots of old [and new] DirectX apps do this) get absolutely horrendous performance, since CPU readbacks over the AGP bus are absolutely disgustingly slow, and video memory is uncached.
The reason this works for vertex array range is that video memory vertices are best reserved for static vertex data. In fact, we specifically recommend AGP instead for dynamic data. Also, the synchonization hazards for vertex data are much simpler than those for framebuffer data -- vertex data is read only, and we have spun off the synchronization problem to the app (NV_fence), and there is no way that vertex data can get asynchronously relocated like framebuffer memory can.
There are genuine problems with the current situation, and we are working to solve them, but unfortunately there are platform limitations and there's only so much time in a day. Furthermore, none of these limitations are inherent OpenGL limitations.
Offering these pointers (either FB or texture) to apps opens up the biggest Pandora's Box in all of graphics programming. Microsoft did it, and they regretted that decision for years. I refuse to make that mistake again with OpenGL.
- Matt
-
Re: Direct acces to tetxure memory & drawing buffers
Thanks for the response, Matt.
> The fact that DX8 has _eliminated_ this >feature really casts doubt on it.
I never used DX8, but we shouldn't confuse "feature" and "implementation".
> For texture access, we are working on ways > that texture downloads can be cheaper.
Promises, promises ;-) At least good to hear
that you are working on it.
>There is no inherent limitation that causes >TexSubImage to offer poorer performance >than directly writing to video memory. >There _are_ Windows platform restrictions >that make doing this correctly very >difficult.
Typically, the user would load a texture
with linear memory layout using glTexSubImage2D; glTexSubImage2D typically
makes a copy of the texture and usually
does some reordering of texels to match the
internal memory layout. From a developer's
point of view, even with reduced overhead
for glTexSubImage2D it doesn't make sense
to first have to stream texture into main
memory and then have OpenGL copy/reformat
it again. I'd rather stream texture in the
proprietary memory layout directly onto the
card.
>We have to take some kind of system-wide >mutex so that apps don't stomp on top of >each other if we decide to reorganize video >memory.
What's wrong with a glTexLock() function
assuming a glBindTexture of a resident
texture?
> We _cannot_ give you a pointer to the start of your framebuffer without taking a system-wide OS mutex.
Well, not my problem ;-) I'm only interested
in texture.
> Finally, direct writes to video memory are > actually _not fast_ on most PC platforms > today.
I get very decent performance with AGP2x
without fast writes. Of course, it required
careful optimization (non-temporal writes)
as direct texture access is a very low-level
feature. But just like C++, by empowering
users you also give themselves the rope to
hang themselves with....
> Offering these pointers (either FB or >texture) to apps opens up the biggest >Pandora's Box in all of graphics >programming. Microsoft did it, and they >regretted that decision for years. I >refuse to make that mistake again with >OpenGL.
Hmmm. I think you should pay more attention
to your customers here. 30+ Millions triangles, fantastic fillrates simply don't
mean squat to me if I can't move texture
fast enough onto the card. The current
glTexSubImage2D speeds is to low; either
optimize the driver much more, provide
alternative approaches (count me in for
beta test) or give me that pointer - your
competition could do it, why can't you? ;-)
-
Re: Direct acces to tetxure memory & drawing buffers
Well, the number of reads and writes varies based on several factors, but the "traditional" TexSubImage data flow is as follows:
App reads data off disk
App writes data into buffer
Driver reads data out of buffer
Driver writes data into internal buffer
Driver reads data out of internal buffer
Driver sends data to HW (somehow or another)
Note that "Driver sends data to HW" may or may not involve direct writes to video memory. That's one way to implement it, but not the only way.
That internal buffer is a result of a Windows platform limitation. We'd like to eliminate it, but it may be a long-term goal. This would eliminate a read and write.
You can already get noticeable speedups in many cases by matching your data format with the HW data format. Here are our optimal matchups for internal format, format, and type:
GL_RGB5: GL_UNSIGNED_SHORT_5_6_5/GL_RGB
GL_RGB8: GL_UNSIGNED_BYTE/GL_BGRA (avoid 3-byte data types, even at the cost of padding)
GL_RGBA4: GL_UNSIGNED_SHORT_4_4_4_4_REV/GL_BGRA
GL_RGBA8: GL_UNSIGNED_BYTE/GL_BGRA
Another obvious speedup is texture compression, if you can use it. It helps in every step of the download process, assuming of course that you have the texture stored in compressed format on disk.
Another likely speedup (I haven't ever tried it, but it _should_ work) is to use a file mapping and to pass us the pointer to the file mapping instead of reading from disk yourself. That saves a temporary buffer _and_ a copy.
So if you did that, and we got rid of our temporary buffer, the dataflow would be:
Driver reads data off disk
Driver sends data to HW
No inefficient extra copies at all! So it can be done without any pointers at all. We could even put in prefetches that would make sure we were overlapping IDE and graphics.
Now, if only we could get graphics to DMA directly from IDE... hmmm... 
[Actually, that might not be impossible, although it would require some heavy-duty kernel hacking, I'm guessing.]
- Matt
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules