Hi,Aleksandar, I have remove the loop and add a trivial rendering, and move the codes for measuring to make them contain exactly the glTexSubImage() func,like this:
...
Type: Posts; User: robotech_er
Hi,Aleksandar, I have remove the loop and add a trivial rendering, and move the codes for measuring to make them contain exactly the glTexSubImage() func,like this:
...
The problem is partially resolved by reducing the write operations as much as possible. The relationship of performance and number of write operations is not linear, at least on my 570GTX is not...
Thank you for the detailed reply, Aleksandar. I should have post a more complete question .
For the first question, I have tested the "normal" glTexImage2D() ( not glTexSubImage2D(), Do the two...
thanks, aqnuep, appreciate your help. I'll continue to try.
Thank you, aqnuep. looks like i have a lot to learn.
Could you direct me some resources/links about how to make this perfect synthetic test, or in other words, how to get the peak performance?...
Thanks for the reply.
The timings above exclude the CPU time, actually, just the GPU time consumed for uploading. So i think it should be faster.
In this thread,...
Hi,
I uploads textures using the PBO skill, got a data transfer speed of ~3.5GB/s on GTX570, and ~2.5 GB/s on GTX670. The timings are performed by ARB_time_query.
These speeds seem slow, since...
Sorry for failing to describe the question exactly.
The shader is an image decompression shader. Volume data is stored as integer textures.
An image is divided into many small patches, and these...
Thanks for the reply.
I am trying to design a volume render. The problem is that the data amount is huge, about over 80 GB. Thus, I implemented a paging system to handle the data access. This...
The functions imageload() and imagestore() seem very slow. imageload() is about 60% slower than texelfetch() through my test. is there any tricks that can accelerate these two functions, in...
I also encountered this problem several days ago, and spent several days to optimize the algorithm to minimize registers usage. Occasionally I found that nv had updated the driver(301.24,beta) for...
May I ask you how? [/QUOTE]
Sorry, Aleksandar, actually, i do not know how.
I read this idea here http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=285231
"In...
It seems that this "dual copy engine" has been disabled in the OpenGL driver for the geforce. But it is available for CUDA. I will try to use CUDA to transfer data. This feature is useful when...
I am looking for a method for loading resources asynchronously. Many posts direct me to this page Opengl PBO . for CPUs, this method is a DMA method since it does not take CPU clocks. But the...
thanks, thokra, you are right. A quad-tree node can be triangulated in various methods, I just used the most native one, where one node needs up to 8 triangles.
Much thanks for all of you, especially Dark Photon & Aleksandar ,I have learned much from your excellent comments.
One more thing confuses me. before jumping into the TINs which seems a little...
yes, it is kind of old. but can be used in the pre-process stage, where the terrain blocks can be simplified as much as possible to reduce the amount of vertices.
Hi,all. I am looking for an implementation of a TIN(Triangulated Irregular Network) algorithm, namely Smooth view-dependent level-of-detail control and its application to terrain rendering, which...
hi,Aleksandar. I have not read the papers you directed yet. I will read them later.
I once implemented one ray-casting method from a paper "gpu ray-casting for scalable terrain...
@Photon, Thanks very much for your detailed explanation, approaches like TIN maybe too complex for me. I will check some methods using regular grids.
Thanks for your replies, they helps a lot.
Sorry for replying so late because i need some time to learn more for a more meaningful question. :)
@Dark Photon :
What I want to do is to...
I know when rendering terrain, terrain is usually equal-sized tiled, and a vbo with the same size is prepared. the tiles contain height values only. at runtime, this vbo is rendered and a vertex...
make a series of mipmap of the original tex, and use a pixel shader to find the maximum value of adjacent 4 pixels in the finer mip level and output the result to the next coarser level, repeat it up...
hi,everyone, On GPU, each thread can use a certain number of registers, but uses too much regs in a single thread would hurt the parallelism so as to cause performance loss( one example of this...
thanks,Onumis,I will check cuda interop later.