[QUOTE=likethe265;1257122]Can I understand as:
- CPU has its own limit of on sending the call order to GPU over the time. For example, during the 1/60 s time interval, only around 6000 draw call can be excuted and moreover this is based on CPU doing nothing but only sending the OPENGL command.[/QUOTE]
Not “nothing”. The CPU (via the driver) performs a lot of “prep work” to get ready for issuing draw calls, often termed validation. For instance, pushing needed buffers/textures onto the GPU which haven’t already been uploaded, pinning CPU-side memory buffers, activating new shader programs, uploading uniform blocks, binding textures, resolving block handle/offset pairs into GPU addresses, etc. along with performing all manor of “is this state valid” checks.
The amount of this work varies by driver, draw submission method, and the number of intervening state changes, and even with all that held constant, consumes different amounts of time depending on specific CPU and CPU memory speeds involved, so any “max N draw calls/frame” heuristic you might come up with is going to vary based on the system and the specifics of your example.
In any case, with smaller and smaller batches (with the above factors held constant), you do hit a point where you are CPU bound submitting batches and not GPU bound. You’d of course like to avoid this case because you could be rendering more or more interesting content with that time.
- The intention of optimization of caching is to reduce the batch size and make the most use of GPU.
Not to reduce the batch size, no. Several types of caching were discussed above. I’ll assume you meant vertex cache optimization. The reason for optimizing that is it means fewer total vertices need to be transformed by the GPU to render your mesh.
Background: At any point in your frame, you’ll be bottlenecked on something. Previously, we were discussing what would make you CPU bound (the GPU is somewhat academic here). However, in cases where you are GPU bound (what you want), you could be bottlenecked on a few things. One of these things is vertex transform rate. There’s a limit to how fast GPUs can transform vertices, and if you feed the GPU properly, you can hit this limit. If you are vertex transform bound, and if you can reduce the number of vertices the GPU has to transform to render your mesh, then you can speed up the time required to render your mesh. That’s the point of vertex cache optimization.
- GPU and CPU processing can be deemed as parallel because CPU can send the next batch when GPU is processing the last batch. But in most case, the GPU is almost hungry.
In an application written without regard to performance, perhaps. But once optimized this is typically not the case. You do figure out how to “keep the GPU busy” without wasting too much time on state changes or irrelevant content while still meeting your frame rate requirements.