PDA

View Full Version : Android terrible frames, but only running 2 draw calls



tommohawkaction
07-12-2017, 05:45 AM
Hello, I really do not understand why I am getting really bad frames on Android
I am using Libgdx, and using 3D. I have two textures, not including the font texture so basically 3.
The game on android runs at about 50 fps however it dips down to 20 fps which makes the game feel really horrible... and this has no logic only rendering
and on Desktop with no vSync enabled I get 5500 fps, but thats with a GTX 1080 and I7 6700K

Here is a screen shot of the game running
http://i63.tinypic.com/1g1zzr.png

I render everything the whole map in one mesh (batch)
The main texture sheet is 256x256 and I use it as a sprite sheet for 16x16 textures
The shader has point and directional light which has a max of 15 lights per vertex

I did have other entities running which are dynamic but i stored them all into one batch, however when i removed it the performance was only a tad better

I really have no idea why my frames a dipping when there is nothing much going on, the phone I tested with was the Samsung Galaxy Ace 2 (Can run Dead Trigger and Gta) I also tried the Nexus 7 which i get even worse frames, yet the Nexus 7 is better??

Render Code (Without uniforms)

mesh.bind(shader);
{
Gdx.gl.glDrawElements(GL20.GL_TRIANGLES, mesh.getNumIndices(), GL20.GL_UNSIGNED_SHORT, 0);
}
mesh.unbind(shader);


#ifdef GL_ES
precision mediump float;
precision mediump int;
#endif

attribute vec4 a_position;
attribute vec4 a_color;
attribute vec2 a_texCoord0;
attribute vec3 a_normal;

uniform mat4 u_projTrans;
uniform mat4 transformMatrix;

varying vec4 v_color;
varying vec2 v_texCoords;

uniform vec3 directionLight;
uniform vec4 directionLightColour;
uniform int useDirectionalLight;

const int AMOUNTOFLIGHTS = 15;

uniform vec3 pLightPosition[AMOUNTOFLIGHTS];
uniform vec4 pLightColour[AMOUNTOFLIGHTS];
uniform vec3 pAttenuation[AMOUNTOFLIGHTS];
uniform int pAmount;

varying vec3 lightDiffuse;

void main(){
v_color = a_color;
v_texCoords = a_texCoord0;

vec4 worldPos = transformMatrix * a_position;
vec3 unitNormal = normalize((transformMatrix * vec4(a_normal,0.0)).xyz);

lightDiffuse = vec3(0.0);

for(int i = 0;i < pAmount;++i){
vec3 lightDir = pLightPosition[i] - worldPos.xyz;
float distance = length(lightDir);
float attFactor = pAttenuation[i].x + (pAttenuation[i].y * distance) + (pAttenuation[i].z * distance * distance);
vec3 unitLightVector = normalize(lightDir);

float nDot = dot(unitNormal,unitLightVector);
float brightness = max(nDot,0.0) / attFactor;
vec3 light = brightness * pLightColour[i].rgb;
lightDiffuse += light;
}

if(useDirectionalLight == 1){
float nDot = dot(unitNormal,-directionLight);
float brightness = max(nDot,0.0);
vec3 diffuse = brightness * directionLightColour.rgb;
lightDiffuse += diffuse;
}

lightDiffuse = max(lightDiffuse,0.4);

gl_Position = u_projTrans * worldPos;
}

FRAGMENT

#ifdef GL_ES
precision mediump float;
precision mediump int;
#endif

varying vec4 v_color;
varying vec2 v_texCoords;

uniform sampler2D u_texture;

uniform vec4 fogColour;
uniform float fogThickness;

varying vec3 lightDiffuse;

void main(){
vec4 texColour = texture2D(u_texture, v_texCoords);
if(texColour.a < 0.5) discard;

vec4 outputFrag = vec4(lightDiffuse,1.0) * (v_color * texColour);

// FOG
float perspective_far = 12.0;
float fog_cord = (gl_FragCoord.z / gl_FragCoord.w) / perspective_far;
float fog = fog_cord * fogThickness;
gl_FragColor = mix(fogColour, outputFrag, clamp(1.0-fog,0.0, 1.0));

}

Batching Code

package com.hawk.tommo.game.graphics;

import java.nio.FloatBuffer;
import java.nio.ShortBuffer;

import com.badlogic.gdx.Gdx;
import com.badlogic.gdx.graphics.Color;
import com.badlogic.gdx.graphics.GL20;
import com.badlogic.gdx.graphics.Mesh;
import com.badlogic.gdx.graphics.VertexAttribute;
import com.badlogic.gdx.graphics.VertexAttributes.Usage;
import com.badlogic.gdx.graphics.g2d.TextureRegion;
import com.badlogic.gdx.graphics.glutils.ShaderProgram;
import com.badlogic.gdx.math.Matrix4;
import com.badlogic.gdx.math.Vector3;
import com.badlogic.gdx.utils.Disposable;
import com.hawk.tommo.game.entities.RenderableEntity;
import com.hawk.tommo.game.entities.RenderableModelEntity ;

public class EntityBatchRenderer implements Disposable {

static final int SIZE = 1000;

static final int POSITIONCOMPONENT = 3;
static final int NORMALCOMPONENT = 3;
static final int TEXTURECOMPONENT = 2;
static final int COLORCOMPONENT = 1;

final static int components = POSITIONCOMPONENT + NORMALCOMPONENT + TEXTURECOMPONENT + COLORCOMPONENT;

private Mesh mesh;
private float[] vertices;
private short[] indices;
private int lastVerticesCount, lastIndicesCount, lastIndiceID;

public EntityBatchRenderer() {
lastVerticesCount = 0;
lastIndicesCount = 0;
vertices = new float[SIZE * components];
indices = new short[SIZE * 3];

mesh = new Mesh(false, vertices.length, indices.length,
new VertexAttribute(Usage.Position, POSITIONCOMPONENT, "a_position"),
new VertexAttribute(Usage.Normal, NORMALCOMPONENT, "a_normal"),
new VertexAttribute(Usage.TextureCoordinates, TEXTURECOMPONENT, "a_texCoord0"),
new VertexAttribute(Usage.ColorPacked, COLORCOMPONENT + 3, "a_color"));

}

static Vector3 tmp = new Vector3();

public void render(RenderableEntity entity) {
this.render(entity.getMesh(), entity.getTransform(), entity.getColor(), entity.getTextureRegion());
}

public void render(RenderableModelEntity entity) {
this.render(entity.getMesh(), entity.getTransform(), entity.getColor(), null);
}

public void render(Mesh mesh, Matrix4 transform, Color color, TextureRegion tex) {
FloatBuffer vBuffer = mesh.getVerticesBuffer();
ShortBuffer iBuffer = mesh.getIndicesBuffer();
int vLimit = vBuffer.limit();
int iLimit = iBuffer.limit();
if (lastVerticesCount + vLimit < vertices.length) {
float floatColor = (color != null) ? color.toFloatBits() : -1;

for (int i = lastVerticesCount; i < vLimit + lastVerticesCount; i += components) {
tmp.set(vBuffer.get(i - lastVerticesCount), vBuffer.get(i + 1 - lastVerticesCount),
vBuffer.get(i + 2 - lastVerticesCount));
if (transform != null)
tmp.mul(transform);
vertices[i] = tmp.x;
vertices[i + 1] = tmp.y;
vertices[i + 2] = tmp.z;
vertices[i + 3] = vBuffer.get(i + 3 - lastVerticesCount); // n
vertices[i + 4] = vBuffer.get(i + 4 - lastVerticesCount); // n
vertices[i + 5] = vBuffer.get(i + 5 - lastVerticesCount); // n
if (tex != null) {
vertices[i + 6] = vBuffer.get(i + 6 - lastVerticesCount) + tex.getU(); // t
vertices[i + 7] = vBuffer.get(i + 7 - lastVerticesCount) + tex.getV(); // t
} else {
vertices[i + 6] = vBuffer.get(i + 6 - lastVerticesCount); // t
vertices[i + 7] = vBuffer.get(i + 7 - lastVerticesCount); // t
}
if (floatColor != -1) {
vertices[i + 8] = floatColor;
} else {
vertices[i + 8] = vBuffer.get(i + 8 - lastVerticesCount); // t
}
}

int largest = Short.MIN_VALUE;
for (int i = lastIndicesCount; i < iLimit + lastIndicesCount; ++i) {
indices[i] = (short) (iBuffer.get(i - lastIndicesCount) + lastIndiceID);
largest = indices[i] > largest ? indices[i] : largest;
}

lastVerticesCount += vLimit;
lastIndicesCount += iLimit;

lastIndiceID = largest + 1;
} else {
Gdx.app.log("ERROR", "Can't render anymore, need bigger batch size!");
}
}

public void flush(ShaderProgram shader) {
mesh.setVertices(vertices);
mesh.setIndices(indices);
mesh.bind(shader);
{
Gdx.gl.glDrawElements(GL20.GL_TRIANGLES, lastIndicesCount, GL20.GL_UNSIGNED_SHORT, 0);
}
mesh.unbind(shader);

lastVerticesCount = 0;
lastIndicesCount = 0;
lastIndiceID = 0;
}

@Override
public void dispose() {
mesh.dispose();
}

}

Silence
07-12-2017, 06:06 AM
I'm not an expert at all on mobile development.
But I would highly suggest you to remove conditions and loops in your shaders. They are well known to slow things down on desktop GC. So it is most probably even worse on mobile GPUs.

Also:


Gdx.gl.glDrawElements(GL20.GL_TRIANGLES, mesh.getNumIndices(), GL20.GL_UNSIGNED_SHORT, 0);

how many faces do you render ? What is the maximum expected by your driver ?
Also, you might have a lot of occlusion, which could slow things down too.

tommohawkaction
07-12-2017, 06:19 AM
I'm not an expert at all on mobile development.
But I would highly suggest you to remove conditions and loops in your shaders. They are well known to slow things down on desktop GC. So it is most probably even worse on mobile GPUs.

Also:


how many faces do you render ? What is the maximum expected by your driver ?
Also, you might have a lot of occlusion, which could slow things down too.

Hello thanks for replying so quickly!
How would i do lighting if I had to remove loops?

Well for faces It depends because I mainly render billboards so its simple and fast, but sometimes i render a cube in there.
But for the world I don't have any extra faces only the faces in the world for example

http://i63.tinypic.com/2ce3k2o.png

tommohawkaction
07-12-2017, 06:41 AM
Its really annoying as you might think 30fps on android is fine but when i play my game, it freezes and the players head shoots into the sky as im using head bobbing
camera.position.y += bob_y * Gdx.graphics.getDeltaTime();
I really want to finish this game as i never finish a game because of bad performance

GClements
07-12-2017, 09:18 AM
The main things that are likely to improve your frame rate are:


Reduce the number of polygons rendered, i.e. don't render the entire map, only the portion that's reasonably likely to be visible. Perform broad-phase frustum culling and occlusion culling on the geometry you submit for rendering.

Render polygons in (approximately) front-to-back order, to benefit from early depth tests. If a block of fragments fail the depth test, there's no need to execute the fragment shader, which in turn means no texture lookups. Conversely, if you render from back to front, overdrawn fragments waste processing power.

tommohawkaction
07-12-2017, 11:06 AM
The main things that are likely to improve your frame rate are:


Reduce the number of polygons rendered, i.e. don't render the entire map, only the portion that's reasonably likely to be visible. Perform broad-phase frustum culling and occlusion culling on the geometry you submit for rendering.

Render polygons in (approximately) front-to-back order, to benefit from early depth tests. If a block of fragments fail the depth test, there's no need to execute the fragment shader, which in turn means no texture lookups. Conversely, if you render from back to front, overdrawn fragments waste processing power.


Hello GClements, thanks for the reply. If i have the map already in a static mesh how would i do the front to back culling? Also when I get low fps my players head bobbing goes insane like going out of the map xD. Heres my code what have I done wrong?


private float bob_y;
private float bob;

void headBobbing(boolean moved) {
if (moved) {
bob += Gdx.graphics.getDeltaTime() * ((isInLiquid()) ? 15 : 30);
bob_y = MathUtils.sin(bob) / 15f;
camera.position.y += bob_y * Gdx.graphics.getDeltaTime();
}
}

void breathing(boolean moved) {
if (!moved) {
bob += Gdx.graphics.getDeltaTime() * 5;
bob_y = MathUtils.sin(bob) / 50f;
camera.position.y += bob_y * Gdx.graphics.getDeltaTime();
}
}

Silence
07-12-2017, 11:26 AM
If i have the map already in a static mesh how would i do the front to back culling?

There are many algorithms known as space partitioning (https://en.wikipedia.org/wiki/Space_partitioning). BSP are well-known and are a lot used since Doom 1 and even now. Quadtrees are more easy to integrate and give very nice results. Many games use variants of the latter.
What GClements said is really important on all platforms, and mainly on mobile where the power of the GPU are not the same than on desktops.

But if your map is very sparse and consists of few polygons as we could deduce from your previous posts, then maybe your issue lies somewhere else.
30 fps might not be as bad, you might also be stuck due to any vsync.

Also, it's not easy to understand what were your intents here:


public void render(Mesh mesh, Matrix4 transform, Color color, TextureRegion tex) {
FloatBuffer vBuffer = mesh.getVerticesBuffer();
ShortBuffer iBuffer = mesh.getIndicesBuffer();
int vLimit = vBuffer.limit();
int iLimit = iBuffer.limit();
if (lastVerticesCount + vLimit < vertices.length) {
float floatColor = (color != null) ? color.toFloatBits() : -1;

for (int i = lastVerticesCount; i < vLimit + lastVerticesCount; i += components) {
tmp.set(vBuffer.get(i - lastVerticesCount), vBuffer.get(i + 1 - lastVerticesCount),
vBuffer.get(i + 2 - lastVerticesCount));
if (transform != null)
tmp.mul(transform);
vertices[i] = tmp.x;
vertices[i + 1] = tmp.y;
vertices[i + 2] = tmp.z;
vertices[i + 3] = vBuffer.get(i + 3 - lastVerticesCount); // n
vertices[i + 4] = vBuffer.get(i + 4 - lastVerticesCount); // n
vertices[i + 5] = vBuffer.get(i + 5 - lastVerticesCount); // n
if (tex != null) {
vertices[i + 6] = vBuffer.get(i + 6 - lastVerticesCount) + tex.getU(); // t
vertices[i + 7] = vBuffer.get(i + 7 - lastVerticesCount) + tex.getV(); // t
} else {
vertices[i + 6] = vBuffer.get(i + 6 - lastVerticesCount); // t
vertices[i + 7] = vBuffer.get(i + 7 - lastVerticesCount); // t
}
if (floatColor != -1) {
vertices[i + 8] = floatColor;
} else {
vertices[i + 8] = vBuffer.get(i + 8 - lastVerticesCount); // t
}
}

int largest = Short.MIN_VALUE;
for (int i = lastIndicesCount; i < iLimit + lastIndicesCount; ++i) {
indices[i] = (short) (iBuffer.get(i - lastIndicesCount) + lastIndiceID);
largest = indices[i] > largest ? indices[i] : largest;
}

lastVerticesCount += vLimit;
lastIndicesCount += iLimit;

lastIndiceID = largest + 1;
} else {
Gdx.app.log("ERROR", "Can't render anymore, need bigger batch size!");
}
}

tommohawkaction
07-12-2017, 11:55 AM
There are many algorithms known as space partitioning (https://en.wikipedia.org/wiki/Space_partitioning). BSP are well-known and are a lot used since Doom 1 and even now. Quadtrees are more easy to integrate and give very nice results. Many games use variants of the latter.
What GClements said is really important on all platforms, and mainly on mobile where the power of the GPU are not the same than on desktops.

But if your map is very sparse and consists of few polygons as we could deduce from your previous posts, then maybe your issue lies somewhere else.
30 fps might not be as bad, you might also be stuck due to any vsync.

Also, it's not easy to understand what were your intents here:


public void render(Mesh mesh, Matrix4 transform, Color color, TextureRegion tex) {
FloatBuffer vBuffer = mesh.getVerticesBuffer();
ShortBuffer iBuffer = mesh.getIndicesBuffer();
int vLimit = vBuffer.limit();
int iLimit = iBuffer.limit();
if (lastVerticesCount + vLimit < vertices.length) {
float floatColor = (color != null) ? color.toFloatBits() : -1;

for (int i = lastVerticesCount; i < vLimit + lastVerticesCount; i += components) {
tmp.set(vBuffer.get(i - lastVerticesCount), vBuffer.get(i + 1 - lastVerticesCount),
vBuffer.get(i + 2 - lastVerticesCount));
if (transform != null)
tmp.mul(transform);
vertices[i] = tmp.x;
vertices[i + 1] = tmp.y;
vertices[i + 2] = tmp.z;
vertices[i + 3] = vBuffer.get(i + 3 - lastVerticesCount); // n
vertices[i + 4] = vBuffer.get(i + 4 - lastVerticesCount); // n
vertices[i + 5] = vBuffer.get(i + 5 - lastVerticesCount); // n
if (tex != null) {
vertices[i + 6] = vBuffer.get(i + 6 - lastVerticesCount) + tex.getU(); // t
vertices[i + 7] = vBuffer.get(i + 7 - lastVerticesCount) + tex.getV(); // t
} else {
vertices[i + 6] = vBuffer.get(i + 6 - lastVerticesCount); // t
vertices[i + 7] = vBuffer.get(i + 7 - lastVerticesCount); // t
}
if (floatColor != -1) {
vertices[i + 8] = floatColor;
} else {
vertices[i + 8] = vBuffer.get(i + 8 - lastVerticesCount); // t
}
}

int largest = Short.MIN_VALUE;
for (int i = lastIndicesCount; i < iLimit + lastIndicesCount; ++i) {
indices[i] = (short) (iBuffer.get(i - lastIndicesCount) + lastIndiceID);
largest = indices[i] > largest ? indices[i] : largest;
}

lastVerticesCount += vLimit;
lastIndicesCount += iLimit;

lastIndiceID = largest + 1;
} else {
Gdx.app.log("ERROR", "Can't render anymore, need bigger batch size!");
}
}


Ah okay, whats going on there is my version of batch, so when I want to render an object i call that function and that will put the object vertices into the main mesh at the next index, which after all rendering has been done, I will call flush which will draw the batch. Also the post above I mentioned my head bobbing whats up with it?... I really appreciate the support here

Dark Photon
07-12-2017, 05:56 PM
The main things that are likely to improve your frame rate are:


Reduce the number of polygons rendered, i.e. don't render the entire map, only the portion that's reasonably likely to be visible. Perform broad-phase frustum culling and occlusion culling on the geometry you submit for rendering.
Render polygons in (approximately) front-to-back order, to benefit from early depth tests. If a block of fragments fail the depth test, there's no need to execute the fragment shader, which in turn means no texture lookups. Conversely, if you render from back to front, overdrawn fragments waste processing power.



On mobile GPUs and drivers, these two are pretty far down the list. If the developer doesn't clearly understand how mobile GPUs function (hint: very differently from desktop GPUs), they're much more likely to trip over basic usage issues that trigger implicit synchronization and unbounded GPU memory growth in the GL driver. These are issues that you just won't see on a decent desktop GPU driver in most cases (though you might see a few exhibiting very minor effects if you're optimizing for GL driver read-ahead performance on a desktop GPU).

Issues like:


App blocking on buffer object updates, cutting your frame rate to 1/2 or 1/3 of the VSync rate.
Texture ghosting on texture subloads causing poor performance and significant additional GPU memory consumption, potentially terminating your app.
A number of operations including use of sync objects triggering full pipeline flushes, causing rendering artifacts and massive performance slowdowns.
Needless full framebuffer-sized reads/writes from/to slow DRAM which is horribly slow.
etc.


OpenGL is an ok-to-sometimes-poor abstraction for desktop GPUs, but the abstraction (OpenGL ES) is even worse for mobile GPUs. Both require special TLC to influence driver internals that you can't directly control but must sometimes know about.

Vulkan is much better abstraction for both classes of GPU. Unfortunately, it's much less approachable for beginners.

Dark Photon
07-12-2017, 06:53 PM
Hello, I really do not understand why I am getting really bad [frame rate] on Android ... The game on android runs at about 50 fps however it dips down to 20 fps which makes the game feel really horrible ... I render everything the whole map in one mesh (batch) ...

I really have no idea why my frames a dipping when there is nothing much going on,

Ok. What are you changing that causes the performance to dip?

In other words, steady-state (no movement, same GL commands sent to the driver each frame, same contents of VBOs each frame), it will likely have a steady-state frame rate. Is that 50 fps? What are you doing that causes it to dip to 20 fps?


the phone I tested with was the Samsung Galaxy Ace 2 (Can run Dead Trigger and Gta) I also tried the Nexus 7 which i get even worse frames, yet the Nexus 7 is better??

I was going to ask: which GPUs? This implicitly answers it: ARM Mali-400 MP (circa 2008; OpenGL ES 2.0) for the Samsung Galaxy Ace 2.

And either an Adreno 320 (circa 2012; OpenGL ES 3.0-capable) or NVidia Tegra 3 (circa 2012; OpenGL ES 2.0), depending on whether you have a Nexus 7 (2013) or Nexus 7 (2012), respectively. Which do you have?


The shader has point and directional light which has a max of 15 lights per vertex

To quicky nail down the cause of the performance problem, I'd temporarily replace your vertex shader with a very simple one that just computes and writes gl_Position, and replace your frag shader with a very simple one that just writes out red as the gl_FragColor.

Any difference in perf? If not, move on...


I have two textures, not including the font texture so basically 3. ... The main texture sheet is 256x256 and I use it as a sprite sheet for 16x16 textures

Are these created and uploaded to once before your render loop begins, and not changed in your render loop? If so, move on...


mesh = new Mesh(false, vertices.length, indices.length, ...);

What are the values of vertices.length and indices.length for your two meshes? If not insanely large, move on...

(You might be surprised how large these can be on mobile and still get good performance if you drive the GPU properly.)


Render Code (Without uniforms)

mesh.setVertices(vertices);
mesh.setIndices(indices);
mesh.bind(shader);
Gdx.gl.glDrawElements(GL20.GL_TRIANGLES, mesh.getNumIndices(), GL20.GL_UNSIGNED_SHORT, 0);
mesh.unbind(shader);


This is actually the first thing that caught my eye as a performance problem.

The "0" for the pointer argument suggests you are using a VBO as the source of your index buffer (and I suspect you're also using VBO(s) as the source of your vertex arrays). It also indicates that you are changing the contents of those VBOs every frame.

Doing it the way you're doing it, the performance is going to be very poor on mobile. I can explain why if you want (for now you probably don't care), but suffice it to say that dynamic VBOs are tricky to make fast on mobile GPUs.

For now, to help you get to the root of your performance issue(s):


Try creating and uploading to your VBOs on startup only, not every frame.
In your draw loop, just bind your VBO(s) and then issue your glDrawElements. Don't change the contents of the VBOs!

How's your performance look before and after? If better, try adding back in dynamic updates by just:


Submitting your vertex and index data to the GPU via client arrays (i.e. CPU pointers) each frame rather than via server arrays (aka VBOs).

This is a fairly tiny change over what you were doing before. Just provide CPU pointers to glVertexAttribPointer() and glDrawElements() rather than offsets into the bound VBOs, and don't bind those VBOs (just bind 0 instead, which unbinds whatever buffer was previously bound).

How's the performance now?

Also, it occurs to me to ask: are you clearing all of your framebuffer buffers at the beginning of the frame, and are you invalidating/discarding framebuffer buffers at the end of frame that you don't need anymore once your frame is displayed (usually depth and/or stencil, if present)? That'll save you some significant DRAM bandwidth/perf, particularly if your framebuffer resolution is large.

Also, you're only calling glBindFramebuffer once at startup (if at all), not during your rendering loop, right?

tommohawkaction
07-13-2017, 02:42 AM
Ok. What are you changing that causes the performance to dip?

In other words, steady-state (no movement, same GL commands sent to the driver each frame, same contents of VBOs each frame), it will likely have a steady-state frame rate. Is that 50 fps? What are you doing that causes it to dip to 20 fps?



I was going to ask: which GPUs? This implicitly answers it: ARM Mali-400 MP (circa 2008; OpenGL ES 2.0) for the Samsung Galaxy Ace 2.

And either an Adreno 320 (circa 2012; OpenGL ES 3.0-capable) or NVidia Tegra 3 (circa 2012; OpenGL ES 2.0), depending on whether you have a Nexus 7 (2013) or Nexus 7 (2012), respectively. Which do you have?



To quicky nail down the cause of the performance problem, I'd temporarily replace your vertex shader with a very simple one that just computes and writes gl_Position, and replace your frag shader with a very simple one that just writes out red as the gl_FragColor.

Any difference in perf? If not, move on...



Are these created and uploaded to once before your render loop begins, and not changed in your render loop? If so, move on...



What are the values of vertices.length and indices.length for your two meshes? If not insanely large, move on...

(You might be surprised how large these can be on mobile and still get good performance if you drive the GPU properly.)



This is actually the first thing that caught my eye as a performance problem.

The "0" for the pointer argument suggests you are using a VBO as the source of your index buffer (and I suspect you're also using VBO(s) as the source of your vertex arrays). It also indicates that you are changing the contents of those VBOs every frame.

Doing it the way you're doing it, the performance is going to be very poor on mobile. I can explain why if you want (for now you probably don't care), but suffice it to say that dynamic VBOs are tricky to make fast on mobile GPUs.

For now, to help you get to the root of your performance issue(s):


Try creating and uploading to your VBOs on startup only, not every frame.
In your draw loop, just bind your VBO(s) and then issue your glDrawElements. Don't change the contents of the VBOs!

How's your performance look before and after? If better, try adding back in dynamic updates by just:


Submitting your vertex and index data to the GPU via client arrays (i.e. CPU pointers) each frame rather than via server arrays (aka VBOs).

This is a fairly tiny change over what you were doing before. Just provide CPU pointers to glVertexAttribPointer() and glDrawElements() rather than offsets into the bound VBOs, and don't bind those VBOs (just bind 0 instead, which unbinds whatever buffer was previously bound).

How's the performance now?

Also, it occurs to me to ask: are you clearing all of your framebuffer buffers at the beginning of the frame, and are you invalidating/discarding framebuffer buffers at the end of frame that you don't need anymore once your frame is displayed (usually depth and/or stencil, if present)? That'll save you some significant DRAM bandwidth/perf, particularly if your framebuffer resolution is large.

Also, you're only calling glBindFramebuffer once at startup (if at all), not during your rendering loop, right?

Thanks for the response, a lot to consider... will get to work