I've started working on a 2D tile rendering system. So far it draws 8x8 tiles on the screen. At 640x480 resolution, there can be up to 4941 tiles on screen at a time, and the number quickly increases at higher resolutions. I'm looking for an efficient way to render them all.
Click image for larger version. 

Name:	TileRenderTest.jpg 
Views:	40 
Size:	21.8 KB 
ID:	2672

Additionally, each tile will be a solid color. On the application side, I have a vec4[MAX_COLORS] storing each color and I send an int colorIndex to each instance.

I'm using instanced rendering right now, but several things about my approach make me wonder if it's inefficient. I'm storing the colors in a uniform in the vertex shader, which I realize isn't the best way. I'm just getting the hang of openGL so I'm also not sure if instanced rendering would be the method here. Here's the relevant code:

Constant data, structs:
Code :
static float colors[MAX_COLORS][4];
static float tileRect[] =
	0.0f, 0.0f,
	0.0f, TILE_SIZE,
	TILE_SIZE, 0.0f,
	vec2 positions[MAX_TILES];
	int colorIndex[MAX_TILES];
} tiles;
	GLuint vertexShader;
	GLuint fragShader;
	GLuint programID;
	GLuint vao;
} tileShader;

I bind the rectangle, positions, and color index as vertex attributes:
Code :
glVertexArrayVertexBuffer(tileShader.vao, 0, buffers.rect, 0, 2 * sizeof(float));
glVertexArrayVertexBuffer(tileShader.vao, 1, buffers.position, 0, 2 * sizeof(float));
glVertexArrayVertexBuffer(tileShader.vao, 2, buffers.colIdxs, 0, 1 * sizeof(int));
glVertexArrayAttribFormat(tileShader.vao, 0, 2, GL_FLOAT, GL_FALSE, NULL);
glVertexArrayAttribFormat(tileShader.vao, 1, 2, GL_FLOAT, GL_FALSE, NULL);
glVertexArrayAttribIFormat(tileShader.vao, 2, 1, GL_INT, 0);
glVertexArrayBindingDivisor(tileShader.vao, 0, 0);
glVertexArrayBindingDivisor(tileShader.vao, 1, 1);
glVertexArrayBindingDivisor(tileShader.vao, 2, 1);

Store the colors and and transform as uniforms:
Code :
GLint cl = glGetUniformLocation(tileShader.programID, "colors");
glUniform4fv(cl, MAX_COLORS, &colors);
GLint mv = glGetUniformLocation(tileShader.programID, "modelview");
glUniformMatrix4fv(mv, 1, GL_FALSE, worldTransform);

The render loop:
Code :
while (!glfwWindowShouldClose(window))
	glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, MAX_TILES);

Vertex shader: (OpenGL 4.5)
Code :
#version 450 core
uniform vec4 colors[2];
uniform mat4 modelview;
layout(location = 0) in vec2 iPos;
layout (location = 1) in vec2 iOff;
layout (location = 2) in int iColIdx;
out vec4 vCol;
void main()
    gl_Position = modelview * vec4(iPos + iOff, 0.0f, 1.0f);
    vCol = colors[iColIdx];

Other approaches I've considered:
Drawing the tiles:
- Draw the static tiles as point sprites
- Just send the tile's color to each instance.
- Perhaps there's a way to connect all the static quads into one large quad, and just render that?

For storing all the colors, I'm thinking either a 1D texture array or an ssbo.

So my question is, for upwards of 20,000 tiles (at higher resolutions), is this method sufficient, or is there a better way to handle this?