Improve Loading Times

I’m optimizing my content pipeline and wanted to know of any way I can reduce loading times even further.

My current solution loads the following in 10 milliseconds(100th of a second) :

  • mesh (100KB)
  • collision mask (200KB) (pre-cooked physx collision objects)
  • diffuse/colour map (683KB) (pre-compressed DXT1) (pre-generated mip maps) (1024x1024)
  • combined normal & specular map (683KB) (pre-compressed DXT1) (pre-generated mip maps) (1024x1024)
    NB: all resources are packed as one file to reduce seek times

What had the hugest impact on loading data :

  • transfer speed of data from ram to graphics memory (2.5ms per 1024x1024 texture, texture size has a huge impact here)
  • transfer speed of data from hard-drive to ram (2ms)

My texture class has a LOD setting which will reduce loading times by passing less data to the graphics card :
LOD (0) - 10ms (4mb usage) (1024x1024)
LOD (1) - 8ms (1mb usage) (512x512)
LOD (2) - 5ms (0.25mb usage) (256x256)
NB: usage(video memory) includes a mesh and 2 DXT1 compressed textures. PhysX3.0 collision mask stored in ram so excluded from usage

Compression considerations :

  • viable if multi-threading (and/or) commercial decompression library is used, otherwise can double loading times of hard drive to ram transfer on free fast decompression libraries and even slower on slower decompression libraries.

New Research :
New technique to increase precision of normal maps allowing me to store them in DXT1 with a reduction in artifacts.

Left : Uncompressed
Middle : DXT1 Compressed, notice artifacts caused by compression
Right : DXT1 Compression Using New Technique, notice the reduction in artifacts

If anyone has any additional ways I can improve the loader, please give me any advice.

Hell, i will try to help you, so:
>- mesh (100KB)
How you load the vertex buffer and index buffer for a mesh ?
This a static mesh or skeletal animation?
You have to load the both buffers at once in to Video Memory, without performing a copying memory, for example:

size_t size;

void* data = glMapBuffer(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);
file.Read(data, size);
glUnMapBuffer();

data = glMapBuffer(GL_ELEMENT_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);
file.Read(data, size);
glUnMapBuffer();

>- collision mask (200KB) (pre-cooked physx collision objects)
How you load this data ?

You have to store all not graphics resources in to common adress space of RAM memory, this allow to reduce cache miss.
Try to use the “inplace loading”
http://www.gamasutra.com/view/feature/1565/fast_file_loading_pt_1.php

The project I’m working on is not using any animated geometry. Here is the code I use to load my mesh.

cVAO(int size, int num, GLvoid *data, int size2, int num2, GLvoid *data2)
	: _numIndices(num2)
{
	glGenVertexArrays(1, &vao);
	glBindVertexArray(vao);

	glGenBuffers(1, &ao);
	glBindBuffer(GL_ARRAY_BUFFER, ao);
	glBufferData(GL_ARRAY_BUFFER, size*num, data, GL_STATIC_DRAW);

	glGenBuffers(1, &eao);
	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, eao);
	glBufferData(GL_ELEMENT_ARRAY_BUFFER, size2*num2, data2, GL_STATIC_DRAW);
}
vao = new cVAO(sizeof(cMeshVertex), vertexCount, (void*)vertices, sizeof(c_count), indexCount, (void*)indices);

vao->enableVertexAttribute(0);
vao->vertexAtrribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(cMeshVertex), 0);
vao->enableVertexAttribute(1);
vao->vertexAtrribPointer(1, 2, GL_FLOAT, GL_FALSE, sizeof(cMeshVertex), (void*)((c_uint)vertices->texCoord-(c_uint)vertices) );
vao->enableVertexAttribute(2);
vao->vertexAtrribPointer(2, 3, GL_FLOAT, GL_FALSE, sizeof(cMeshVertex), (void*)((c_uint)vertices->normal-(c_uint)vertices) );

This is not a concern since it’s loaded by the PhysX API.

Seems like a good idea, will give it a look.

Here’s the sample code of the texture loader which focuses on loading a compressed texture. “bin” is used to read and write from a memory stream.

int mipmaps = bin.readInt();
int w = width;
int h = height;

if (lod)
{
	width >>= lod;
	height >>= lod;
}
		
for (int i = 0; i <= mipmaps; ++i)
{
	if (w == 0) w = 1;
	if (h == 0) h = 1;

	int size = bin.readInt();
	if (i >= lod)
	{
		bool assign_new_mem = false; // NEW:: gives me an extra millisecond
		c_byte *img = bin.read(size, assign_new_mem); // do not assign new memory but pass pointer to it's own data

		glCompressedTexImage2D(GL_TEXTURE_2D, i-lod, internalformat, w, h, 0, size, (GLvoid*)img);
	}
	else
		bin.get += size;

	w >>= 1;
	h >>= 1;
}

Any ways to improve on that?

>vao = new cVAO(

oh, why here is “new” ?
You have to use less dynamic memory allocation, if a vao is contains in a C++ class, try to create it without operator “new”.

>Here’s the sample code of the texture loader which focuses on >loading a compressed texture. “bin” is used to read and write >from a memory stream.

>int size = bin.readInt();

You have to create “a header”.
Header must have all simple data of a mesh and materials: number of Mips, dimension of textures… In this case, you can load for one time this data, without the separate small reading of file.

bin - What uses in this class ?
What is “a memory stream”?
Do You load a file in a memory ?

Ok will do that. How much does this impact performance?

bin is the class cBinary which manages a byte array. it can read and write data in the binary array.

Memory stream is dynamically allocated memory that I’m navigating through.

Files are loaded into memory as a byte array and bin is used to access data from it.

I could probably use precise placement of variables so that I can load my cTexture variables faster, such as.

memcpy(this, bin.read(size), size);

this - pointer to cTexture class
size - size in bytes of variable data being transfered from the byte array to the cTexture class

I’ve got another 2 classes which deal with read and writing files, these are.
cBinWriter - easy way to write binary files
cFileStream - simple fstream wrapper

Ok will do that. How much does this impact performance?

Out of curiosity, how do you plan on doing that? Removing the dynamic allocation? Presumably cVAO contains an OpenGL VAO object. If you’re RAII-wrapping it, then you’ll have problems with copying it (unless you are using C++0x move operations, or the horrible std::auto_ptr cheat, the latter of which you should never use).

Anyway, your disk access is your bottleneck; dynamic allocations are meaningless at this point. Don’t worry about it until it starts becoming an actual problem.

cFileStream - simple fstream wrapper

Again, out of curiosity, why did you feel the need to wrap std::fstream? It’s a C++ standard library class; everyone has one.

cVAO vao is located in a cMesh object which is located in a cModel object which means I usually pass cMesh or cModel*, but you right I should just leave it as a pointer.

To my surprise it does run slightly faster if it does not need to dynamically allocate large pieces of memory. It seems to read very fast from the hard drive, but I’m not sure if clock() is not good to measure hard-drive read speeds and causing in-accuracy?

I use it mostly to read and write binary files so it has additional features to speed up the process by making it easier. Although the cBinWriter is a much easier class to use because it allows me to write a binary string to a file or allocating it as dynamic data to use with cBinary so I don’t make use of cFileStream anymore. The reason I have these 3 files to manage binary data is because I did work on a game maker. I had to also write classes that deal with archives which allowed all your game resources to be stored as a single file and without these classes it would have taken my forever to read and write binary files.

It’s a general rule with disk access and memory allocations that “lots of small accesses/allocations perform extremely poorly, very few large allocations/accesses perform well” (or at least as close to optimal as possible given the circumstances).

So if you’re reading files you’ll do better if you read the entire file into memory in one big chunk, then parse what you need out of the in-memory copy. If you’re allocating objects in memory you’ll do better if you allocate one huge chunk of memory (a virtual memory API is really useful here) and draw down from that as required.

A lot of the OO/C++ stuff can hide this behind various abstractions so you need to be sure that you know what the classes you’re using are doing behind the scenes before you make the decision to use them.

You’re never going to get 100% optimal performance on either, and the best you can do is choose what works best for you given your circumstances, constraints, and which bunch of tradeoffs are least objectionable to you.