Well, one of my hobbies is to write mesh exporters for 3d programs like 3dsmax, maya, xsi, etc…
Both in OpenGL and DirectX, found a thing that desperates me… Imagine the following TEXTURED box:
As you can see, the cube has 8 vertex positions, but when you export it has 24 different vertices… why? simple… The vertices shares positions but not normals neither UVs ( notice the Hi texture applyed for each face )
[img]http://www.santyesprogramadorynografista.net/cubeV2.png[/img]
For frontal face, v2 texture coords are (1,1), while for right face v2 texture coords are (0,1), so you need to DUPLICATE the vertex as:
pos=(1,1,0); UVs=(1,1)
pos=(1,1,0); UVs=(0,1)
This makes the mesh to occupy more in VRAM and also provokes the shared vertex positions to be transformed twice or more, which degrades performace…
This way to treat the vertices is due electronic microchips are very fast typically applying linear algorithms without pointers, branching, random access, etc… But now we can perform some of these things with the GPU.
For the next shader generation, this should change, so the triangles must be processed in a different way. Imagine… In one call you do
glPositionsArray(...) //Send all vertex positions
glNormalsArrau(...) //Send all normals to GPU
glTextureCoordsArray(...) //Send all UVs to GPU
struct sINPUT_TRIANGLE
{
int idx; //Triangle index in the itself
int vIndex[3]; //vertex indices referred to glPositionsArray
int normalIndex[3]; //normal indices referred to glNormalArray
int uvIndex[32][3]; //32 uv channels for multitexture
int adjIndex[3]; //the 3 neighbor adjacent triangle indices of the triangles that surrounds this triangle. -1 if none.
};
glTriangleIndices(sINPUT_TRIANGLE*, int nTris);
struct sRASTER_PIXEL
{
vec3 pos;
vec2 uv[32];
vec3 norm;
...
vec4 customData[32]; //other used-defined extra data to be goraud/flat triangle interpolated
}
struct sRASTER_TRIANGLE
{
sRASTER_PIXEL v[3];
}
Internally, the GPU does does:
foreach ( Mesh mesh in vram.MeshesToDraw )
{
//Execute the user-defined topographic shader and draw resulting triangles
foreach ( sRASTER_TRIANGLE tri in unifiedShader::TopographicShade(mesh.Triangles) )
{
foreach ( sRASTER_PIXEL pix in renderer::drawTri(tri) )
{
unifiedShader::PixelShade(pix);
}
}
}
The user can define the following custom shader:
//User defined shader constants ( call glSetShaderConstant(string name, void* value) to set them)
mat4x4 g_objinvViewProjTM, g_objTM;
vec3 camDirNeg; //Camera direction negated for face culling
const vec3 inputPositions[]; //The GPU will put here the data from glPositionsArray
const vec3 inputNormals[]; //The GPU will put here the data from glNormalsArray
const vec2 inputUVs[32][]; //The GPU will put here the data from glUVsArray
sampler2D textureSampler[32]; //32 multitextures allowed
sRASTER_TRIANGLE[] unifiedShader::TopographicShade(sINPUT_TRIANGLE[] tris )
{
vec3[] triNormals;
sRASTER_TRIANGLE[] finalRasterTriangles;
vec3 vNormals[inputPositions.Count] = {0};
vec3 v20, v10, triN;
//Average normals using adjacency
foreach ( Triangle t in tris )
{
v20 = normalize ( inputPositions[t.vIndex[2]] - inputPositions[t.vIndex[0]] );
v20 = normalize ( inputPositions[t.vIndex[1]] - inputPositions[t.vIndex[0]] );
triN = normalize(cross(v2,v0));
triNormals.Add(triN);
for ( int i=0; i<3; i++ )
{
vNormals[t.normalIndex[i]] += triN;
}
}
//Transform positions to clip space, normals to world space
vec4 tPositions[inputPositions.Count];
for ( int i=0; i<inputPositions.Count; i++ )
{
tPositions[i] = mul(g_objinvViewProjTM,inputPositions[i]]);
}
vec3 tNormals[vNormals.Count];
for ( int i=0; i<vNormals.Count; i++ )
{
tNormals[i] = nul(g_objTM,vNormals[i]);
}
//Do back-face culling
sRASTER_TRIANGLE[] finalTriangleList;
sRASTER_TRIANGLE l_sTri;
foreach ( Triangle t in tris )
{
if ( dot(triNormals[t.idx],camDirNeg) < 0.0 )
{
//Back-face found, skip it
continue;
}
//Output raster triangle
for ( int i=0; i<3; i++ )
{
l_sTri.v[i].pos = tPositions[tri.vPos[i]]);
l_sTri.v[i].norm = tNormals[tri.normalIndex[i]]);
l_sTri.v[i].uv[0] = inputUVs[0][tri.uvIndex[0][i]]);
}
finalTriangleList.Add(l_sTri);
}
return finalTriangleList;
}
vec4[6] unifiedShader::PixelShade ( sRASTER_PIXEL pix )
{
/*
Pixel shader like we use at the moment BUT infinite length allowed, REAL early-out/break/continue/return branching,
constant/texture sample indexing allowed, etc... The unified shader grants us to use ANY instruction.. the GPU
will be like a BIG SIMD calculator, allowing normalize(), sincos(), constant[index],
for/do/while... with NO LIMIT at all( except pointers perhaps ).
Notice we return a vec4[6] color array, allowing us to write cube-map faces too and letting
us to specify a FLT_INFINITE value if we dont want to write anything to one MRT.
*/
vec[0] = texture2D(textureSampler,pix.uv[0]); //color to muñltiple render target 0
vec[1] = vec[2] = vec[3] = vec[4] = vec[5] = flt_infinite; //infinite=don't write
}
Also, will be GOOD to save in a VRAM cache the “raster triangles” that outputs the “Topographic shader” for multipass algorithns, so no need to re-transform again the vertices. For example:
int cachedTransformedTrianglesHandler = glDrawArrays(....); //this "shades" the triangles, draws the mesh using indexed triangle list and "caches" the raster triangles into an internal VRAM buffer...
glRedrawMultipassArrays(cachedTransformedTrianglesHandler); // this will re-use the previous drawn raster triangles skipping the need for re-transform all the vertices again...
glEraseCache(cachedTransformedTrianglesHandler);//free transformed triangles cache
Notice this way to store/process data is very efficient, customizable, skips the need to implement a post-vertex-cache and allows tons of thing we cannot do at the moment…
What do you think about all this?