Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 20

Thread: DrawXInstancedTransformFeedback

  1. #1
    Intern Contributor
    Join Date
    Apr 2010
    Posts
    68

    DrawXInstancedTransformFeedback

    Since ARB_transform_feedback_instanced, it is possible to draw multiple instances of transform feedback data without using a query and the resulting round trip from server to client. The primcount must be specified by the client while the count is read from the transform feedback object. Having the possibility to do it the other way round would be a nice addition, especially for instance cloud reduction algorithms : we know what we need to render (a mesh), but we don't know the number of instances for the current frame (because we're doing per instance view frustum culling on the GPU, for example).

    So here's what I quickly came up with : two new instanced drawing functions which use the result of a transform feedback object as the primcount parameter
    - DrawArraysInstancedTransformFeedback(enum mode, int first, sizei count, uint id);
    - DrawElementsInstancedTransformFeedback(enum mode, sizei count, enum type, const void* indices, uint id);

    I think what they do is pretty explicit so I'm not giving any detail. The parameters are the same as the standard functions, but the primcount parameter is replaced by the name of the TF object.



  2. #2
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    5,394

    Re: DrawXInstancedTransformFeedback

    What you're asking for doesn't make sense. You want to use transform feedback to somehow produce a count of instances to render. How would that work? What would your shader have to look like to generate a count?

  3. #3
    Intern Contributor
    Join Date
    Apr 2010
    Posts
    68

    Re: DrawXInstancedTransformFeedback

    1/ You have an array of matrices, each matrix is an instance of a mesh.
    2/ Use Transform Feedback to perform culling in a geometry shader : you get another Array of matrices. You also have the number of generated primitives stored in the transform feedback object.
    3/ Draw your meshes with instancing using the culled matrix array as per instance data (vertexAttribDivisor) and use the number of generated primitives stored in the transform feedback object as the primcount in one of the functions I suggested.

    Currently the only solution is to use a query to get the result. With my suggestion, we can do this asynchronously.

  4. #4

  5. #5
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    989

    Re: DrawXInstancedTransformFeedback

    Quote Originally Posted by _blitz
    1/ You have an array of matrices, each matrix is an instance of a mesh.
    2/ Use Transform Feedback to perform culling in a geometry shader : you get another Array of matrices. You also have the number of generated primitives stored in the transform feedback object.
    3/ Draw your meshes with instancing using the culled matrix array as per instance data (vertexAttribDivisor) and use the number of generated primitives stored in the transform feedback object as the primcount in one of the functions I suggested.

    Currently the only solution is to use a query to get the result. With my suggestion, we can do this asynchronously.
    That wouldn't work that way. I know it because I'm the author of the article you've linked. Transform feedback renders the captured data as the primitive type you specify. The problem is that the result of the transform feedback is the instance data buffer and you don't want to feed it back that way. You cannot even use indexed triangles (DrawElements*) this way.

    What we need in order to be able to make the algorithm you described, what I also investigated, is to be able to take an instanced draw command num_instances field from a buffer object. That would be, naturally an extension to the already existing indirect drawing functionality with a MultiDrawElementsIndirect style command that takes it's num_instances parameter from a buffer filled previously by the culling phase using atomic counters. Actually I've already proposed such a development idea to NVIDIA and AMD. AMD actually implemented some of the proposal via AMD_multi_draw_indirect, however, even though this later provides MultiDrawElementsIndirect for executing multiple indirect draw commands, the num_instances parameter is still taken from client side.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  6. #6
    Intern Contributor
    Join Date
    Apr 2010
    Posts
    68

    Re: DrawXInstancedTransformFeedback

    Quote Originally Posted by aqnuep
    That wouldn't work that way. I know it because I'm the author of the article you've linked. Transform feedback renders the captured data as the primitive type you specify. The problem is that the result of the transform feedback is the instance data buffer and you don't want to feed it back that way.
    Actually I do want to feed it back that way.

    Quote Originally Posted by aqnuep
    What we need in order to be able to make the algorithm you described, what I also investigated, is to be able to take an instanced draw command num_instances field from a buffer object. That would be, naturally an extension to the already existing indirect drawing functionality with a MultiDrawElementsIndirect style command that takes it's num_instances parameter from a buffer filled previously by the culling phase using atomic counters. Actually I've already proposed such a development idea to NVIDIA and AMD. AMD actually implemented some of the proposal via AMD_multi_draw_indirect, however, even though this later provides MultiDrawElementsIndirect for executing multiple indirect draw commands, the num_instances parameter is still taken from client side.
    The solution you're talking about is not what I'm describing in my suggestion.

    To make it clear once and for all here's the code I'd like to be able to produce
    Code :
    void init()
    {
    	glBindVertexArray(VERTEX_ARRAY_PER_INSTANCE_DATA);
    		glEnableVertexAttribArray(0); // per instance matrix column 0
    		glEnableVertexAttribArray(1); // per instance matrix column 1
    		glEnableVertexAttribArray(2); // per instance matrix column 2
    		glEnableVertexAttribArray(3); // per instance matrix column 3
    		glBindBuffer(GL_ARRAY_BUFFER, BUFFER_PER_INSTANCE_DATA);
    		glVertexAttribPointer(0, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(0));
    		glVertexAttribPointer(1, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(  sizeof(vec4)));
    		glVertexAttribPointer(2, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(2*sizeof(vec4)));
    		glVertexAttribPointer(3, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(3*sizeof(vec4)));
    	glBindVertexArray(VERTEX_ARRAY_RENDER);
    		glEnableVertexAttribArray(0); // vertex position of the instanced mesh
    		glEnableVertexAttribArray(1); // per instance matrix column 0
    		glEnableVertexAttribArray(2); // per instance matrix column 1
    		glEnableVertexAttribArray(3); // per instance matrix column 2
    		glEnableVertexAttribArray(4); // per instance matrix column 3
    		glBindBuffer(GL_ARRAY_BUFFER, BUFFER_MESH_VERTICES);
    		glVertexAttribPointer(0, 3, GL_FLOAT, 0, 0, BUFFER_OFFSET(0));
    		glBindBuffer(GL_ARRAY_BUFFER, BUFFER_PER_INSTANCE_DATA_CULLED);
    		glVertexAttribPointer(0, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(0));
    		glVertexAttribPointer(1, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(  sizeof(vec4)));
    		glVertexAttribPointer(2, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(2*sizeof(vec4)));
    		glVertexAttribPointer(3, 4, GL_FLOAT, 0, sizeof(mat4), BUFFER_OFFSET(3*sizeof(vec4)));
    		glVertexAttribDivisor(1, 1);
    		glVertexAttribDivisor(2, 1);
    		glVertexAttribDivisor(3, 1);
    		glVertexAttribDivisor(4, 1);
    		glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, BUFFER_MESH_INDEXES);
    	glBindVertexArray(0);
    }
     
    void cullPass()
    {
    	glUseProgram(PROGRAM_CULL);
    	glBindTransformFeedback(GL_TRANSFORM_FEEDBACK, TRANSFORM_FEEDBACK_CULL);
    	glBeginTransformFeedback(GL_POINTS);
    		glBindVertexArray(VERTEX_ARRAY_PER_INSTANCE_DATA);
    		glDrawArrays(GL_POINTS, 0, INSTANCE_COUNT);
    	glEndTransformFeedback();
    }
     
    void renderPass()
    {
    	glUseProgram(PROGRAM_RENDER);
    	glBindVertexArray(VERTEX_ARRAY_RENDER);
    	// NEW : use the tf's counter to specify primcount
    	glDrawElementsInstancedTransformFeedback( GL_TRIANGLES,
    	                                          mesh.count, 
    	                                          GL_UNSIGNED_SHORT, 
    	                                          BUFFER_OFFSET(0), 
    	                                          TRANSFORM_FEEDBACK_CULL );
    }
    And while I'm at it some shader code
    Cull:
    Code :
    #version 430 core
     
    // usual culling stuff
    uniform vec3 u_instanceBoxMin;
    uniform vec3 u_instanceBoxMax;
    layout(std140) uniform FrustumPlanes {
    	vec4 u_frustumPlanes[6]; // view frustum planes
    }
     
    /////////////////////////////////////////////////
    // Vertex Shader
    layout(location = 0) in vec4 i_perInstanceMeshMatrixCol0;
    layout(location = 1) in vec4 i_perInstanceMeshMatrixCol1;
    layout(location = 2) in vec4 i_perInstanceMeshMatrixCol2;
    layout(location = 3) in vec4 i_perInstanceMeshMatrixCol3;
     
    layout(location = 0) in vec4 o_perInstanceMeshMatrixCol0;
    layout(location = 1) in vec4 o_perInstanceMeshMatrixCol1;
    layout(location = 2) in vec4 o_perInstanceMeshMatrixCol2;
    layout(location = 3) in vec4 o_perInstanceMeshMatrixCol3;
    layout(location = 4) flat out int o_isVisible;
     
    void main()
    {
    	mat4 modelMatrix = mat4( i_perInstanceMeshMatrixCol0,
    	                         i_perInstanceMeshMatrixCol1,
    	                         i_perInstanceMeshMatrixCol2,
    	                         i_perInstanceMeshMatrixCol3 );
     
    	// set varyings
    	o_perInstanceMeshMatrixCol0 = i_perInstanceMeshMatrixCol0;
    	o_perInstanceMeshMatrixCol1 = i_perInstanceMeshMatrixCol1;
    	o_perInstanceMeshMatrixCol2 = i_perInstanceMeshMatrixCol2;
    	o_perInstanceMeshMatrixCol3 = i_perInstanceMeshMatrixCol3;
     
    	// compute AABB and test against view frustum planes
    	vec3 aabbVertices[8];
    	// ...
     
    	// if the AABB is intersecting or inside the view frustum
    	o_isVisible = 1;
    }
     
    /////////////////////////////////////////////////
    // Geom Shader
     
    layout(points) in;
    layout(location = 0) in vec4 i_perInstanceMeshMatrixCol0[1];
    layout(location = 1) in vec4 i_perInstanceMeshMatrixCol1[1];
    layout(location = 2) in vec4 i_perInstanceMeshMatrixCol2[1];
    layout(location = 3) in vec4 i_perInstanceMeshMatrixCol3[1];
    layout(location = 4) flat in int i_isVisible[1];
     
    layout(points, max_vertices = 1) out;
    layout(location = 0, stream = 0) out vec4 o_perInstanceMeshMatrixCol0;
    layout(location = 1, stream = 0) out vec4 o_perInstanceMeshMatrixCol1;
    layout(location = 2, stream = 0) out vec4 o_perInstanceMeshMatrixCol2;
    layout(location = 3, stream = 0) out vec4 o_perInstanceMeshMatrixCol3;
     
    void main()
    {
    	if(1 == i_isVisible[0])
    	{
    		o_perInstanceMeshMatrixCol0 = i_perInstanceMeshMatrixCol0[0];
    		o_perInstanceMeshMatrixCol1 = i_perInstanceMeshMatrixCol1[0];
    		o_perInstanceMeshMatrixCol2 = i_perInstanceMeshMatrixCol2[0];
    		o_perInstanceMeshMatrixCol3 = i_perInstanceMeshMatrixCol3[0];
     
    		EmitVertex();
    		EndPrimitive();
    	}
    }
    Render:
    Code :
    #version 430 core
     
    uniform mat4 u_viewMatrix;
    uniform mat4 u_projectionMatrix;
     
    /////////////////////////////////////////////////
    // Vertex shader
    layout(location = 0) in vec3 i_meshVertex;
    layout(location = 1) in vec4 i_perInstanceMeshMatrixCol0;
    layout(location = 2) in vec4 i_perInstanceMeshMatrixCol1;
    layout(location = 3) in vec4 i_perInstanceMeshMatrixCol2;
    layout(location = 4) in vec4 i_perInstanceMeshMatrixCol3;
     
    void main()
    {
    	mat4 modelMatrix = mat4( i_perInstanceMeshMatrixCol0,
    	                         i_perInstanceMeshMatrixCol1,
    	                         i_perInstanceMeshMatrixCol2,
    	                         i_perInstanceMeshMatrixCol3 );
    	mat4 modelViewProjection = u_projectionMatrix * (u_viewMatrix * modelMatrix);
    	gl_Position = modelViewProjection * vec4(i_meshVertex, 1.0);
    }
     
    /////////////////////////////////////////////////
    // Fragment shader
    layout(location = 0) out vec4 o_color;
     
    void main()
    {
    	o_color = vec4(1.0);
    }
    In the end, I'm suggesting to use the counter of a transform feedback object for something else than just the number of vertices in a gl draw call, more specifically as the number of instances in an instanced rendering scenario. It seems feasible to me and would offer more async behaviour for instanced rendering algorithms.

    @aqnuep The multi draw arrays solution you talk about comes in handy when you have different geometry/meshes to instantiate. My scenario assumes that we're using multiple instances of one single mesh.

  7. #7
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    989

    Re: DrawXInstancedTransformFeedback

    Ah, now I know what you mean. But this is something that is already possible via ARB_draw_indirect and ARB_shader_atomic_counters.

    As in case of draw indirect the primcount parameter comes already from a buffer object, the only thing that you have to do is set the backup buffer of the atomic counter to the primcount field of the indirect draw command buffer and simply increase the atomic counter in the geometry shader.

    Actually you don't even need transform feedback and geometry shader, but you can do everything using ARB_shader_image_load_store and implement an append buffer using a read/write image and an atomic counter. This is even more efficient than using geometry shader and transform feedback because geometry shaders must ensure that the order of the primitives emitted is in the same order as those received as input. The hardware has to ensure this and it has a negative effect on performance. As we simply store an unordered array of instance data, we don't have requirements related to the ordering, so it is faster to implement the whole thing with an append buffer.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  8. #8
    Intern Contributor
    Join Date
    Apr 2010
    Posts
    68

    Re: DrawXInstancedTransformFeedback

    Quote Originally Posted by aqnuep
    Actually you don't even need transform feedback and geometry shader, but you can do everything using ARB_shader_image_load_store and implement an append buffer using a read/write image and an atomic counter.
    Very interesting! I'm going to try to lay out what you mean, would you mind telling me if I understood you correctly ?

    The vertex shader would look like something like this (actually we only need a vertex stage) :
    Code :
    #version 420 core
     
    atomic_uint atomic_primCount;  // number of instances
    image1D image_perInstanceData; // texture buffer
     
    // usual culling stuff
    uniform vec3 u_instanceBoxMin;
    uniform vec3 u_instanceBoxMax;
    layout(std140) uniform FrustumPlanes {
    	vec4 u_frustumPlanes[6]; // view frustum planes
    }
     
    layout(location = 0) in vec4 i_perInstanceMeshMatrixCol0;
    layout(location = 1) in vec4 i_perInstanceMeshMatrixCol1;
    layout(location = 2) in vec4 i_perInstanceMeshMatrixCol2;
    layout(location = 3) in vec4 i_perInstanceMeshMatrixCol3;
     
    void main()
    {
    	mat4 modelMatrix = mat4( i_perInstanceMeshMatrixCol0,
    	                         i_perInstanceMeshMatrixCol1,
    	                         i_perInstanceMeshMatrixCol2,
    	                         i_perInstanceMeshMatrixCol3 );
     
    	// compute AABB and test against view frustum planes
    	vec3 aabbVertices[8];
    	// ...
     
    	// if the AABB is visible
    	if(1 == isVisible)
    	{
    		uint perInstanceOffset = 4u * atomicCounterIncrement(1u);
    		imageStore(image_perInstanceData, perInstanceOffset  , modelMatrix[0]);
    		imageStore(image_perInstanceData, perInstanceOffset+1, modelMatrix[1]);
    		imageStore(image_perInstanceData, perInstanceOffset+2, modelMatrix[2]);
    		imageStore(image_perInstanceData, perInstanceOffset+3, modelMatrix[3]);
    	}
    }
    Where :
    - the atomic counter atomic_primCount is the primCount of an DRAW_INDIRECT_BUFFER (bound to an ATOMIC_COUNTER_BUFFER)
    - and the image_perInstanceData is a my 'BUFFER_PER_INSTANCE_DATA_CULLED' (given in my previous post) bound as TEXTURE_BUFFER to an image.

  9. #9
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    989

    Re: DrawXInstancedTransformFeedback

    Yes, I meant exactly what you've presented.

    I planned to update my Nature and Mountains demo as well to use this new technique just I was quite busy lately and also GL 4.2 drivers are not mature enough so I thought I don't have to hurry.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  10. #10
    Intern Contributor
    Join Date
    Apr 2010
    Posts
    68

    Re: DrawXInstancedTransformFeedback

    Quote Originally Posted by aqnuep
    I planned to update my Nature and Mountains demo as well to use this new technique just I was quite busy lately and also GL 4.2 drivers are not mature enough so I thought I don't have to hurry.
    Yes I'm pretty curious about the performances (writing to an image with synchronization doesn't sound very GPU friendly, guess I'll have to bench to find out). I'll also be looking forward to seeing your updated demo on your blog, thanks for sharing the algorithm !

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •