glGetCommandHeaderNV(GL_NOP_COMMAND_NV, 4)
always seems to return ZERO, but no error.
I take this to mean that I can just pad all of my command structs with ZEROED-UINT32s for alignment.
This helps, because I like to use streaming stores when building the command lists in mapped buffers.
I use streaming stores for the write-combined memory that opengl likes to give for write-only buffer mapping.
But, streaming stores might help for normal memory too, IDK?
I initial the xmm/ymm register with:
__m128i _mm_setzero_si128()
__m256i _mm256_setzero_si256()
I build the command with:
__m128i _mm_insert_epi16 (__m128i a, int i, int imm8)
__m256i _mm256_insert_epi16 (__m256i a, __int16 i, const int index)
__m256i _mm256_insert_epi32 (__m256i a, __int32 i, const int index)
__m256i _mm256_insert_epi64 (__m256i a, __int64 i, const int index)
I store using:
void _mm_stream_si32 (int* mem_addr, int a)
void _mm_stream_si64 (__int64* mem_addr, __int64 a)
void _mm_stream_si128 (__m128i* mem_addr, __m128i a)
void _mm256_stream_si256 (__m256i * mem_addr, __m256i a)
I was using the 32 bit and 64 bit streaming stores at first, but I found that I could use the 128bit and 256bit aligned streaming stores by first zeroing the xmm/ymm register and then setting the needed fields. I know that this can end up leaving a lot of NOP commands in the list, but that doesn’t seem to impact performance noticeably for my tests. It appears that a few NOPs here and there for padding is ok performance-wise, but I would imagine that huge thousands or millions of them might be a bad thing.
I have not noticed any difference when using
glCompileCommandListNV
to compile a tightly packet command list versus a command list where commands are 256bit aligned and NOP/ZERO padded.
Also, since NOPs appear to be ZERO, I believe one can safely initialize a command memory buffer with zeros and it would just be lots of NOPs. And since lots of memory allocating methods (HeapAlloc(HEAP_ZERO_MEMORY), VirtualAlloc, OpenGL buffers(using nullptr for data)) can do this, it seems super convenient, almost like someone planned it that way ;).
Can someone officially comment on whether NOPs are ZERO, and if they will be ZERO forever? Also, please let me know if this is a fragile abuse of some corner-case, or if it is ok for standard practice.
Thanks