Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 4 of 5 FirstFirst ... 2345 LastLast
Results 31 to 40 of 41

Thread: Scalar binary/intermediate shader code

  1. #31
    Junior Member Regular Contributor
    Join Date
    Aug 2006
    Posts
    230
    Quote Originally Posted by kRogue View Post
    If we look at the D3D HLSL compiler, its main issue is that the instruction set it compiles to does not represent the underlying hardware well enough. If it had saturate ops, various ops for inputs that are for free, more realistic understanding of various sampler commands, then the D3D HLSL compiler would not do so many odd things.
    Not true, I got this bit of HLSL:
    Code :
    Output.Position = saturate((input.Position * -input.Position) + abs(input.Position));
    to compile to this asm:
    Code :
    mad_sat o0.xyzw, v0.xyzw, -v0.xyzw, |v0.xyzw|

    I'm not sure I've seen any instruction set that lets you have saturates on input though, abs and neg sure, but sat and ssat only seem to be allowed on output.

    I certainly agree with you that the Mesa guys don't really know how to do compilers. They seem to do a lot of their optimizations on the GLSL tree IR, which is just awkward. The TGSI instruction set it's mean to get compiled to doesn't support things like predicates, or updating condition codes on any instruction. The fact that they have 3 IRs (Mesa, GLSL and TGSI) make things even more confusing.

    Regards
    elFarto
    Last edited by elFarto; 07-04-2014 at 08:58 AM.

  2. #32
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    607
    Quote Originally Posted by elFarto View Post
    Not true, I got this bit of HLSL:
    Code :
    Output.Position = saturate((input.Position * -input.Position) + abs(input.Position));
    to compile to this asm:
    Code :
    mad_sat o0.xyzw, v0.xyzw, -v0.xyzw, |v0.xyzw|

    I'm not sure I've seen any instruction set that lets you have saturates on input though, abs and neg sure, but sat and ssat only seem to be allowed on output.
    Looks like I should read everything on the internet with a huge grain of salt. It looks like the D3D HLSL compiler does know of sat, which I did not think it did. Learn something everyday. However, just to make sure I do not have to learn something again, the D3D compiler bytecode is not scalar but vector? I know for D3D9 that is the case, but just to make sure...

    Sadly, no matter what Microsoft does, the current situation is that some hardware will not be happy: most want scalar, except for Intel's which wants vec4 on everything but fragment. Though, I am of the opinion that maybe for that case maybe have option for scalar or vector preference ... or just say hell with it and make it all scalar and let a driver vectorize it.


    I certainly agree with you that the Mesa guys don't really know how to do compilers. They seem to do a lot of their optimizations on the GLSL tree IR, which is just awkward. The TGSI instruction set it's mean to get compiled to doesn't support things like predicates, or updating condition codes on any instruction. The fact that they have 3 IRs (Mesa, GLSL and TGSI) make things even more confusing.

    Regards
    Stephen

    TGSI is just for Gallium drivers. My take on it is that it is not meant to be the place to perform optimizations. For Gallium drivers, the Gallium state tracker sits between Mesa and the driver. It reduces the API to a much simpler API. The shader API is through TGSI: Gallium drivers are essentially fed TGSI for shaders. I lie a little bit there: there is an option to feed a Gallium driver LLVM, but AFAIK, that LLVM is generated from TGSI anyways. The Mesa IR is not for GLSL, it is for the assembly interface.

  3. #33
    Junior Member Regular Contributor
    Join Date
    Aug 2006
    Posts
    230
    Quote Originally Posted by kRogue View Post
    However, just to make sure I do not have to learn something again, the D3D compiler bytecode is not scalar but vector?
    Yes, it's vector. Although, I guess you could treat it as scalar and just specify one element at a time.

    Quote Originally Posted by kRogue View Post
    The Mesa IR is not for GLSL, it is for the assembly interface.
    I know, I've been knee deep in it for the past week, attempting to implement NV_gpu_program{4,5}. It doesn't look like it's going to be possible without large changes to TGSI IR.

    Regards
    elFarto

  4. #34
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    607
    To implement NV_gpu_program4 or 5 one will need to first attack just Mesa (not Gallium) to update the assembly interface to accept all that those extensions add, lots of pain there. Then you need to update the Gallium state tracker in converting the Mesa IR (that you updated) to TGSI IR. Then the real pain begins as the TGSI is really not good enough anymore. It is fine-ish for D3D9 feature sets mostly, but it for features from NVIDIA's Geforce8 series and up, it just is not good enough. Situation is quite dire and at the same time really funny.

    If I had the authority, I'd say a wise thing to do would be for Gallium to dump TGSI and use NVIDIA's PTX format that their CUDA stack uses.
    Last edited by kRogue; 07-05-2014 at 03:12 AM.

  5. #35
    Member Regular Contributor
    Join Date
    Apr 2004
    Posts
    260
    my opinion is that il instruction set should not necessarily try to match any hardware. not not even things that are common among all hardwares.
    for example i think it should not have any modifiers (things like saturate output and negate/abs input). instead it should have plain, simple and pure instructions. e.g. use separate negate instruction instead of negate modifier.
    the reason for this opinion is that it is very easy for driver il-to-hw converters to merge a negate instruction into modifiers for the the following instructions that use the negated register. same with the other modifiers.
    but keeping the modifiers separate instructions in the il allows for much simpler il, which helps the frontend/optimization/whatever passes that work with il - now they don't have to worry about special casess and awkward rules and exceptions e.g. got to remember that the bitwise not and other unary operations are separate instructions but arithmetic negate and absolute value are input modifiers, and saturate is output modifier, and complicate our algorithms to handle the mess, or probably have many separate algorithms to handle different cases and somehow tangle them togehter.

    Sorry, by "il" i meant the intermediate binary code we are talking about. I recently read the AMD's intermediate language (il) documentation and i carried this word from there only by inertia (i don't consider it very appropriate because "language" sounds more like high level one, but it is assembly)
    Last edited by l_belev; 07-07-2014 at 08:03 AM.

  6. #36
    Junior Member Regular Contributor
    Join Date
    Dec 2009
    Posts
    241
    If Khronos defines a generic assembly language, should there also be a strict GLSL to assembly mapping in the spec, or should every vendor be able to adjust his GLSL to assembly compiler to produce assembly that maps optimally to his hardware ?

  7. #37
    Member Regular Contributor
    Join Date
    Apr 2004
    Posts
    260
    This is how i imagine it should be:

    By no means there should by strict GLSL to assembly mapping requirements. The vendors are free to develop their own compilers if they feel they can do better job at optimizing than the (hypothetical) standard reference compiler developed by Khronos.

    On the other hand a shader given in the standard assembly language form should be able to run on any implementation (and have the same result) regardless of how the shader was produced - by the standard reference compiler, by private vendor's compiler, converted from other shader assembly by tool or written by hand. After all thats what the "standard" is about. This probably means there will be little-to-no incentive for the vendors to develop their own compiler front-ends, which is not bad. If they find a problem in the reference compiler they better contribute fix to it rather than make their private branch. I think it'll also be easier for them as it will lift some burden and let them focus more on developing their hardware instead of re-inventing the wheel of writing compilers.

    Then again i'm not hw vendor, they may have different opinions
    And in the end of the day its their opinion that matters. i am nobody, i just post some suggestion.
    Last edited by l_belev; 07-08-2014 at 02:23 AM.

  8. #38
    Junior Member Regular Contributor
    Join Date
    Aug 2006
    Posts
    230
    Quote Originally Posted by l_belev View Post
    ...for example i think it should not have any modifiers (things like saturate output and negate/abs input). instead it should have plain, simple and pure instructions. e.g. use separate negate instruction instead of negate modifier.
    the reason for this opinion is that it is very easy for driver il-to-hw converters to merge a negate instruction into modifiers for the the following instructions that use the negated register...
    I'm not sure I buy that. It's surely going to be easier for a driver to de-optimize a MAD_SAT to a MAD and SAT, than it is to combine a separate MAD/SAT into a single instruction. Combining the instructions means looking though all the usages of the destination register to see if its only use is in a SAT instruction (which it might not be). The same goes for a neg/abs on the inputs. It's easy for the assembler to see that an input operand needs a negation/abs (and if the hardware doesn't support it) emit a NEG/ABS before the MAD.

    Quote Originally Posted by mbentrup View Post
    If Khronos defines a generic assembly language, should there also be a strict GLSL to assembly mapping in the spec, or should every vendor be able to adjust his GLSL to assembly compiler to produce assembly that maps optimally to his hardware ?
    I have to agree with l_belev here, if the driver's got the GLSL you might as well let it compile as best it can.

    Regards
    elFarto

  9. #39
    Member Regular Contributor
    Join Date
    Apr 2004
    Posts
    260
    sometimes the best way to do something is not the obvious way. unfortunately people often rush and do the obvious and fail to consider many important perspectives.

  10. #40
    Intern Contributor
    Join Date
    Mar 2014
    Posts
    65
    Quote Originally Posted by l_belev View Post
    sometimes the best way to do something is not the obvious way. unfortunately people often rush and do the obvious and fail to consider many important perspectives.

    Correct. But before defining 'the best way' we first need to define precisely what a binary shader format is supposed to achieve.

    I think your main concern is compilation time, right?
    If you ask me, it's completely pointless to define the specifics of a binary format unless we know what part of the compilation process is the bottleneck here. Correct me if I'm wrong but from my experience I'd guess it's the optimization process, not turning human-readable source into an equivalent binary representation. But if that's the case I believe that a binary format is utterly pointless because the optimization results will be entirely different for different hardware (even different generations from the same manufacturer as hardware evolves!) so with a fixed low level binary representation you'd inevitably run into other problems later in the game when the driver developers have to sort out the mess - and that's problems I'm seriously worried about because they affect everybody.
    So, my conclusion is that for this case there is no better solution than the current method of doing a precompilation run and locally cache the generated binary.

    On the other hand, if your concern is not having to provide human-readable code the binary output should be as close as possible to the source code, not even trying to create pseudo-assembly out of it, so that the optimizers have as generic data as possible to work with. Any concept of scalar vs. vectorized would just be plain wrong in this case.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •