Name NV_fragment_program Name Strings GL_NV_fragment_program Contact Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com) Notice Copyright NVIDIA Corporation, 2001-2002. IP Status NVIDIA Proprietary. Status Implemented in CineFX (NV30) Emulation driver, August 2002. Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003. Version Last Modified Date: 2005/05/24 NVIDIA Revision: 73 Number 282 Dependencies Written based on the wording of the OpenGL 1.2.1 specification and requires OpenGL 1.2.1. Requires support for the ARB_multitexture extension with at least two texture units. NV_vertex_program affects the definition of this extension. The only dependency is that both extensions use the same mechanisms for defining and binding programs. NV_texture_shader trivially affects the definition of this extension. NV_texture_rectangle trivially affects the definition of this extension. ARB_texture_cube_map trivially affects the definition of this extension. EXT_fog_coord trivially affects the definition of this extension. NV_depth_clamp affects the definition of this extension. ARB_depth_texture and SGIX_depth_texture affect the definition of this extension. NV_float_buffer affects the definition of this extension. ARB_vertex_program affects the definition of this extension. ARB_fragment_program affects the definition of this extension. Overview OpenGL mandates a certain set of configurable per-fragment computations defining texture lookup, texture environment, color sum, and fog operations. Each of these areas provide a useful but limited set of fixed operations. For example, unextended OpenGL 1.2.1 provides only four texture environment modes, color sum, and three fog modes. Many OpenGL extensions have either improved existing functionality or introduced new configurable fragment operations. While these extensions have enabled new and interesting rendering effects, the set of effects is limited by the set of special modes introduced by the extension. This lack of flexibility is in contrast to the high-level of programmability of general-purpose CPUs and other (frequently software-based) shading languages. The purpose of this extension is to expose to the OpenGL application writer an unprecedented degree of programmability in the computation of final fragment colors and depth values. This extension provides a mechanism for defining fragment program instruction sequences for application-defined fragment programs. When in fragment program mode, a program is executed each time a fragment is produced by rasterization. The inputs for the program are the attributes (position, colors, texture coordinates) associated with the fragment and a set of constant registers. A fragment program can perform mathematical computations and texture lookups using arbitrary texture coordinates. The results of a fragment program are new color and depth values for the fragment. This extension defines a programming model including a 4-component vector instruction set, 16- and 32-bit floating-point data types, and a relatively large set of temporary registers. The programming model also includes a condition code vector which can be used to mask register writes at run-time or kill fragments altogether. The syntax, program instructions, and general semantics are similar to those in the NV_vertex_program and NV_vertex_program2 extensions, which provide for the execution of an arbitrary program each time the GL receives a vertex. The fragment program execution environment is designed for efficient hardware implementation and to support a wide variety of programs. By design, the entire set of existing fragment programs defined by existing OpenGL per-fragment computation extensions can be implemented using the extension's programming model. The fragment program execution environment accesses textures via arbitrarily computed texture coordinates. As such, there is no necessary correspondence between the texture coordinates and texture maps previously lumped into a single "texture unit". This extension separates the notion of "texture coordinate sets" and "texture image units" (texture maps and associated parameters), allowing implementations with a different number of each. The initial implementation of this extension will support 8 texture coordinate sets and 16 texture image units. Issues What limitations exist in this extension? RESOLVED: Very few. Programs can not exceed a maximum program length (which is no less than 1024 instructions), and can use no more than 32-64 temporary registers. Programs can not access more than one fragment attribute or program parameter (constant) per instruction, but can work around this restriction using temporaries. The number of textures that can be used by a program is limited to the number of texture image units provided by the implementation (16 in the initial implementation of this extension). These limits are fairly high. Additionally, there is no limit on the total number of texture lookups that can be performed by a program. There is no limit on the length of a texture dependency chain -- one can write a program that performs over 1000 consecutive dependent texture lookups. There is no restrictions on dependencies between texture mapping instructions and arithmetic instructions. Texture lookups can be performed using arbitrarily computed texture coordinates. Applications can carry out their calculations with full 32-bit single precision, although two lower-precision modes are also available. How does texture mapping work with fragment programs? RESOLVED: This extension provides three instructions used to perform texture lookups. The "TEX" instruction performs a lookup with the (s,t,r) values taken from an interpolated texture coordinate, an arbitrarily computed vector, or even a program constant. The "TXP" instruction performs a similar lookup, except that it uses the fourth component of the source vector to performs a perspective divide, using (s/q, t/q, r/q). In both cases, the GL will automatically compute partial derivatives used for filter and LOD selection. The "TXD" instruction operates like "TEX", except that it allows the program to explicitly specify two additional vectors containing the partial derivatives of the texture coordinate with respect to x and y window coordinates. All three instructions write a filtered texel value to a temporary or output register. Other than the computation of texture coordinates and partial derivatives, texture lookups not performed any differently in fragment program mode. In particular, any applicable LOD biases, wrap modes, minification and magnification filters, and anisotropic filtering controls are still applied in fragment program mode. The results of the texture lookup are available to be used arbitrarily by subsequent fragment program instructions. Fragment programs are allowed to access any texture map arbitrarily many times. Can fragment programs be used to compute depth values? RESOLVED: Yes. A fragment program can perform arbitrary computations to compute a final value for the fragment, which it should write to the "z" component of the o[DEPR] register. The "z" value written should be in the range [0,1], regardless of the size of the depth buffer. To assist in the computation of the final Z value, a fragment program can access the interpolated depth of the fragment (prior to any displacement) by reading the "z" component of the f[WPOS] attribute register. How should near and far plane clipping work in fragment program mode if the current fragment program computes a depth value? RESOLVED: Geometric clipping to the near and far clip plane should be disabled. Clipping should be done based on the depth values computed per-fragment. The rationale is that per-fragment depth displacement operations may effectively move portions of a primitive initially outside the clip volume inside, and vice versa. Note that under the NV_depth_clamp extension, geometric clipping to the near and far clip planes is also disabled, and the fragment depth values are clamped to the depth range. If depth clamp mode is enabled when using a fragment program that computes a depth value, the computed depth value will be clamped to the depth range. Should fragment programs be allowed to use multiple precisions for operands and operations? RESOLVED: Yes. Low-precision operands are generally adequate for representing colors. Allowing low-precision registers also allows for a larger number of temporary registers (at lower precision). Low-precision operations also provide the opportunity for a higher level of performance. Applications are free to use only high-precision operations or mix high- and low-precision operations as necessary. What levels of precision are supported in arithmetic operations? RESOLVED: Arithmetic operations can be performed at three different precisions. 32-bit floating point precision (fp32) uses the IEEE single-precision standard with a sign bit, 8 exponent bits, and 23 mantissa bits. 16-bit floating-point precision (fp16) uses a similar floating-point representation, but with 5 exponent bits and 10 mantissa bits. Additionally, many arithmetic operations can also be carried out at 12-bit fixed point precision (fx12), where values in the range [-2,+2) are represented as signed values with 10 fraction bits. How should the precision with which operations are carried out be specified? Should we infer the precision from the types of the operands or result vectors? Or should it be an attribute of the instruction? RESOLVED: Applications can optionally specify the precision of individual instructions by adding a suffix of "R", "H", and "X" to instruction names to select fp32, fp16, and fx12 precision, respectively. By default, instructions will be carried out using the precision of the destination register. Always inferring the precision from the operands has a number of issues. First, there are a number of operations (e.g., TEX/TXP/TXD) where result type has little to no correspondance to the type of the operands. In these cases, precision suffixes are not supported. Second, one could have instructions automatically cast operands and compute results using the type of the highest precision operand or result. This behavior would be problematic since all fragment attribute registers and program parameters are kept at full precision, but full precision may not be needed by the operation. The choice of precision level allows programs to trade off precision for potentially higher performance. Giving the program explicit control over the precision also allows it to dictate precision explicitly and eliminate any uncertainty over type casting. For instructions whose specified precision is different than the precision of the operands or the result registers, how are the operations performed? How are the condition codes updated? RESOLVED: Operations are performed with operands and results at the precision specified by the instruction. After the operation is complete, the result is converted to the precision of the destination register, after which the condition code is generated. In an alternate approach, the condition code could be generated from the result. However, in some cases, the register contents would not match the condition code. In such cases, it may not be reliable to use the condition code to prevent division by zero or other special cases. How does this extension interact with the ARB_multisample extension? In the ARB_multisample extension, each fragment has multiple depth values. In this extension, a single interpolated depth value may be modified by a fragment program. RESOLVED: The depth values for the extra samples are generated by computing partials of the computed depth value and using these partials to derive the depth values for each of the extra samples. How does this extension interact with polygon offset? Both extensions modify fragment depth values. RESOLVED: As in the base OpenGL spec, the depth offset generated by polygon offset is added during polygon rasterization. The depth value provided to programs in f[WPOS].z already includes polygon offset, if enabled. If the depth value is replaced by a fragment program, the polygon offset value will NOT be recomputed and added back after program execution. This is probably not desirable for fragment programs that modify depth values since the partials used to generate the offset may not match the partials of the computed depth value. Polygon offset for filled polygons can be approximated in a fragment program using the depth partials obtained by the DDX and DDY instructions. This will not work properly for line- and point-mode polygons, since the partials used for offset are computed over the polygon, while the partials resulting from the DDX and DDY instructions are computed along the line (or are zero for point-mode polygons). In addition, separate treatment of points, line segments, and polygons is not possible in a fragment program. Should depth component replacement be an property of the fragment program or a separate enable? RESOLVED: It should be a program property. Using the output register notation simplifies matters: depth components are replaced if and only if the DEPR register is written to. This alleviates the application and driver burden of maintaining separate state. How does this extension affect the handling of q texture coordinates in the OpenGL spec? RESOLVED: Fragment programs are allowed to access an associated q texture coordinate, so this attribute must be produced by rasterization. In unextended OpenGL 1.2, the q coordinate is eliminated in the rasterization portions of the spec after dividing each of s, t, and r by it. This extension updates the specification to pass q coordinates through at least to conventional texture mapping. When fragment program mode are disabled, q coordinates will be eliminated there in an identical manner. This modification has the added benefit of simplifying the equations used for attribute interpolation. How should clip w coordinates be handled by this extension? RESOLVED: Fragment programs are allowed to access the reciprocal of the clip w coordinate, so this attribute must be produced by rasterization. The OpenGL 1.2 spec doesn't explictly enumerate the attributes associated with the fragment, but we add treatment of the w clip coordinate in the appropriate locations. The reciprocal of the clip w coordinate in traditional graphics hardware is produced by screen-space linear interpolation of the reciprocals of the clip w coordinates of the vertices. However, this spec says the clip w coordinate is produced by perspective-correct interpolation of the (non-reciprocated) clip w vertex coordinates. These two formulations turn out to be equivalent, and the latter is more convenient since the core OpenGL spec already contains formulas for perspective-correct interpolation of vertex attributes. What is produced by the TEX/TXP/TXD instructions if the requested texture image is inconsistent? RESOLVED: The result vector is specified to be (0,0,0,0). This behavior is consistent with the NV_texture_shader extension. Note that like in NV_texture_shader, these instructions ignore the standard hierarchy of texture enables and programs can access textures that are not specifically "enabled". Should a minimum precision be specified for certain fragment attribute registers (in particular COL0, COL1) that may not be generated with full fp32 precision? RESOLVED: No. It is expected that the precision of COL0/COL1 should generally be at least as high as that of the frame buffer. Fragment color components (f[COL0] and f[COL1]) are generally low-precision fixed-point values in the range [0,1]. Is it possible to pass unclamped or high-precision color components to fragment programs? RESOLVED: Yes, although you can't exactly call them "colors". High-precision per-vertex color values can be written into any unused texture coordinate set, either via a MultiTexCoord call or using a vertex program. These "texture coordinates" will be interpolated during rasterization, and can be used arbitrarily by a fragment program. In particular, there is no requirement that per-fragment attributes called "texture coordinates" be used for texture mapping. Should this specification guarantee that temporary registers are initialized to zero? RESOLVED: Yes. This will allow for the modular construction of programs that accumulate results in registers. For example, per-fragment lighting may use MAD instructions to accumulate color contributions at each light. Without zero-initialization, the program would require an explicit MOV instruction to load 0 or the use of the MUL instruction for the first light. Should this specification support Unicode program strings? RESOLVED: Not necessary. Programs defined by NV_vertex_program begin with "!!VP1.0". Should fragment programs have a similar identifier? RESOLVED: Yes, "!!FP1.0", identifying the first revision of this fragment program language. Should per-fragment attributes have equivalent integer names in the program language, as per-vertex attributes do in NV_vertex_program? RESOLVED: No. In NV_vertex_program, "generic" vertex attributes could be specified directly by an application using only an attribute number. Those numbers may have no necessary correlation with the conventional attribute names, although conventional vertex attributes are mapped to attribute numbers. However, conventional attributes are the only outputs of vertex programs and of rasterization. Therefore, there is no need for a similar input-by-number functionality for fragment programs. Should we provide the ability to issue instructions that do not update temporary or output registers? RESOLVED: Yes. Programs may issue instructions whose only purpose is to update the condition code register, and requiring such instructions to write to a temporary may require the use of an additional temporary and/or defeat possible program optimizations. We accomplish this by adding two write-only temporary pseudo-registers ("RC" and "HC") that can be specified as destination registers. Do the packing and unpacking instructions in this extension make any sense? RESOLVED: Yes. They are useful for packing and unpacking multiple components in a single channel of a floating-point frame buffer. For example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities or 8 16-bit quantities, all of which could be used in later rasterization passes. See the NV_float_buffer extension for more information. Should we provide a method for specifying a fp16 depth component output value? RESOLVED: No. There is no good reason for supporting half-precision Z outputs. Even with 16-bit Z buffers, the 10-bit mantissa of the half-precision float is rather limiting. There would effectively be only 11 good bits in the back half of the Z buffer. Should RequestResidentProgramsNV (or a new equivalent function) take a target? Dealing with working sets of different program types is a bit messy. Should we document some limitation if we get programs of different types? RESOLVED: In retrospect, it may have been a good idea to attach a target to this command, but there isn't a good reason to mess with something that already works for vertex programs. The driver is responsible for ensuring consistent results when the program types specified are mixed. What happens on data type conversions where the original value is not exactly representable in the new data type, either due to overflow or insufficient precision in the destination type? RESOLVED: In case of overflow, the original value is clamped to the +/-INF (fp16 or fp32) or the nearest representable value (fx12). In case of imprecision, the conversion is either to round or truncate to the nearest representable value. Should this extension support IEEE-style denorms? For 32-bit IEEE floating point, denorms are numbers smaller in absolute value than 2^-126. For 16-bit floats used by this extension, denorms are numbers smaller in absolute value than 2^-14. RESOLVED: For 32-bit data types, hardware support for denorms was considered too expensive relative to the benefit provided. Computational results that would otherwise produce denorms are flushed to zero. For 16-bit data types, hardware denorm support will be present. The expense of hardware denorm support is lower and the potential precision benefit is greater for 16-bit data types. OpenGL provides a hierarchy of texture enables. The texture lookup operations in NV_texture_shader effectively override the texture enable hierarchy and select a specific texture to enable. What should be done by this extension? RESOLVED: This extension will build upon NV_texture_shader and reduce the driver overhead of validating the texture enables. Texture lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2, 3D", which would indicate to use texture coordinate set number 2 to do a lookup in the texture object bound to the TEXTURE_3D target in texture image unit 2. Each texture unit can have only one "active" target. Programs are not allowed to reference different texture targets in the same texture image unit. In the example above, any other texture instructions using texture image unit 2 must specify the 3D texture target. What is the interaction with NV_register_combiners? RESOLVED: Register combiners are not available when fragment programs are enabled. Previous version of this specification supported the notion of combiner programs, where the result of fragment program execution was a set of four "texture lookup" values that fed the register combiners. For convenience, should we include pseudo-instructions not present in the hardware instruction set that are trivially implementable? For example, absolute value and subtract instructions could fall in this category. An "ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB R2,R0,R1" would be equivalent to "ADD R2,R0,-R1" RESOLVED: In general, yes. A SUB instruction is provided for convenience. This extension does not provide a separate ABS instruction because it supports absolute value operations of each operand. Should there be a '+' in the portion of the grammar? There isn't one in the GL_NV_vertex_program spec. RESOLVED: Yes, for orthogonality/readability. A '+' obviously adds no functionality. In NV_vertex_program, an of "-" was always a negation operator. However, in fragment programs, it can also be used as a sign for a constant value. Can the same fragment attribute register, program parameter register, or constants be used for multiple operands in the same instruction? If so, can it be used with different swizzle patterns? RESOLVED: Yes and yes. This extension allows different limits for the number of texture coordinate sets and the number of texture image units (i.e., texture maps and associated data). The state in ActiveTextureARB affects both coordinate sets (TexGen, matrix operations) and image units (TexParameter, TexEnv). How should we deal with this? RESOLVED: Continue to use ActiveTextureARB and emit an INVALID_OPERATION if the active texture refers to an unsupported coordinate set/image unit. Other options included creating dummy (unusable) state for unsupported coordinate sets/image units and continue to use ActiveTextureARB normally, or creating separate state and state-setting commands for coordinate sets and image units. Separate state is the cleanest solution, but would add more calls and potentially cause more programmer confusion. Dummy state would avoid additional error checks, but the demands of dummy state could grow if the number of texture image units and texture coordinate sets increases. The current OpenGL spec is vague as to what state is affected by the active texture selector and has no distination between coordinate-related and image-related state. The state tables could use a good clean-up in this area. The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2" is R0*R1+(1-R0)*R2. There are conflicting precedents here. The definition here matches the "lrp" instruction in the DirectX 8.0 pixel shader language. However, an equivalent RenderMan lerp operation would yield a result of (1-R0)*R1+R0*R2. Which ordering should be implemented? RESOLVED: NVIDIA hardware implements the former operand ordering, and there is no good reason to specify a different ordering. To convert a "LRP" using the latter ordering to NV_fragment_program, swap the third and fourth arguments. Should this extension provide tracking of matrices or any other state, similar to that provided in NV_vertex_program? RESOLVED: No. Should this extension provide global program parameters -- values shared between multiple fragment programs? RESOLVED: No. Should this extension provide program parameters specific to a program? If so, how? RESOLVED: Yes. These parameters will be called "local parameters". This extension will provide both named and numbered local parameters. Local parameters can be managed by the driver and eliminate the need for applications to manage a global name space. Named local parameters work much like standard variable names in most programming languages. They are created using the "DECLARE" instruction within the fragment program itself. For example: DECLARE color = {1,0,0,1}; Named local parameters are used simply by referencing the variable name. They do not require the array syntax like the global parameters in the NV_vertex_program extension. They can be updated using the commands ProgramNamedParameter4[f,fv]NV. Numbered local parameters are not declared. They are used by simply referencing an element of an array called "p". For example, MOV R0, p[12]; loads the value of numbered local parameter 12 into register R0. Numbered local parameters can be updated using the commands ProgramLocalParameter4[d,dv,f,fv]ARB. The numbered local parameter APIs were added to this extension late in its development, and are provided for compatibility with the ARB_vertex_program extension, and what will likely be supported in ARB_fragment_program as well. Providing this mechanism allows programs to use the same mechanisms to set local parameters in both extension. Why are the APIs for setting named and numbered local parameters different? RESOLVED: The named parameter API was created prior to ARB_vertex_program (and the possible future ARB_fragment_program) and uses conventions borrowed from NV_vertex_program. A slightly different API was chosen during the ARB standardization process; see the ARB_vertex_program specification for more details. The named parameter API takes a program ID and a parameter name, and sets the parameter for the program with the specified ID. The specified program does not need to be bound (via BindProgramNV) in order to modify the values of its named parameters. The numbered parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a parameter number and modifies the corresponding numbered parameter of the currently bound program. What should be the initial value of uninitialized local parameters? RESOLVED: (0,0,0,0). This choice is somewhat arbitrary, but matches previous extensions (e.g., NV_vertex_program). Should this extension support program parameter arrays? RESOLVED: No hardware support is present. Note that from the point of view of a fragment program, a texture map can be used as a 1-, 2-, or 3-dimensional array of constants. Should this extension provide support constants in fragment programs? If so, how? RESOLVED: Yes. Scalar or vector constants can be defined inline (e.g., "1.0" or "{1,2,3,4}"). In addition, named constants are supported using the "DEFINE" instruction, which allow programmers to change the values of constants used in multiple instructions simply be changing the value assigned to the named constant. Note that because this extension uses program strings, the floating-point value of any constants generated on the fly must be printed to the program string. An alternate method that avoids the need to print constants is to declare a named local program parameter and initialize it with the ProgramNamedParameter4[f,fv]() calls. Should named constants be allowed to be redefined? RESOLVED: No. If you want to redefine the values of constants, you can create an equivalent named program parameter by changing the "DEFINE" keyword to "DECLARE". Should functions used to update or query named local parameters take a zero-terminated string (as with most strings in the C programming language), or should they require an explicit string length? If the former, should we create a version of LoadProgramNV that does not require a string length. RESOLVED: Stick with explicit string length. Strings that are defined as constants can have the length computed at compile-time. Strings read from files will have the length known in advance. Programs to build strings at run-time also likely keep the length up-to-date. Passing an explicit length saves time, since the driver doesn't have to do a strlen(). What is the deal with the alpha of the secondary color? RESOLVED: In unextended OpenGL 1.2, the alpha component of the secondary color is forced to 0.0. In the EXT_secondary_color extension, the alpha of the per-vertex secondary colors is defined to be 0.0. NV_vertex_program allows vertex programs to produce a per-vertex alpha component, but it is forced to zero for the purposes of the color sum. In the NV_register_combiners extension, the alpha component of the secondary color is undefined. What a mess. In this extension, the alpha of the secondary color is well-defined and can be used normally. When in vertex program mode Why are fragment program instructions involving f[FOGC] or f[TEX0] through f[TEX7] automatically carried out at full precision? RESOLVED: This is an artifact of the method that these interpolants are generated the NVIDIA graphics hardware. If such instructions absolutely must be carried out at lower precision, the requirement can be met by first loading the interpolants into a temporary register. With a different number of texture coordinate sets and texture image units, how many copies of each kind of texture state are there? RESOLVED: The intention is that texture state be broken into three groups. (1) There are MAX_TEXTURE_COORDS_NV copies of texture coordinate set state, which includes current texture coordinates, TexGen state, and texture matrices. (2) There are MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which include texture maps, texture parameters, LOD bias parameters. (3) There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit state (e.g., texture enables, TexEnv blending state), all of which are unused when in fragment program mode. It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS -- implementations may choose not to extend fixed-function OpenGL texture mapping modes beyond a certain point. The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end up with programs >64KB. This will overflow the limits of the GLX Render protocol, resulting in the need to use RenderLarge path. This is an issue with vertex programs, also. RESOLVED: Yes, it is. Should textures used by fragment programs be declared? For example, "TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all accesses to texture unit 3. The dimension could be dropped from the TEX family of instructions, and some of the compile-time error checking could be dropped. RESOLVED: Maybe it should be, but for better or worse, it isn't. It is not all that uncommon to have negative q values with projective texture mapping, but results are undefined if any q values are negative in this specification. Why? RESOLVED: This restriction carries on a similar one in the initial OpenGL specification. The motivation for this restriction is that when interpolating, it is possible for a fragment to have an interpolated q coordinate at or near 0.0. Since the texture coordinates used for projective texture mapping are s/q, t/q, and r/q, this will result in a divide-by-zero error or suffer from significant numerical instability. Results will be inaccurate for such fragments. Other than the numerical stability issue above, NVIDIA hardware should have no problems with negative q coordinates. Should programs that replace depth have their own special program type, Such as "!!FPD1.0" and "!!FPDC1.0"? RESOLVED: No. If a program has an instruction that writes to o[DEPR], the final fragment depth value is taken from o[DEPR].z. Otherwise, the fragment's original depth value is used. What fx12 value should NaN map to? RESOLVED: For the lack of any better choice, 0.0. How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for arithmetic and comparison operations? RESOLVED: The special cases for all floating-point operations are designed to match the IEEE specification for floating-point numbers as closely as possible. The results produced by special cases should be enumerated in the sections of this spec describing the operations. There are some cases where the implemented fragment program behavior does not match IEEE conventions, and these cases should be noted in this specification. How can condition codes be used to mask out register writes? How about killing fragments? What other things can you do? RESOLVED: The following example computes a component wise |R1-R2|: SUBC R0, R1, R2; # "C" suffix means update condition code MOV R0 (LT), -R0; # Conditional write mask in parentheses The first instruction computes a component-wise difference between R1 and R2, storing R1-R2 in register R0. The "C" suffix in the instruction means to update the condition code based on the sign of the result vector components. The second instruction inverts the sign of the components of R0. However the "(LT)" portion says that the destination register should be updated only if the corresponding condition code component is LT (negative). This means that only those components of R0 To kill a fragment if the red (x) component of a texture lookup returns zero: TEXC R0, f[TEX0], TEX0, 2D; KIL EQ.x; To kill based on the green (y) component, use "EQ.y" instead. To kill if any of the four components is zero, use "EQ.xyzw" or just "EQ". Fragment programs do not support boolean expressions. These can generally be achieved using conditional write mask. To evaluate the expression "(R0.x == 0) && (R1.x == 0)": MOVC RC.x, R0.x; MOVC RC.x (EQ), R1.x; To evaluate the expression "(R0.x == 0) || (R1.x == 0)": MOVC RC.x, R0.x; MOVC RC.x (NE), R1.x; In both cases, the x component of the condition code will contain "EQ" if and only if the condition is TRUE. How can fragment programs be used to implement non-standard texture filtering modes? RESOLVED: As one example, consider a case where you want to do linear filtering in a 2D texture map, but only horizontally. To achieve this, first set the texture filtering mode to NEAREST. For a 16 x n texture, you might do something like: DEFINE halfTexel = { 0.03125, 0 }; # 1/32 (1/2 a texel) ADD R2, f[TEX0], -halfTexel; # coords of left sample ADD R1, f[TEX0], +halfTexel; # coords of right sample TEX R0, R2, TEX0, 2D; # lookup left sample TEX R1, R1, TEX0, 2D; # lookup right sample MUL R2.x, R2.x, 16; # scale X coords to texels FRC R2.x, R2.x; # get fraction, filter weight LRP R0, R2.x, R1, R0; # blend samples based on weight There are plenty of other interesting things that can be done. Should this specification provide more examples? RESOLVED: Yes, it should. Is the OpenGL ARB working on a multi-vendor standard for fragment programmability? Will there be an ARB_fragment_program extension? If so, how will this extension interact with the ARB standard? RESOLVED: Yes, as of July 2002, there was a multi-vendor working group and a draft specification. The ARB extension is expected to have several features not present in this extension, such as state tracking and global parameters (called "program environment parameters"). It will also likely lack certain features found in this extension. Why does the HEMI mapping apply to the third component of signed HILO textures, but not to unsigned HILO textures? RESOLVED: This behavior matches the behavior of NV_texture_shader (e.g., the DOT_PRODUCT_NV mode). The HEMI mapping will construct the third component of a unit vector whose first two components are encoded in the HILO texture. New Procedures and Functions void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name, float x, float y, float z, float w); void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name, double x, double y, double z, double w); void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name, const float v[]); void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name, const double v[]); void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name, float *params); void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name, double *params); void ProgramLocalParameter4dARB(enum target, uint index, double x, double y, double z, double w); void ProgramLocalParameter4dvARB(enum target, uint index, const double *params); void ProgramLocalParameter4fARB(enum target, uint index, float x, float y, float z, float w); void ProgramLocalParameter4fvARB(enum target, uint index, const float *params); void GetProgramLocalParameterdvARB(enum target, uint index, double *params); void GetProgramLocalParameterfvARB(enum target, uint index, float *params); New Tokens Accepted by the parameter of Disable, Enable, and IsEnabled, by the parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev, and by the parameter of BindProgramNV, LoadProgramNV, ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB, ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB, GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB: FRAGMENT_PROGRAM_NV 0x8870 Accepted by the parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev: MAX_TEXTURE_COORDS_NV 0x8871 MAX_TEXTURE_IMAGE_UNITS_NV 0x8872 FRAGMENT_PROGRAM_BINDING_NV 0x8873 MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV 0x8868 Accepted by the parameter of GetString: PROGRAM_ERROR_STRING_NV 0x8874 Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation) Modify Section 2.11, Clipping (p.39) (replace the first paragraph of the section, p. 39) Primitives are clipped to the clip volume. In clip coordinates, the view volume is defined by -w_c <= x_c <= w_c, -w_c <= y_c <= w_c, and -w_c <= z_c <= w_c. Clipping to the near and far clip planes is ignored if fragment program mode (section 3.11) or texture shaders (see NV_texture_shader specification) are enabled, if the current fragment program or texture shader computes per-fragment depth values. In this case, the view volume is defined by: -w_c <= x_c <= w_c and -w_c <= y_c <= w_c. Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization) Modify Chapter 3 introduction (p. 57) (p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization process. The color value assigned to a fragment is initially determined by the rasterization operations (Sections 3.3 through 3.7) and modified by either the execution of the texturing, color sum, and fog operations as defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined in Section 3.11. The final depth value is initially determined by the rasterization operations and may be modified by a fragment program. note: Antialiasing Application is renumbered from Section 3.11 to Section 3.12. Modify Figure 3.1 (p.58) Primitive Assembly | +-----------+-----------+-----------+-----------+ | | | | | | | | Pixel | Point Line Polygon Rectangle Bitmap Raster- Raster- Raster- Raster- Raster- ization ization ization ization ization | | | | | +-----------+-----------+-----------+-----------+ | | +-----------------+-----------------+ | | | Conventional Texture Fragment Texture Fetch Shaders Programs | | | | +--------------+ | | | | TEXTURE_ o o | SHADER_NV | enable o | | | +-------------+ | | | | Conventional Register | TexEnv Combiners | | | | Color Sum | | | | | Fog | | | | | | +----------+ | | | | REGISTER_ o o | COMBINERS_ | NV enable o | | | +-----------------+ +--------------+ | | FRAGMENT_ o o PROGRAM_ NV enable o | | Coverage Application | v to fragment processing Modify Section 3.3, Points (p.61) All fragments produced in rasterizing a non-antialiased point are assigned the same associated data, which are those of the vertex corresponding to the point. (delete reference to divide by q). If anitialiasing is enabled, then ... The data associated with each fragment are otherwise the data associated with the point being rasterized. (delete reference to divide by q) Modify Section 3.4.1, Basic Line Segment Rasterization (p.66) (Note that t=0 at p_a and t=1 at p_b). The value of an associated datum f from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color index (in color index mode), the s, t, r, or q texture coordinate, or the clip w coordinate (the depth value, window z, must be found using equation 3.3, below), is found as f = (1-t) * f_a / w_a + t * f_b / w_b (3.2) --------------------------------- (1-t) / w_a + t / w_b where f_a and f_b are the data associated with the starting and ending endpoints of the segment, respectively; w_a and w_b are the clip w coordinates of the starting and ending endpoints of the segments respectively. Note that linear interpolation would use f = (1-t) * f_a + t * f_b. (3.3) ... A GL implementation may choose to approximate equation 3.2 with 3.3, but this will normally lead to unacceptable distortion effects when interpolating texture coordinates or clip w coordinates. Modify Section 3.5.1, Basic Polygon Rasterization (p.71) Denote a datum at p_a, p_b, or p_c ... is given by f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c (3.4) --------------------------------------------- a / w_a + b / w_b + c / w_c where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c, respectively. a, b, and c are the barycentric coordinates of the fragment for which the data are produced. a, b, and c must correspond precisely to the exact coordinates ... at the fragment's center. Just as with line segment rasterization, equation 3.4 may be approximated by f = a * f_a + b * f_b + c * f_c; (3.5) this may yield ... for texture coordinates or clip w coordinates. Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100) A fragment arising from a group ... are given by those associated with the current raster position. (delete reference to divide by q) Modify Section 3.7, Bitmaps (p.111) Otherwise, a rectangular array ... The associated data for each fragment are those associated with the current raster position. (delete reference to divide by q) Once the fragments have been produced ... Modify Section 3.8, Texturing (p.112) ... an image at the location indicated by a fragment's texture coordinates to modify the fragments primary RGBA color. Texturing does not affect the secondary color. Texturing is specified only for RGBA mode; its use in color index mode is undefined. Except when in fragment program mode (Section 3.11), the (s,t,r) texture coordinates used for texturing are the values s/q, t/q, and r/q, respectively, where s, t, r, and q are the texture coordinates associated with the fragment. When in fragment program mode, the (s,t,r) texture coordinates are specified by the program. If q is less than or equal to zero, the results of texturing are undefined. Add new Section 3.11, Fragment Programs (p.140) Fragment program mode is enabled and disabled with the Enable and Disable commands using the symbolic constant FRAGMENT_PROGRAM_NV. When fragment program mode is enabled, standard and extended texturing, color sum, and fog application stages are ignored and a general purpose program is executed instead. A fragment program is a sequence of instructions that execute on a per-fragment basis. In fragment program mode, the currently bound fragment program is executed as each fragment is generated by the rasterization operations. Fragment programs execute a finite fixed sequence of instructions with no branching or looping, and operate independently from the processing of other fragments. Fragment programs are used to compute new color values to be associated with each fragment, and can optionally compute a new depth value for each fragment as well. Fragment program mode is not available in color index mode and is considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV. When fragment program mode is enabled, texture shaders and register combiners (NV_texture_shader and NV_register_combiners extension) are disabled, regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV. Section 3.11.1, Fragment Program Registers Fragment programs operate on a set of program registers. Each program register is a 4-component vector, whose components are referred to as "x", "y", "z", and "w" respectively. The components of a fragment register are always referred to in this manner, regardless of the meaning of their contents. The four components of each fragment program register have one of two different representations: 32-bit floating-point (fp32) or 16-bit floating-point (fp16). More details on these representations can be found in Section 3.11.4.1. There are several different classes of program registers. Attribute registers (Table X.1) correspond to the fragment's associated data produced by rasterization. Temporary registers (Table X.2) hold intermediate results generated by the fragment program. Output registers (Table X.3) hold the final results of a fragment program. The single condition code register is used to mask writes to other registers or to determine if a fragment should be discarded. Section 3.11.1.1, Fragment Program Attribute Registers The fragment program attribute registers (Table X.1) hold the location of the fragment and the data associated with the fragment produced by rasterization. Fragment Attribute Component Register Name Description Interpretation -------------- ----------------------------------- -------------- f[WPOS] Position of the fragment center. (x,y,z,1/w) f[COL0] Interpolated primary color (r,g,b,a) f[COL1] Interpolated secondary color (r,g,b,a) f[FOGC] Interpolated fog distance/coord (z,0,0,0) f[TEX0] Texture coordinate (unit 0) (s,t,r,q) f[TEX1] Texture coordinate (unit 1) (s,t,r,q) f[TEX2] Texture coordinate (unit 2) (s,t,r,q) f[TEX3] Texture coordinate (unit 3) (s,t,r,q) f[TEX4] Texture coordinate (unit 4) (s,t,r,q) f[TEX5] Texture coordinate (unit 5) (s,t,r,q) f[TEX6] Texture coordinate (unit 6) (s,t,r,q) f[TEX7] Texture coordinate (unit 7) (s,t,r,q) Table X.1: Fragment Attribute Registers. The component interpretation column describes the mapping of attribute values to register components. For example, the "x" component of f[COL0] holds the red color component, and the "x" component of f[TEX0] holds the "s" texture coordinate for texture unit 0. The entries "0" and "1" indicate that the attribute register components hold the constants 0 and 1, respectively. f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment center, and relative to the lower left corner of the window. f[WPOS].z holds the associated z window coordinate, normally in the range [0,1]. f[WPOS].w holds the reciprocal of the associated clip w coordinate. f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors of the fragment, respectively. f[FOGC] holds the associated eye distance or fog coordinate normally used for fog computations. f[TEX0] through f[TEX7] hold the associated texture coordinates for texture coordinate sets 0 through 7, respectively. All attribute register components are treated as 32-bit floats. However, the components of primary and secondary colors (f[COL0] and f[COL1]) may be generated with reduced precision. The contents of the fragment attribute registers may not be modified by a fragment program. In addition, each fragment program instruction can use at most one unique attribute register. Section 3.11.1.2, Fragment Program Temporary Registers The fragment temporary registers (Table X.2) hold intermediate values used during the execution of a fragment program. There are 96 temporary register names, but not all can be used simultaneously. Fragment Temporary Register Name Description ------------------ ----------------------------------------------------- R0-R31 Four 32-bit (fp32) floating point values (s.e8.m23) H0-H63 Four 16-bit (fp16) floating point values (s.e5.m10) Table X.2: Fragment Temporary Registers. In addition to the normal temporary registers, there are two temporary pseudo-registers, "RC" and "HC". RC and HC are treated as unnumbered, write-only temporary registers. The components of RC have a fp32 data type; the components of HC have a fp16 data type. The sole purpose of these registers is to permit instructions to modify the condition code register (section 3.11.1.4) without overwriting the values in any temporary register. Fragment program instructions can read and write temporary registers. There is no restriction on the number of temporary registers that can be accessed by any given instruction. All temporary registers are initialized to (0,0,0,0) each time a fragment program executes. Section 3.11.1.3, Fragment Program Output Registers The fragment program output registers hold the final results of the fragment program. The possible final results of a fragment program are a high- or low-precision RGBA fragment color, and a fragment depth value. Output Register Name Description ------------- ------------------------------------------------------- o[COLR] Final RGBA fragment color, fp32 format o[COLH] Final RGBA fragment color, fp16 format o[DEPR] Final fragment depth value, fp32 format Table X.3: Fragment Program Output Registers. o[COLR] and o[COLH] specify the color of a fragment. These two registers are identical, except for the associated data type of the components. The R, G, B, and A components of the fragment color are taken from the x, y, z, and w components respectively of the o[COLR] or o[COLH]. A fragment program will fail to load if it writes to both o[COLR] and o[COLH]. o[DEPR] can be used to replace the associated depth value of a fragment. The new depth value is taken from the z component of o[DEPR]. If a fragment program does not write to o[DEPR], the associated depth value is unmodified. A fragment program will fail to load if it does not write to at least one output register. The fragment program output registers may not be read by a fragment program, but may be written to multiple times. The values of all fragment program output registers are initially undefined. Section 3.11.1.4, Fragment Program Condition Code Register The condition code register (CC) is a single four-component vector. Each component of this register is one of four enumerated values: GT (greater than), EQ (equal), LT (less than), or UN (unordered). The condition code register can be used to mask writes to fragment data register components or to terminate processing of a fragment altogether (via the KIL instruction). Most fragment program instructions can optionally update the condition code register. When a fragment program instruction updates the condition code register, a condition code component is set to LT if the corresponding component of the result vector is less than zero, EQ if it is equal to zero, GT if it is greater than zero, and UN if it is NaN (not a number). The condition code register is initialized to a vector of EQ values each time a fragment program executes. Section 3.11.2, Fragment Program Parameters In addition to using the registers defined in Section 3.11.1, fragment programs may also use fragment program parameters in their computation. Fragment program parameters are constant during the execution of fragment programs, but some parameters may be modified outside the execution of a fragment program. There are five different types of program parameters: embedded scalar constants, embedded vector constants, named constants, named local parameters, and numbered local parameters. Embedded scalar constants are written as standard floating-point numbers with an optional sign designator ("+" or "-") and optional scientific notation (e.g., "E+06", meaning "times 10^6"). Embedded vector constants are written as a comma-separated array of one to four scalar constants, surrounded by braces (like a C/C++ array initializer). Vector constants are always treated as 4-component vectors: constants with fewer than four components are expanded to 4-components by filling missing y and z components with 0.0 and missing w components with 1.0. Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}", "{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to "{5,6,7,1}". Named constants allow fragment program instructions to define scalar or vector constants that can be referenced by name. Named constants are created using the DEFINE instruction: DEFINE pi = 3.1415926535; DEFINE color = {0.2, 0.5, 0.8, 1.0}; The DEFINE instruction associates a constant name with a scalar or vector constant value. Subsequent fragment program instructions that use the constant name are equivalent to those using the corresponding constant value. Named local parameters are similar to named vector constants, but their values can be modified after the program is loaded. Local parameters are created using the DECLARE instruction: DECLARE fog_color1; DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1}; The DECLARE instruction creates a 4-component vector associated with the local parameter name. Subsequent fragment program instructions referencing the local parameter name are processed as though the current value of the local parameter vector were specified instead of the parameter name. A DECLARE instruction can optionally specify an initial value for the local parameter, which can be either a scalar or vector constant. Scalar constants are expanded to 4-component vectors by replicating the scalar value in each component. The initial value of local parameters not initialized by the program is (0,0,0,0). A named local parameter for a specific program can be updated using the calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section 5.7). Named local parameters are accessible only by the program in which they are defined. Modifying a local parameter affects the only the associated program and does not affect local parameters with the same name that are found in any other fragment program. Numbered local parameters are similar to named local parameters, except that they are referred to by number and are not declared in fragment programs. Each fragment program object has an array of four-component floating-point vectors that can be used by the program. The number of vectors is given by the implementation-dependent constant MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64. A numbered local parameter is accessed by a fragment program as members of an array called "p". For example, the instruction MOV R0, p[31]; copies the contents of numbered local parameter 31 into temporary register R0. Constant and local parameter names can be arbitrary strings consisting of letters (upper or lower-case), numbers, underscores ("_"), and dollar signs ("$"). Keywords defined in the grammar (including instruction names) can not be used as constant names, nor can strings that start with numbers, or strings that specify valid temporary register or texture numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15"). A fragment program will fail to load if a DEFINE or DECLARE instruction specifies an invalid constant or local parameter name. A fragment program will fail to load if an instruction contains a named parameter not specified in a previous DEFINE or DECLARE instruction. A fragment program will also fail to load if a DEFINE or DECLARE instruction attempts to re-define a named parameter specified in a previous DEFINE or DECLARE instruction. The contents of the fragment program parameters may not be modified by a fragment program. In addition, each fragment program instruction can normally use at most one unique program parameter. The only exception to this rule is if all program parameter references specify named or embedded constants that taken together contain no more than four unique scalar values. For such instructions, the GL will automatically generate an equivalent instruction that references a single merged vector constant. This merging allows programs to specify instructions like the following: Instruction Equivalent Instruction --------------------- --------------------------------------- MAD R0, R1, 2, -1; MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y; ADD R0, {1,2,3,4}, 4; ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w; Before counting the number of unique values, any named constants are first converted to the equivalent embedded constants. When generating a combined vector constant, the GL does not perform swizzling, component selection, negation, or absolute value operations. The following instructions are invalid, as they contain more than four unique scalar values. Invalid Instructions ----------------------------------- ADD R0, {1,2,3,4}, -4; ADD R0, {1,2,3,4}, |-4|; ADD R0, {1,2,3,4}, -{-1,-2,-3,-4}; ADD R0, {1,2,3,4}, {4,5,6,7}.x; Section 3.11.3, Fragment Program Specification Fragment programs are specified as an array of ubytes. The array is a string of ASCII characters encoding the program. The command LoadProgramNV loads a fragment program when the target parameter is FRAGMENT_PROGRAM_NV. The command BindProgramNV enables a fragment program for execution. At program load time, the program is parsed into a set of tokens possibly separated by white space. Spaces, tabs, newlines, carriage returns, and comments are considered whitespace. Comments begin with the character "#" and are terminated by a newline, a carriage return, or the end of the program array. Fragment programs are case-sensitive -- upper and lower case letters are treated differently. The proper choice of case can be inferred from the grammar. The Backus-Naur Form (BNF) grammar below specifies the syntactically valid sequences for fragment programs. The set of valid tokens can be inferred from the grammar. The token "" represents an empty string and is used to indicate optional rules. A program is invalid if it contains any undefined tokens or characters. ::= "END" ::= "!!FP1.0" ::= | ::= ";" | ";" | ";" ::= | | | | | | | ::= "," ::= "DDX" | "DDX_SAT" | "DDXR" | "DDXR_SAT" | "DDXH" | "DDXH_SAT" | "DDXC" | "DDXC_SAT" | "DDXRC" | "DDXRC_SAT" | "DDXHC" | "DDXHC_SAT" | "DDY" | "DDY_SAT" | "DDYR" | "DDYR_SAT" | "DDYH" | "DDYH_SAT" | "DDYC" | "DDYC_SAT" | "DDYRC" | "DDYRC_SAT" | "DDYHC" | "DDYHC_SAT" | "FLR" | "FLR_SAT" | "FLRR" | "FLRR_SAT" | "FLRH" | "FLRH_SAT" | "FLRX" | "FLRX_SAT" | "FLRC" | "FLRC_SAT" | "FLRRC" | "FLRRC_SAT" | "FLRHC" | "FLRHC_SAT" | "FLRXC" | "FLRXC_SAT" | "FRC" | "FRC_SAT" | "FRCR" | "FRCR_SAT" | "FRCH" | "FRCH_SAT" | "FRCX" | "FRCX_SAT" | "FRCC" | "FRCC_SAT" | "FRCRC" | "FRCRC_SAT" | "FRCHC" | "FRCHC_SAT" | "FRCXC" | "FRCXC_SAT" | "LIT" | "LIT_SAT" | "LITR" | "LITR_SAT" | "LITH" | "LITH_SAT" | "LITC" | "LITC_SAT" | "LITRC" | "LITRC_SAT" | "LITHC" | "LITHC_SAT" | "MOV" | "MOV_SAT" | "MOVR" | "MOVR_SAT" | "MOVH" | "MOVH_SAT" | "MOVX" | "MOVX_SAT" | "MOVC" | "MOVC_SAT" | "MOVRC" | "MOVRC_SAT" | "MOVHC" | "MOVHC_SAT" | "MOVXC" | "MOVXC_SAT" | "PK2H" | "PK2US" | "PK4B" | "PK4UB" ::= "," ::= "COS" | "COS_SAT" | "COSR" | "COSR_SAT" | "COSH" | "COSH_SAT" | "COSC" | "COSC_SAT" | "COSRC" | "COSRC_SAT" | "COSHC" | "COSHC_SAT" | "EX2" | "EX2_SAT" | "EX2R" | "EX2R_SAT" | "EX2H" | "EX2H_SAT" | "EX2C" | "EX2C_SAT" | "EX2RC" | "EX2RC_SAT" | "EX2HC" | "EX2HC_SAT" | "LG2" | "LG2_SAT" | "LG2R" | "LG2R_SAT" | "LG2H" | "LG2H_SAT" | "LG2C" | "LG2C_SAT" | "LG2RC" | "LG2RC_SAT" | "LG2HC" | "LG2HC_SAT" | "RCP" | "RCP_SAT" | "RCPR" | "RCPR_SAT" | "RCPH" | "RCPH_SAT" | "RCPC" | "RCPC_SAT" | "RCPRC" | "RCPRC_SAT" | "RCPHC" | "RCPHC_SAT" | "RSQ" | "RSQ_SAT" | "RSQR" | "RSQR_SAT" | "RSQH" | "RSQH_SAT" | "RSQC" | "RSQC_SAT" | "RSQRC" | "RSQRC_SAT" | "RSQHC" | "RSQHC_SAT" | "SIN" | "SIN_SAT" | "SINR" | "SINR_SAT" | "SINH" | "SINH_SAT" | "SINC" | "SINC_SAT" | "SINRC" | "SINRC_SAT" | "SINHC" | "SINHC_SAT" | "UP2H" | "UP2H_SAT" | "UP2HC" | "UP2HC_SAT" | "UP2US" | "UP2US_SAT" | "UP2USC" | "UP2USC_SAT" | "UP4B" | "UP4B_SAT" | "UP4BC" | "UP4BC_SAT" | "UP4UB" | "UP4UB_SAT" | "UP4UBC" | "UP4UBC_SAT" ::= "," "," ::= "POW" | "POW_SAT" | "POWR" | "POWR_SAT" | "POWH" | "POWH_SAT" | "POWC" | "POWC_SAT" | "POWRC" | "POWRC_SAT" | "POWHC" | "POWHC_SAT" ::= "," "," ::= "ADD" | "ADD_SAT" | "ADDR" | "ADDR_SAT" | "ADDH" | "ADDH_SAT" | "ADDX" | "ADDX_SAT" | "ADDC" | "ADDC_SAT" | "ADDRC" | "ADDRC_SAT" | "ADDHC" | "ADDHC_SAT" | "ADDXC" | "ADDXC_SAT" | "DP3" | "DP3_SAT" | "DP3R" | "DP3R_SAT" | "DP3H" | "DP3H_SAT" | "DP3X" | "DP3X_SAT" | "DP3C" | "DP3C_SAT" | "DP3RC" | "DP3RC_SAT" | "DP3HC" | "DP3HC_SAT" | "DP3XC" | "DP3XC_SAT" | "DP4" | "DP4_SAT" | "DP4R" | "DP4R_SAT" | "DP4H" | "DP4H_SAT" | "DP4X" | "DP4X_SAT" | "DP4C" | "DP4C_SAT" | "DP4RC" | "DP4RC_SAT" | "DP4HC" | "DP4HC_SAT" | "DP4XC" | "DP4XC_SAT" | "DST" | "DST_SAT" | "DSTR" | "DSTR_SAT" | "DSTH" | "DSTH_SAT" | "DSTC" | "DSTC_SAT" | "DSTRC" | "DSTRC_SAT" | "DSTHC" | "DSTHC_SAT" | "MAX" | "MAX_SAT" | "MAXR" | "MAXR_SAT" | "MAXH" | "MAXH_SAT" | "MAXX" | "MAXX_SAT" | "MAXC" | "MAXC_SAT" | "MAXRC" | "MAXRC_SAT" | "MAXHC" | "MAXHC_SAT" | "MAXXC" | "MAXXC_SAT" | "MIN" | "MIN_SAT" | "MINR" | "MINR_SAT" | "MINH" | "MINH_SAT" | "MINX" | "MINX_SAT" | "MINC" | "MINC_SAT" | "MINRC" | "MINRC_SAT" | "MINHC" | "MINHC_SAT" | "MINXC" | "MINXC_SAT" | "MUL" | "MUL_SAT" | "MULR" | "MULR_SAT" | "MULH" | "MULH_SAT" | "MULX" | "MULX_SAT" | "MULC" | "MULC_SAT" | "MULRC" | "MULRC_SAT" | "MULHC" | "MULHC_SAT" | "MULXC" | "MULXC_SAT" | "RFL" | "RFL_SAT" | "RFLR" | "RFLR_SAT" | "RFLH" | "RFLH_SAT" | "RFLC" | "RFLC_SAT" | "RFLRC" | "RFLRC_SAT" | "RFLHC" | "RFLHC_SAT" | "SEQ" | "SEQ_SAT" | "SEQR" | "SEQR_SAT" | "SEQH" | "SEQH_SAT" | "SEQX" | "SEQX_SAT" | "SEQC" | "SEQC_SAT" | "SEQRC" | "SEQRC_SAT" | "SEQHC" | "SEQHC_SAT" | "SEQXC" | "SEQXC_SAT" | "SFL" | "SFL_SAT" | "SFLR" | "SFLR_SAT" | "SFLH" | "SFLH_SAT" | "SFLX" | "SFLX_SAT" | "SFLC" | "SFLC_SAT" | "SFLRC" | "SFLRC_SAT" | "SFLHC" | "SFLHC_SAT" | "SFLXC" | "SFLXC_SAT" | "SGE" | "SGE_SAT" | "SGER" | "SGER_SAT" | "SGEH" | "SGEH_SAT" | "SGEX" | "SGEX_SAT" | "SGEC" | "SGEC_SAT" | "SGERC" | "SGERC_SAT" | "SGEHC" | "SGEHC_SAT" | "SGEXC" | "SGEXC_SAT" | "SGT" | "SGT_SAT" | "SGTR" | "SGTR_SAT" | "SGTH" | "SGTH_SAT" | "SGTX" | "SGTX_SAT" | "SGTC" | "SGTC_SAT" | "SGTRC" | "SGTRC_SAT" | "SGTHC" | "SGTHC_SAT" | "SGTXC" | "SGTXC_SAT" | "SLE" | "SLE_SAT" | "SLER" | "SLER_SAT" | "SLEH" | "SLEH_SAT" | "SLEX" | "SLEX_SAT" | "SLEC" | "SLEC_SAT" | "SLERC" | "SLERC_SAT" | "SLEHC" | "SLEHC_SAT" | "SLEXC" | "SLEXC_SAT" | "SLT" | "SLT_SAT" | "SLTR" | "SLTR_SAT" | "SLTH" | "SLTH_SAT" | "SLTX" | "SLTX_SAT" | "SLTC" | "SLTC_SAT" | "SLTRC" | "SLTRC_SAT" | "SLTHC" | "SLTHC_SAT" | "SLTXC" | "SLTXC_SAT" | "SNE" | "SNE_SAT" | "SNER" | "SNER_SAT" | "SNEH" | "SNEH_SAT" | "SNEX" | "SNEX_SAT" | "SNEC" | "SNEC_SAT" | "SNERC" | "SNERC_SAT" | "SNEHC" | "SNEHC_SAT" | "SNEXC" | "SNEXC_SAT" | "STR" | "STR_SAT" | "STRR" | "STRR_SAT" | "STRH" | "STRH_SAT" | "STRX" | "STRX_SAT" | "STRC" | "STRC_SAT" | "STRRC" | "STRRC_SAT" | "STRHC" | "STRHC_SAT" | "STRXC" | "STRXC_SAT" | "SUB" | "SUB_SAT" | "SUBR" | "SUBR_SAT" | "SUBH" | "SUBH_SAT" | "SUBX" | "SUBX_SAT" | "SUBC" | "SUBC_SAT" | "SUBRC" | "SUBRC_SAT" | "SUBHC" | "SUBHC_SAT" | "SUBXC" | "SUBXC_SAT" ::= "," "," "," ::= "MAD" | "MAD_SAT" | "MADR" | "MADR_SAT" | "MADH" | "MADH_SAT" | "MADX" | "MADX_SAT" | "MADC" | "MADC_SAT" | "MADRC" | "MADRC_SAT" | "MADHC" | "MADHC_SAT" | "MADXC" | "MADXC_SAT" | "LRP" | "LRP_SAT" | "LRPR" | "LRPR_SAT" | "LRPH" | "LRPH_SAT" | "LRPX" | "LRPX_SAT" | "LRPC" | "LRPC_SAT" | "LRPRC" | "LRPRC_SAT" | "LRPHC" | "LRPHC_SAT" | "LRPXC" | "LRPXC_SAT" | "X2D" | "X2D_SAT" | "X2DR" | "X2DR_SAT" | "X2DH" | "X2DH_SAT" | "X2DC" | "X2DC_SAT" | "X2DRC" | "X2DRC_SAT" | "X2DHC" | "X2DHC_SAT" ::= ::= "KIL" ::= "," "," ::= "TEX" | "TEX_SAT" | "TEXC" | "TEXC_SAT" | "TXP" | "TXP_SAT" | "TXPC" | "TXPC_SAT" ::= "," "," "," "," ::= "TXD" | "TXD_SAT" | "TXDC" | "TXDC_SAT" ::= | ::= "|" "|" ::= | | | | | ::= | ::= "|" "|" ::= | | | | | | | | | ::= ::= | | "RC" | "HC" ::= "(" ")" | "" ::= | ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" | "TR" | "FL" ::= "" | "." "x" | "." "y" | "." "x" "y" | "." "z" | "." "x" "z" | "." "y" "z" | "." "x" "y" "z" | "." "w" | "." "x" "w" | "." "y" "w" | "." "x" "y" "w" | "." "z" "w" | "." "x" "z" "w" | "." "y" "z" "w" | "." "x" "y" "z" "w" ::= | ::= "f" "[" "]" ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0" | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5" | "TEX6" | "TEX7" ::= | ::= "R0" | "R1" | "R2" | "R3" | "R4" | "R5" | "R6" | "R7" | "R8" | "R9" | "R10" | "R11" | "R12" | "R13" | "R14" | "R15" | "R16" | "R17" | "R18" | "R19" | "R20" | "R21" | "R22" | "R23" | "R24" | "R25" | "R26" | "R27" | "R28" | "R29" | "R30" | "R31" ::= "H0" | "H1" | "H2" | "H3" | "H4" | "H5" | "H6" | "H7" | "H8" | "H9" | "H10" | "H11" | "H12" | "H13" | "H14" | "H15" | "H16" | "H17" | "H18" | "H19" | "H20" | "H21" | "H22" | "H23" | "H24" | "H25" | "H26" | "H27" | "H28" | "H29" | "H30" | "H31" | "H32" | "H33" | "H34" | "H35" | "H36" | "H37" | "H38" | "H39" | "H40" | "H41" | "H42" | "H43" | "H44" | "H45" | "H46" | "H47" | "H48" | "H49" | "H50" | "H51" | "H52" | "H53" | "H54" | "H55" | "H56" | "H57" | "H58" | "H59" | "H60" | "H61" | "H62" | "H63" ::= "o" "[" "]" ::= "COLR" | "COLH" | "DEPR" ::= "p" "[" "]" ::= from 0 to MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1 ::= "." ::= "" | "." ::= "x" | "y" | "z" | "w" ::= "," ::= "TEX0" | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5" | "TEX6" | "TEX7" | "TEX8" | "TEX9" | "TEX10" | "TEX11" | "TEX12" | "TEX13" | "TEX14" | "TEX15" ::= "1D" | "2D" | "3D" | "CUBE" | "RECT" ::= "DEFINE" "=" | "DEFINE" "=" ::= "DECLARE" ::= "" | "=" | "=" ::= {" "}" | ::= | "," | "," "," | "," "," "," ::= | ::= ::= ((name of a scalar constant in a DEFINE instruction)) ::= ((name of a vector constant in a DEFINE instruction)) ::= ((name of a local parameter in a DECLARE instruction)) ::= "-" | "+" | "" ::= "-" | "+" | "" ::= see text below ::= see text below The rule matches a sequence of one or more letters ("A" through "Z", "a" through "z", "_", and "$") and digits ("0" through "9); the first character must be a letter. The underscore ("_") and dollar sign ("$") count as a letters. Upper and lower case letters are different (names are case-sensitive). The rule matches a floating-point constant consisting of an integer part, a decimal point, a fraction part, an "e" or "E", and an optionally signed integer exponent. The integer and fraction parts both consist of a sequence of on or more digits ("0" through "9"). Either the integer part or the fraction parts (not both) may be missing; either the decimal point or the "e" (or "E") and the exponent (not both) may be missing. A fragment program fails to load if it contains more than the maximum number of executable instructions. If ARB_fragment_program is supported, this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the FRAGMENT_PROGRAM_ARB target. Otherwise, the limit is 1024. Executable instructions are those matching the rule in the grammar, and do not include DEFINE or DECLARE instructions. A fragment program fails to load if its total temporary and output register count exceeds 64. Each fp32 temporary or output register used by the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each fp16 temporary or output register used by the program (H0-H63 and o[COLH]) count as a single register. A fragment program fails to load if any instruction sources more than one unique fragment attribute register. Instructions sourcing the same attribute register multiple times are acceptable. A fragment program fails to load if any instruction sources more than one unique program parameter register. Instructions sourcing the same program parameter multiple times are acceptable. A fragment program fails to load if multiple texture lookup instructions reference different targets for the same texture image unit. A fragment program fails to load if it writes to both the o[COLR] and o[COLH] output registers. The error INVALID_OPERATION is generated by LoadProgramNV if a fragment program fails to load because it is not syntactically correct or for one of the semantic restrictions listed above. The error INVALID_OPERATION is generated by LoadProgramNV if a program is loaded for id when id is currently loaded with a program of a different target. A successfully loaded fragment program is parsed into a sequence of instructions. Each instruction is identified by its tokenized name. The operation of these instructions when executed is defined in Sections 3.11.4 and 3.11.5. Section 3.11.4, Fragment Program Operation There are forty-five fragment program instructions. Fragment program instructions may have up to eight variants, including a suffix of "R", "H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix of "C" to allow an update of the condition code register (section 3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to the range [0,1] (section 3.11.4.4). For example, the sixteen forms of the "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC", "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT", "ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT". Some mathematical instructions that support precision suffixes, typically those that involve complicated floating-point computations, do not support the "X" precision suffix. The fragment program instructions and their respective input and output parameters are summarized in Table X.4. Instruction Inputs Output Description ----------------- ------ ------ -------------------------------- ADD[RHX][C][_SAT] v,v v add COS[RH ][C][_SAT] s ssss cosine DDX[RH ][C][_SAT] v v derivative relative to x DDY[RH ][C][_SAT] v v derivative relative to y DP3[RHX][C][_SAT] v,v ssss 3-component dot product DP4[RHX][C][_SAT] v,v ssss 4-component dot product DST[RH ][C][_SAT] v,v v distance vector EX2[RH ][C][_SAT] s ssss exponential base 2 FLR[RHX][C][_SAT] v v floor FRC[RHX][C][_SAT] v v fraction KIL none none conditionally discard fragment LG2[RH ][C][_SAT] s ssss logarithm base 2 LIT[RH ][C][_SAT] v v compute light coefficients LRP[RHX][C][_SAT] v,v,v v linear interpolation MAD[RHX][C][_SAT] v,v,v v multiply and add MAX[RHX][C][_SAT] v,v v maximum MIN[RHX][C][_SAT] v,v v minimum MOV[RHX][C][_SAT] v v move MUL[RHX][C][_SAT] v,v v multiply PK2H v ssss pack two 16-bit floats PK2US v ssss pack two unsigned 16-bit scalars PK4B v ssss pack four signed 8-bit scalars PK4UB v ssss pack four unsigned 8-bit scalars POW[RH ][C][_SAT] s,s ssss exponentiation (x^y) RCP[RH ][C][_SAT] s ssss reciprocal RFL[RH ][C][_SAT] v,v v reflection vector RSQ[RH ][C][_SAT] s ssss reciprocal square root SEQ[RHX][C][_SAT] v,v v set on equal SFL[RHX][C][_SAT] v,v v set on false SGE[RHX][C][_SAT] v,v v set on greater than or equal SGT[RHX][C][_SAT] v,v v set on greater than SIN[RH ][C][_SAT] s ssss sine SLE[RHX][C][_SAT] v,v v set on less than or equal SLT[RHX][C][_SAT] v,v v set on less than SNE[RHX][C][_SAT] v,v v set on not equal STR[RHX][C][_SAT] v,v v set on true SUB[RHX][C][_SAT] v,v v subtract TEX[C][_SAT] v v texture lookup TXD[C][_SAT] v,v,v v texture lookup w/partials TXP[C][_SAT] v v projective texture lookup UP2H[C][_SAT] s v unpack two 16-bit floats UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars UP4B[C][_SAT] s v unpack four signed 8-bit scalars UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation Table X.4: Summary of fragment program instructions. "[RHX]" indicates an optional arithmetic precision suffix. "[C]" indicates an optional condition code update suffix. "[_SAT]" indicates an optional clamp of result vector components to [0,1]. "v" indicates a 4-component vector input or output, "s" indicates a scalar input, and "ssss" indicates a scalar output replicated across a 4-component vector. Section 3.11.4.1: Fragment Program Storage Precision Registers in fragment program are stored in two different representations: 16-bit floating-point (fp16) and 32-bit floating-point (fp32). There is an additional 12-bit fixed-point representation (fx12) used only as an internal representation for instructions with the "X" precision qualifier. In the 32-bit float (fp32) representation, each component is represented in floating-point with eight exponent and twenty-three mantissa bits, as in the standard IEEE single-precision format. If S represents the sign (0 or 1), E represents the exponent in the range [0,255], and M represents the mantissa in the range [0,2^23-1], then a fp32 float is decoded as: (-1)^S * 0.0, if E == 0, (-1)^S * 2^(E-127) * (1 + M/2^23), if 0 < E < 255, (-1)^S * INF, if E == 255 and M == 0, NaN, if E == 255 and M != 0. INF (Infinity) is a special representation indicating numerical overflow. NaN (Not a Number) is a special representation indicating the result of illegal arithmetic operations, such as computing the square root or logarithm of a negative number. Note that all normal fp32 values, zero, and INF have an associated sign. -0.0 and +0.0 are considered equivalent for the purposes of comparisons. This representation is identical to the IEEE single-precision floating-point standard, except that no special representation is provided for denorms -- numbers in the range (-2^-126, +2^-126). All such numbers are flushed to zero. In a 16-bit float (fp16) register, each component is represented similarly, except with only five exponent and ten mantissa bits. If S represents the sign (0 or 1), E represents the exponent in the range [0,31], and M represents the mantissa in the range [0,2^10-1], then an fp32 float is decoded as: (-1)^S * 0.0, if E == 0 and M == 0, (-1)^S * 2^-14 * M/2^10 if E == 0 and M != 0, (-1)^S * 2^(E-15) * (1 + M/2^10), if 0 < E < 31, (-1)^S * INF, if E == 31 and M == 0, or NaN, if E == 31 and M != 0. One important difference is that the fp16 representation, unlike fp32, supports denorms to maximize the limited precision of the 16-bit floating point encodings. In the 12-bit fixed-point (fx12) format, numbers are represented as signed 12-bit two's complement integers with 10 fraction bits. The range of representable values is [-2048/1024, +2047/1024]. Section 3.11.4.2: Fragment Program Operation Precision Fragment program instructions frequently perform mathematical operations. Such operations may be performed at one of three different precisions. Fragment programs can specify the precision of each instruction by using the precision suffix. If an instruction has a suffix of "R", calculations are carried out with 32-bit floating point operands and results. If an instruction has a suffix of "H", calculations are carried out using 16-bit floating point operands and results. If an instruction has a suffix of "X", calculations are carried out using 12-bit fixed point operands and results. For example, the instruction "MULR" performs a 32-bit floating-point multiply, "MULH" performs a 16-bit floating-point multiply, and "MULX" performs a 12-bit fixed-point multiply. If no precision suffix is specified, calculations are carried out using the precision of the temporary register receiving the result. Fragment program instructions may source registers or constants whose precisions differ from the precision specified with the instruction. Instructions may also generate intermediate results with a different precision than that of the destination register. In these cases, the values sourced are converted to the precision specified by the instruction. When converting to fx12 format, -INF and any values less than -2048/1024 become -2048/1024. +INF, and any values greater than +2047/1024 become +2047/1024. NaN becomes 0. When converting to fp16 format, any values less than or equal to -2^16 are converted to -INF. Any values greater than or equal to +2^16 are converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any other values that are not exactly representable in fp16 format are converted to one of the two nearest representable values. When converting to fp32 format, any values less than or equal to -2^128 are converted to -INF. Any values greater than or equal to +2^128 are converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any other values that are not exactly representable in fp32 format are converted to one of the two nearest representable values. Fragment program instructions using the fragment attribute registers f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32 precision, regardless of the precision specified by the instruction. Section 3.11.4.3: Fragment Program Operands Except for KIL, fragment program instructions operate on either vector or scalar operands, indicated in the grammar (see section 3.11.3) by the rules and respectively. The basic set of scalar operands is defined by the grammar rule . Scalar operands can be scalar constants (embedded or named), or single components of vector constants, local parameters, or registers allowed by the rule. A vector component is selected by the rule, where the characters "x", "y", "z", and "w" select the x, y, z, and w components, respectively, of the vector. The basic set of vector operands is defined by the grammar rule . Vector operands can include vector constants, local parameters, or registers allowed by the rule. Basic vector operands can be swizzled according to the rule. In its most general form, the rule matches the pattern ".????" where each question mark is one of "x", "y", "z", or "w". For such patterns, the x, y, z, and w components of the operand are taken from the vector components named by the first, second, third, and fourth character of the pattern, respectively. For example, if the swizzle suffix is ".yzzx" and the specified source contains {2,8,9,0}, the swizzled operand used by the instruction is {8,9,9,2}. If the rule matches "", it is treated as though it were ".xyzw". Operands can optionally be negated according to the rule in or . If the matches "-", each value is negated. The absolute value of operands can be taken if the or rules match or . In this case, the absolute value of each component is taken. In addition, if the rule in or matches "-", the result is then negated. Instructions requiring vector operands can also use scalar operands in the case where the rule matches . In such cases, a 4-component vector is produced by replicating the scalar. After operands are loaded, they are converted to a data type corresponding to the operation precision specified in the fragment program instruction. The following pseudo-code spells out the operand generation process. "SrcT" and "InstT" refer to the data types of the specified register or constant and the instruction, respectively. "VecSrcT" and "VecInstT" refer to 4-component vectors of the corresponding type. "absolute" is TRUE if the operand matches the or rules, and FALSE otherwise. "negateBase" is TRUE if the rule in or matches "-" and FALSE otherwise. "negateAbs" is TRUE if the rule in or matches "-" and FALSE otherwise. The ".c***", ".*c**", ".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained by the swizzle operation. TypeConvert() is assumed to convert a scalar of type SrcT to a scalar of type InstT using the type conversion process specified above. VecInstT VectorLoad(VecSrcT source) { VecSrcT srcVal; VecInstT convertedVal; srcVal.x = source.c***; srcVal.y = source.*c**; srcVal.z = source.**c*; srcVal.w = source.***c; if (negateBase) { srcVal.x = -srcVal.x; srcVal.y = -srcVal.y; srcVal.z = -srcVal.z; srcVal.w = -srcVal.w; } if (absolute) { srcVal.x = abs(srcVal.x); srcVal.y = abs(srcVal.y); srcVal.z = abs(srcVal.z); srcVal.w = abs(srcVal.w); } if (negateAbs) { srcVal.x = -srcVal.x; srcVal.y = -srcVal.y; srcVal.z = -srcVal.z; srcVal.w = -srcVal.w; } convertedVal.x = TypeConvert(srcVal.x); convertedVal.y = TypeConvert(srcVal.y); convertedVal.z = TypeConvert(srcVal.z); convertedVal.w = TypeConvert(srcVal.w); return convertedVal; } InstT ScalarLoad(VecSrcT source) { SrcT srcVal; InstT convertedVal; srcVal = source.c***; if (negateBase) { srcVal = -srcVal; } if (absolute) { srcVal = abs(srcVal); } if (negateAbs) { srcVal = -srcVal; } convertedVal = TypeConvert(srcVal); return convertedVal; } Section 3.11.4.4, Fragment Program Destination Register Update Each fragment program instruction, except for KIL, writes a 4-component result vector to a single temporary or output register. The four components of the result vector are first optionally clamped to the range [0,1]. The components will be clamped if and only if the result clamp suffix "_SAT" is present in the instruction name. The instruction "ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent instruction "ADD" will not. Since the instruction may be carried out at a different precision than the destination register, the components of the results vector are then converted to the data type corresponding to destination register. Writes to individual components of the temporary register are controlled by two sets of enables: individual component write masks specified as part of the instruction and the optional condition code mask. The component write mask is specified by the rule found in the rule. If the optional mask is "", all components are enabled. Otherwise, the optional mask names the individual components to enable. The characters "x", "y", "z", and "w" match the x, y, z, and w components respectively. For example, an optional mask of ".xzw" indicates that the x, z, and w components should be enabled for writing but the y component should not. The grammar requires that the destination register mask components must be listed in "xyzw" order. The optional condition code mask is specified by the rule found in the rule. If matches "", all components are enabled. Otherwise, the condition code register is loaded and swizzled according to the swizzling specified by . Each component of the swizzled condition code is tested according to the rule given by . may have the values "EQ", "NE", "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding condition code field evaluates to equal, not equal, less than, greater than or equal, less than or equal, or greater than, respectively. Comparisons involving condition codes of "UN" (unordered) evaluate to true for "NE" and false otherwise. For example, if the condition code is (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle operation will load (EQ,LT,GT,GT) and the mask will thus will enable writes on the y, z, and w components. In addition, "TR" always enables writes and "FL" always disables writes, regardless of the condition code. Each component of the destination register is updated with the result of the fragment program if and only if the component is enabled for writes by both the component write mask and the optional condition code mask. Otherwise, the component of the destination register remains unchanged. A fragment program instruction can also optionally update the condition code register. The condition code is updated if the condition code register update suffix "C" is present in the instruction name. The instruction "ADDC" will update the condition code; the otherwise equivalent instruction "ADD" will not. If condition code updates are enabled, each component of the destination register enabled for writes is compared to zero. The corresponding component of the condition code is set to "LT", "EQ", or "GT", if the written component is less than, equal to, or greater than zero, respectively. Condition code components are set to "UN" if the written component is NaN. Note that values of -0.0 and +0.0 both evaluate to "EQ". If a component of the destination register is not enabled for writes, the corresponding condition code component is unchanged. In the following example code, # R1=(-2, 0, 2, NaN) R0 CC MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN) MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN) MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT) the first instruction writes (-2,0,2,NaN) to R0 and updates the condition code to (LT,EQ,GT,UN). The second instruction, only the "x", "y", and "z" components of R0 and the condition code are updated, so R0 ends up with (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN). In the third instruction, the condition code mask disables writes to the x component (its condition code field is "EQ"), so R0 ends up with (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT). The following pseudocode illustrates the process of writing a result vector to the destination register. In the example, "ccMaskRule" refers to the condition code mask rule given by (or "" if no rule is specified), "instrmask" refers to the component write mask given by the rule, "updatecc" is TRUE if condition code updates are enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled. "destination" and "cc" refer to the register selected by and the condition code, respectively. boolean TestCC(CondCode field) { switch (ccMaskRule) { case "EQ": return (field == "EQ"); case "NE": return (field != "EQ"); case "LT": return (field == "LT"); case "GE": return (field == "GT" || field == "EQ"); case "LE": return (field == "LT" || field == "EQ"); case "GT": return (field == "GT"); case "TR": return TRUE; case "FL": return FALSE; case "": return TRUE; } enum GenerateCC(DstT value) { if (value == NaN) { return UN; } else if (value < 0) { return LT; } else if (value == 0) { return EQ; } else { return GT; } } void UpdateDestination(VecDstT destination, VecInstT result) { // Load the original destination register and condition code. VecDstT resultDst; VecDstT merged; VecCC mergedCC; // Clamp the result vector components to [0,1], if requested. if (clamp01) { if (result.x < 0) result.x = 0; else if (result.x > 1) result.x = 1; if (result.y < 0) result.y = 0; else if (result.y > 1) result.y = 1; if (result.z < 0) result.z = 0; else if (result.z > 1) result.z = 1; if (result.w < 0) result.w = 0; else if (result.w > 1) result.w = 1; } // Convert the result to the type of the destination register. resultDst.x = TypeConvert(result.x); resultDst.y = TypeConvert(result.y); resultDst.z = TypeConvert(result.z); resultDst.w = TypeConvert(result.w); // Merge the converted result into the destination register, under // control of the compile- and run-time write masks. merged = destination; mergedCC = cc; if (instrMask.x && TestCC(cc.c***)) { merged.x = result.x; if (updatecc) mergedCC.x = GenerateCC(result.x); } if (instrMask.y && TestCC(cc.*c**)) { merged.y = result.y; if (updatecc) mergedCC.y = GenerateCC(result.y); } if (instrMask.z && TestCC(cc.**c*)) { merged.z = result.z; if (updatecc) mergedCC.z = GenerateCC(result.z); } if (instrMask.w && TestCC(cc.***c)) { merged.w = result.w; if (updatecc) mergedCC.w = GenerateCC(result.w); } // Write out the new destination register and result code. destination = merged; cc = mergedCC; } Section 3.11.5, Fragment Program Instruction Set The following sections describe the instruction set available to fragment programs. Section 3.11.5.1, ADD: Add The ADD instruction performs a component-wise add of the two operands to yield a result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = tmp0.x + tmp1.x; result.y = tmp0.y + tmp1.y; result.z = tmp0.z + tmp1.z; result.w = tmp0.w + tmp1.w; The following special-case rules apply to addition: 1. "A+B" is always equivalent to "B+A". 2. NaN + = NaN, for all . 3. +INF + = +INF, for all except NaN and -INF. 4. -INF + = -INF, for all except NaN and +INF. 5. +INF + -INF = NaN. 6. -0.0 + = , for all . 7. +0.0 + = , for all except -0.0. Section 3.11.5.2, COS: Cosine The COS instruction approximates the cosine of the angle specified by the scalar operand and replicates the approximation to all four components of the result vector. The angle is specified in radians and does not have to be in the range [0,2*PI]. tmp = ScalarLoad(op0); result.x = ApproxCosine(tmp); result.y = ApproxCosine(tmp); result.z = ApproxCosine(tmp); result.w = ApproxCosine(tmp); The approximation function ApproxCosine is accurate to at least 22 bits with an angle in the range [0,2*PI]. | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. The error in the approximation will typically increase with the absolute value of the angle when the angle falls outside the range [0,2*PI]. The following special-case rules apply to cosine approximation: 1. ApproxCosine(NaN) = NaN. 2. ApproxCosine(+/-INF) = NaN. 3. ApproxCosine(+/-0.0) = +1.0. Section 3.11.5.3, DDX: Derivative Relative to X The DDX instruction computes approximate partial derivatives of the four components of the single operand with respect to the X window coordinate to yield a result vector. The partial derivative is evaluated at the center of the pixel. f = VectorLoad(op0); result = ComputePartialX(f); Note that the partial derivates obtained by this instruction are approximate, and derivative-of-derivate instruction sequences may not yield accurate second derivatives. For components with partial derivatives that overflow (including +/-INF inputs), the resulting partials may be encoded as large floating-point numbers instead of +/-INF. Section 3.11.5.4, DDY: Derivative Relative to Y The DDY instruction computes approximate partial derivatives of the four components of the single operand with respect to the Y window coordinate to yield a result vector. The partial derivative is evaluated at the center of the pixel. f = VectorLoad(op0); result = ComputePartialY(f); Note that the partial derivates obtained by this instruction are approximate, and derivative-of-derivate instruction sequences may not yield accurate second derivatives. For components with partial derivatives that overflow (including +/-INF inputs), the resulting partials may be encoded as large floating-point numbers instead of +/-INF. Section 3.11.5.5, DP3: 3-Component Dot Product The DP3 instruction computes a three component dot product of the two operands (using the x, y, and z components) and replicates the dot product to all four components of the result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1): result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z); result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z); result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z); result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z); Section 3.11.5.6, DP4: 4-Component Dot Product The DP4 instruction computes a four component dot product of the two operands and replicates the dot product to all four components of the result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1): result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); Section 3.11.5.7, DST: Distance Vector The DST instruction computes a distance vector from two specially- formatted operands. The first operand should be of the form [NA, d^2, d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], where NA values are not relevant to the calculation and d is a vector length. If both vectors satisfy these conditions, the result vector will be of the form [1.0, d, d^2, 1/d]. The exact behavior is specified in the following pseudo-code: tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = 1.0; result.y = tmp0.y * tmp1.y; result.z = tmp0.z; result.w = tmp1.w; Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction (using the same vector for both operands) and 1/d can be obtained from d^2 using the RSQ instruction. This distance vector is useful for per-fragment light attenuation calculations: a DOT3 operation involving the distance vector and an attenuation constants vector will yield the attenuation factor. Section 3.11.5.8, EX2: Exponential Base 2 The EX2 instruction approximates 2 raised to the power of the scalar operand and replicates it to all four components of the result vector. tmp = ScalarLoad(op0); result.x = Approx2ToX(tmp); result.y = Approx2ToX(tmp); result.z = Approx2ToX(tmp); result.w = Approx2ToX(tmp); The approximation function is accurate to at least 22 bits: | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0, and, in general, | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)). The following special-case rules apply to exponential approximation: 1. Approx2ToX(NaN) = NaN. 2. Approx2ToX(-INF) = +0.0. 3. Approx2ToX(+INF) = +INF. 4. Approx2ToX(+/-0.0) = +1.0. Section 3.11.5.9, FLR: Floor The FLR instruction performs a component-wise floor operation on the operand to generate a result vector. The floor of a value is defined as the largest integer less than or equal to the value. The floor of 2.3 is 2.0; the floor of -3.6 is -4.0. tmp = VectorLoad(op0); result.x = floor(tmp.x); result.y = floor(tmp.y); result.z = floor(tmp.z); result.w = floor(tmp.w); The following special-case rules apply to floor computation: 1. floor(NaN) = NaN. 2. floor() = , for -0.0, +0.0, -INF, and +INF. In all cases, the sign of the result is equal to the sign of the operand. Section 3.11.5.10, FRC: Fraction The FRC instruction extracts the fractional portion of each component of the operand to generate a result vector. The fractional portion of a component is defined as the result after subtracting off the floor of the component (see FLR), and is always in the range [0.00, 1.00). For negative values, the fractional portion is NOT the number written to the right of the decimal point -- the fractional portion of -1.7 is not 0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) from -1.7. tmp = VectorLoad(op0); result.x = tmp.x - floor(tmp.x); result.y = tmp.y - floor(tmp.y); result.z = tmp.z - floor(tmp.z); result.w = tmp.w - floor(tmp.w); The following special-case rules, which can be derived from the rules for FLR and ADD apply to fraction computation: 1. fraction(NaN) = NaN. 2. fraction(+/-INF) = NaN. 3. fraction(+/-0.0) = +0.0. Section 3.11.5.11, KIL: Conditionally Discard Fragment The KIL instruction is unlike any other instruction in the instruction set. This instruction evaluates components of a swizzled condition code using a test expression identical to that used to evaluate condition code write masks (Section 3.11.4.4). If any condition code component evaluates to TRUE, the fragment is discarded. Otherwise, the instruction has no effect. The condition code components are specified, swizzled, and evaluated in the same manner as the condition code write mask. if (TestCC(rc.c***) || TestCC(rc.*c**) || TestCC(rc.**c*) || TestCC(rc.***c)) { // Discard the fragment. } else { // Do nothing. } If the fragment is discarded, it is treated as though it were not produced by rasterization. In particular, none of the per-fragment operations (such as stencil tests, blends, stencil, depth, or color buffer writes) are performed on the fragment. Section 3.11.5.12, LG2: Logarithm Base 2 The LG2 instruction approximates the base 2 logarithm of the scalar operand and replicates it to all four components of the result vector. tmp = ScalarLoad(op0); result.x = ApproxLog2(tmp); result.y = ApproxLog2(tmp); result.z = ApproxLog2(tmp); result.w = ApproxLog2(tmp); The approximation function is accurate to at least 22 bits: | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22. Note that for large values of x, there are not enough bits in the floating-point storage format to represent a result that precisely. The following special-case rules apply to logarithm approximation: 1. ApproxLog2(NaN) = NaN. 2. ApproxLog2(+INF) = +INF. 3. ApproxLog2(+/-0.0) = -INF. 4. ApproxLog2(x) = NaN, -INF < x < -0.0. 5. ApproxLog2(-INF) = NaN. Section 3.11.5.13, LIT: Compute Light Coefficients The LIT instruction accelerates per-fragment lighting by computing lighting coefficients for ambient, diffuse, and specular light contributions. The "x" component of the operand is assumed to hold a diffuse dot product (n dot VP_pli, as in the vertex lighting equations in Section 2.13.1). The "y" component of the operand is assumed to hold a specular dot product (n dot h_i). The "w" component of the operand is assumed to hold the specular exponent of the material (s_rm). The "x" component of the result vector receives the value that should be multiplied by the ambient light/material product (always 1.0). The "y" component of the result vector receives the value that should be multiplied by the diffuse light/material product (n dot VP_pli). The "z" component of the result vector receives the value that should be multiplied by the specular light/material product (f_i * (n dot h_i) ^ s_rm). The "w" component of the result is the constant 1.0. Negative diffuse and specular dot products are clamped to 0.0, as is done in the standard per-vertex lighting operations. In addition, if the diffuse dot product is zero or negative, the specular coefficient is forced to zero. tmp = VectorLoad(op0); if (t.x < 0) t.x = 0; if (t.y < 0) t.y = 0; result.x = 1.0; result.y = t.x; result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0; result.w = 1.0; The exponentiation approximation used to compute result.z are identical to that used in the POW instruction, including errors and the processing of any special cases. Section 3.11.5.14, LRP: Linear Interpolation The LRP instruction performs a component-wise linear interpolation to yield a result vector. It interpolates between the components of the second and third operands, using the first operand as a weight. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); tmp2 = VectorLoad(op2); result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x; result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y; result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z; result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w; Section 3.11.5.15, MAD: Multiply and Add The MAD instruction performs a component-wise multiply of the first two operands, and then does a component-wise add of the product to the third operand to yield a result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); tmp2 = VectorLoad(op2); result.x = tmp0.x * tmp1.x + tmp2.x; result.y = tmp0.y * tmp1.y + tmp2.y; result.z = tmp0.z * tmp1.z + tmp2.z; result.w = tmp0.w * tmp1.w + tmp2.w; Section 3.11.5.16, MAX: maximum The MAX instruction computes component-wise maximums of the values in the two operands to yield a result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = max(tmp0.x, tmp1.x); result.y = max(tmp0.y, tmp1.y); result.z = max(tmp0.z, tmp1.z); result.w = max(tmp0.w, tmp1.w); The following special cases apply to the maximum operation: 1. max(A,B) is always equivalent to max(B,A). 2. max(NaN, ) == NaN, for all . Section 3.11.5.17, MIN: minimum The MIN instruction computes component-wise minimums of the values in the two operands to yield a result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = min(tmp0.x, tmp1.x); result.y = min(tmp0.y, tmp1.y); result.z = min(tmp0.z, tmp1.z); result.w = min(tmp0.w, tmp1.w); The following special cases apply to the minimum operation: 1. min(A,B) is always equivalent to min(B,A). 2. min(NaN, ) == NaN, for all . Section 3.11.5.18, MOV: Move The MOV instruction copies the value of the operand to yield a result vector. result = VectorLoad(op0); Section 3.11.5.19, MUL: Multiply The MUL instruction performs a component-wise multiply of the two operands to yield a result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = tmp0.x * tmp1.x; result.y = tmp0.y * tmp1.y; result.z = tmp0.z * tmp1.z; result.w = tmp0.w * tmp1.w; The following special-case rules apply to multiplication: 1. "A*B" is always equivalent to "B*A". 2. NaN * = NaN, for all . 3. +/-0.0 * +/-INF = NaN. 4. +/-0.0 * = +/-0.0, for all except -INF, +INF, and NaN. The sign of the result is positive if the signs of the two operands match and negative otherwise. 5. +/-INF * = +/-INF, for all except -0.0, +0.0, and NaN. The sign of the result is positive if the signs of the two operands match and negative otherwise. 6. +1.0 * = , for all . Section 3.11.5.20, PK2H: Pack Two 16-bit Floats The PK2H instruction converts the "x" and "y" components of the single operand into 16-bit floating-point format, packs the bit representation of these two floats into a 32-bit value, and replicates that value to all four components of the result vector. The PK2H instruction can be reversed by the UP2H instruction below. tmp0 = VectorLoad(op0); /* result obtained by combining raw bits of tmp0.x, tmp0.y */ result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); The result must be written to a register with 32-bit components (an "R" register, o[COLR], or o[DEPR]). A fragment program will fail to load if any other register type is specified. Section 3.11.5.21, PK2US: Pack Two Unsigned 16-bit Scalars The PK2US instruction converts the "x" and "y" components of the single operand into a packed pair of 16-bit unsigned scalars. The scalars are represented in a bit pattern where all '0' bits corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit representations of the two converted components are packed into a 32-bit value, and that value is replicated to all four components of the result vector. The PK2US instruction can be reversed by the UP2US instruction below. tmp0 = VectorLoad(op0); if (tmp0.x < 0.0) tmp0.x = 0.0; if (tmp0.x > 1.0) tmp0.x = 1.0; if (tmp0.y < 0.0) tmp0.y = 0.0; if (tmp0.y > 1.0) tmp0.y = 1.0; us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */ us.y = round(65535.0 * tmp0.y); /* result obtained by combining raw bits of us. */ result.x = ((us.x) | (us.y << 16)); result.y = ((us.x) | (us.y << 16)); result.z = ((us.x) | (us.y << 16)); result.w = ((us.x) | (us.y << 16)); The result must be written to a register with 32-bit components (an "R" register, o[COLR], or o[DEPR]). A fragment program will fail to load if any other register type is specified. Section 3.11.5.22, PK4B: Pack Four Signed 8-bit Scalars The PK4B instruction converts the four components of the single operand into 8-bit signed quantities. The signed quantities are represented in a bit pattern where all '0' bits corresponds to -128/127 and all '1' bits corresponds to +127/127. The bit representations of the four converted components are packed into a 32-bit value, and that value is replicated to all four components of the result vector. The PK4B instruction can be reversed by the UP4B instruction below. tmp0 = VectorLoad(op0); if (tmp0.x < -128/127) tmp0.x = -128/127; if (tmp0.y < -128/127) tmp0.y = -128/127; if (tmp0.z < -128/127) tmp0.z = -128/127; if (tmp0.w < -128/127) tmp0.w = -128/127; if (tmp0.x > +127/127) tmp0.x = +127/127; if (tmp0.y > +127/127) tmp0.y = +127/127; if (tmp0.z > +127/127) tmp0.z = +127/127; if (tmp0.w > +127/127) tmp0.w = +127/127; ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */ ub.y = round(127.0 * tmp0.y + 128.0); ub.z = round(127.0 * tmp0.z + 128.0); ub.w = round(127.0 * tmp0.w + 128.0); /* result obtained by combining raw bits of ub. */ result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); The result must be written to a register with 32-bit components (an "R" register, o[COLR], or o[DEPR]). A fragment program will fail to load if any other register type is specified. Section 3.11.5.23, PK4UB: Pack Four Unsigned 8-bit Scalars The PK4UB instruction converts the four components of the single operand into a packed grouping of 8-bit unsigned scalars. The scalars are represented in a bit pattern where all '0' bits corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit representations of the four converted components are packed into a 32-bit value, and that value is replicated to all four components of the result vector. The PK4UB instruction can be reversed by the UP4UB instruction below. tmp0 = VectorLoad(op0); if (tmp0.x < 0.0) tmp0.x = 0.0; if (tmp0.x > 1.0) tmp0.x = 1.0; if (tmp0.y < 0.0) tmp0.y = 0.0; if (tmp0.y > 1.0) tmp0.y = 1.0; if (tmp0.z < 0.0) tmp0.z = 0.0; if (tmp0.z > 1.0) tmp0.z = 1.0; if (tmp0.w < 0.0) tmp0.w = 0.0; if (tmp0.w > 1.0) tmp0.w = 1.0; ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */ ub.y = round(255.0 * tmp0.y); ub.z = round(255.0 * tmp0.z); ub.w = round(255.0 * tmp0.w); /* result obtained by combining raw bits of ub. */ result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); The result must be written to a register with 32-bit components (an "R" register, o[COLR], or o[DEPR]). A fragment program will fail to load if any other register type is specified. Section 3.11.5.24, POW: Exponentiation The POW instruction approximates the value of the first scalar operand raised to the power of the second scalar operand and replicates it to all four components of the result vector. tmp0 = ScalarLoad(op0); tmp1 = ScalarLoad(op1); result.x = ApproxPower(tmp0, tmp1); result.y = ApproxPower(tmp0, tmp1); result.z = ApproxPower(tmp0, tmp1); result.w = ApproxPower(tmp0, tmp1); The exponentiation approximation function is defined in terms of the base 2 exponentiation and logarithm approximation operations in the EX2 and LG2 instructions, including errors and the processing of any special cases. In particular, ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)). The following special-case rules, which can be derived from the rules in the LG2, MUL, and EX2 instructions, apply to exponentiation: 1. ApproxPower(, ) = NaN, if x < -0.0, 2. ApproxPower(, ) = NaN, if x or y is NaN. 3. ApproxPower(+/-0.0, +/-0.0) = NaN. 4. ApproxPower(+INF, +/-0.0) = NaN. 5. ApproxPower(+1.0, +/-INF) = NaN. 6. ApproxPower(+/-0.0, ) = +0.0, if x > +0.0. 7. ApproxPower(+/-0.0, ) = +INF, if x < -0.0. 8. ApproxPower(+1.0, ) = +1.0, if -INF < x < +INF. 9. ApproxPower(+INF, ) = +INF, if x > +0.0. 10. ApproxPower(+INF, ) = +INF, if x < -0.0. 11. ApproxPower(, +/-0.0) = +1.0, if +0.0 < x < +INF. 12. ApproxPower(, +1.0) ~= , if x >= +0.0. 13. ApproxPower(, +INF) = +0.0, if -0.0 <= x < +1.0, +INF, if x > +1.0, 14. ApproxPower(, -INF) = +INF, if -0.0 <= x < +1.0, +0.0, if x > +1.0, Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and 0*(-INF) = NaN. In many other applications, including the standard C pow() function, 0^0 is defined as 1.0. This behavior can be emulated using additional instructions in much that same way that the pow() function is implemented on many CPUs. Note that a logarithm is involved even if the exponent is an integer. This means that any exponentiating with a negative base will produce NaN. In constrast, it is possible in a "normal" mathematical formulation to raise negative numbers to integral powers (e.g., (-3)^2== 9, and (-0.5)^-2==4). Section 3.11.5.25, RCP: Reciprocal The RCP instruction approximates the reciprocal of the scalar operand and replicates it to all four components of the result vector. tmp = ScalarLoad(op0); result.x = ApproxReciprocal(tmp); result.y = ApproxReciprocal(tmp); result.z = ApproxReciprocal(tmp); result.w = ApproxReciprocal(tmp); The approximation function is accurate to at least 22 bits: | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. The following special-case rules apply to reciprocation: 1. ApproxReciprocal(NaN) = NaN. 2. ApproxReciprocal(+INF) = +0.0. 3. ApproxReciprocal(-INF) = -0.0. 4. ApproxReciprocal(+0.0) = +INF. 5. ApproxReciprocal(-0.0) = -INF. Section 3.11.5.26, RFL: Reflection Vector The RFL instruction computes the reflection of the second vector operand (the "direction" vector) about the vector specified by the first vector operand (the "axis" vector). Both operands are treated as 3D vectors (the w components are ignored). The result vector is another 3D vector (the "reflected direction" vector). The length of the result vector, ignoring rounding errors, should equal that of the second operand. axis = VectorLoad(op0); direction = VectorLoad(op1); tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z); tmp.x = (axis.x * direction.x + axis.y * direction.y + axis.z * direction.z); tmp.x = 2.0 * tmp.x; tmp.x = tmp.x / tmp.w; result.x = tmp.x * axis.x - direction.x; result.y = tmp.x * axis.y - direction.y; result.z = tmp.x * axis.z - direction.z; A fragment program will fail to load if the w component of the result is enabled in the component write mask (see the rule in the grammar). Section 3.11.5.27, RSQ: Reciprocal Square Root The RSQ instruction approximates the reciprocal of the square root of the scalar operand and replicates it to all four components of the result vector. tmp = ScalarLoad(op0); result.x = ApproxRSQRT(tmp); result.y = ApproxRSQRT(tmp); result.z = ApproxRSQRT(tmp); result.w = ApproxRSQRT(tmp); The approximation function is accurate to at least 22 bits: | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0. The following special-case rules apply to reciprocal square roots: 1. ApproxRSQRT(NaN) = NaN. 2. ApproxRSQRT(+INF) = +0.0. 3. ApproxRSQRT(-INF) = NaN. 4. ApproxRSQRT(+0.0) = +INF. 5. ApproxRSQRT(-0.0) = -INF. 6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0. Section 3.11.5.28, SEQ: Set on Equal To The SEQ instruction performs a component-wise comparison of the two operands. Each component of the result vector is 1.0 if the corresponding component of the first operand is equal to that of the second, and 0.0 otherwise. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0; result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0; result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0; result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0; The following special-case rules apply to SEQ: 1. ( == ) and ( == ) always produce the same result. 1. (NaN == ) is FALSE for all , including NaN. 2. (+INF == +INF) and (-INF == -INF) are TRUE. 3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE. Section 3.11.5.29, SFL: Set on False The SFL instruction is a degenerate case of the other "Set on" instructions that sets all components of the result vector to 0.0. result.x = 0.0; result.y = 0.0; result.z = 0.0; result.w = 0.0; Section 3.11.5.30, SGE: Set on Greater Than or Equal The SGE instruction performs a component-wise comparison of the two operands. Each component of the result vector is 1.0 if the corresponding component of the first operands is greater than or equal that of the second, and 0.0 otherwise. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0; result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0; result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0; result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0; The following special-case rules apply to SGE: 1. (NaN >= ) and ( >= NaN) are FALSE for all . 2. (+INF >= +INF) and (-INF >= -INF) are TRUE. 3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE. Section 3.11.5.31, SGT: Set on Greater Than The SGT instruction performs a component-wise comparison of the two operands. Each component of the result vector is 1.0 if the corresponding component of the first operands is greater than that of the second, and 0.0 otherwise. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0; result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0; result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0; result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0; The following special-case rules apply to SGT: 1. (NaN > ) and ( > NaN) are FALSE for all . 2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE. Section 3.11.5.32, SIN: Sine The SIN instruction approximates the sine of the angle specified by the scalar operand and replicates it to all four components of the result vector. The angle is specified in radians and does not have to be in the range [0,2*PI]. tmp = ScalarLoad(op0); result.x = ApproxSine(tmp); result.y = ApproxSine(tmp); result.z = ApproxSine(tmp); result.w = ApproxSine(tmp); The approximation function is accurate to at least 22 bits with an angle in the range [0,2*PI]. | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. The error in the approximation will typically increase with the absolute value of the angle when the angle falls outside the range [0,2*PI]. The following special-case rules apply to cosine approximation: 1. ApproxSine(NaN) = NaN. 2. ApproxSine(+/-INF) = NaN. 3. ApproxSine(+/-0.0) = +/-0.0. The sign of the result is equal to the sign of the single operand. Section 3.11.5.33, SLE: Set on Less Than or Equal The SLE instruction performs a component-wise comparison of the two operands. Each component of the result vector is 1.0 if the corresponding component of the first operand is less than or equal to that of the second, and 0.0 otherwise. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0; result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0; result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0; result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0; The following special-case rules apply to SLE: 1. (NaN <= ) and ( <= NaN) are FALSE for all . 2. (+INF <= +INF) and (-INF <= -INF) are TRUE. 3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE. Section 3.11.5.34, SLT: Set on Less Than The SLT instruction performs a component-wise comparison of the two operands. Each component of the result vector is 1.0 if the corresponding component of the first operand is less than that of the second, and 0.0 otherwise. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0; result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0; result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0; result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0; The following special-case rules apply to SLT: 1. (NaN < ) and ( < NaN) are FALSE for all . 2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE. Section 3.11.5.35, SNE: Set on Not Equal The SNE instruction performs a component-wise comparison of the two operands. Each component of the result vector is 1.0 if the corresponding component of the first operand is not equal to that of the second, and 0.0 otherwise. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0; result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0; result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0; result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0; The following special-case rules apply to SNE: 1. ( != ) and ( != ) always produce the same result. 2. (NaN != ) is TRUE for all , including NaN. 3. (+INF != +INF) and (-INF != -INF) are FALSE. 4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE. Section 3.11.5.36, STR: Set on True The STR instruction is a degenerate case of the other "Set on" instructions that sets all components of the result vector to 1.0. result.x = 1.0; result.y = 1.0; result.z = 1.0; result.w = 1.0; Section 3.11.5.37, SUB: Subtract The SUB instruction performs a component-wise subtraction of the second operand from the first to yield a result vector. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); result.x = tmp0.x - tmp1.x; result.y = tmp0.y - tmp1.y; result.z = tmp0.z - tmp1.z; result.w = tmp0.w - tmp1.w; The SUB instruction is completely equivalent to an identical ADD instruction in which the negate operator on the second operand is reversed: 1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2". 2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2". 3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|". 4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|". Section 3.11.5.38, TEX: Texture Lookup The TEX instruction performs a filtered texture lookup using the texture target given by belonging to the texture image unit given by . values of "1D", "2D", "3D", "CUBE", and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. The (s,t,r) texture coordinates used for the lookup are the x, y, and z components of the single operand. The texture lookup is performed as specified in Section 3.8. The LOD calculations in Section 3.8.5 are performed using an implementation dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. The mapping of filtered texture components to the components of the result vector is dependent on the base internal format of the texture and is specified in Table X.5. Result Vector Components Base Internal Format X Y Z W -------------------- ----- ----- ----- ----- ALPHA 0.0 0.0 0.0 At LUMINANCE Lt Lt Lt 1.0 LUMINANCE_ALPHA Lt Lt Lt At INTENSITY It It It It RGB Rt Gt Bt 1.0 RGBA Rt Gt Bt At HILO_NV (signed) HIt LOt HEMI 1.0 HILO_NV (unsigned) HIt LOt 1.0 1.0 DSDT_NV DSt DTt 0.0 1.0 DSDT_MAG_NV DSt DTt MAGt 1.0 DSDT_MAG_INTENSITY_NV DSt DTt MAGt It FLOAT_R_NV Rt 0.0 0.0 1.0 FLOAT_RG_NV Rt Gt 0.0 1.0 FLOAT_RGB_NV Rt Gt Bt 1.0 FLOAT_RGBA_NV Rt Gt Bt At Table X.5: Mapping of filtered texel components to result vector components for the TEX instruction. 0.0 and 1.0 indicate that the corresponding constant value is written to the result vector. DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY, as specified in the texture's depth texture mode. For HILO_NV textures with signed components, "HEMI" is defined as sqrt(MAX(0, 1-(HIt^2+LOt^2))). This instruction specifies a particular texture target, ignoring the standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended OpenGL. If the specified texture target has a consistent set of images, a lookup is performed. Otherwise, the result of the instruction is the vector (0,0,0,0). Although this instruction allows the selection of any texture target, a fragment program can not use more than one texture target for any given texture image unit. Section 3.11.5.39, TXD: Texture Lookup with Derivatives The TXD instruction performs a filtered texture lookup using the texture target given by belonging to the texture image unit given by . values of "1D", "2D", "3D", "CUBE", and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. The (s,t,r) texture coordinates used for the lookup are the x, y, and z components of the first operand. The partial derivatives in the X direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z components of the second operand. The partial derivatives in the Y direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z components of the third operand. The texture lookup is performed as specified in Section 3.8. The LOD calculations in Section 3.8.5 are performed using the specified partial derivatives. The mapping of filtered texture components to the components of the result vector is dependent on the base internal format of the texture and is specified in Table X.5. This instruction specifies a particular texture target, ignoring the standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended OpenGL. If the specified texture target has a consistent set of images, a lookup is performed. Otherwise, the result of the instruction is the vector (0,0,0,0). Although this instruction allows the selection of any texture target, a fragment program can not use more than one texture target for any given texture image unit. Section 3.11.5.40, TXP: Projective Texture Lookup The TXP instruction performs a filtered texture lookup using the texture target given by belonging to the texture image unit given by . values of "1D", "2D", "3D", "CUBE", and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. For cube map textures, the (s,t,r) texture coordinates used for the lookup are given by x, y, and z, respectively. For all other textures, the (s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and z/w, respectively, where x, y, z, and w are the corresponding components of the operand. The texture lookup is performed as specified in Section 3.8. The LOD calculations in Section 3.8.5 are performed using an implementation dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. The mapping of filtered texture components to the components of the result vector is dependent on the base internal format of the texture and is specified in Table X.5. This instruction specifies a particular texture target, ignoring the standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended OpenGL. If the specified texture target has a consistent set of images, a lookup is performed. Otherwise, the result of the instruction is the vector (0,0,0,0). Although this instruction allows the selection of any texture target, a fragment program can not use more than one texture target for any given texture image unit. Section 3.11.5.41, UP2H: Unpack Two 16-Bit Floats The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit scalar operand. The first 16-bit float (stored in the 16 least significant bits) is written into the "x" and "z" components of the result vector; the second is written into the "y" and "w" components of the result vector. This operation undoes the type conversion and packing performed by the PK2H instruction. tmp = ScalarLoad(op0); result.x = (fp16) (RawBits(tmp) & 0xFFFF); result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); result.z = (fp16) (RawBits(tmp) & 0xFFFF); result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); Since the source operand must be a 32-bit scalar, a fragment program will fail to load if the operand is not obtained from a register with 32-bit components or from a program parameter. Section 3.11.5.42, UP2US: Unpack Two Unsigned 16-Bit Scalars The UP2US instruction unpacks two 16-bit unsigned values packed together in a 32-bit scalar operand. The unsigned quantities are encoded where a bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' bits corresponds to 1.0. The "x" and "z" components of the result vector are obtained from the 16 least significant bits of the operand; the "y" and "w" components are obtained from the 16 most significant bits. This operation undoes the type conversion and packing performed by the PK2US instruction. tmp = ScalarLoad(op0); result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; Since the source operand must be a 32-bit scalar, a fragment program will fail to load if the operand is not obtained from a register with 32-bit components or from a program parameter. Section 3.11.5.43, UP4B: Unpack Four Signed 8-Bit Values The UP4B instruction unpacks four 8-bit signed values packed together in a 32-bit scalar operand. The signed quantities are encoded where a bit pattern of all '0' bits corresponds to -128/127 and a pattern of all '1' bits corresponds to +127/127. The "x" component of the result vector is the converted value corresponding to the 8 least significant bits of the operand; the "w" component corresponds to the 8 most significant bits. This operation undoes the type conversion and packing performed by the PK4B instruction. tmp = ScalarLoad(op0); result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0; result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0; result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0; result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0; Since the source operand must be a 32-bit scalar, a fragment program will fail to load if the operand is not obtained from a register with 32-bit components or from a program parameter. Section 3.11.5.44, UP4UB: Unpack Four Unsigned 8-Bit Scalars The UP4UB instruction unpacks four 8-bit unsigned values packed together in a 32-bit scalar operand. The unsigned quantities are encoded where a bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' bits corresponds to 1.0. The "x" component of the result vector is obtained from the 8 least significant bits of the operand; the "w" component is obtained from the 8 most significant bits. This operation undoes the type conversion and packing performed by the PK4UB instruction. tmp = ScalarLoad(op0); result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0; result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0; result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0; result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0; Since the source operand must be a 32-bit scalar, a fragment program will fail to load if the operand is not obtained from a register with 32-bit components or from a program parameter. Section 3.11.5.45, X2D: 2D Coordinate Transformation The X2D instruction multiplies the 2D offset vector specified by the "x" and "y" components of the second vector operand by the 2x2 matrix specified by the four components of the third vector operand, and adds the transformed offset vector to the 2D vector specified by the "x" and "y" components of the first vector operand. The first component of the sum is written to the "x" and "z" components of the result; the second component is written to the "y" and "w" components of the result. The X2D instruction can be used to displace texture coordinates in the same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader extension. tmp0 = VectorLoad(op0); tmp1 = VectorLoad(op1); tmp2 = VectorLoad(op2); result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; Section 3.11.6, Fragment Program Outputs Upon completion of fragment program execution, the output registers are used to replace the fragment's associated data. The RGBA color of the fragment is taken from the color output register used by the program (COLR or COLH). The R, G, B, and A color components are extracted from the "x", "y", "z", and "w" components, respectively, of the output register and are clamped to the range [0,1]. If the DEPR output register is written by the fragment program, the depth value of the fragment is taken from the z component of the DEPR output register. If depth clamping is enabled, the depth value is clamped to the range [min(n,f), max(n,f)], where n and f are the near and far depth range values. If depth clamping is disabled, the fragment is discarded if its depth value is outside the range [min(n,f), max(n,f)]. Section 3.11.7, Required Fragment Program State The state required for managing fragment programs consists of: a bit indicating whether or not fragment program mode is enabled; an unsigned integer naming the currently bound fragment program and the state that must be maintained to indicate which integers are currently in use as fragment program names. Fragment program mode is initially disabled. The initial state of all 128 fragment program parameter registers is (0,0,0,0). The initial currently bound fragment program is zero. Each fragment program object consists of: an enumerant given the program target (FRAGMENT_PROGRAM_NV); a boolean indicating whether the program is resident; an array of type ubyte containing the program string; an integer representing the length of the program string array; one four-component floating-point vector for each named local parameter in the program; and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component floating-point vectors to hold numbered local parameters, each initially set to (0,0,0,0). Initially, no program objects exist. Additionally, the state required during the execution of a fragment program consists of: twelve 4-component floating-point fragment attribute registers, thirty-two 128-bit physical temporary registers, and a single 4-component condition code, whose components have one of four values (LT, EQ, GT, or UN). Each time a fragment program is executed, the fragment attribute registers are initialized with the fragment's location and associated data, all temporary register components are initialized to zero, and all condition code components are initialized to EQ. Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140). No changes to the text of the section. Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment Operations and the Framebuffer) None Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions) Add new section 5.7, Programs (after "Flush and Finish") Programs are specified as an array of ubytes used to control the operation of portions of the GL. The array is a string of ASCII characters encoding the program. The command LoadProgramNV(enum target, uint id, sizei len, const ubyte *program); loads a program. The target parameter specifies the type of program loaded and can be VERTEX_PROGRAM_NV, VERTEX_STATE_PROGRAM_NV, or FRAGMENT_PROGRAM_NV. VERTEX_PROGRAM_NV specifies a program to be executed in vertex program mode as each vertex is specified. VERTEX_STATE_PROGRAM specifies a program to be run manually to update vertex state. FRAGMENT_PROGRAM specifies a program to be executed in fragment program mode as each fragment is rasterized. Multiple programs can be loaded with different names. id names the program to load. The name space for programs is the set of positive integers (zero is reserved). The error INVALID_VALUE is generated by LoadProgramNV if a program is loaded with an id of zero. The error INVALID_OPERATION is generated by LoadProgramNV or if a program is loaded for an id that is currently loaded with a program of a different program target. program is a pointer to an array of ubytes that represents the program being loaded. The length of the array in ubytes is indicated by len. At program load time, the program is parsed into a set of tokens possibly separated by white space. Spaces, tabs, newlines, carriage returns, and comments are considered whitespace. Comments begin with the character "#" and are terminated by a newline, a carriage return, or the end of the program array. Tokens are processed in a case-sensitive manner: upper and lower-case letters are not considered equivalent. Each program target has a corresponding Backus-Naur Form (BNF) grammar specifying the syntactically valid sequences for programs of the specified type. The set of valid tokens can be inferred from the grammar. The token "" represents an empty string and is used to indicate optional rules. A program is invalid if it contains any undefined tokens or characters. The error INVALID_OPERATION is generated by LoadProgramNV if a program fails to load because it is not syntactically correct or fails to satisfy all of the semantic restrictions corresponding to the program target. A successfully loaded program is parsed into a sequence of instructions. Each instruction is identified by its tokenized name. The operation of these instructions is specific to the program target and is defined elsewhere. A successfully loaded program replaces the program previously assigned to the name specified by id. If the OUT_OF_MEMORY error is generated by LoadProgramNV, no change is made to the previous contents of the named program. Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset into the program string most recently passed to LoadProgramNV indicating the position of the first error, if any, in the program. If the program fails to load because of a semantic restriction that cannot be determined until the program is fully scanned, the error position will be len, the length of the program. If the program loads successfully, the value of PROGRAM_ERROR_POSITION_NV is assigned the value negative one. For targets whose programs are executed automatically (e.g., vertex and fragment programs), there must be a current program. The current vertex program is executed automatically in vertex program mode as vertices are specified. The current fragment program is executed automatically in fragment program mode as fragments are generated by rasterization. Current programs for a program target are updated by BindProgramNV(enum target, uint id); where target must be VERTEX_PROGRAM_NV or FRAGMENT_PROGRAM_NV. The error INVALID_OPERATION is generated by BindProgramNV if id names a program that has a type different than target (for example, if id names a vertex state program as described in section 2.14.4). Binding to a nonexistent program id does not generate an error. In particular, binding to program id zero does not generate an error. However, because program zero cannot be loaded, program zero is always nonexistent. If a program id is successfully loaded with a new vertex program and id is also the currently bound vertex program, the new program is considered the currently bound vertex program. The INVALID_OPERATION error is generated when both vertex program mode is enabled and Begin is called (or when a command that performs an implicit Begin is called) if the current vertex program is nonexistent or not valid. A vertex program may not be valid for reasons explained in section 2.14.5. The INVALID_OPERATION error is generated when both fragment program mode is enabled and Begin, another GL command that performs an implicit Begin, or any other GL command that generates fragments is called, if the current fragment program is nonexistent or not valid. A fragment program may be invalid for reasons explained in Section 3.11.3. Programs are deleted by calling void DeleteProgramsNV(sizei n, const uint *ids); ids contains n names of programs to be deleted. After a program is deleted, it becomes nonexistent, and its name is again unused. If a program that is currently bound is deleted, it is as though BindProgramNV has been executed with the same target as the deleted program and program zero. Unused names in ids are silently ignored, as is the value zero. The command void GenProgramsNV(sizei n, uint *ids); returns n currently unused program names in ids. These names are marked as used, for the purposes of GenProgramsNV only, but they become existent programs only when the are first loaded using LoadProgramNV. An implementation may choose to establish a working set of programs on which binding and/or manual execution are performed with higher performance. A program that is currently part of this working set is said to be resident. The command boolean AreProgramsResidentNV(sizei n, const uint *ids, boolean *residences); returns TRUE if all of the n programs named in ids are resident, or if the implementation does not distinguish a working set. If at least one of the programs named in ids is not resident, then FALSE is returned, and the residence of each program is returned in residences. Otherwise the contents of residences are not changed. If any of the names in ids are nonexistent or zero, FALSE is returned, the error INVALID_VALUE is generated, and the contents of residences are indeterminate. The residence status of a single named program can also be queried by calling GetProgramivNV (Section 6.1.13) with id set to the name of the program and pname set to PROGRAM_RESIDENT_NV. AreProgramsResidentNV indicates only whether a program is currently resident, not whether it could not be made resident. An implementation may choose to make a program resident only on first use, for example. The client may guide the GL implementation in determining which programs should be resident by requesting a set of programs to make resident. The command void RequestResidentProgramsNV(sizei n, const uint *ids); requests that the n programs named in ids should be made resident. While all the programs are not guaranteed to become resident, the implementation should make a best effort to make as many of the programs resident as possible. As a result of making the requested programs resident, program names not among the requested programs may become non-resident. Higher priority for residency should be given to programs listed earlier in the ids array. RequestResidentProgramsNV silently ignores attempts to make resident nonexistent program names or zero. AreProgramsResidentNV can be called after RequestResidentProgramsNV to determine which programs actually became resident. The commands void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name, float x, float y, float z, float w); void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name, double x, double y, double z, double w); void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name, const float v[]); void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name, const double v[]); specify a new value for the named program local parameter belonging to the fragment program specified by . is a pointer to an array of ubytes holding the parameter name. specifies the number of ubytes in the array given by . The new x, y, z, and w components of the named local parameter are given by x, y, z, and w, respectively, for ProgramNamedParameter4fNV and ProgramNamedParameter4dNV, and by v[0], v[1], v[2], and v[3], respectively, for ProgramNamedParameter4fvNV and ProgramNamedParameter4dvNV. The error INVALID_OPERATION is generated if specifies a nonexistent program or a program whose type does not suport named local parameters. The error INVALID_VALUE error is generated if does not specify the name of a local parameter in the program corresponding to . The error INVALID_VALUE is also generated if is zero. The commands void ProgramLocalParameter4fARB(enum target, uint index, float x, float y, float z, float w); void ProgramLocalParameter4fvARB(enum target, uint index, const float *params); void ProgramLocalParameter4dARB(enum target, uint index, double x, double y, double z, double w); void ProgramLocalParameter4dvARB(enum target, uint index, const double *params); update the values of the numbered program local parameter belonging to the program object currently bound to . For ProgramLocalParameter4fARB and ProgramLocalParameter4dARB, the four components of the parameter are updated with the values of , , , and , respectively. For ProgramLocalParameter4fvARB and ProgramLocalParameter4dvARB, the four components of the parameter are updated with the array of four values pointed to by . The error INVALID_VALUE is generated if is greater than or equal to the number of numbered program local parameters supported by . Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and State Requests) Modify Section 6.1.11, Pointer and String Queries (p. 206) (modify last paragraph, p. 206) ... The possible values for are VENDOR, RENDERER, VERSION, EXTENSIONS, and PROGRAM_ERROR_STRING_NV. (add after last paragraph of section, p. 207) Queries of PROGRAM_ERROR_STRING_NV return a pointer to an implementation-dependent program load error string. If the last call to LoadProgramNV failed to load a program, the returned string describes a reason that the program failed to load. Otherwise, a pointer to an empty string (containing only a terminator) is returned. Rename and modify Section 6.1.13, Vertex and Fragment Program Queries (from GL_NV_fragment_program). Portions of this section pertaining to fragment programs are copied verbatim. (insert after discussion of GetProgramParameter[fd]vNV) The commands void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name, float *params); void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name, double *params); obtain the current program named local parameter value for the parameter named belonging to the program given by . is a pointer to an array of ubytes holding the parameter name. specifies the number of ubytes in the array given by . The error INVALID_OPERATION is generated if specifies a nonexistent program or a program whose type does not suport named local parameters. The error INVALID_VALUE is generated if does not specify the name of a local parameter in the program corresponding to . The error INVALID_VALUE is also generated if is zero. Each named program local parameter is an array of four values. The commands void GetProgramLocalParameterdvARB(enum target, uint index, double *params); void GetProgramLocalParameterfvARB(enum target, uint index, float *params); obtain the current value for the numbered program local parameter belonging to the program object currently bound to , and places the information in the array . The error INVALID_ENUM is generated if specifies a nonexistent program target or a program target that does not support numbered program local parameters. The error INVALID_VALUE is generated if is greater than or equal to the implementation-dependent number of supported numbered program local parameters for the program target. When the program target type is FRAGMENT_PROGRAM_NV, each numbered program local parameter returned is an array of four values. ... The command void GetProgramivNV(uint id, enum pname, int *params); obtains program state named by pname for the program named id in the array params. pname must be one of PROGRAM_TARGET_NV, PROGRAM_LENGTH_NV, or PROGRAM_RESIDENT_NV. The error INVALID_OPERATION is generated if the program named id does not exist. The command void GetProgramStringNV(uint id, enum pname, ubyte *program); obtains the program string for program id. pname must be PROGRAM_STRING_NV. n ubytes are returned into the array program where n is the length of the program in ubytes. GetProgramivNV with PROGRAM_LENGTH_NV can be used to query the length of a program's string. The INVALID_OPERATION error is generated if the program named id does not exist. ... The command boolean IsProgramNV(uint id); returns TRUE if program is the name of a program object. If program is zero or is a non-zero value that is not the name of a program object, or if an error condition occurs, IsProgramNV returns FALSE. A name returned by GenProgramsNV but not yet loaded with a program is not the name of a program object." Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions) Modify Section F.2.3 (Changes to Section 2.6), p.240 (modify last paragraph on p.240) ... Multiple sets of texture coordinates may be used to specify how multiple texture images are mapped onto a primitive. The number of texture coordinate sets supported is implementation dependent, but must be at least 1. The number of texture coordinate sets supported may be queried with the state MAX_TEXTURE_COORDS_NV. Modify Section F.2.4 (Changes to Section 2.7), p.241 (modify the last paragraph on p.241, carrying over to p.243) Implementations may support more than one set of texture coordinates. The commands void MultiTexCoord{1234}{sifd}ARB(enum texture, T coords) void MultiTexCoord{1234}{sifd}vARB(enum texture, T coords) take the coordinate set to be modified as the parameter. is a symbolic constant of the form TEXTUREi_ARB, indicating that texture coordinate set i is to be modified. The constants obey TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is the implementation dependent number of texture units defined by MAX_TEXTURE_COORDS_NV). Modify Section F.2.5 (Changes to Section 2.8), p.243 (modify first and second paragraphs of section) ... The client may specify up to 5 plus the value of MAX_TEXTURE_COORDS_NV arrays; one each to store vertex coordinates... In implementations which support more than one texture coordinate set, the command void ClientActiveTextureARB(enum texture) is used to select the vertex array client state parameters to be modified by the TexCoordPointer command and the array affected by EnableClientState and DisableClientState with the parameter TEXTURE_COORD_ARRAY. This command sets the state variable CLIENT_ACTIVE_TEXTURE_ARB. Each texture coordinate set has a client state vector which is selected when this command is invoked. This state vector also includes the vertex array state. This command also selects the texture coordinate set state used for queries of client state. (modify first paragraph on p.244) If the number of supported texture coordinate sets (the value of MAX_TEXTURE_COORDS_NV) is k, ... Modify Section F.2.6 (Changes to Section 2.10.2), p.244 (modify first paragraph) For each texture coordinate set, a 4x4 matrix is applied to the corresponding texture coordinates... (replace second and third paragraphs) The command void ActiveTextureARB(enum texture); specifies the active texture unit selector, ACTIVE_TEXTURE_ARB. Each texture unit contains up to two distinct sub-units: a texture coordinate processing unit (consisting of a texture matrix stack and texture coordinate generation state) and a texture image unit (consisting of all the texture state defined in Section 3.8). In implementations with a different number of supported texture coordinate sets and texture image units, some texture units may consist of only one of the two sub-units. The active texture unit selector specifies the texture unit accessed by commands involving texture coordinate processing. Such commands include those accessing the current matrix stack (if MATRIX_MODE is TEXTURE), TexGen (Section 2.10.4), Enable/Disable (if any texture coordinate generation enum is selected), as well as queries of the current texture coordinates and current raster texture coordinates. If the texture unit number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation dependent constant MAX_TEXTURE_COORDS_NV, the error INVALID_OPERATION is generated by any such command. The active texture unit selector also selects the texture unit accessed by commands involving texture image processing (Section 3.8). Such commands include all variants of TexEnv, TexParameter, and TexImage commands, BindTexture, Enable/Disable for any texture target (e.g., TEXTURE_2D), and queries of all such state. If the texture unit number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV, the error INVALID_OPERATION is generated by any such command. ActiveTextureARB generates the error INVALID_ENUM if an invalid is specified. is a symbolic constant of the form TEXTUREi_ARB, indicating that texture unit i is to be modified. The constants obey TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is the larger of the MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV). For compatibility with old OpenGL specifications, the implementation dependent constant MAX_TEXTURE_UNITS_ARB specifies the number of conventional texture units supported by the implementation. Its value must be no larger than the minimum of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV. Modify Section F.2.12 (Changes to Section 3.8.10), p.249 (modify next-to-last paragraph) Texturing is enabled and disabled individually for each texture unit. If texturing is disabled for one of the units, then the fragment resulting from the previous unit is passed unaltered to the following unit. Individual texture units beyond those specified by MAX_TEXTURE_UNITS_ARB may be incomplete and are always treated as disabled. Modify Section F.2.15 (Changes to Section 6.1.2), p.251 (add to end of paragraph) Queries of texture state variables corresponding to texture coordinate processing unit (namely, TexGen state and enables, and matrices) will produce an INVALID_OPERATION error if the value of ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_COORDS_NV. All other texture state queries will result in an INVALID_OPERATION error if the value of ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_IMAGE_UNITS_NV. Additions to the AGL/GLX/WGL Specifications Program objects are shared between AGL/GLX/WGL rendering contexts if and only if the rendering contexts share display lists. No change is made to the AGL/GLX/WGL API. Dependencies on GL_NV_vertex_program If NV_vertex_program is supported, the description of LoadProgramNV in Section 2.14.1.7 (up to the BNF description of vertex programs) is deleted, as it is replaced by the contents of Section 5.7 in this specification. The general error descriptions in Section 2.14.1.7 common to Section 5.7 (like INVALID_OPERATION if the program fails to compile) should also be deleted. Section 2.14.1.8 should also be deleted. Section 6.1.13 is modified by this specification as described above. Dependencies on NV_texture_shader If NV_texture_shader is not supported, the comment about texture shaders being disabled in fragment program mode is not applicable. Dependencies on NV_texture_rectangle If NV_texture_rectangle is not supported, the references to "RECT" in the grammar rule and TEXTURE_RECTANGLE_NV are not applicable. Dependencies on ARB_texture_cube_map If ARB_texture_cube_map is not supported, the references to "CUBE" in the grammar rule and TEXTURE_CUBE_MAP_ARB are not applicable. Dependencies on EXT_fog_coord If EXT_fog_coord is not supported, references to "fog coordinate" in the definition of the "FOGC" fragment attribute register should be removed. Dependencies on NV_depth_clamp If NV_depth_clamp is not supported, section 3.11.6 is modified to remove discussion of the depth clamp enable and instead indicate that fragments with depth values outside [min(n,f), max(n,f)] are always discarded. Dependencies on ARB_depth_texture and SGIX_depth_texture If ARB_depth_texture is not supported, but SGIX_depth_texture is supported, the discussion of Table X.5 is modified to indicate that DEPTH_COMPONENT textures are treated as LUMINANCE. If neither extension is supported, the discussion of DEPTH_COMPONENT textures in Table X.5 should be removed. Dependencies on NV_float_buffer If NV_float_buffer is not supported, references to FLOAT_R_NV, FLOAT_RG_NV, FLOAT_RGB_NV, and FLOAT_RGBA_NV internal texture formats in Table X.5 should be removed. Dependencies on ARB_vertex_program This extension does not have any explicit dependencies, but the APIs for setting and querying numbered local parameters (ProgramLocalParameter*ARB and GetProgramLocalParameter*ARB) were taken directly from this extension, Dependencies on ARB_fragment_program If ARB_fragment_program is not supported, the maximum number of executable instructions in any !!FP1.0 program is 1024. If ARB_fragment_program is supported, the maximum number of executable instructions for an !!FP1.0 is at least 1024, but can be larger. The limit can be queried by calling GetProgramiv with set to FRAGMENT_PROGRAM_ARB and set to MAX_PROGRAM_INSTRUCTIONS_ARB. GLX Protocol Most of the GLX protocol needed to implement this extension is described in the GL_NV_vertex_program extension specification and will not be repeated here. The following two rendering commands are potentially large, and hence can be sent in a glXRender or glXRenderLarge request. ProgramNamedParameter4fvNV 2 28+len+p rendering command length 2 4218 rendering command opcode 4 CARD32 id 4 CARD32 len 4 FLOAT32 params[0] 4 FLOAT32 params[1] 4 FLOAT32 params[2] 4 FLOAT32 params[3] len LISTofCARD8 name p unused, p=pad(len) If the command is encoded in a glxRenderLarge request, the command opcode and command length fields above are expanded to 4 bytes each: 4 32+len+p rendering command length 4 4218 rendering command opcode ProgramNamedParameter4dvNV 2 44+len+p rendering command length 2 4219 rendering command opcode 4 CARD32 id 4 CARD32 len 8 FLOAT64 params[0] 8 FLOAT64 params[1] 8 FLOAT64 params[2] 8 FLOAT64 params[3] len LISTofCARD8 name p unused, p=pad(len) If the command is encoded in a glxRenderLarge request, the command opcode and command length fields above are expanded to 4 bytes each: 4 48+len+p rendering command length 4 4219 rendering command opcode The remaining two commands are non-rendering commands. These commands are sent separately (i.e., not as part of a glXRender or glXRenderLarge request), using the glXVendorPrivateWithReply request: GetProgramNamedParameterfvNV 1 CARD8 opcode (X assigned) 1 17 GLX opcode (glXVendorPrivateWithReply) 2 4+(len+p)/4 request length 4 1310 vendor specific opcode 4 GLX_CONTEXT_TAG context tag 4 INT32 len len LISTofCARD8 name p unused, p=pad(len) => If the command succeeds, 4 floats are sent in the reply: 1 1 reply 1 unused 2 CARD16 sequence number 4 4 reply length 24 unused 16 LISTofFLOAT32 params Otherwise, an empty reply is sent, indicating that a GL error occured: 1 1 reply 1 unused 2 CARD16 sequence number 4 0 reply length 24 unused GetProgramNamedParameterdvNV 1 CARD8 opcode (X assigned) 1 17 GLX opcode (glXVendorPrivateWithReply) 2 4+(len+p)/4 request length 4 1311 vendor specific opcode 4 GLX_CONTEXT_TAG context tag 4 INT32 len len LISTofCARD8 name p unused, p=pad(len) => If the command succeeds, 4 doubles are sent in the reply: 1 1 reply 1 unused 2 CARD16 sequence number 4 8 reply length 24 unused 32 LISTofFLOAT64 params Otherwise, an empty reply is sent, indicating that a GL error occured: 1 1 reply 1 unused 2 CARD16 sequence number 4 0 reply length 24 unused Errors INVALID_OPERATION is generated by Begin, DrawPixels, Bitmap, CopyPixels, or a command that performs an explicit Begin if FRAGMENT_PROGRAM_NV is enabled and the currently bound fragment program does not exist. INVALID_OPERATION is generated by ProgramNamedParameter4fNV, ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV, ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or GetProgramNamedParameterdvNV if specifies a nonexistent program or a program whose type does not suport local parameters. INVALID_VALUE is generated by ProgramNamedParameter4fNV, ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV, ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or GetProgramNamedParameterdvNV if is zero. INVALID_VALUE is generated by ProgramNamedParameter4fNV, ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV, ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or GetProgramNamedParameterdvNV if does not specify the name of a local parameter in the program corresponding to . INVALID_OPERATION is generated by any command accessing texture coordinate processing state if the texture unit number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation dependent constant MAX_TEXTURE_COORDS_NV. INVALID_OPERATION is generated by any command accessing texture image processing state if the texture unit number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV. (The following are error descriptions copied from GL_NV_vertex_program that apply to this extension as well. These modifications do not affect the behavior of that extension.) INVALID_VALUE is generated by LoadProgramNV if id is zero. INVALID_OPERATION is generated by LoadProgramNV if the program corresponding to id is currently loaded but has a program type different from that given by target. INVALID_OPERATION is generated by LoadProgramNV if the program specified is syntactically incorrect for the program type specified by target. The value of PROGRAM_ERROR_POSITION_NV is still updated when this error is generated. INVALID_OPERATION is generated by LoadProgramNV if the program specified fails to conform to any of the semantic restrictions imposed on programs of the type specified by target. The value of PROGRAM_ERROR_POSITION_NV is still updated when this error is generated. INVALID_OPERATION is generated by BindProgramNV if target does not match the type of the program named by id. INVALID_VALUE is generated by AreProgramsResidentNV if any of the queried programs are zero or do not exist. INVALID_OPERATION is generated by GetProgramivNV or GetProgramStringNV if the program named id does not exist. New State Get Value Type Get Command Initial Value Description Section Attribute --------------------------------- ---- ----------------------- ------------- ------------------ -------- ------------ FRAGMENT_PROGRAM_NV B IsEnabled FALSE fragment program 3.11 enable mode enable FRAGMENT_PROGRAM_BINDING_NV Z+ GetIntegerv 0 bound fragment 5.7 - program Table X.6. New State Introduced by NV_fragment_program. Get Value Type Get Command Initial Value Description Section Attribute ------------------------- ------ ------------------ ------------- ------------------ -------- --------- PROGRAM_ERROR_POSITION_NV Z GetIntegerv -1 program error 5.7 - position PROGRAM_TARGET_NV Z2 GetProgramivNV 0 program target 6.1.13 - PROGRAM_LENGTH_NV Z+ GetProgramivNV 0 program length 6.1.13 - PROGRAM_RESIDENT_NV Z2 GetProgramivNV False program residency 6.1.13 - PROGRAM_STRING_NV ubxn GetProgramStringNV "" program string 6.1.13 - - nxR4 GetProgramNamed- (0,0,0,0) named program local 5.7 - ParameterNV parameter value - 64+xR4 GetProgramLocal- (0,0,0,0) numbered program 5.7 - ParameterARB local parameter Table X.7. Program Object State common to NV_vertex_program and NV_fragment_program. Get Value Type Get Command Initial Value Description Section Attribute --------- ------ ----------- ------------- ----------------------- -------- --------- - 12xR4 - fragment data fragment attribute registers 3.11.1.1 - - 16xR4 - (0,0,0,0) fp32 temporary registers 3.11.1.2 - - 32xR4 - (0,0,0,0) fp16 temporary registers 3.11.1.2 - (Z_4)4 - (EQ,EQ,EQ,EQ) condition code register 3.11.1.4 - address register Table X.8. Fragment Program Per-Fragment Execution State. New Implementation Dependent State Minimum Get Value Type Get Command Value Description Section Attribute --------- ---- ----------- ------- ----------------- ------- --------- MAX_TEXTURE_COORDS_NV Z+ GetIntegerv 2 number of texture 2.6 - coordinate sets supported MAX_TEXTURE_IMAGE_UNITS_NV Z+ GetIntegerv 2 number of texture 2.10.2 - image units supported MAX_FRAGMENT_PROGRAM_ Z+ GetIntegerv 64 number of numbered 3.11.7 - LOCAL_PARAMETERS_NV local parameters supported Revision History Rev. Date Author Changes ---- -------- -------- -------------------------------------------- 73 05/23/05 pbrown Fixed cut-and-paste error in the dependency section where it said "NV_texture_rectangle" instead of "ARB_texture_cube_map". 72 05/16/04 pbrown Documented that it's not possible to results from LG2 that are any more precise than what is available in the fp32 storage format. 71 04/23/04 pbrown Fixed incorrect example. 70 03/20/03 pbrown Made the instruction count limit for !!FP1.0 programs queryable instead of a hard-wired value of 1024. The limit can be queried using ARB_fragment_program mechanisms, and remains 1024 if ARB_fragment_program is unsupported. 69 02/01/03 pbrown Removed support for combiner fragment programs (!!FCP1.0). 68 01/08/03 pbrown Correct spec language providing examples of NaNs, such as sqrt(-1) or log(-1). Division by zero produces an infinity, not a NaN. 67 12/23/02 pbrown Fix incorrect syntax of examples of "KIL" instruction. The condition code test is not parenthesized in KIL. 66 10/31/02 pbrown Cleaned up special cases of POW, including the fact that "POW dst, 0, 0" produces NaN in this spec, not 1.0. 65 10/28/02 pbrown Documented that signed HILO textures will have the hemisphere remapping applied, but unsigned textures will not. 64 09/17/02 pbrown Minor typo fixes. 63 08/14/02 pbrown Clarified the value of the "other" components of f[FOGC]. 62 07/24/02 pbrown Removed PK4UBG and UP4UBG instructions. Simplified the implementation of the temporary and output register limit for combiner programs by counting all four o[TEXn] registers against the limit, whether or not they are written. 61 07/19/02 pbrown Renamed ProgramLocalParameter*NV to ProgramNamedParameter*NV to eliminate naming conflicts with ARB_vertex_program (and presumably ARB_fragment_program). Added support for numbered program local parameters for compatibility with the ARB vertex program extension (and upcoming ARB fragment program extension), so it's possible to set local parameters the same way in both extensions. Eliminated the language describing "register slots" and how the "H" and "R" registers overlap. Instead, registers are guaranteed not to overlap, and a semantic limit is added on the number of temporaries and output registers that can be used by a program. Eliminated the requirement that non-combiner programs actually write a color value; the only requirement is that one output register be written. When using fragment programs that use depth replacement, there may not be a need to compute color if color writes are currently disabled Cleaned up the issues section. Added several examples of fragment program operation. Cleaned up GLX protocol. 59 07/07/02 pbrown Minor clarifications of texture lookup handling. Documented that DDX and DDY may not always produce infinities. 58 06/27/02 pbrown Added clarification that instructions can use the same attribute or parameter register more than once. Added support for "X" precision on the "set on" instructions. Removed "X" precision support from DST. 57 06/27/02 pbrown Added missing table entries covering the use of floating-point textures. 56 06/27/02 pbrown Modified the spec to indicate that depth textures are treated as alpha, luminance, or intensity according to the depth texture mode in ARB_shadow. 55 06/26/02 pbrown Fixed the correct aliased register number and "read-only" mappings for o[DEPR] in combiner programs. 54 06/05/02 pbrown Fixed the spec to indicate that near and far frustum clipping is disabled for depth replacement programs. Fixed the spec to indicate that the register combiners enable is overridden for fragment programs (enabled for combiner programs, disabled for color programs). 53 05/20/02 pbrown Miscellaneous bug fixes for wording and special-case handling errors. 52 05/16/02 pbrown Added "_SAT" suffix to clamp result vector components to [0,1]. Fixed special case rules for MUL instruction and the "UN" condition code. 50 04/19/02 pbrown Added "$" as a legal character in an identifier name. Added example for fixed and conditional write masks and condition code updates. 49 04/16/02 pbrown Added new query of PROGRAM_ERROR_STRING_NV to return more detailed information on program load failures. 48 04/02/02 pbrown Added missing enum value for the FRAGMENT_PROGRAM_BINDING_NV query. 47 03/15/02 pbrown Fixed various typos, and an incorrect description of the MAX operation. 45 01/31/02 pbrown Renamed the packing and unpacking opcode to more closely match OpenGL data type naming conventions (PK2 becomes PK2H, PK16 becomes PH2US, PK4 becomes PK4B, PKB becomes PK4UB). Renamed "BEM" instruction to "X2D" to reflect the fact that it does a 2D coordinate transformation (not just a bump mapping operation). Added PK4UBG and UP4UBG instructions to support sRGB gamma correction when packing and unpacking components. 44 01/18/02 pbrown Double the number of available temporaries (16 to 32 fp32 vectors). Add BEM (texture coordinate offset), PKB/UPB (unsigned byte packing), and PK16/UP16 (unsigned short packing) instructions. 43 01/04/02 pbrown Documented special cases for comparisons, including the handling of NaN in the SNE instruction. Added automatic generation of a third normal component for HILO textures. Documented the restriction that RFL can't write to the w component of the result. Trivial fix of the special-cases for RCP. Fixed minor typo on the TEX instruction. 40 11/26/01 pbrown Eliminated "X" precision specifier on those instructions that do complicated math or don't otherwise need it (e.g., "SGE"). Fixed special case math on LG2 instruction. Eliminated incorrectly specified exponent clamping on LIT instruction. Fixed description and special-case math on LIT/POW instructions. Specified that combiner program outputs are clamped to [-1,+1], not [+0,+1]. 39 11/16/01 pbrown Added semantic restriction that PK2/PK4 must write to a 32-bit register. Cleaned up the converse restrictions on UP2/UP4, making sure to allow UP2/UP4 from a program parameter. Fix section numberings and a few typos. 36 11/07/01 pbrown Cleaned up explanation of the "negative q is undefined" for texture mapping spec restriction. Fixed a nit on the number of condition code values (now 4 with UN - unordered). 35 10/29/01 pbrown Add a SUB instruction for programmer convenience. Moved unresolved issue list back to the "Issues" section. Fix several minor wording issues. Clarify register combiners/texture shader/fragment program flow control diagram. 32 10/19/01 pbrown Document the fragment program restriction that instructions involving f[FOGC] and f[TEX0-TEX7] are always carried out at fp32 precision. 31 10/19/01 pbrown Fixed incorrect description of encoding of fp16 denorms. 30 10/12/01 pbrown Documented (0,0,0,0) local parameter initialization. Disallow multiple defines of the same token. Allow tokens that look like a possible register or texture name, but have numbers that are too big (e.g., "TEX24", "R37"). Fixed up several grammar bugs. Documented that LG2 and RSQ now do not automatically take absolute values, plus new math special cases.