Name
NV_fragment_program_option
Name Strings
GL_NV_fragment_program_option
Contact
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
Status
Shipping.
Version
Last Modified: 05/27/2005
NVIDIA Revision: 4
Number
303
Dependencies
ARB_fragment_program is required.
Overview
This extension provides additional fragment program functionality
to extend the standard ARB_fragment_program language and execution
environment. ARB programs wishing to use this added functionality
need only add:
OPTION NV_fragment_program;
to the beginning of their fragment programs.
The functionality provided by this extension, which is roughly
equivalent to that provided by the NV_fragment_program extension,
includes:
* increased control over precision in arithmetic computations and
storage,
* data-dependent conditional writemasks,
* an absolute value operator on scalar and swizzled operand loads,
* instructions to compute partial derivatives, and perform texture
lookups using specified partial derivatives,
* fully orthogonal "set on" instructions,
* instructions to compute reflection vector and perform a 2D
coordinate transform, and
* instructions to pack and unpack multiple quantities into a single
component.
Issues
Why is this a separate extension, rather than just an additional
feature of NV_fragment_program?
RESOLVED: The NV_fragment_program specification was complete
(with a published implementation) prior to the completion of
ARB_fragment_program. Future NVIDIA fragment program extensions
should contain extensions to the ARB_fragment_program execution
environment as a standard feature.
Should a similar option be provided to expose ARB_fragment_program
features not found in NV_fragment_program (e.g., state bindings,
certain "macro" instructions) under the NV_fragment_program
interface?
RESOLVED: No. Why not just write an ARB program?
The ARB_fragment_program spec has a minor grammar bug that requires
that inline scalar constants used as scalar operands include a
component selector. In other words, you have to say "11.0.x" to
use the constant "11.0". What should we do here?
RESOLVED: The NV_fragment_program_option grammar will correct
this problem, which should be fixed in future revisions to the
ARB language.
New Procedures and Functions
None.
New Tokens
None.
Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)
None.
Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization)
Modify Section 3.11.2 of ARB_fragment_program (Fragment Program
Grammar and Restrictions):
(mostly add to existing grammar rules, modify a few existing grammar
rules -- changes marked with "***")
::= "NV_fragment_program"
::= ::= "DDX"
| "DDY"
| "PK2H"
| "PK2US"
| "PK4B"
| "PK4UB"
::= "UP2H"
| "UP2US"
| "UP4B"
| "UP4UB"
::= "RFL"
| "SEQ"
| "SFL"
| "SGT"
| "SLE"
| "SNE"
| "STR"
::= "X2D"
::= "," ","
"," ","
::= "TXD"
::= ::= ::= "|" "|"
::= ::= "|" "|"
::= ::= ::= "TEMP" ::= "OUTPUT" "="
::= "SHORT"
| "LONG"
::=
(*** instead of )
::=
(*** instead of )
::= "(" ")"
::= ::= "EQ"
| "GE"
| "GT"
| "LE"
| "LT"
| "NE"
| "TR"
| "FL"
(modify language describing reserved keywords) The following strings
are reserved keywords and may not be used as identifiers:
ALIAS, ATTRIB, END, OPTION, OUTPUT, PARAM, TEMP, fragment,
program, result, state, and texture.
Additionally, all the instruction names (and variants) listed in
Table X.5 are reserved.
Modify Section 3.11.3.3, Fragment Program Temporaries
(replace second paragraph) Fragment program temporary variables
can be declared explicitly using the grammar
rule. Each such statement can declare one or more temporaries.
Temporary declaration can optionally specify a variable size,
using the grammar rule. Variables declared as "SHORT"
will represented with at least 16 bits per component (5 bits of
exponent, 10 bits of mantissa). Variables declared as "LONG" will be
represented with at least 32 bits per component (8 bits of exponent,
23 bits of mantissa). Fragment program temporary variables can not
be declared implicitly.
Modify Section 3.11.3.4, Fragment Program Results
(replace second paragraph) Fragment program result variables
can be declared explicitly using the grammar
rule, or implicitly using the grammar rule in an
executable instruction. Explicit result variable declaration can
optionally specify a variable size, using the grammar rule.
Variables declared as "SHORT" will represented with at least 16
bits per component (5 bits of exponent, 10 bits of mantissa).
Variables declared as "LONG" will be represented with at least
32 bits per component (8 bits of exponent, 23 bits of mantissa).
Each fragment program result variable is bound to a fragment attribute
used in subsequent back-end processing. The set of fragment program
result variable bindings is given in Table X.3.
(add to the end of a section) A fragment program will fail to load if
contains instructions writing to variables bound to the same result,
but declared with different sizes.
Add New Section 3.11.3.X, Condition Code Register (insert after
Section 3.11.3.4, Fragment Program Results)
The fragment program condition code register is a single
four-component vector. Each component of this register is one of four
enumerated values: GT (greater than), EQ (equal), LT (less than),
or UN (unordered). The condition code register can be used to mask
writes to registers and to evaluate conditional branches.
Most fragment program instructions can optionally update the condition
code register. When a fragment program instruction updates the
condition code register, a condition code component is set to LT if
the corresponding component of the result is less than zero, EQ if it
is equal to zero, GT if it is greater than zero, and UN if it is NaN
(not a number).
The condition code register is initialized to a vector of EQ values
each time a fragment program executes.
Modify Section 3.11.4, Fragment Program Execution Environment
(modify instruction table) There are fifty-two fragment program
instructions. Fragment program instructions may have up to sixteen
variants, including a suffix of "R", "H", or "X" to specify arithmetic
precision (section 3.11.4.X), a suffix of "C" to allow an update
of the condition code register (section 3.11.3.X), and a suffix of
"_SAT" to clamp the result vector components to the range [0,1]
(section 3.11.4.3). For example, the sixteen forms of the "ADD"
instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC",
"ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT",
"ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT".The instructions
and their respective input and output parameters are summarized in
Table X.5.
Modifiers
Instr. R H X C S Inputs Output Description
------- - - - - - ------ ------ --------------------------------
ABS X X X X X v v absolute value
ADD X X X X X v,v v add
CMP - - - - X v,v,v v compare
COS X X - X X s ssss cosine with reduction to [-PI,PI]
DDX X X - X X v v partial derivative relative to X
DDY X X - X X v v partial derivative relative to Y
DP3 X X X X X v,v ssss 3-component dot product
DP4 X X X X X v,v ssss 4-component dot product
DPH X X X X X v,v ssss homogeneous dot product
DST X X - X X v,v v distance vector
EX2 X X - X X s ssss exponential base 2
FLR X X X X X v v floor
FRC X X X X X v v fraction
KIL - - - - - v or c - kill fragment
LG2 X X - X X s ssss logarithm base 2
LIT X X - X X v v compute light coefficients
LRP X X X X X v,v,v v linear interpolation
MAD X X X X X v,v,v v multiply and add
MAX X X X X X v,v v maximum
MIN X X X X X v,v v minimum
MOV X X X X X v v move
MUL X X X X X v,v v multiply
PK2H - - - - - v ssss pack two 16-bit floats
PK2US - - - - - v ssss pack two unsigned 16-bit scalars
PK4B - - - - - v ssss pack four signed 8-bit scalars
PK4UB - - - - - v ssss pack four unsigned 8-bit scalars
POW X X - X X s,s ssss exponentiate
RCP X X - X X s ssss reciprocal
RFL X X - X X v,v v reflection vector
RSQ X X - X X s ssss reciprocal square root
SCS - - - - X s ss-- sine/cosine without reduction
SEQ X X X X X v,v v set on equal
SFL X X X X X v,v v set on false
SGE X X X X X v,v v set on greater than or equal
SGT X X X X X v,v v set on greater than
SIN X X - X X s ssss sine with reduction to [-PI,PI]
SLE X X X X X v,v v set on less than or equal
SLT X X X X X v,v v set on less than
SNE X X X X X v,v v set on not equal
STR X X X X X v,v v set on true
SUB X X X X X v,v v subtract
SWZ - - - - X v v extended swizzle
TEX - - - X X v v texture sample
TXB - - - X X v v texture sample with bias
TXD - - - X X v,v,v v texture sample w/partials
TXP - - - X X v v texture sample with projection
UP2H - - - X X s v unpack two 16-bit floats
UP2US - - - X X s v unpack two unsigned 16-bit scalars
UP4B - - - X X s v unpack four signed 8-bit scalars
UP4UB - - - X X s v unpack four unsigned 8-bit scalars
X2D X X - X X v,v,v v 2D coordinate transformation
XPD - - - - X v,v v cross product
Table X.5: Summary of fragment program instructions. The columns
"R", "H", "X", "C", and "S" indicate whether the "R", "H", or "X"
precision modifiers, the C condition code update modifier, and the
"_SAT" saturation modifier, respectively, are supported for the
opcode. In the input/output columns, "v" indicates a floating-point
vector input or output, "s" indicates a floating-point scalar
input, "ssss" indicates a scalar output replicated across a
4-component result vector, "ss--" indicates two scalar outputs in
the first two components, and "c" indicates a condition code test.
Instructions describe as "texture sample" also specify a texture
image unit identifier and a texture target.
Modify Section 3.11.4.1, Fragment Program Operands
(add prior to the discussion of negation) A component-wise absolute
value operation can optionally performed on the operand if the operand
is surrounded with two "|" characters. For example, "|src|" indicates
that a component-wise absolute value operation should be performed on
the variable named "src". In terms of the grammar, this operation
is performed if the or grammar rules
match or , respectively.
(modify operand load pseudo-code) The following pseudo-code spells
out the operand generation process. In the example, "float" is a
floating-point scalar type, while "floatVec" is a four-component
vector. "source" refers to the register used for the operand,
matching the rule. "abs" is TRUE if an absolute value
operation should be performed on the operand ( or
rules) "negate" is TRUE if the rule
in or matches "-" and FALSE otherwise.
The ".c***", ".*c**", ".**c*", ".***c" modifiers refer to the x,
y, z, and w components obtained by the swizzle operation; the ".c"
modifier refers to the single component selected for a scalar load.
floatVec VectorLoad(floatVec source)
{
floatVec operand;
operand.x = source.c***;
operand.y = source.*c**;
operand.z = source.**c*;
operand.w = source.***c;
if (abs) {
operand.x = abs(operand.x);
operand.y = abs(operand.y);
operand.z = abs(operand.z);
operand.w = abs(operand.w);
}
if (negate) {
operand.x = -operand.x;
operand.y = -operand.y;
operand.z = -operand.z;
operand.w = -operand.w;
}
return operand;
}
float ScalarLoad(floatVec source)
{
float operand;
operand = source.c;
if (abs) {
operand = abs(operand);
if (negate) {
operand = -operand;
}
return operand;
}
Add New Section 3.11.4.X, Fragment Program Operation Precision
(insert after Section 3.11.4,2, Fragment Program Parameter Arrays)
Fragment program implementations may be able to perform instructions
with different levels of arithmetic precision. The "R", "H", and
"X" opcode precision modifiers (Section 3.11.4) specify the minimum
precision used to perform arithmetic operations. Instructions with
an "R" precision modifiers will be carried out at no less than
IEEE single-precision floating-point (8 bits of exponent, 23 bits
of mantissa). Instructions with an "H" precision modifier will
be carried out at no less than 16-bit floating-point precision (5
bits of exponent, 10 bits of mantissa). Instructions with an "X"
precision modifier will be carried out at no less than signed 12-bit
fixed-point precision (two's complement with 10 fraction bits).
If the result of a computation overflows the range of numbers
supported by the instruction precision, the result will be +/-INF
(infinity) for "R" and "H" precision, or -2048/1024 or +2047/1024 for
"X" precision.
If no precision modifier is specified, the instruction will be carried
out with at least as much precision as the destination variable.
Rewrite Section 3.11.4.3, Fragment Program Destination Register
Update
Most fragment program instructions write a 4-component result vector
to a single temporary or fragment result register. Writes to
individual components of the destination register are controlled
by individual component write masks specified as part of the
instruction.
The component write mask is specified by the rule
found in the rule. If the optional mask is "",
all components are enabled. Otherwise, the optional mask names
the individual components to enable. The characters "x", "y",
"z", and "w" match the x, y, z, and w components, respectively.
For example, an optional mask of ".xzw" indicates that the x, z,
and w components should be enabled for writing but the y component
should not. The grammar requires that the destination register mask
components must be listed in "xyzw" order.
The condition code write mask is specified by the rule found
in the rule. The condition code register is loaded and
swizzled according to the swizzle codes specified by .
Each component of the swizzled condition code is tested according to
the rule given by . may have the values
"EQ", "NE", "LT", "GE", LE", or "GT", which mean to enable writes
if the corresponding condition code field evaluates to equal,
not equal, less than, greater than or equal, less than or equal,
or greater than, respectively. Comparisons involving condition
codes of "UN" (unordered) evaluate to true for "NE" and false
otherwise. For example, if the condition code is (GT,LT,EQ,GT)
and the condition code mask is "(NE.zyxw)", the swizzle operation
will load (EQ,LT,GT,GT) and the mask will thus will enable writes on
the y, z, and w components. In addition, "TR" always enables writes
and "FL" always disables writes, regardless of the condition code.
If the condition code mask is empty, it is treated as "(TR)".
Each component of the destination register is updated with the result
of the fragment program instruction if and only if the component is
enabled for writes by both the component write mask and the condition
code write mask. Otherwise, the component of the destination register
remains unchanged.
A fragment program instruction can also optionally update the
condition code register. The condition code is updated if
the condition code register update suffix "C" is present in the
instruction. The instruction "ADDC" will update the condition code;
the otherwise equivalent instruction "ADD" will not. If condition
code updates are enabled, each component of the destination register
enabled for writes is compared to zero. The corresponding component
of the condition code is set to "LT", "EQ", or "GT", if the written
component is less than, equal to, or greater than zero, respectively.
Condition code components are set to "UN" if the written component is
NaN (not a number). Values of -0.0 and +0.0 both evaluate to "EQ".
If a component of the destination register is not enabled for writes,
the corresponding condition code component is also unchanged.
In the following example code,
# R1=(-2, 0, 2, NaN) R0 CC
MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN)
MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN)
MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT)
the first instruction writes (-2,0,2,NaN) to R0 and updates the
condition code to (LT,EQ,GT,UN). The second instruction, only the
"x", "y", and "z" components of R0 and the condition code are updated,
so R0 ends up with (0,2,NaN,NaN) and the condition code ends up with
(EQ,GT,UN,UN). In the third instruction, the condition code mask
disables writes to the x component (its condition code field is "EQ"),
so R0 ends up with (0,0,NaN,-2) and the condition code ends up with
(EQ,EQ,UN,LT).
The following pseudocode illustrates the process of writing a result
vector to the destination register. In the pseudocode, "instrmask"
refers to the component write mask given by the
rule. "ccMaskRule" refers to the condition code mask rule given
by and "updatecc" is TRUE if and only if condition code
updates are enabled. "result", "destination", and "cc" refer to
the result vector, the register selected by and the
condition code, respectively. Condition codes do not exist in the
VP1 execution environment.
boolean TestCC(CondCode field) {
switch (ccMaskRule) {
case "EQ": return (field == "EQ");
case "NE": return (field != "EQ");
case "LT": return (field == "LT");
case "GE": return (field == "GT" || field == "EQ");
case "LE": return (field == "LT" || field == "EQ");
case "GT": return (field == "GT");
case "TR": return TRUE;
case "FL": return FALSE;
case "": return TRUE;
}
}
enum GenerateCC(float value) {
if (value == NaN) {
return UN;
} else if (value < 0) {
return LT;
} else if (value == 0) {
return EQ;
} else {
return GT;
}
}
void UpdateDestination(floatVec destination, floatVec result)
{
floatVec merged;
ccVec mergedCC;
// Merge the converted result into the destination register, under
// control of the compile- and run-time write masks.
merged = destination;
mergedCC = cc;
if (instrMask.x && TestCC(cc.c***)) {
merged.x = result.x;
if (updatecc) mergedCC.x = GenerateCC(result.x);
}
if (instrMask.y && TestCC(cc.*c**)) {
merged.y = result.y;
if (updatecc) mergedCC.y = GenerateCC(result.y);
}
if (instrMask.z && TestCC(cc.**c*)) {
merged.z = result.z;
if (updatecc) mergedCC.z = GenerateCC(result.z);
}
if (instrMask.w && TestCC(cc.***c)) {
merged.w = result.w;
if (updatecc) mergedCC.w = GenerateCC(result.w);
}
// Write out the new destination register and condition code.
destination = merged;
cc = mergedCC;
}
Add to Section 3.11.4.5 of ARB_fragment_program (Fragment Program
Options):
Section 3.11.4.5.3, NV_fragment_program Option
If a fragment program specifies the "NV_fragment_program" option,
the grammar will be extended to support the features found in the
NV_fragment_program extension not present in the ARB_fragment_program
extension, including:
* the availability of the following instructions:
- DDX (partial derivative relative to X),
- DDY (partial derivative relative to Y),
- PK2H (pack as two half floats),
- PK2US (pack as two unsigned shorts),
- PK4B (pack as four signed bytes),
- PK4UB (pack as four unsigned bytes),
- RFL (reflection vector),
- SEQ (set on equal to),
- SFL (set on false),
- SGT (set on greater than),
- SLE (set on less than or equal to),
- SNE (set on not equal to),
- STR (set on true),
- TXD (texture lookup with computed partial derivatives),
- UP2H (unpack two half floats),
- UP2US (unpack two unsigned shorts),
- UP4B (unpack four signed bytes),
- UP4UB (unpack four unsigned bytes), and
- X2D (2D coordinate transformation),
* opcode precision suffixes "R", "H", and "X", to specify
the precision of arithmetic operations ("R" specifies 32-bit
floating-point computations, "H" specifies 16-bit floating-point
computations, and "X" specifies 12-bit signed fixed-point
computations with 10 fraction bits),
* the availability of the "SHORT" and "LONG" variable precision
keywords to control the size of a variable's components,
* a four-component condition code register to hold the sign of
result vector components (useful for comparisons),
* a condition code update opcode suffix "C", where the results of
the instruction are used to update the condition code register,
* a condition code write mask operator, where the condition code
register is swizzled and tested, and the test results are used
to mask register writes,
* an absolute value operator on scalar and swizzled source inputs
The added functionality is identical to that provided by the
NV_fragment_program extension specification.
Modify Section 3.11.5, Fragment Program ALU Instruction Set
Section 3.11.5.30, DDX: Derivative Relative to X
The DDX instruction computes approximate partial derivatives of the
four components of the single operand with respect to the X window
coordinate to yield a result vector. The partial derivatives are
evaluated at the center of the pixel.
f = VectorLoad(op0);
result = ComputePartialX(f);
Note that the partial derivates obtained by this instruction are
approximate, and derivative-of-derivate instruction sequences may
not yield accurate second derivatives.
Section 3.11.5.31, DDY: Derivative Relative to Y
The DDY instruction computes approximate partial derivatives of the
four components of the single operand with respect to the Y window
coordinate to yield a result vector. The partial derivatives are
evaluated at the center of the pixel.
f = VectorLoad(op0);
result = ComputePartialY(f);
Note that the partial derivates obtained by this instruction are
approximate, and derivative-of-derivate instruction sequences may
not yield accurate second derivatives.
Section 3.11.5.32, PK2H: Pack Two 16-bit Floats
The PK2H instruction converts the "x" and "y" components of
the single operand into 16-bit floating-point format, packs the
bit representation of these two floats into a 32-bit value, and
replicates that value to all four components of the result vector.
The PK2H instruction can be reversed by the UP2H instruction below.
tmp0 = VectorLoad(op0);
/* result obtained by combining raw bits of tmp0.x, tmp0.y */
result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
A fragment program will fail to load if it contains a PK2H instruction
that writes its results to a variable declared as "SHORT".
Section 3.11.5.33, PK2US: Pack Two Unsigned 16-bit Scalars
The PK2US instruction converts the "x" and "y" components of the
single operand into a packed pair of 16-bit unsigned scalars.
The scalars are represented in a bit pattern where all '0' bits
corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit
representations of the two converted components are packed into a
32-bit value, and that value is replicated to all four components
of the result vector. The PK2US instruction can be reversed by the
UP2US instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */
us.y = round(65535.0 * tmp0.y);
/* result obtained by combining raw bits of us. */
result.x = ((us.x) | (us.y << 16));
result.y = ((us.x) | (us.y << 16));
result.z = ((us.x) | (us.y << 16));
result.w = ((us.x) | (us.y << 16));
A fragment program will fail to load if it contains a PK2S instruction
that writes its results to a variable declared as "SHORT".
Section 3.11.5.34, PK4B: Pack Four Signed 8-bit Scalars
The PK4B instruction converts the four components of the single
operand into 8-bit signed quantities. The signed quantities
are represented in a bit pattern where all '0' bits corresponds
to -128/127 and all '1' bits corresponds to +127/127. The bit
representations of the four converted components are packed into a
32-bit value, and that value is replicated to all four components
of the result vector. The PK4B instruction can be reversed by the
UP4B instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < -128/127) tmp0.x = -128/127;
if (tmp0.y < -128/127) tmp0.y = -128/127;
if (tmp0.z < -128/127) tmp0.z = -128/127;
if (tmp0.w < -128/127) tmp0.w = -128/127;
if (tmp0.x > +127/127) tmp0.x = +127/127;
if (tmp0.y > +127/127) tmp0.y = +127/127;
if (tmp0.z > +127/127) tmp0.z = +127/127;
if (tmp0.w > +127/127) tmp0.w = +127/127;
ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */
ub.y = round(127.0 * tmp0.y + 128.0);
ub.z = round(127.0 * tmp0.z + 128.0);
ub.w = round(127.0 * tmp0.w + 128.0);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
A fragment program will fail to load if it contains a PK4B instruction
that writes its results to a variable declared as "SHORT".
Section 3.11.5.35, PK4UB: Pack Four Unsigned 8-bit Scalars
The PK4UB instruction converts the four components of the single
operand into a packed grouping of 8-bit unsigned scalars. The scalars
are represented in a bit pattern where all '0' bits corresponds to
0.0 and all '1' bits corresponds to 1.0. The bit representations
of the four converted components are packed into a 32-bit value, and
that value is replicated to all four components of the result vector.
The PK4UB instruction can be reversed by the UP4UB instruction below.
tmp0 = VectorLoad(op0);
if (tmp0.x < 0.0) tmp0.x = 0.0;
if (tmp0.x > 1.0) tmp0.x = 1.0;
if (tmp0.y < 0.0) tmp0.y = 0.0;
if (tmp0.y > 1.0) tmp0.y = 1.0;
if (tmp0.z < 0.0) tmp0.z = 0.0;
if (tmp0.z > 1.0) tmp0.z = 1.0;
if (tmp0.w < 0.0) tmp0.w = 0.0;
if (tmp0.w > 1.0) tmp0.w = 1.0;
ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */
ub.y = round(255.0 * tmp0.y);
ub.z = round(255.0 * tmp0.z);
ub.w = round(255.0 * tmp0.w);
/* result obtained by combining raw bits of ub. */
result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
A fragment program will fail to load if it contains a PK4UB
instruction that writes its results to a variable declared as
"SHORT".
Section 3.11.5.36, RFL: Reflection Vector
The RFL instruction computes the reflection of the second vector
operand (the "direction" vector) about the vector specified by the
first vector operand (the "axis" vector). Both operands are treated
as 3D vectors (the w components are ignored). The result vector is
another 3D vector (the "reflected direction" vector). The length
of the result vector, ignoring rounding errors, should equal that
of the second operand.
axis = VectorLoad(op0);
direction = VectorLoad(op1);
tmp.w = (axis.x * axis.x + axis.y * axis.y +
axis.z * axis.z);
tmp.x = (axis.x * direction.x + axis.y * direction.y +
axis.z * direction.z);
tmp.x = 2.0 * tmp.x;
tmp.x = tmp.x / tmp.w;
result.x = tmp.x * axis.x - direction.x;
result.y = tmp.x * axis.y - direction.y;
result.z = tmp.x * axis.z - direction.z;
A fragment program will fail to load if the w component of the result
is enabled in the component write mask.
Section 3.11.5.37, SEQ: Set on Equal
The SEQ instruction performs a component-wise comparison of the
two operands. Each component of the result vector is 1.0 if the
corresponding component of the first operand is equal to that of
the second, and 0.0 otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0;
Section 3.11.5.38, SFL: Set on False
The SFL instruction is a degenerate case of the other "Set on"
instructions that sets all components of the result vector to 0.0.
result.x = 0.0;
result.y = 0.0;
result.z = 0.0;
result.w = 0.0;
Section 3.11.5.39, SGT: Set on Greater Than
The SGT instruction performs a component-wise comparison of the
two operands. Each component of the result vector is 1.0 if the
corresponding component of the first operands is greater than that
of the second, and 0.0 otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0;
Section 3.11.5.40, SLE: Set on Less Than or Equal
The SLE instruction performs a component-wise comparison of the
two operands. Each component of the result vector is 1.0 if the
corresponding component of the first operand is less than or equal
to that of the second, and 0.0 otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0;
Section 3.11.5.41, SNE: Set on Not Equal
The SNE instruction performs a component-wise comparison of the
two operands. Each component of the result vector is 1.0 if the
corresponding component of the first operand is not equal to that
of the second, and 0.0 otherwise.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0;
result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0;
result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0;
result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0;
Section 3.11.5.42, STR: Set on True
The STR instruction is a degenerate case of the other "Set on"
instructions that sets all components of the result vector to 1.0.
result.x = 1.0;
result.y = 1.0;
result.z = 1.0;
result.w = 1.0;
Section 3.11.5.43, UP2H: Unpack Two 16-Bit Floats
The UP2H instruction unpacks two 16-bit floats stored together in
a 32-bit scalar operand. The first 16-bit float (stored in the 16
least significant bits) is written into the "x" and "z" components
of the result vector; the second is written into the "y" and "w"
components of the result vector.
This operation undoes the type conversion and packing performed by
the PK2H instruction.
tmp = ScalarLoad(op0);
result.x = (fp16) (RawBits(tmp) & 0xFFFF);
result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
result.z = (fp16) (RawBits(tmp) & 0xFFFF);
result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
A fragment program will fail to load if it contains a UP2H instruction
whose operand is a variable declared as "SHORT".
Section 3.11.5.44, UP2US: Unpack Two Unsigned 16-Bit Scalars
The UP2US instruction unpacks two 16-bit unsigned values packed
together in a 32-bit scalar operand. The unsigned quantities are
encoded where a bit pattern of all '0' bits corresponds to 0.0 and
a pattern of all '1' bits corresponds to 1.0. The "x" and "z"
components of the result vector are obtained from the 16 least
significant bits of the operand; the "y" and "w" components are
obtained from the 16 most significant bits.
This operation undoes the type conversion and packing performed by
the PK2US instruction.
tmp = ScalarLoad(op0);
result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0;
result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0;
result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
A fragment program will fail to load if it contains a UP2S instruction
whose operand is a variable declared as "SHORT".
Section 3.11.5.45, UP4B: Unpack Four Signed 8-Bit Values
The UP4B instruction unpacks four 8-bit signed values packed together
in a 32-bit scalar operand. The signed quantities are encoded where
a bit pattern of all '0' bits corresponds to -128/127 and a pattern
of all '1' bits corresponds to +127/127. The "x" component of the
result vector is the converted value corresponding to the 8 least
significant bits of the operand; the "w" component corresponds to
the 8 most significant bits.
This operation undoes the type conversion and packing performed by
the PK4B instruction.
tmp = ScalarLoad(op0);
result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;
A fragment program will fail to load if it contains a UP4B instruction
whose operand is a variable declared as "SHORT".
Section 3.11.5.46, UP4UB: Unpack Four Unsigned 8-Bit Scalars
The UP4UB instruction unpacks four 8-bit unsigned values packed
together in a 32-bit scalar operand. The unsigned quantities are
encoded where a bit pattern of all '0' bits corresponds to 0.0 and a
pattern of all '1' bits corresponds to 1.0. The "x" component of the
result vector is obtained from the 8 least significant bits of the
operand; the "w" component is obtained from the 8 most significant
bits.
This operation undoes the type conversion and packing performed by
the PK4UB instruction.
tmp = ScalarLoad(op0);
result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0;
result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0;
result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;
A fragment program will fail to load if it contains a UP4UB
instruction whose operand is a variable declared as "SHORT".
Section 3.11.5.47, X2D: 2D Coordinate Transformation
The X2D instruction multiplies the 2D offset vector specified by the
"x" and "y" components of the second vector operand by the 2x2 matrix
specified by the four components of the third vector operand, and adds
the transformed offset vector to the 2D vector specified by the "x"
and "y" components of the first vector operand. The first component
of the sum is written to the "x" and "z" components of the result;
the second component is written to the "y" and "w" components of
the result.
tmp0 = VectorLoad(op0);
tmp1 = VectorLoad(op1);
tmp2 = VectorLoad(op2);
result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
Modify Section, 3.11.6.4 KIL: Kill fragment
Rather than mapping a coordinate set to a color, this function
prevents a fragment from receiving any future processing. If any
component of its source vector is negative, the processing of this
fragment will be discontinued and no further outputs to this fragment
will occur. Subsequent stages of the GL pipeline will be skipped
for this fragment.
A KIL instruction may be specified using either a vector operand
or a condition code test. If a vector operand is specified, the
following is performed:
tmp = VectorLoad(op0);
if ((tmp.x < 0) || (tmp.y < 0) ||
(tmp.z < 0) || (tmp.w < 0))
{
exit;
}
If a condition code is specified, the following is performed:
if (TestCC(rc.c***) || TestCC(rc.*c**) ||
TestCC(rc.**c*) || TestCC(rc.***c))
{
exit;
}
Add Section 3.11.6.5, TXD: Texture Lookup with Derivatives
The TXD instruction takes the first three components of its first
vector operand and maps them to s, t, and r. These coordinates are
used to sample from the specified texture target on the specified
texture image unit in a manner consistent with its parameters.
The level of detail is computed as specified in section 3.8.
In this calculation, ds/dx, dt/dx, and dr/dx are given by the x,
y, and z components, respectively, of the second vector operand.
ds/dy, dt/dy, and dr/dy are given by the x, y, and z components of
the third vector operand.
The resulting sample is mapped to RGBA as described in table 3.21
and written to the result vector.
tmp = VectorLoad(op0);
result = TextureSample(tmp.x, tmp.y, tmp.z, 0.0, op1, op2);
Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment
Operations and the Frame Buffer)
None.
Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special
Functions)
None.
Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and
State Requests)
None.
Additions to Appendix A of the OpenGL 1.2.1 Specification (Invariance)
None.
Additions to the AGL/GLX/WGL Specifications
None.
Dependencies on ARB_fragment_program
This specification is based on a modified version of the grammar
published in the ARB_fragment_program specification. This modified
grammar (see below) includes a few structural changes to better
accommodate new functionality from this and other extensions,
but should be functionally equivalent to the ARB_fragment_program
grammar.
::= "END"
::=