Name
INTEL_shader_integer_functions2
Name Strings
GL_INTEL_shader_integer_functions2
Contact
Ian Romanick
Contributors
Status
In progress
Version
Last Modification Date: 11/25/2019
Revision: 5
Number
OpenGL Extension #547
OpenGL ES Extension #323
Dependencies
This extension is written against the OpenGL 4.6 (Core Profile)
Specification.
This extension is written against Version 4.60 (Revision 03) of the OpenGL
Shading Language Specification.
GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL)
is required.
This extension interacts with ARB_gpu_shader_int64.
This extension interacts with AMD_gpu_shader_int16.
This extension interacts with OpenGL 4.6 and ARB_gl_spirv.
This extension interacts with EXT_shader_explicit_arithmetic_types.
Overview
OpenCL and other GPU programming environments provides a number of useful
functions operating on integer data. Many of these functions are
supported by specialized instructions various GPUs. Correct GLSL
implementations for some of these functions are non-trivial. Recognizing
open-coded versions of these functions is often impractical. As a result,
potential performance improvements go unrealized.
This extension makes available a number of functions that have specialized
instruction support on Intel GPUs.
New Procedures and Functions
None
New Tokens
None
IP Status
No known IP claims.
Modifications to the OpenGL Shading Language Specification, Version 4.60
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_INTEL_shader_integer_functions2 :
where is as specified in section 3.3.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_INTEL_shader_integer_functions2 1
Additions to Chapter 8 of the OpenGL Shading Language Specification
(Built-in Functions)
Modify Section 8.8, Integer Functions
(add a new rows after the existing "findMSB" table row, p. 161)
genUType countLeadingZeros(genUType value)
Returns the number of leading 0-bits, stating at the most significant bit,
in the binary representation of value. If value is zero, the size in bits
of the type of value or component type of value, if value is a vector will
be returned.
genUType countTrailingZeros(genUType value)
Returns the number of trailing 0-bits, stating at the least significant bit,
in the binary representation of value. If value is zero, the size in bits
of the type of value or component type of value (if value is a vector) will
be returned.
genUType absoluteDifference(genUType x, genUType y)
genUType absoluteDifference(genIType x, genIType y)
genU64Type absoluteDifference(genU64Type x, genU64Type y)
genU64Type absoluteDifference(genI64Type x, genI64Type y)
genU16Type absoluteDifference(genU16Type x, genU16Type y)
genU16Type absoluteDifference(genI16Type x, genI16Type y)
Returns |x - y| clamped to the range of the return type (instead of modulo
overflowing). Note: the return type of each of these functions is an
unsigned type of the same bit-size and vector element count.
genUType addSaturate(genUType x, genUType y)
genIType addSaturate(genIType x, genIType y)
genU64Type addSaturate(genU64Type x, genU64Type y)
genI64Type addSaturate(genI64Type x, genI64Type y)
genU16Type addSaturate(genU16Type x, genU16Type y)
genI16Type addSaturate(genI16Type x, genI16Type y)
Returns x + y clamped to the range of the type of x (instead of modulo
overflowing).
genUType average(genUType x, genUType y)
genIType average(genIType x, genIType y)
genU64Type average(genU64Type x, genU64Type y)
genI64Type average(genI64Type x, genI64Type y)
genU16Type average(genU16Type x, genU16Type y)
genI16Type average(genI16Type x, genI16Type y)
Returns (x+y) >> 1. The intermediate sum does not modulo overflow.
genUType averageRounded(genUType x, genUType y)
genIType averageRounded(genIType x, genIType y)
genU64Type averageRounded(genU64Type x, genU64Type y)
genI64Type averageRounded(genI64Type x, genI64Type y)
genU16Type averageRounded(genU16Type x, genU16Type y)
genI16Type averageRounded(genI16Type x, genI16Type y)
Returns (x+y+1) >> 1. The intermediate sum does not modulo overflow.
genUType subtractSaturate(genUType x, genUType y)
genIType subtractSaturate(genIType x, genIType y)
genU64Type subtractSaturate(genU64Type x, genU64Type y)
genI64Type subtractSaturate(genI64Type x, genI64Type y)
genU16Type subtractSaturate(genU16Type x, genU16Type y)
genI16Type subtractSaturate(genI16Type x, genI16Type y)
Returns x - y clamped to the range of the type of x (instead of modulo
overflowing).
genUType multiply32x16(genUType x_32_bits, genUType y_16_bits)
genIType multiply32x16(genIType x_32_bits, genIType y_16_bits)
genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits)
genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits)
Returns x * y, where only the (possibly sign-extended) low 16-bits of y
are used. In cases where one of the signed operands is known to be in the
range [-2^15, (2^15)-1] or unsigned operands is known to be in the range
[0, (2^16)-1], this may provide a higher performance multiply.
Interactions with OpenGL 4.6 and ARB_gl_spirv
If OpenGL 4.6 or ARB_gl_spirv is supported, then
SPV_INTEL_shader_integer_functions2 must also be supported.
The IntegerFunctions2INTEL capability is available whenever the
implementation supports INTEL_shader_integer_functions2.
Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64
If the shader enables only INTEL_shader_integer_functions2 but not
ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64,
remove all function overloads that have either genU64Type or genI64Type
parameters.
Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16
If the shader enables only INTEL_shader_integer_functions2 but not
AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16,
remove all function overloads that have either genU16Type or genI16Type
parameters.
Issues
1) What should this extension be called?
RESOLVED. There already exists a MESA_shader_integer_functions extension,
so this is called INTEL_shader_integer_functions2 to prevent confusion.
2) How does countLeadingZeros differ from findMSB?
RESOLVED: countLeadingZeros is only defined for unsigned types, and it is
equivalent to 32-(findMSB(x)+1). This corresponds the clz() function in
OpenCL and the LZD (leading zero detection) instruction on Intel GPUs.
3) How does countTrailingZeros differ from findLSB?
RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)),
32). This corresponds to the ctz() function in OpenCL.
4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be
provided?
RESOLVED: NO. OpenCL has 64-bit versions of clz() and ctz(), but OpenGL
does not have 64-bit versions of findMSB() or findLSB() even when
ARB_gpu_shader_int64 is supported. The instructions used to implement
countLeadingZeros and countTrailingZeros do not natively support 64-bit
operands.
The implementation of 64-bit countLeadingZeros() would be 5 instructions,
and the implementation of 64-bit countTrailingZeros() would be 7
instructions. Neither of these is better than an application developer
could achieve in GLSL:
uint countLeadingZeros(uint64_t value)
{
uvec2 v = unpackUint2x32(value);
return v.y == 0
? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y);
}
uint countTrailingZeros(uint64_t value)
{
uvec2 v = unpackUint2x32(value);
return v.x == 0
? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x);
}
5) Should 64-bit versions of the arithmetic functions be provided?
RESOLVED: NO. Since recent generations of Intel GPUs have removed
hardware support for 64-bit integer arithmetic, there doesn't seem to be
much value in providing 64-bit arithmetic functions.
6) Should this extension include average()?
RESOLVED: YES. average() corresponds to hadd() in OpenCL, and
averageRounded() corresponds to rhadd() in OpenCL.
averageRounded() corresponds to the AVG instruction on Intel GPUs.
average(), on the other hand, does not correspond to a single instruction.
The signed and unsigned versions may have slightly different
implementations depending on the specific GPU. In the worst case, the
implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) &
1)), and in the best case it is 3 instructions.
Revision History
Rev Date Author Changes
--- ----------- -------- ---------------------------------------------
1 04-Sep-2018 idr Initial version.
2 19-Sep-2018 idr Add interactions with AMD_gpu_shader_int16.
3 22-Jan-2019 idr Add interactions with EXT_shader_explicit_arithmetic_types.
4 14-Nov-2019 idr Resolve issue #1 and issue #5.
5 25-Nov-2019 idr Fix a bunch of typos noticed by @cmarcelo.