© Copyright 2014-2019 The Khronos Group Inc. All Rights Reserved.
© Copyright 2014-2019 The Khronos Group Inc. All Rights Reserved.
This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast, or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part.
Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be reformatted AS LONG AS the contents of the specification are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions.
Khronos Group makes no, and expressly disclaims any, representations or warranties, express or implied, regarding this specification, including, without limitation, any implied warranties of merchantability or fitness for a particular purpose or noninfringement of any intellectual property. Khronos Group makes no, and expressly disclaims any, warranties, express or implied, regarding the correctness, accuracy, completeness, timeliness, and reliability of the specification. Under no circumstances will the Khronos Group, or any of its Promoters, Contributors or Members or their respective partners, officers, directors, employees, agents, or representatives be liable for any damages, whether direct, indirect, special or consequential damages for lost revenues, lost profits, or otherwise, arising from or in connection with these materials. Khronos, SYCL, SPIR, WebGL, EGL, COLLADA, StreamInput, OpenVX, OpenKCam, glTF, OpenKODE, OpenVG, OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL and OpenMAX DL are trademarks and WebCL is a certification mark of the Khronos Group Inc. OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics International used under license by Khronos. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.
Contributors and Acknowledgments
-
Yaxun Liu, AMD
-
Brian Sumner, AMD
-
Marty Johnson, AMD
-
Mandana Baregheh, AMD
-
Andrew Richards, Codeplay
-
Ben Ashbaugh, Intel
-
Alexey Bader, Intel
-
Guy Benyei, Intel
-
Raun Krisch, Intel
-
Boaz Ouriel, Intel
-
Yuan Lin, NVIDIA
-
Lee Howes, Qualcomm
-
Chihong Zang, Qualcomm
-
Ben Gaster, Qualcomm
-
Jack Liu, Qualcomm
-
Ronan Keryell, Xilinx
1. Introduction
This is the specification of OpenCL.std extended instruction set.
The library is imported into a SPIR-V module in the following manner:
<ext-inst-id> OpExtInstImport "OpenCL.std"
The library can only be imported when Memory Model is set to OpenCL
2. Binary Form
This section contains the semantics and exact form of execution of OpenCL extended instructions using the OpExtInst instruction.
In this section we use the following naming conventions:
-
void denote an OpTypeVoid.
-
half, float and double denote an OpTypeFloat with a width of 16, 32 and 64 bits respectively.
-
i8, i16, i32 and i64 denote an OpTypeInt with a width of 8, 16, 32 and 64 bits respectively.
-
bool denotes an OpTypeBool.
-
size_t denotes an i32 when the Addressing Model is Physical32 and i64 when the Addressing Model is Physical64.
-
vector(n) denotes an OpTypeVector where n indicates the component count.
-
vector(n1, n2, …, ni) abbreviates vector(n1), vector(n2), … or vector(ni).
-
-
integer denotes i8, i16, i32 or i64.
-
floating-point denotes half, float, double.
-
pointer(storage) denotes an OpTypePointer which points to storage Storage Class.
-
pointer(constant) denotes an OpTypePointer with UniformConstant Storage Class.
-
pointer(generic) denotes an OpTypePointer with Generic Storage Class.
-
pointer(global) denotes an OpTypePointer with CrossWorkgroup Storage Class.
-
pointer(local) denotes an OpTypePointer with Workgroup Storage Class.
-
pointer(private) denotes an OpTypePointer with Function Storage Class.
-
pointer(s1, s2, …, si) abbreviates pointer(s1), pointer(s2), … or pointer(si).
-
-
image defines all types of image memory objects (See image encoding section).
-
sampler a SPIR-V sampler object (See sampler encoding section).
2.1. Math extended instructions
This section describes the list of external math instructions. The external math instructions are categorized into the following:
-
A list of instructions that have scalar or vector argument versions, and,
-
A list of instructions that only take scalar float arguments.
The vector versions of the math instructions operate component-wise. The description is per-component.
The math instructions are not affected by the prevailing rounding mode in the calling environment, and always return the same value as they would if called with the round to nearest even rounding mode.
Result Type and x must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the Result Type operand, must be of the same type. |
||||||
6 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
22 |
<id> |
Result Type and x must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the Result Type operand, must be of the same type. |
||||||
6 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
40 |
<id> |
Result Type, x and y must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the Result Type operand, must be of the same type. |
|||||||
7 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
48 |
<id> |
<id> |
Result Type, x and y must be float or vector(2,3,4,8,16) of float values. All of the operands, including the Result Type operand, must be of the same type. |
|||||||
7 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
68 |
<id> |
<id> |
2.2. Integer instructions
This section describes the list of integer instructions that take scalar or vector arguments. The vector versions of the integer instructions operate component-wise. The description is per-component.
2.3. Common instructions
This section describes the list of common instructions that take scalar or vector arguments. The vector versions of the integer instructions operate component-wise. The description is per-component. The common instructions are implemented using the round to nearest even rounding mode.
2.4. Geometric instructions
This section describes the list of geometric instructions. In this section x,y,z and w denote the first, second, third and fourth component respecitively, of vectors with 3 and four components.The geometric instructions are implemented using the round to nearest even rounding mode.
Note: The geometric instructions can be implemented using contractions such as mad or fma
2.5. Relational instructions
This section describes the list of relational instructions that take scalar or vector arguments. The vector versions of the integer instructions operate component-wise. The description is per-component.
2.6. Vector Data Load and Store instructions
This section describes the list of instructions that allow reading and writing of vector types from a pointer to memory.
vloadn The computed address must be 8-bit aligned if p points to an i8 value; 16-bit aligned if p points to an i16 or half value; 32-bit aligned if p points to an i32 or float value; 64-bit aligned if p points to an i64 or double value.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to floating-point, integer. Result Type must be vector(2,3,4,8,16) of floating-point or integer values. Result Type component count must be equal to n and its component type must be equal to the type pointed by p. n must be 2, 3, 4, 8 or 16. |
||||||||
8 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
171 |
<id> |
<id> |
Literal |
vload_halfn The computed address must be 16-bit aligned.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to half. Result Type must be vector(2,3,4,8,16) of float values. Result Type component count must be equal to n. n must be 2, 3, 4, 8 or 16. |
||||||||
8 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
174 |
<id> |
<id> |
Literal |
vstore_half_r The computed address must be 16-bit aligned.
data must be float or double. offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. |
|||||||||
9 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
176 |
<id> |
<id> |
<id> |
FP Rounding Mode |
vstore_halfn_r Let n be the component count of the vector data. The n components from the converted vector of half values are written to the address computed as (p + (offset * n)). The computed address must be 16-bit aligned.
offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. data must be vector(2,3,4,8,16) of float or double values. |
|||||||||
9 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
178 |
<id> |
<id> |
<id> |
FP Rounding Mode |
vloada_halfn For n equal to 2, 4, 8, and 16, the vector of n half values is read from the address computed as (p + (offset * n)). The computed address must be aligned to (sizeof(half) * n) bytes. For n equal to 3, the vector of n half values are read from the address computed as (p + (offset * 4)). The computed address must be aligned to (sizeof(half) * 4) bytes.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to half. Result Type must be vector(2,3,4,8,16) of float values. Result Type component count must be equal to n. n must be 2, 3, 4, 8 or 16. |
||||||||
8 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
179 |
<id> |
<id> |
Literal |
vstorea_halfn_r Let n be the component count of the vector data. For n equal to 2, 4, 8, and 16, the converted vector of half values is written to the address computed as (p + (offset * n)). The computed address must be aligned to (sizeof(half) * n) bytes. For n equal to 3, the converted vector of half values is written to the address computed as (p + (offset * 4)). The computed address must be aligned to (sizeof(half) * 4) bytes.
offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. data must be vector(2,3,4,8,16) of float or double values. |
|||||||||
9 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
181 |
<id> |
<id> |
<id> |
FP Rounding Mode |
2.7. Miscellaneous Vector instructions
This section describes additional vector instructions.
2.8. Misc instructions
This section describes additional miscellaneous instructions.
3. Appendix A: Changes and TBD
-
Fork the revision stream, changes section, TBD, etc. from the core specification, so this specification has its own, starting numbering at revision 1. This document now lives independently.
3.1. Changes from Version 0.99, Revision 1
-
Move to use the updated image/texturing/sampling, instead of extended instructions. Also, see changes in core specification related to this.
-
14241 Implement OpenCL Extended Instructions for images/samplers with core OpImageSample instructions
-
-
Fixed internal bugs
-
13455 Merged the OpenCL 1.2, 2.0, and 2.1 extended-instruction set into a single OpenCL extended-instruction set.
-
-
Fixed public bugs
3.2. Changes from Version 0.99, Revision 2
-
14679 moved precision information to the OpenCL environment spec
-
14636 clarified trig functions to accept and return radians
3.3. Changes from Version 0.99, Revision 3
-
Fixed internal bugs:
-
14862 removed remaining image instructions as core versions are sufficient
-
14636 Fixed type-o’s in several trig functions accepting radian inputs and/or producing radian results
-
Flattened opcode numbers
-
3.4. Changes from Version 1.0, Revision 1
-
Fixed internal bugs:
-
Issue 8 - order of parameters for prefetch was reversed; pointer operand should be first.
-
Issue 103 - typo: singp should be signp
-
-
Fixed public bugs
-
1469 - incorrect specification of pow and pown
-
3.5. Changes from Version 1.0, Revision 2
-
Fixed internal bugs:
-
Issue 261 - clarified that s_mad24 and u_mad24 only support 32-bit integers
-
Issue 262 - added scalars to the types supported by length
-
Issue 266 - fixed shuffle and shuffle2 description
-
Issue 267 - fixed description of ldexp operands
-
3.6. Changes from Version 1.0, Revision 3
-
Moved image and sampler encoding to the OpenCL environment specification
-
Editorial fixes and improvements
-
Fixed internal bugs:
-
Issue 271 - storage class inconsistency between vloadn/vstoren and vload_half/vstore_half
-
Issue 312 - bad wording for vstorea_halfn
-
3.7. Changes from Version 1.0, Revision 4
Support SPV_KHR_no_integer_wrap_decoration, in the s_abs instruction.
3.8. Changes from Version 1.0, Revision 5
-
Fixed internal bugs:
-
Issue 497 - fixed description for s_upsample
-