© Copyright 20142019 The Khronos Group Inc. All Rights Reserved.
This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast, or otherwise exploited in any manner without the express prior written permission of Khronos Group. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part.
Khronos Group grants express permission to any current Promoter, Contributor or Adopter member of Khronos to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be reformatted AS LONG AS the contents of the specification are not changed in any way. The specification may be incorporated into a product that is sold as long as such product includes significant independent work developed by the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions.
Khronos Group makes no, and expressly disclaims any, representations or warranties, express or implied, regarding this specification, including, without limitation, any implied warranties of merchantability or fitness for a particular purpose or noninfringement of any intellectual property. Khronos Group makes no, and expressly disclaims any, warranties, express or implied, regarding the correctness, accuracy, completeness, timeliness, and reliability of the specification. Under no circumstances will the Khronos Group, or any of its Promoters, Contributors or Members or their respective partners, officers, directors, employees, agents, or representatives be liable for any damages, whether direct, indirect, special or consequential damages for lost revenues, lost profits, or otherwise, arising from or in connection with these materials. Khronos, SYCL, SPIR, WebGL, EGL, COLLADA, StreamInput, OpenVX, OpenKCam, glTF, OpenKODE, OpenVG, OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL and OpenMAX DL are trademarks and WebCL is a certification mark of the Khronos Group Inc. OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics International used under license by Khronos. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.
Contributors and Acknowledgments

Yaxun Liu, AMD

Brian Sumner, AMD

Marty Johnson, AMD

Mandana Baregheh, AMD

Andrew Richards, Codeplay

Ben Ashbaugh, Intel

Alexey Bader, Intel

Guy Benyei, Intel

Raun Krisch, Intel

Boaz Ouriel, Intel

Yuan Lin, NVIDIA

Lee Howes, Qualcomm

Chihong Zang, Qualcomm

Ben Gaster, Qualcomm

Jack Liu, Qualcomm

Ronan Keryell, Xilinx
1. Introduction
This is the specification of OpenCL.std extended instruction set.
The library is imported into a SPIRV module in the following manner:
<extinstid> OpExtInstImport "OpenCL.std"
The library can only be imported when Memory Model is set to OpenCL
2. Binary Form
This section contains the semantics and exact form of execution of OpenCL extended instructions using the OpExtInst instruction.
In this section we use the following naming conventions:

void denote an OpTypeVoid.

half, float and double denote an OpTypeFloat with a width of 16, 32 and 64 bits respectively.

i8, i16, i32 and i64 denote an OpTypeInt with a width of 8, 16, 32 and 64 bits respectively.

bool denotes an OpTypeBool.

size_t denotes an i32 when the Addressing Model is Physical32 and i64 when the Addressing Model is Physical64.

vector(n) denotes an OpTypeVector where n indicates the component count.

vector(n_{1}, n_{2}, …, n_{i}) abbreviates vector(n_{1}), vector(n_{2}), … or vector(n_{i}).


integer denotes i8, i16, i32 or i64.

floatingpoint denotes half, float, double.

pointer(storage) denotes an OpTypePointer which points to storage Storage Class.

pointer(constant) denotes an OpTypePointer with UniformConstant Storage Class.

pointer(generic) denotes an OpTypePointer with Generic Storage Class.

pointer(global) denotes an OpTypePointer with CrossWorkgroup Storage Class.

pointer(local) denotes an OpTypePointer with Workgroup Storage Class.

pointer(private) denotes an OpTypePointer with Function Storage Class.

pointer(s_{1}, s_{2}, …, s_{i}) abbreviates pointer(s_{1}), pointer(s_{2}), … or pointer(s_{i}).


image defines all types of image memory objects (See image encoding section).

sampler a SPIRV sampler object (See sampler encoding section).
2.1. Math extended instructions
This section describes the list of external math instructions. The external math instructions are categorized into the following:

A list of instructions that have scalar or vector argument versions, and,

A list of instructions that only take scalar float arguments.
The vector versions of the math instructions operate componentwise. The description is percomponent.
The math instructions are not affected by the prevailing rounding mode in the calling environment, and always return the same value as they would if called with the round to nearest even rounding mode.
Result Type and x must be floatingpoint or vector(2,3,4,8,16) of floatingpoint values. All of the operands, including the Result Type operand, must be of the same type. 

6 
12 
<id> 
Result <id> 
extended instructions set <id> 
22 
<id> 
Result Type, x and y must be floatingpoint or vector(2,3,4,8,16) of floatingpoint values. All of the operands, including the Result Type operand, must be of the same type. 

7 
12 
<id> 
Result <id> 
extended instructions set <id> 
48 
<id> 
<id> 
Result Type, x and y must be float or vector(2,3,4,8,16) of float values. All of the operands, including the Result Type operand, must be of the same type. 

7 
12 
<id> 
Result <id> 
extended instructions set <id> 
68 
<id> 
<id> 
2.2. Integer instructions
This section describes the list of integer instructions that take scalar or vector arguments. The vector versions of the integer instructions operate componentwise. The description is percomponent.
2.3. Common instructions
This section describes the list of common instructions that take scalar or vector arguments. The vector versions of the integer instructions operate componentwise. The description is percomponent. The common instructions are implemented using the round to nearest even rounding mode.
2.4. Geometric instructions
This section describes the list of geometric instructions. In this section x,y,z and w denote the first, second, third and fourth component respecitively, of vectors with 3 and four components.The geometric instructions are implemented using the round to nearest even rounding mode.
Note: The geometric instructions can be implemented using contractions such as mad or fma
2.5. Relational instructions
This section describes the list of relational instructions that take scalar or vector arguments. The vector versions of the integer instructions operate componentwise. The description is percomponent.
2.6. Vector Data Load and Store instructions
This section describes the list of instructions that allow reading and writing of vector types from a pointer to memory.
vloadn The computed address must be 8bit aligned if p points to an i8 value; 16bit aligned if p points to an i16 or half value; 32bit aligned if p points to an i32 or float value; 64bit aligned if p points to an i64 or double value.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to floatingpoint, integer. Result Type must be vector(2,3,4,8,16) of floatingpoint or integer values. Result Type component count must be equal to n and its component type must be equal to the type pointed by p. n must be 2, 3, 4, 8 or 16. 

8 
12 
<id> 
Result <id> 
extended instructions set <id> 
171 
<id> 
<id> 
Literal 
vload_halfn The computed address must be 16bit aligned.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to half. Result Type must be vector(2,3,4,8,16) of float values. Result Type component count must be equal to n. n must be 2, 3, 4, 8 or 16. 

8 
12 
<id> 
Result <id> 
extended instructions set <id> 
174 
<id> 
<id> 
Literal 
vstore_half_r The computed address must be 16bit aligned.
data must be float or double. offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. 

9 
12 
<id> 
Result <id> 
extended instructions set <id> 
176 
<id> 
<id> 
<id> 
FP Rounding Mode 
vstore_halfn_r Let n be the component count of the vector data. The n components from the converted vector of half values are written to the address computed as (p + (offset * n)). The computed address must be 16bit aligned.
offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. data must be vector(2,3,4,8,16) of float or double values. 

9 
12 
<id> 
Result <id> 
extended instructions set <id> 
178 
<id> 
<id> 
<id> 
FP Rounding Mode 
vloada_halfn For n equal to 2, 4, 8, and 16, the vector of n half values is read from the address computed as (p + (offset * n)). The computed address must be aligned to (sizeof(half) * n) bytes. For n equal to 3, the vector of n half values are read from the address computed as (p + (offset * 4)). The computed address must be aligned to (sizeof(half) * 4) bytes.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to half. Result Type must be vector(2,3,4,8,16) of float values. Result Type component count must be equal to n. n must be 2, 3, 4, 8 or 16. 

8 
12 
<id> 
Result <id> 
extended instructions set <id> 
179 
<id> 
<id> 
Literal 
vstorea_halfn_r Let n be the component count of the vector data. For n equal to 2, 4, 8, and 16, the converted vector of half values is written to the address computed as (p + (offset * n)). The computed address must be aligned to (sizeof(half) * n) bytes. For n equal to 3, the converted vector of half values is written to the address computed as (p + (offset * 4)). The computed address must be aligned to (sizeof(half) * 4) bytes.
offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. data must be vector(2,3,4,8,16) of float or double values. 

9 
12 
<id> 
Result <id> 
extended instructions set <id> 
181 
<id> 
<id> 
<id> 
FP Rounding Mode 
2.7. Miscellaneous Vector instructions
This section describes additional vector instructions.
2.8. Misc instructions
This section describes additional miscellaneous instructions.
3. Appendix A: Changes and TBD

Fork the revision stream, changes section, TBD, etc. from the core specification, so this specification has its own, starting numbering at revision 1. This document now lives independently.
3.1. Changes from Version 0.99, Revision 1

Move to use the updated image/texturing/sampling, instead of extended instructions. Also, see changes in core specification related to this.

14241 Implement OpenCL Extended Instructions for images/samplers with core OpImageSample instructions


Fixed internal bugs

13455 Merged the OpenCL 1.2, 2.0, and 2.1 extendedinstruction set into a single OpenCL extendedinstruction set.


Fixed public bugs
3.2. Changes from Version 0.99, Revision 2

14679 moved precision information to the OpenCL environment spec

14636 clarified trig functions to accept and return radians
3.3. Changes from Version 0.99, Revision 3

Fixed internal bugs:

14862 removed remaining image instructions as core versions are sufficient

14636 Fixed typeo’s in several trig functions accepting radian inputs and/or producing radian results

Flattened opcode numbers

3.4. Changes from Version 1.0, Revision 1

Fixed internal bugs:

Issue 8  order of parameters for prefetch was reversed; pointer operand should be first.

Issue 103  typo: singp should be signp


Fixed public bugs

1469  incorrect specification of pow and pown

3.5. Changes from Version 1.0, Revision 2

Fixed internal bugs:

Issue 261  clarified that s_mad24 and u_mad24 only support 32bit integers

Issue 262  added scalars to the types supported by length

Issue 266  fixed shuffle and shuffle2 description

Issue 267  fixed description of ldexp operands

3.6. Changes from Version 1.0, Revision 3

Moved image and sampler encoding to the OpenCL environment specification

Editorial fixes and improvements

Fixed internal bugs:

Issue 271  storage class inconsistency between vloadn/vstoren and vload_half/vstore_half

Issue 312  bad wording for vstorea_halfn

3.7. Changes from Version 1.0, Revision 4
Support SPV_KHR_no_integer_wrap_decoration, in the s_abs instruction.
3.8. Changes from Version 1.0, Revision 5

Fixed internal bugs:

Issue 497  fixed description for s_upsample
