Copyright 2008-2017 The Khronos Group.

This specification is protected by copyright laws and contains material proprietary to the Khronos Group, Inc. Except as described by these terms, it or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos Group.

Khronos Group grants a conditional copyright license to use and reproduce the unmodified specification for any purpose, without fee or royalty, EXCEPT no licenses to any patent, trademark or other intellectual property rights are granted under these terms. Parties desiring to implement the specification and make use of Khronos trademarks in relation to that implementation, and receive reciprocal patent license protection under the Khronos IP Policy must become Adopters and confirm the implementation as conformant under the process defined by Khronos for this specification; see https://www.khronos.org/adopters.

Khronos Group makes no, and expressly disclaims any, representations or warranties, express or implied, regarding this specification, including, without limitation: merchantability, fitness for a particular purpose, non-infringement of any intellectual property, correctness, accuracy, completeness, timeliness, and reliability. Under no circumstances will the Khronos Group, or any of its Promoters, Contributors or Members, or their respective partners, officers, directors, employees, agents or representatives be liable for any damages, whether direct, indirect, special or consequential damages for lost revenues, lost profits, or otherwise, arising from or in connection with these materials.

Vulkan is a registered trademark and Khronos, OpenXR, SPIR, SPIR-V, SYCL, WebGL, WebCL, OpenVX, OpenVG, EGL, COLLADA, glTF, NNEF, OpenKODE, OpenKCAM, StreamInput, OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL, OpenMAX DL, OpenML and DevU are trademarks of the Khronos Group Inc. ASTC is a trademark of ARM Holdings PLC, OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics International used under license by Khronos. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.

Acknowledgements

The OpenCL C++ specification is the result of the contributions of many people, representing a cross section of the desktop, hand-held, and embedded computer industry. Following is a partial list of the contributors, including the company that they represented at the time of their contribution:

  • Eric Berdahl, Adobe

  • Aaftab Munshi, Apple

  • Brian Sumner, AMD

  • Andrew Richards, Codeplay

  • Maria Rovatsou, Codeplay

  • Adam Stański, Intel

  • Alexey Bader, Intel

  • Allen Hux, Intel

  • Bartosz Sochacki, Intel

  • Ben Ashbaugh, Intel

  • Kevin Stevens, Intel

  • Łukasz Dudziak, Intel

  • Łukasz Towarek, Intel

  • Marcin Walkowiak, Intel

  • Michael Kinsner, Intel

  • Raun Krisch, Intel

  • Tomasz Fiechowski, Intel

  • Kedar Patil, NVIDIA

  • Yuan Lin, NVIDIA

  • Alex Bourd, Qualcomm

  • Lee Howes, Qualcomm

  • Anton Gorenko, StreamComputing

  • Jakub Szuppe, StreamComputing

  • James Price, University of Bristol

  • Paul Preney, University of Windsor

  • Ronan Keryell, Xilinx

  • AJ Guillon, YetiWare Inc.

1. Generic Type Name Notation

The generic type names are used when some entity has multiple overloads which differ only by argument(s). They can map to one or more built-in data types. The tables below describe these mappings in details.

Assuming that gentype maps to built-in types: float, int and uint, when coming across definition:

gentype function(gentype x);

reader should understand that such function has in fact three overloads:

float function(float x);
int function(int x);
uint function(uint x);

Note that if a function signature has multiple usages of gentype they all should map to the same type. Following this rule such overloads are then invalid:

float function(int x);
uint function(float x);
// etc.

If a function is meant to have such overloads, respective gentypes in its signature should be postfixed with numbers to indicate they represent different types. Declaration like this:

cl::common_type_t<gentype1, gentype2> greater(gentype1 x, gentype2 y);

would match following overloads:

cl::common_type_t<float, float> greater(float x, float y);
cl::common_type_t<float, int> greater(float x, int y);
cl::common_type_t<float, uint> greater(float x, uint y);
cl::common_type_t<int, float> greater(int x, float y);

// etc.
Table 1. generic types

generic type

corresponding built-in types

typen

scalar and all vector types of type

Example:

floatn matches: float, float2, float3, float4, float8 and float16

floatn doesn’t match: half, int2

gentype

unspecified in global context, should be defined whenever used

sgentype

subset of scalar types from types matched by gentype

ugentype

subset of unsigned integer types from types matched by gentype

gentypeh

half, half2, half3, half4, half8 or half16

gentypef

float, float2, float3, float4, float8 or float16

gentyped

double, double2, double3, double4, double8 or double16

2. OpenCL C++ Programming Language

This section describes the OpenCL C++ programming language used to create kernels that are executed on OpenCL device(s). The OpenCL C++ programming language is based on the ISO/IEC JTC1 SC22 WG21 N 3690 language specification (a.k.a. C++14 specification) with specific restrictions (OpenCL C++ restrictions section). Please refer to this specification for a detailed description of the language grammar. This section describes restrictions to the C++14 specification supported in OpenCL C++.

2.1. Supported Built-in Data Types

The following data types are supported.

2.1.1. Built-in Scalar Data Types

Table 2. Device Built-in scalar data types
Type Description

bool

A data type which is either true or false. (See [ISO/IEC 14882:2014: lex.bool, §2.14.6; basic.fundamental, §3.9.1].)

char, signed char

A signed two’s complement 8-bit integer.

unsigned char

An unsigned 8-bit integer.

short

A signed two’s complement 16-bit integer.

unsigned short

An unsigned 16-bit integer.

int

A signed two’s complement 32-bit integer.

unsigned int

An unsigned 32-bit integer.

long

A signed two’s complement 64-bit integer.

unsigned long

An unsigned 64-bit integer.

float

A 32-bit floating-point. The float data type must conform to the IEEE 754 single precision storage format.

double [2]

A 64-bit floating-point. The double data type must conform to the IEEE 754 double precision storage format.

half

A 16-bit floating-point. The half data type must conform to the IEEE 754-2008 half precision storage format.

void

The void type comprises an empty set of values; it is an incomplete type that cannot be completed.

Most built-in scalar data types are also declared as appropriate types in the OpenCL API (and header files) that can be used by an application. The following table describes the built-in scalar data type in the OpenCL C++ programming language and the corresponding data type available to the application:

Table 3. Host Scalar Built-in Data Types
Type in OpenCL Language API type for application

bool

n/a, i.e., there is no corresponding cl_bool type.

char

cl_char

unsigned char, uchar

cl_uchar

short

cl_short

unsigned short, ushort

cl_ushort

int

cl_int

unsigned int, uint

cl_uint

long

cl_long

unsigned long, ulong

cl_ulong

float

cl_float

double

cl_double

half

cl_half

void

void

Built-in Half Data Type

The half data type must be IEEE 754-2008 compliant. half numbers have 1 sign bit, 5 exponent bits, and 10 mantissa bits. The interpretation of the sign, exponent and mantissa is analogous to IEEE 754 floating-point numbers.

The exponent bias is 15. The half data type must represent finite and normal numbers, denormalized numbers, infinities and NaN. Denormalized numbers for the half data type which may be generated when converting a float to a half using vstore_half and converting a half to a float using vload_half cannot be flushed to zero.

Conversions from float to half correctly round the mantissa to 11 bits of precision.

Conversions from half to float are lossless; all half numbers are exactly representable as float values.

The half data type can only be used to declare a pointer to a buffer that contains half values. All other operations are not allowed if the cl_khr_fp16 extension is not supported.

A few valid examples are given below:

#include <opencl_def>
#include <opencl_memory>
#include <opencl_vector_load_store>

float bar(half *a) {
  return cl::vload_half< 1 >(0, a);
}

kernel void foo(cl::global_ptr<half> pg) { //ok: a global pointer
                                           // passed from the host
    int offset = 1;

    half *ptr = pg.get() + offset; //ok: half pointer arithmetic
    float b = bar(ptr);

    if(b < *ptr) { //not allowed: it is only supported if cl_khr_fp16
                   // extension is enabled
      //...
    }
}

The half scalar data type is required to be supported as a data storage format. Vector data load and store functions (described in Vector Data Load and Store Functions section) must be supported.

cl_khr_fp16 extension

This extension adds support for half scalar and vector types as built-in types that can be used for arithmetic operations, conversions etc. An application that wants to use half and halfn types will need to specify -cl-fp16-enable compiler option (Double and half-precision floating-point options section).

The OpenCL compiler accepts an h and H suffix on floating point literals, indicating the literal is typed as a half

A few valid examples:

#include <opencl_def>
#include <opencl_memory>

half bar(half a) {
    half b = a;
    b += 10.0h; //ok: cl_khr_fp16 extension is enabled. All arithmetic
                // operations on half built-in type are available

    return b;
}

kernel void foo(cl::global_ptr<half> pg) {
    int offset = 1;

    half *ptr = pg.get() + offset;
    half b = bar(*ptr);

    if(b < *ptr) { //ok: cl_khr_fp16 extension is enabled.
                   // All comparision operations are available
      //...
    }
}
Hexadecimal floating point literals

Hexadecimal floating point literals are supported in OpenCL C++.

float f = 0x1.fffffep127f
double d = 0x1.fffffffffffffp1023;
half h = 0x1.ffcp15h;

2.1.2. Built-in Vector Data Types

Supported Vector Data Types

The bool, char, unsigned char, short, unsigned short, int, unsigned int, long, unsigned long, half, float and double vector data types are supported. The vector data type is defined with the type name i.e. bool, char, uchar, short, ushort, int, uint, long, ulong, half, float or double followed by a literal value n that defines the number of elements in the vector. Supported values of n are 2, 3, 4, 8, and 16 for all vector data types.

Table 4. Device Built-in Vector Data Types
Type Description

booln

A vector of n boolean values.

charn

A vector of n 8-bit signed two’s complement integer values.

ucharn

A vector of n 8-bit unsigned integer values.

shortn

vector of n 16-bit signed two’s complement integer values.

ushortn

A vector of n 16-bit unsigned integer values.

intn

A vector of n 32-bit signed two’s complement integer values.

uintn

A vector of n 32-bit unsigned integer values.

longn

A vector of n 64-bit signed two’s complement integer values.

ulongn

A vector of n 64-bit unsigned integer values.

halfn

A vector of n 16-bit floating-point values.

floatn

A vector of n 32-bit floating-point values.

doublen

A vector of n 64-bit floating-point values.

The built-in vector data types are also declared as appropriate types in the OpenCL API (and header files) that can be used by an application. The following table describes the built-in vector data type in the OpenCL C++ programming language and the corresponding data type available to the application:

Table 5. Host Built-in Vector Data Types
Type in OpenCL Language API type for application

booln

n/a, i.e., there is no corresponding cl_booln type.

charn

cl_charn

ucharn

cl_ucharn

shortn

cl_shortn

ushortn

cl_ushortn

intn

cl_intn

uintn

cl_uintn

longn

cl_longn

ulongn

cl_ulongn

halfn

cl_halfn

floatn

cl_floatn

doublen

cl_doublen

The halfn vector data type is required to be supported as a data storage format. Vector data load and store functions (described in Vector Data Load and Store Functions section) must be supported.

Support for the doublen vector data type is optional.

Vector Changes to C++14 standard
  1. Vector types are classified as fundamental ([ISO/IEC 14882:2014: basic.fundamental, ch. 3.9.1]) and literal types

    Note
    A vector type behave similarly to a trivially destructible class with all data members of literal type and all of its constructors defined as constexpr constructors
  2. Abbreviating vector type as Tn, T is called the component type of a vector. The numerical value n specifies number of components in a vector. Device built-in vector data types table specifies supported vector types.

    A vector type which component type is integral type is called integral vector type. A vector type which component is floating-point type is called floating-point vector type.

    float8 a; // component type: float, number of components: 8
    uint16 b; // component type: uint, number of components: 16
  3. An integral vector type can be used as type of value of non-type template-parameter. The change is introduced by following changes in C++ specification:

    • [ISO/IEC 14882:2014: temp.param, ch. 14.1 (4, 4.1)] Template parameters: A non-type template-parameter shall have one of the following (optionally cv-qualified) types:

      • integral, integral vector or enumeration type,

      • integral, integral vector or enumeration type,

      • [ … ]

    • [ISO/IEC 14882:2014: temp.param, ch. 14.1 (7)] Template parameters: A non-type template-parameter shall not be declared to have floating point, floating-point vector, class, or void type.

    • [ISO/IEC 14882:2014: temp.type, ch. 14.4 (1, 1.3)] Type equivalence: Two template-ids refer to the same class, function, or variable if

      • [ … ]

      • their corresponding non-type template arguments of integral, integral vector or enumeration type have identical values and

      • [ … ]

    • [ISO/IEC 14882:2014: temp.res, ch. 14.6 (8, 8.3, 8.3.1)] Name resolution: […] If the interpretation of such a construct in the hypothetical instantiation is different from the interpretation of the corresponding construct

      • integral, integral vector or enumeration type, in any actual instantiation of the template, the program is ill-formed; no diagnostic is required. This can happen in situations including the following:

      • [ … ]

      • constant expression evaluation (5.20) within the template instantiation uses

        • the value of a const object of integral, integral vector or unscoped enumeration type or

        • [ … ]

      • [ … ]

Vector Component Access
  1. The components of vector type can be accessed using swizzle expression. The syntax of a swizzle expression is similar to syntax used in class member access expression [ISO/IEC 14882:2014: expr.ref, ch. 5.2.5]: The swizzle expression is a postfix expression formed with a postfix expression followed by a dot . or an arrow -> and then followed by an vector-swizzle-selector. The postfix expression before the dot or arrow is evaluated. The result of that evaluation, together with the vector-swizzle-selector, determines the result of the entire postfix expression.

    float4 v1 = float4(1.0f, 2.0f, 3.0f, 4.0f);
    float4 *pv1 = &v1;
    
    float2 v2 = v1.xz; // v1.xz is a swizzle expression
    float3 v3 = pv1->s321; // pv1->s321 is a swizzle expression
                           // equivalent to (*pv1).s321
    (*pv1).rgb = float3(0.0f, 0.5f, 1.0f); // (*pv1).rgb is a swizzle expression
    pv1->lo.hi = 0.0f; // pv1->lo and pv1->lo.hi are swizzle
                       // expressions
  2. For the first option (dot) the first expression shall have vector type or be a swizzle expression which results in vector-swizzle of vector type. For the second option (arrow) the first expression shall have pointer to vector type. The expression E1->E2 is converted to the equivalent form (*(E1)).E2; the remainder of Vector Component Access will address only the first option (dot).

    Note
    (*(E1)) is lvalue. In either case, the vector-swizzle-selector shall name a vector component selection of a swizzle.
    uint8 v1 = uint8(10, 11, 12, 13, 14, 15, 16, 17);
    
    uint4 v2 = v1.s7301; // correct
    uint3 v3 = (&v1)->s246; // correct
    uint4 v4 = v1->s0123; // ill-formed: v1 is not a pointer to
                          //             vector type
    
    uint8 *pv1 = &v1;
    
    uint2 v5 = pv1->S13; // correct
    uint2 v6 = (*pv1).s0745.even; // correct
    uint4 v7 = pv1.odd; // ill-formed: pv1 is not vector or
                        // vector-swizzle
  3. Abbreviating postfix-expression.vector-swizzle-selector as E1.E2, E1 is called the vector expression. The type and value category of E1.E2 are determined as follows. In the remainder of Vector Component Access, cq represents either const or the absence of const and vq represents either volatile or the absence of volatile. cv represents an arbitrary set of cv-qualifiers, as defined in [ISO/IEC 14882:2014: basic.type.qualifier, ch. 3.9.3] .

  4. vector-swizzle-selector is subset of identifier with following syntax:

    vector-swizzle-selector:

    • vector-swizzle-xyzw-selector:

      • vector-swizzle-xyzw-selector-value

      • vector-swizzle-xyzw-selector vector-swizzle-xyzw-selector-value

    • vector-swizzle-rgba-selector:

      • vector-swizzle-rgba-selector-value

      • vector-swizzle-rgba-selector vector-swizzle-rgba-selector-value

    • vector-swizzle-special-selector:

      • hi

      • lo

      • even

      • odd

    • vector-swizzle-num-selector:

      • s vector-swizzle-num-selector-values

      • S vector-swizzle-num-selector-values

    vector-swizzle-num-selector-values:

    • vector-swizzle-num-selector-value

    • vector-swizzle-num-selector-values vector-swizzle-num-selector-value

    vector-swizzle-xyzw-selector-value: one of x y z w

    vector-swizzle-rgba-selector-value: one of r g b a

    vector-swizzle-num-selector-value: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

    with following restrictions:

    • vector-swizzle-selector in a form of vector-swizzle-special-selector shall only be used with vector expression with at least 2 components.

    • vector-swizzle-selector shall not select components beyond those available in vector expression.

      Note
      Selector values and their corresponding components in swizzle table describes relation between selector value and components.
    • vector-swizzle-selector shall have swizzle size of 1, 2, 3, 4, 8 or 16.

      Note
      Result from the swizzle expression shall be either of scalar or of valid vector type.

    If vector-swizzle-selector does not meet requirements, the swizzle expression is ill-formed.

    int2 v2;
    int3 v3;
    int4 v4;
    int8 v8;
    int16 v16;
    
    v4.xyz = int3(1, 2, 3); // correct: xyz selector
    v4.baS01 = v8.lo; // ill-formed: baS01 is mix of rgba
                      // and numerical selectors
    v3.rx = int2(20, 7); // ill-formed: mix of rgba and
                         // xyzw selectors
    
    int v2c1 = v2.z; // correct: xyzw selector
    int v3c1 = v3.b; // correct: rgba selector
    int2 v4c1 = v4.ww; // correct: xyzw selector
    int3 v8c1 = v8.xyz; // ill-formed: xyzw and rgba selectors
                        // are not allowed on vector expressions
                        // with more than 4 components
    int2 v8c2 = v8.hi.xyz; // correct: xyzw selector on vector
                           // expression v8.hi (vector-swizzle
                           // of int4 type)
    
    int2 v3c2 = v3.odd; // correct: special selector
    int2 v3c2 = v3.x.even; // ill-formed: #1 vector expression
                           // is invalid (vector swizzle of
                           // scalar type)
                           // #2 special selector cannot be
                           // used with less than 2 components
    
    v3.x = 1; // correct: xyzw selector
    v3.w = 2; // ill-formed: there is no "w" component in int3
    v2.gb = v4.hi; // ill-formed: there is no "b" component in int2
    v8.S7890 = v4; // ill-formed: int8 allows numerical selector
                   // in range 0-7
    
    auto v16c1 = v16.s012; // correct: numerical selector
    auto v16c2 = v16.s467899; // ill-formed: swizzle expression
                              // has not allowed size
                              // (there is no int6 type)
    
    int16 vv1 = int16(v16.S98aabb01, v2, v2.gr, v3.xxxx); // correct
    int16 vv2 = int16(v16.S98aabb0123, v2.gr, v3.xxxx);
                               // ill-formed:
                               // although it sums up to 16
                               // components the
                               // S98aabb0123 selector has invalid
                               // swizzle size (there is no int10)
    
  5. vector-swizzle-selector, in a form of vector-swizzle-xyzw-selector, vector-swizzle-rgba-selector or vector-swizzle-num-selector can specify multiple values. Each value selects single component. Values in a selector can be repeated and specified in any order. A number of values in a selector including repeated values is called the swizzle size.

    Table 6. Selector values and their corresponding components in swizzle
    Selector Selector value Selected component Required number of components in vector expression

    vector-swizzle-xyzw-selector

    x

    1st

    2, 3 or 4

    vector-swizzle-xyzw-selector

    y

    2nd

    2, 3 or 4

    vector-swizzle-xyzw-selector

    z

    3rd

    3 or 4

    vector-swizzle-xyzw-selector

    w

    4th

    4

    vector-swizzle-rgba-selector

    r

    1st

    2, 3 or 4

    vector-swizzle-rgba-selector

    g

    2nd

    2, 3 or 4

    vector-swizzle-rgba-selector

    b

    3rd

    3 or 4

    vector-swizzle-rgba-selector

    a

    4th

    4

    vector-swizzle-num-selector

    0

    1st

    2, 3, 4, 8 or 16

    vector-swizzle-num-selector

    1

    2nd

    2, 3, 4, 8 or 16

    vector-swizzle-num-selector

    2

    3rd

    3, 4, 8 or 16

    vector-swizzle-num-selector

    3

    4th

    4, 8 or 16

    vector-swizzle-num-selector

    4

    5th

    8 or 16

    vector-swizzle-num-selector

    5

    6th

    8 or 16

    vector-swizzle-num-selector

    6

    7th

    8 or 16

    vector-swizzle-num-selector

    7

    8th

    8 or 16

    vector-swizzle-num-selector

    8

    9th

    16

    vector-swizzle-num-selector

    9

    10th

    16

    vector-swizzle-num-selector

    a or A

    11th

    16

    vector-swizzle-num-selector

    b or B

    12th

    16

    vector-swizzle-num-selector

    c or C

    13th

    16

    vector-swizzle-num-selector

    d or D

    14th

    16

    vector-swizzle-num-selector

    e or E

    15th

    16

    vector-swizzle-num-selector

    f or F

    16th

    16

  6. vector-swizzle-selector in a form of vector-swizzle-special-selector shall select:

    • if number of components in vector expression is 3, the same components as if number of components of the vector expression was 4 and the 4-th component was undefined.

      Note
      If 4-th component is read, the returned value is undefined; all writes to 4-th component shall be discarded.
    • otherwise, half of components of vector expression with

      • hi - highest numerical selector values in ascending order (higher half of the vector)

      • lo - lowest numerical selector values in ascending order (lower half of the vector)

      • even - even numerical selector values in ascending order

      • odd - odd numerical selector values in ascending order

      Note
      Special selector values table describes special selector values and their numerical equivalents.
      Table 7. Special selector values
      Number of components in vector expression Selector value Equivalent numerical selector Number of components in result vector swizzle (swizzle size)

      2

      hi

      s1

      1

      3

      hi

      s2? [3]

      2

      4

      hi

      s23

      2

      8

      hi

      s4567

      4

      16

      hi

      s89abcdef

      8

      2

      lo

      s0

      1

      3

      lo

      s01

      2

      4

      lo

      s01

      2

      8

      lo

      s0123

      4

      16

      lo

      s01234567

      8

      2

      even

      s0

      1

      3

      even

      s02

      2

      4

      even

      s02

      2

      8

      even

      s0246

      4

      16

      even

      s02468ace

      8

      2

      odd

      s1

      1

      3

      odd

      s1? [3]

      2

      4

      odd

      s13

      2

      8

      odd

      s1357

      4

      16

      odd

      s13579bdf

      8

      float8 v = float8(1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f);
      
      auto vv1 = v.hi;   // vv1 = float4(5, 6, 7, 8)
      auto vv2 = v.lo;  // vv2 = float4(1, 2, 3, 4)
      auto vv3 = v.even; // equivalent of v.s0246; vv3 = float4(1, 3, 5, 7)
      auto vv4 = v.odd;  // equivalent of v.s1357; vv4 = float4(2, 4, 6, 8)
      
      auto vv5 = v.odd.even; // vv5 = float2(2, 6)
      
      int3 sv = int3(10, 20, 30);
      
      // ? means undefined value
      auto svv1 = sv.hi;  // svv1 = int2(30, ?)
      auto svv2 = sv.odd; // svv2 = int2(20, ?)
      
      sv.hi = int2(-123, 456); // write to 4-th channel in sv is discarded;
                               // equivalent of sv.s2 = int2(-123, 456).s0
  7. The value of a swizzle expression E1.E2 is vector-swizzle. The expression designates group of components of the object designated by expression E1. Selector E2 specifies which components are designated, how many times and in which order.

    Assuming that in the type of a vector expression E1 is cv Tn where T denotes type of components and n their number in vector type, the resulting vector-swizzle shall have:

    • scalar type cv T if it is result of a swizzle expression with swizzle size of one or

    • vector type cv Tm if it is result of a swizzle expression with swizzle size of two or more.

      Note
      m is a swizzle size.

    If E1 is an lvalue, then E1.E2 is an lvalue; if E1 is an xvalue, then E1.E2 is an xvalue; otherwise, it is a prvalue.

    long2 v;
    const long2  pv = &v;
    
    auto vc1 = pv->x; // pv->x is lvalue vector-swizzle of
                      // scalar type: const long
    auto vc2 = pv->rg; // pv->rg is lvalue vector-swizzle of
                       // vector type: const long2
    
    auto  vc3 = uchar4(1).xxy; // uchar4(1).xxy is prvalue
                               // vector-swizzle
                               // of vector type: uchar3
    
    v.x = long2(1, 2); // ill-formed: cannot assign prvalue of long2
                       // to lvalue vector-swizzle of
                       // scalar type: long - types do not
                       // match
    
  8. A vector-swizzle with vector type T shall have the same number of components as number of components of T. Each component of the vector-swizzle refers to component from E1 designated by corresponding value specified in selector E2, assuming that E1.E2 is swizzle expression used to create the vector-swizzle.

    Note
    First component refers to component from E1 selected by first value in selector E2, second - by second value and so on.

    A vector-swizzle with scalar type T shall behave as value of T and refer to component from E1 designated by E2's value, assuming E1.E2 is swizzle expression used to create the vector-swizzle.

    Note
    It is similar to reference bounded to value of selected component from E1.
  9. A vector-swizzle shall have scalar or vector type. The address-of operator & shall not be applied to vector-swizzle, so there are no pointers to vector-swizzles. A non-const reference shall not be bound to vector-swizzle.

    Note
    If the initializer for a reference of type const T& is lvalue that refers to vector-swizzle, the reference is bound to a temporary initialized to hold the value of the vector-swizzle; the reference is not bound to the vector-swizzle directly.

    There is no declarator for vector-swizzle.

    Note
    Any variable, member or type declaration shall not involve vector-swizzle; vector-swizzle cannot be stored.

    An alignment-specifier shall not be applied to vector-swizzle.

    float4 v;
    
    auto pv1 = &v; // correct: pv1 points to v
    auto pv2 = &v.xy; // ill-formed: address-of operator & is not
                      // allowed on vector-swizzle
    
    const auto &rv1 = v.xx; // correct: refers to temporary value of
                            // float2 type initialized with
                            // value of vector-swizzle
    float2 &rv2 = v.xy; // ill-formed: binding to non-const reference
                        // is not allowed
  10. A result vector-swizzle from swizzle expression E1.E2 is modifiable if:

    • Vector expression E1 is modifiable lvalue and

    • Each component selected by vector-swizzle-selector E2 is selected at most once.

    Expression which modifies unmodifiable vector-swizzle is ill-formed.

    Changes applied to modifiable vector-swizzle are applied to components of E1 referred by the vector-swizzle or by its components.

    char4 v;
    const char4  cv;
    
    v.yx = char2(33, 45); // correct
    v.zzwx = cv; // ill-formed: v.zzwx is not modifiable
                 // (repeated components)
    cv.zxy = char3(1); // ill-formed: cv.zxy is not modifiable
                       // (cv is const)
  11. A prvalue for vector-swizzle of T type can be converted to a prvalue of T type.

    This conversion is called swizzle-to-vector conversion. swizzle-to-vector conversion shall be applied if necessary in all contexts where lvalue-to-rvalue conversions are allowed.

    Note
    swizzle-to-vector conversion shall be applied after lvalue-to-rvalue conversions and before any arithmetic conversions.
  12. A glvalue vector-swizzle of scalar or vector type T can be used in all expressions where glvalue of type T can be used except those which do not meet requirements and restrictions for vector-swizzle.

    Note
    For example the address-of operator & and binding to non-const reference are one of them.
  13. A swizzle expression E1.E2 where E2 selects all components of vector expression E1 in order of their numerical selector values is called identity swizzle.

    Note
    Components selected in E2 are not repeated.
  14. Additional changes to C++ specification:

    • [ISO/IEC 14882:2014: expr.static.cast, ch. 5.2.9 (3)] static_cast: If value is not a bit-field or a vector-swizzle, […]; if value is a vector-swizzle, the lvalue-to-rvalue conversion and swizzle-to-vector conversion are applied to the vector-swizzle and the resulting prvalue is used as the expression of the static_cast for the remainder of this section; otherwise, […]

    • [ISO/IEC 14882:2014: expr.unary.op, ch. 5.3.1 (5)] Unary operators: […] The operand of & shall not be a bit-field or a vector-swizzle.

    • [ISO/IEC 14882:2014: expr.pre.incr, ch. 5.3.2 (1)] Increment and decrement: The result is the updated operand; it is an lvalue, and it is a bit-field or a vector-swizzle if the operand is respectively a bit-field or a vector-swizzle.

    • [ISO/IEC 14882:2014: expr.sizeof, ch. 5.3.3 (2)] Sizeof: […] When applied to a vector-swizzle which has type T, the result is the same as result from sizeof(T).

    • [ISO/IEC 14882:2014: expr.cond, ch. 5.16 (2.1)] Conditional operator: - […] The conditional-expression is a bit-field or a vector-swizzle if that operand is respectively a bit-field or a vector-swizzle.

    • [ISO/IEC 14882:2014: expr.cond, ch. 5.16 (4)] Conditional operator: If the second and third operands are glvalues of the same value category and have the same type, the result is of that type and value category and it is a bit-field if the second or the third operand is a bit-field, or if both are bit-fields. The result is also a vector-swizzle if the second or the third operand is a vector-swizzle, or if both are vector-swizzles.

      Note
      An operand is converted to vector-swizzle if required by applying identity swizzle expression to it.
    • [ISO/IEC 14882:2014: expr.ass, ch. 5.18 (1)] Assignment and compound assignment operators: The result in all cases is a bit-field or a vector-swizzle if the left operand is respectively a bit-field or a vector-swizzle.

    • [ISO/IEC 14882:2014: expr.comma, ch. 5.19 (1)] Comma operator: The type and value of the result are the type and value of the right operand; the result is of the same value category as its right operand, and is a bit-field if its right operand is a glvalue and a bit-field, and is a vector-swizzle its right operand is a glvalue and a vector-swizzle.

    • [ISO/IEC 14882:2014: dcl.type.simple, ch. 7.1.6.2 (4, 4.1)] Simple type specifiers: For an expression e, the type denoted by decltype(e) is defined as follows:

      • if e is an unparenthesized id-expression or an unparenthesized class member access (5.2.5) or unparenthesized swizzle expression, decltype(e) is the type of the entity named by e. If there is no such entity, or if e names a set of overloaded functions, the program is ill-formed.

Vector Constructors

Vector constructors are defined to initialize a vector data type from a list of scalar or vectors. The forms of the constructors that are available is the set of possible argument lists for which all arguments have the same element type as the result vector, and the total number of elements is equal to the number of elements in the result vector. In addition, a form with a single scalar of the same type as the element type of the vector is available.

For example, the following forms are available for float4:

float4( float, float, float, float )
float4( float2, float, float )
float4( float, float2, float )
float4( float, float, float2 )
float4( float2, float2 )
float4( float3, float )
float4( float, float3 )
float4( float )

float4{ float, float, float, float }
float4{ float2, float, float }
float4{ float, float2, float }
float4{ float, float, float2 }
float4{ float2, float2 }
float4{ float3, float }
float4{ float, float3 }
float4{ float }

Operands are evaluated by standard rules for function evaluation, except that implicit scalar-to-vector conversion shall not occur. The order in which the operands are evaluated is undefined. The operands are assigned to their respective positions in the result vector as they appear in memory order. That is, the first element of the first operand is assigned to result.x, the second element of the first operand (or the first element of the second operand if the first operand was a scalar) is assigned to result.y, etc. In the case of the form that has a single scalar operand, the operand is replicated across all lanes of the vector.

Examples:

float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);

uint4  u = uint4(1); // u will be (1, 1, 1, 1).

float4 f = float4(float2(1.0f, 2.0f),
                  float2(3.0f, 4.0f));

float4 f = float4(1.0f, float2(2.0f, 3.0f), 4.0f);

float4 f = float4(1.0f, 2.0f); // error

int4 i = (int4)(1, 2, 3, 4); // warning, vector literals (from OpenCL C) are
                             // not part of OpenCL C++,
                             // this expression will be evaluated to (int4)4,
                             // and i will be (4, 4, 4, 4)
Vector Types and Usual Arithmetic Conversions

Many operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to determine a common real type for the operands and result. For the specified operands, each operand is converted, without change of type domain, to a type whose corresponding real type is the common real type. For this purpose, all vector types shall be considered to have higher conversion ranks than scalars. Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result, whose type domain is the type domain of the operands if they are the same, and complex otherwise. This pattern is called the usual arithmetic conversions. If the operands are of more than one vector type, then an error shall occur. Implicit conversions between vector types are not permitted, per Implicit Type Conversions section.

Otherwise, if there is only a single vector type, and all other operands are scalar types, the scalar types are converted to the type of the vector element, then widened into a new vector containing the same number of elements as the vector, by duplication of the scalar value across the width of the new vector.

2.1.3. Alignment of Types

A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary.

For 3-component vector data types, the size of the data type is 4 * sizeof(component). This means that a 3-component vector data type will be aligned to a 4 * sizeof(component) boundary. The vload3 and vstore3 built-in functions can be used to read and write, respectively, 3-component vector data types from an array of packed scalar data type.

A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions.

The OpenCL C++ compiler is responsible for aligning data items to the appropriate alignment as required by the data type. For arguments to a kernel function declared to be a pointer to a data type, the OpenCL compiler can assume that the pointee is always appropriately aligned as required by the data type. The behavior of an unaligned load or store is undefined, except for the vloadn, vload_halfn, vstoren, and vstore_halfn functions defined in Vector Data Load and Store Functions section. The vector load functions can read a vector from an address aligned to the element type of the vector. The vector store functions can write a vector to an address aligned to the element type of the vector.

2.2. Keywords

The following names are reserved for use as keywords in OpenCL C++ and shall not be used otherwise.

2.3. Implicit Type Conversions

Implicit conversions between scalar built-in types defined in Device built-in scalar data types table (except void) are supported. When an implicit conversion is done, it is not just a re-interpretation of the expression’s value, but a conversion of that value to an equivalent value in the new type. For example, the integer value 5 will be converted to the floating-point value 5.0.

Implicit conversions from a scalar type to a vector type are allowed. In this case, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector. The scalar type is then widened to the vector. If conversion from a scalar type to the element type used by the vector result in truncation or precision loss, the program is ill-formed, with the exception that:

  • if scalar value is prvalue of literal type and the value is representable as the element type, the conversion should take place without error (warnings may be generated in this case).

Implicit conversions between built-in vector data types are disallowed. Explicit conversions described in Conversions Library section must be used instead.

Implicit conversions for pointer types follow the rules described in the C++14 specification.

2.4. Expressions

All expressions behave as described in [ISO/IEC 14882:2014: expr, ch. 5] with the the restrictions described in OpenCL C++ Restrictions section and the following changes:

  1. All built-in operators have their vector counterparts.

  2. All built-in vector operations, apart from conditional operator, are performed component-wise.

    Note
    Conditional operator logical-or-expression cannot be of vector type.
  3. Built in operators taking two vectors require that vectors have the same number of components, otherwise expression is ill-formed.

  4. Vector swizzle operations meet extra requirements and restrictions described in Vector Component Access section.

  5. Implicit and explicit casts between vector types are not legal. The conversion between vector types can be done only using convert_cast from Conversions Library section.

    Examples:

    int4   i;
    uint4  u = (uint4) i; // not allowed
    
    float4 f;
    int4   i = static_cast<int4>(f); // not allowed
    
    float4 f;
    int8   i = (int8) f; // not allowed
  6. Implicit and explicit casts from scalar to vector types are supported.

  7. All built-in arithmetic operators return result of the same built-in type (integer or floating-point) as the type of the operands, after operand type conversion. After conversion, the following cases are valid:

    1. The two operands are scalars. In this case, the operation is applied, resulting in a scalar.

    2. One operand is a scalar, and the other is a vector. In this case, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is done component-wise resulting in the same size vector.

    3. The two operands are vectors of the same type. In this case, the operation is done component-wise resulting in the same size vector.

  8. The built-in relational and equality operators equal (==), not equal (!=), greater than (>), greater than or equal (>=), less than (<), and less than or equal (<=) operate on scalar and vector types. All relational and equality operators result in a boolean (scalar or vector) type. After operand type conversion, the following cases are valid:

    1. The two operands are scalars. In this case, the operation is applied, resulting in a boolean scalar.

    2. One operand is a scalar, and the other is a vector. In this case, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is done component-wise resulting in the same size boolean vector.

    3. The two operands are vectors of the same type. In this case, the operation is done component-wise resulting in the same size boolean vector.

  9. The built-in bitwise operators and (&), or (|), exclusive or (^), not (~) operate on all scalar and vector built-in types except the built-in scalar and vector float types. For vector built-in types, the operators are applied component-wise. If one operand is a scalar and the other is a vector, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is done component-wise resulting in the same size vector.

  10. The built-in logical operators and (&&), or (||) operate on all scalar and vector built-in types. For scalar built-in types the logical operator and (&&) will only evaluate the right hand operand if the left hand operand compares unequal to false. For scalar built-in types the logical operator or (||) will only evaluate the right hand operand if the left hand operand compares equal to false. For built-in vector types, both operands are evaluated and the operators are applied component-wise. If one operand is a scalar and the other is a vector, the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand. The scalar type is then widened to a vector that has the same number of components as the vector operand. The operation is done component-wise resulting in the same size vector.

    The result is a scalar or vector boolean.

  11. The built-in logical unary operator not (!) operates on all scalar and vector built-in types. For built-in vector types, the operators are applied component-wise.

    The result is a scalar or vector boolean.

  12. The built-in conditional operator (?: described in [ISO/IEC 14882:2014: expr, ch. 5.2] operates on three expressions (exp1 ? exp2 : exp3). This operator evaluates the first expression exp1, which must be a scalar boolean result. If the result is true it selects to evaluate the second expression, otherwise it selects to evaluate the third expression. The second and third expressions can be any type, as long their types match, or there is a conversion in Implicit Type Conversions section that can be applied to one of the expressions to make their types match, or one is a vector and the other is a scalar and the scalar may be subject to the usual arithmetic conversion to the element type used by the vector operand and widened to the same type as the vector type.

    This resulting matching type is the type of the entire expression.

  13. The built-in shift operators are supported for built-in vector types except the built-in scalar and vector float types. For built-in vector types, the operators are applied component-wise. For the right-shift (>>), left-shift (<<) operators, the rightmost operand must be a scalar if the first operand is a scalar, and the rightmost operand can be a vector or scalar if the first operand is a vector. The result of E1 << E2 is E1 left-shifted by log2(N) least significant bits in E2 viewed as an unsigned integer value, where N is the number of bits used to represent the data type of E1 after integer promotion, if E1 is a scalar, or the number of bits used to represent the type of E1 elements, if E1 is a vector. The vacated bits are filled with zeros. The result of E1 >> E2 is E1 right-shifted by log2(N) least significant bits in E2 viewed as an unsigned integer value, where N is the number of bits used to represent the data type of E1 after integer promotion, if E1 is a scalar, or the number of bits used to represent the type of E1 elements, if E1 is a vector.

    If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the vacated bits are filled with zeros.

    If E1 has a signed type and a negative value, the vacated bits are filled with ones.

2.5. Address Spaces

The OpenCL C++ kernel language doesn’t introduce any explicit named address spaces, but they are implemented as part of the standard library described in Address Spaces Library section. There are 4 types of memory supported by all OpenCL devices: global, local, private and constant. The developers should be aware of them and know their limitations.

2.5.1. Implicit Storage Classes

The OpenCL C++ compiler can deduce an address space based on the scope where an object is declared:

  • If a variable is declared in program scope, with static or extern specifier and the standard library storage class (Address Spaces Library section) is not used, the variable is allocated in the global memory of a device.

  • If a variable is declared in function scope, without static specifier and the standard library storage class (Address Spaces Library section) is not used, the variable is allocated in the private memory of a device.

2.5.2. Memory Pools

Global

The variables are allocated from the global memory pool if they meet the criteria described in Implicit Storage Classes section for the implicit global storage class or they are declared using explicit global storage class from the standard library (global class section).

The global memory objects can be:

  • Passed by pointer or reference to a kernel from the host. In such case the host manages their visibility, lifetime and a type of allocation.

  • Declared in the program source (static, extern and program scope global variables). In such case they are:

    • the coarse-grained SVM allocations that can be usable by multiple kernels on the same device safely

    • not shared across devices

    • not accessible from the host

    • their lifetime is the same as a program

The non-trivial constructors and destructors are supported with limitations described in Memory initialization section.

The constructors of objects in global memory are executed before the first kernel execution in the program. The destructors executed at program release time.

The additional restrictions may apply if the explicit global storage class is used. Please refer to Restrictions section for more details.

Local

The local variables can be only allocated in a program using the explicit local storage class from the standard library (local class section). This type of memory is allocated for each work-group executing the kernel and exist only for the lifetime of the work-group executing the kernel.

The non-trivial constructors and destructors are supported with limitations described in Memory initialization section.

The constructors of objects in local memory are executed by one work-item before the kernel body execution. The destructors are executed by one work-item after the kernel body execution.

Note
initialization of local variables can cause performance degradation.

The additional restrictions may apply if the explicit local storage class is used. Please refer to Restrictions section for more details.

Private

The variables are allocated from the private memory pool if they meet the criteria described in Implicit Storage Classes for the implicit private storage class or they were declared using explicit private storage class from the standard library (priv class section).

The non-trivial constructors and destructors are supported.

The additional restrictions may apply if the explicit priv storage class is used. Please refer to Restrictions section for more details.

Constant

The constant variables can be only allocated in a program using the explicit constant storage class from the standard library (constant class section). The variables declared using the constant<T> class refer to memory objects allocated from the global memory pool and which are accessed inside a kernel(s) as read-only variables. These read-only variables can be accessed by all (global) work-items of the kernel during its execution.

The constant objects must be constructible at compile time, they cannot have any user defined constructors, destructors, methods and operators. Otherwise behavior is undefined.

The additional restrictions may apply if the explicit constant storage class is used. Please refer to Restrictions section for more details.

2.5.3. Pointers and references

All C++ pointers and references point to an object in the unnamed/generic address space if the explicit address space pointer classes are not used. The explicit address space pointer classes are implemented as a part of the standard library and they are described in Explicit address space pointer classes section.

2.5.4. Memory initialization

Table 8. Supported memory initializers
Storage memory (address space) Scope type Initialization type

uninitialized (no constructor or trivial default constructor)

AND

trivial destructor

initialized by constant expression

AND

trivial destructor

custom initializer

OR

custom destructor

local

program

supported (not zero-pre-init)

not supported

not supported

kernel

supported

Variables are not zero-pre-initialized.

Optional zero-pre-initialization possible using switch: -cl-zero-init-local-mem-vars

supported

Variables are not zero-pre-initialized.

Materialize temporary expressions are not supported.

Optional zero-pre-initialization possible using switch: -cl-zero-init-local-mem-vars

supported

Variables are not zero-pre-initialized.

Materialize temporary expressions are not supported.

Optional zero-pre-initialization possible using switch: -cl-zero-init-local-mem-vars

local (non-kernel)

not supported

not supported

not supported

class (static data member)

supported

Variables are not zero-pre-initialized.

not supported

not supported

global

program

supported

Variables are zero-pre-initialized.

supported

Variables are zero or constexpr-pre-initialized.

supported

kernel / local

supported

Variables are zero-pre-initialized.

supported

Variables are zero or constexpr-pre-initialized.

not supported

class (static data member)

supported

Variables are zero-pre-initialized.

supported

Variables are zero or constexpr-pre-initialized.

not supported

constant

(any)

supported

Variables are zero-pre-initialized.

supported

Variables are zero or constexpr-pre-initialized.

not supported

private

(any)

supported

supported

supported

2.6. Kernel Functions

2.6.1. Function Qualifiers

The kernel (or __kernel) qualifier declares a function to be a kernel that can be executed by an application on an OpenCL device(s). The following rules apply to functions that are declared with this qualifier:

  • It can be executed on the device only.

  • It can be enqueued by the host or on the device.

The kernel and __kernel names are reserved for use as functions qualifiers and shall not be used otherwise.

2.6.2. Restrictions

Kernel Function Restrictions
  • A kernel functions are by implicitly declared as extern "C".

  • A kernel function cannot be overloaded.

  • A kernel function cannot be template functions.

  • A kernel function cannot be called by another kernel function.

  • A kernel function cannot have parameters specified with default values.

  • A kernel function must have the return type void.

  • A kernel function cannot be called main.

Kernel Parameter Restrictions

The OpenCL host compiler and the OpenCL C++ kernel language device compiler can have different requirements for i.e. type sizes, data packing and alignment, etc., therefore the kernel parameters must meet the following requirements:

  • Types passed by pointer or reference must be standard layout types.

  • Types passed by value must be POD types.

  • Types cannot be declared with the built-in bool scalar type, vector type or a class that contain bool scalar or vector type fields.

  • Types cannot be structures and classes with bit field members.

  • Marker types must be passed by value (Marker Types section).

  • global, constant, local storage classes can be passed only by reference or pointer. More details in Explicit address space storage classes section.

  • Pointers and references must point to one of the following address spaces: global, local or constant.

2.7. Preprocessor Directives and Macros

The preprocessing directives defined by the C++14 specification (section 16) are supported.

The #pragma directive is described as:

#pragma pp-tokensopt new-line

A #pragma directive where the preprocessing token OPENCL (used instead of STDC) does not immediately follow pragma in the directive (prior to any macro replacement) causes the implementation to behave in an implementation-defined manner. The behavior might cause translation to fail or cause the translator or the resulting program to behave in a non-conforming manner. Any such pragma that is not recognized by the implementation is ignored. If the preprocessing token OPENCL does immediately follow pragma in the directive (prior to any macro replacement), then no macro replacement is performed on the directive, and the directive shall have one of the following forms whose meanings are described elsewhere:

#pragma OPENCL FP_CONTRACT on-off-switch // on-off-switch: one of ON OFF DEFAULT

#pragma OPENCL EXTENSION extensionname : behavior
#pragma OPENCL EXTENSION all : behavior

The following predefined macro names are available.

__FILE__ The presumed name of the current source file (a character string literal).

__LINE__ The presumed line number (within the current source file) of the current source line (an integer constant).

__OPENCL_CPP_VERSION__ substitutes an integer reflecting the OpenCL C++ version specified when compiling the OpenCL C++ program. The version of OpenCL C++ described in this document will have __OPENCL_CPP_VERSION__ substitute the integer 100.

The macro names defined by the C++14 specification in section 16 but not currently supported by OpenCL are reserved for future use.

The predefined identifier __func__ is available.

2.8. Attribute Qualifiers

The [[ ]] attribute qualifier syntax allows additional attributes to be attached to types, variables, kernel functions, kernel parameters, or loops.

Some attributes change the semantics of the program and are required for program correctness. Other attributes are optional hints that may be ignored without affecting program correctness. Nevertheless, frontend compilers that compile to an intermediate representation are required to faithfully pass optional attribute hints with an intermediate representation to device compilers for further processing.

2.8.1. Optional Type Attributes

[[ ]] attribute syntax can be used to specify special attributes of enum, class and union types when you define such types. Two attributes are currently defined for types: aligned, and packed.

You may specify type attributes in an enum, class or union type declaration or definition, or for other types in a typedef declaration.

For an enum, class or union type, you may specify attributes either between the enum, class or union tag and the name of the type, or just past the closing curly brace of the definition. The former syntax is preferred.

cl::aligned (alignment)

This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:

struct S { short f[3]; } [[cl::aligned(8)]];
typedef int more_aligned_int [[cl::aligned(8)]];

force the compiler to insure (as far as it can) that each variable whose type is struct S or more_aligned_int will be allocated and aligned at least on a 8-byte boundary.

Note that the alignment of any given struct or union type is required by the C++ standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question and must also be a power of two. This means that you can effectively adjust the alignment of a class or union type by attaching an aligned attribute to any one of the members of such a type, but the notation illustrated in the example above is a more obvious, intuitive, and readable way to request the compiler to adjust the alignment of an entire class or union type.

As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given class or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:

struct S { short f[3]; } [[cl::aligned]];

Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment which is ever used for any data type on the target machine you are compiling for. In the example above, the size of each short is 2 bytes, and therefore the size of the entire struct S type is 6 bytes. The smallest power of two which is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.

Note that the effectiveness of aligned attributes may be limited by inherent limitations of the OpenCL device and compiler. For some devices, the OpenCL compiler may only be able to arrange for variables to be aligned up to a certain maximum alignment. If the OpenCL compiler is only able to align variables up to a maximum of 8 byte alignment, then specifying aligned(16) will still only provide you with 8 byte alignment. See your platform-specific documentation for further information.

The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well. See below.

cl::packed

This attribute, attached to class or union type definition, specifies that each member of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.

Specifying this attribute for class and union types is equivalent to specifying the packed attribute on each of the structure or union members.

In the following example struct my_packed_struct’s members are packed closely together, but the internal layout of its s member is not packed. To do that, struct my_unpacked_struct would need to be packed, too.

struct my_unpacked_struct
{
  char c;
  int i;
};

struct [[cl::packed]] my_packed_struct
{
  char c;
  int  i;
  struct my_unpacked_struct s;
};

You may only specify this attribute on the definition of an enum, class or union, not on a typedef which does not also define the enumerated type, structure or union.

2.8.2. Optional Variable Attributes

[[ ]] syntax allows you to specify special attributes of variables or structure fields. The following attribute qualifiers are currently defined:

cl::aligned

This attribute specifies a minimum alignment for the variable or class field, measured in bytes. For example, the declaration:

int x [[cl::aligned(16)]] = 0;

causes the compiler to allocate the global variable x on a 16-byte boundary. The alignment value specified must be a power of two.

You can also specify the alignment of structure fields. For example, to create double-word aligned int pair, you could write:

struct foo { int x[2] [[cl::aligned(8)]]; };

This is an alternative to creating a union with a double member that forces the union to be double-word aligned.

As in the preceding examples, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given variable or structure field. Alternatively, you can leave out the alignment factor and just ask the compiler to align a variable or field to the maximum useful alignment for the target machine you are compiling for. For example, you could write:

short array[3] [[cl::aligned]];

Whenever you leave out the alignment factor in an aligned attribute specification, the OpenCL compiler automatically sets the alignment for the declared variable or field to the largest alignment which is ever used for any data type on the target device you are compiling for.

When used on a class, or class member, the aligned attribute can only increase the alignment; in order to decrease it, the packed attribute must be specified as well. When used as part of a typedef, the aligned attribute can both increase and decrease alignment, and specifying the packed attribute will generate a warning.

Note that the effectiveness of aligned attributes may be limited by inherent limitations of the OpenCL device and compiler. For some devices, the OpenCL compiler may only be able to arrange for variables to be aligned up to a certain maximum alignment. If the OpenCL compiler is only able to align variables up to a maximum of 8 byte alignment, then specifying aligned(16) will still only provide you with 8 byte alignment. See your platform-specific documentation for further information.

cl::packed

The packed attribute specifies that a variable or class field should have the smallest possible alignment - one byte for a variable, unless you specify a larger value with the aligned attribute.

Here is a structure in which the field x is packed, so that it immediately follows a:

struct foo
{
  char a;
  int x[2] [[cl::packed]];
};

An attribute list placed at the beginning of a user-defined type applies to the variable of that type and not the type, while attributes following the type body apply to the type.

For example:

/* a has alignment of 128 */
[[cl::aligned(128)]] struct A { int i; } a;

/* b has alignment of 16 */
[[cl::aligned(16)]] struct B { double d; } [[cl::aligned(32)]] b;

struct A a1; /* a1 has alignment of 4 */

struct B b1; /* b1 has alignment of 32 */

2.8.3. Optional Kernel Function Attributes

The kernel qualifier can be used with the [[ ]] attribute syntax to declare additional information about the kernel function. The kernel function attributes must appear immediately before the kernel function to be affected.

The following attributes are supported:

cl::work_group_size_hint

The optional [[cl::work_group_size_hint(X, Y, Z)]] is a hint to the compiler and is intended to specify the work-group size that may be used i.e. value most likely to be specified by the local_work_size argument to clEnqueueNDRangeKernel. For example the [[cl::work_group_size_hint(1, 1, 1)]] is a hint to the compiler that the kernel will most likely be executed with a work-group size of 1.

The specialization constants (Specialization Constants section) can be used as arguments of cl::work_group_size_hint attribute.

cl::required_work_group_size

The optional [[cl::required_work_group_size(X, Y, Z)]] is the work-group size that must be used as the local_work_size argument to clEnqueueNDRangeKernel. This allows the compiler to optimize the generated code appropriately for this kernel.

If Z is one, the work_dim argument to clEnqueueNDRangeKernel can be 2 or 3. If Y and Z are one, the work_dim argument to clEnqueueNDRangeKernel can be 1, 2 or 3.

The specialization constants (Specialization Constants section) can be used as arguments of cl::required_work_group_size(X, Y, Z) attribute.

cl::required_num_sub_groups

The optional [[cl::required_num_sub_groups(X)]] is the number of sub-groups that must be generated by a kernel launch. To ensure that this number is created the queries mapping number of sub-groups to local size may be used. This allows the compiler to optimize the kernel based on the sub-group count and in addition allows the API to enforce correctness of kernel use to the user when concurrency of sub-groups is a requirement.

The specialization constants (Specialization Constants section) can be used as argument of cl::required_num_sub_groups attribute.

cl::vec_type_hint

The optional [[cl::vec_type_hint(<type>)]] is a hint to the compiler and is intended to be a representation of the computational width of the kernel, and should serve as the basis for calculating processor bandwidth utilization when the compiler is looking to autovectorize the code. In the [[cl::vec_type_hint(<type>)]] qualifier <type> is one of the built-in vector types listed in Device built-in vector data types table or the constituent scalar element types. If cl::vec_type_hint(<type>) is not specified, the kernel is assumed to have the [[cl::vec_type_hint(int)]] qualifier.

For example, where the developer specified a width of float4, the compiler should assume that the computation usually uses up to 4 lanes of a float vector, and would decide to merge work-items or possibly even separate one work-item into many threads to better match the hardware capabilities. A conforming implementation is not required to autovectorize code, but shall support the hint. A compiler may autovectorize, even if no hint is provided. If an implementation merges N work-items into one thread, it is responsible for correctly handling cases where the number of global or local work-items in any dimension modulo N is not zero.

Examples:

// autovectorize assuming float4 as the
// basic computation width
[[cl::vec_type_hint(float4)]] kernel
void foo(cl::global_ptr<float4> p) { ... }

// autovectorize assuming double as the
// basic computation width
[[cl::vec_type_hint(double)]] kernel
void foo(cl::global_ptr<float4> p) { ... }

// autovectorize assuming int (default)
// as the basic computation width
kernel void foo(cl::global_ptr<float4> p) { ... }

2.8.4. Optional Kernel Parameter Attributes

The kernel parameter can be used with the [[ ]] attribute syntax to declare additional information about an argument passed to the kernel. The kernel parameter attributes must appear immediately before or after the kernel parameter declaration to be affected.

The following attributes are supported:

cl::max_size

This attribute can be provided with a kernel argument of type constant_ptr<T>, constant<T>*, constant<T>&, local_ptr<T>, local<T>*, local<T>&. The value of the attribute specifies the maximum size in bytes of the corresponding memory object. This size cannot exceed the limits supported by the device:

  • CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE for the kernel arguments in constant memory

  • CL_DEVICE_LOCAL_MEM_SIZE for the kernel arguments in local memory

The specialization constants (Specialization Constants section) can be used as argument of cl::max_size attribute.

Examples:

#include <opencl_memory>

kernel void foo([[cl::max_size(65536)]] cl::constant_ptr<int> arg) {
  //...
}

2.8.5. Optional Loop Attributes

cl::unroll_hint

The [[cl::unroll_hint]] and [[cl::unroll_hint(n)]] attribute qualifiers can be used to specify that a loop (for, while and do loops) can be unrolled. This attribute qualifier can be used to specify full unrolling or partial unrolling by a specified amount. This is a compiler hint and the compiler may ignore this directive.

n is the loop unrolling factor and must be a positive integral compile time constant expression. An unroll factor of 1 disables unrolling. If n is not specified, the compiler determines the unrolling factor for the loop.

Note
The [[cl::unroll_hint(n)]] attribute qualifier must appear immediately before the loop to be affected.

Examples:

[[cl::unroll_hint(2)]]
while (*s != 0)
    *p++ = *s++;

This tells the compiler to unroll the above while loop by a factor of 2.

[[cl::unroll_hint]]
for (int i=0; i<2; i++) {
   //...
}

In the example above, the compiler will determine how much to unroll the loop.

[[cl::unroll_hint(1)]]
for (int i=0; i<32; i++) {
  //...
}

The above is an example where the loop should not be unrolled.

Below are some examples of invalid usage of [[cl::unroll_hint(n)]].

[[cl::unroll_hint(-1)]]
while (/* ... */) {
  //...
}

The above example is an invalid usage of the loop unroll factor as the loop unroll factor is negative.

[[cl::unroll_hint]]
if(/* ... */) {
  //...
}

The above example is invalid because the unroll_hint attribute qualifier is used on a non-loop construct.

kernel void my_kernel(/* ... */) {
  int x;
  [[cl::unroll_hint(x)]]
  for (int i=0; i<x; i++) {
    //...
  }
}

The above example is invalid because the loop unroll factor is not a compile-time constant expression.

cl::ivdep

The [[cl::ivdep]] (ignore vector dependencies) attribute qualifier is a hint to the compiler and may appear in loops to indicate that the compiler may assume there are no memory dependencies across loop iterations in order to autovectorize consecutive iterations of the loop. This attribute qualifier may appear in one of the following forms:

[[cl::ivdep]]
[[cl::ivdep(len)]]

If the parameter len is specified, it is used to specify the maximum number of consecutive iterations without loop-carried dependencies. len is a lower bound on the distance of any loop-carried dependence, and it applies to arbitrary alignment. For example, any 4 consecutive iterations can be vectorized with cl::ivdep(4). The len parameter must be a positive integer. The final decision whether to autovectorize the complete loop may be subject to other compiler heuristics as well as flags e.g., -cl-fast-relaxed-math to ignore non-associated operations.

Examples:

[[cl::ivdep]]
for (int i=0; i<N; i++) {
    C[i+offset] = A[i+offset] * B[i+offset];
}

In the example above, assuming that A and B are not restricted pointers, it is unknown if C aliases A or B. Placing the [[cl::ivdep]] attribute before the loop lets the compiler assume there are no memory dependencies across the loop iterations.

[[c::ivdep(8)]]
for (int i=0; i<N; i++) {
    A[i+K] = A[i] * B[i];
}

In the example above, buffer A is read from and written to in the loop iterations. In each iteration, the read and write to A are to different indices. In this case it is not safe to vectorize the loop to a vector length greater than K, so the len parameter is specified with a value that is known to be not greater than any value that K may take during the execution of loop. In this example we are guaranteed (by len) that K will always be greater than or equal to 8.

Below is an example of invalid usage of [[cl::ivdep]].

[[cl::ivdep(-1)]]
for (int i=0; i<N; i++) {
    C[i+offset] = A[i+offset] * B[i+offset];
}

The above example is an invalid usage of the attribute qualifier as len is negative.

2.8.6. Extending Attribute Qualifiers

The attribute syntax can be extended for standard language extensions and vendor specific extensions. Any extensions should follow the naming conventions outlined in the introduction to section 9 in the OpenCL 2.2 Extension Specification.

Attributes are intended as useful hints to the compiler. It is our intention that a particular implementation of OpenCL be free to ignore all attributes and the resulting executable binary will produce the same result. This does not preclude an implementation from making use of the additional information provided by attributes and performing optimizations or other transformations as it sees fit. In this case it is the programmer’s responsibility to guarantee that the information provided is in some sense correct.

2.9. Restrictions

The following C++14 features are not supported by OpenCL C++:

  • the dynamic_cast operator (ISO C++ Section 5.2.7)

  • type identification (ISO C++ Section 5.2.8)

  • recursive function calls (ISO C++ Section 5.2.2, item 9) unless they are a compile-time constant expression

  • non-placement new and delete operators (ISO C++ Sections 5.3.4 and 5.3.5)

  • goto statement (ISO C++ Section 6.6)

  • register and thread_local storage qualifiers (ISO C++ Section 7.1.1)

  • virtual function qualifier (ISO C++ Section 7.1.2)

  • function pointers (ISO C++ Sections 8.3.5 and 8.5.3) unless they are a compile-time constant expression

  • virtual functions and abstract classes (ISO C++ Sections 10.3 and 10.4)

  • exception handling (ISO C++ Section 15)

  • the C++ standard library (ISO C++ Sections 17 … 30)

  • asm declaration (ISO C++ Section 7.4)

  • no implicit lambda to function pointer conversion (ISO C++ Section 5.1.2, item 6)

  • variadic functions (ISO C99 Section 7.15, Variable arguments <stdarg.h>)

  • and, like C++, OpenCL C++ does not support variable length arrays (ISO C99, Section 6.7.5).

To avoid potential confusion with the above, please note the following features are supported in OpenCL C++:

  • All variadic templates (ISO C++ Section 14.5.3) including variadic function templates are supported.

Note
This page refers to ISO C99 instead of ISO C11 since the ISO C++14 document refers to ISO C99 in ISO C++ Section 1.2 and Annex C.

3. OpenCL C++ Standard Library

OpenCL C++ does not support the C++14 standard library, but instead implements its own standard library. No OpenCL types and functions are auto-included.

3.1. OpenCL Definitions

Header <opencl_def> defines OpenCL scalar, vector types and macros. cl_* types are guaranteed to have exactly the same size as their host counterparts defined in cl_platform.h file.

3.1.1. Header <opencl_def> Synopsis

#define __OPENCL_CPP_VERSION__ 100

typedef __SIZE_TYPE__     size_t;
typedef __PTRDIFF_TYPE__  ptrdiff_t;
typedef decltype(nullptr) nullptr_t;
#define NULL              nullptr

typedef __INT8_TYPE__     int8_t     [[cl::aligned(1)]];
typedef __UINT8_TYPE__    uint8_t    [[cl::aligned(1)]];
typedef __INT16_TYPE__    int16_t    [[cl::aligned(2)]];
typedef __UINT16_TYPE__   uint16_t   [[cl::aligned(2)]];
typedef __INT32_TYPE__    int32_t    [[cl::aligned(4)]];
typedef __UINT32_TYPE__   uint32_t   [[cl::aligned(4)]];
typedef __INT64_TYPE__    int64_t    [[cl::aligned(8)]];
typedef __UINT64_TYPE__   uint64_t   [[cl::aligned(8)]];

#if   __INTPTR_WIDTH__ == 32
typedef int32_t           intptr_t;
typedef uint32_t          uintptr_t;
#elif __INTPTR_WIDTH__ == 64
typedef int64_t           intptr_t;
typedef uint64_t          uintptr_t;
#endif

namespace cl
{
using ::intptr_t;
using ::uintptr_t;
using ::ptrdiff_t;
using ::nullptr_t;
using ::size_t;
}

typedef int8_t            cl_char;
typedef uint8_t           cl_uchar;
typedef int16_t           cl_short
typedef uint16_t          cl_ushort;
typedef int32_t           cl_int;
typedef uint32_t          cl_uint;
typedef int64_t           cl_long;
typedef uint64_t          cl_ulong;

#ifdef cl_khr_fp16
typedef half              cl_half   [[aligned(2)]];
#endif
typedef float             cl_float  [[aligned(4)]];
#ifdef cl_khr_fp64
typedef double            cl_double [[aligned(8)]];
#endif

typedef implementation-defined bool2;
typedef implementation-defined bool3;
typedef implementation-defined bool4;
typedef implementation-defined bool8;
typedef implementation-defined bool16;
typedef implementation-defined char2;
typedef implementation-defined char3;
typedef implementation-defined char4;
typedef implementation-defined char8;
typedef implementation-defined char16;
typedef implementation-defined uchar2;
typedef implementation-defined uchar3;
typedef implementation-defined uchar4;
typedef implementation-defined uchar8;
typedef implementation-defined uchar16;
typedef implementation-defined short2;
typedef implementation-defined short3;
typedef implementation-defined short4;
typedef implementation-defined short8;
typedef implementation-defined short16;
typedef implementation-defined ushort2;
typedef implementation-defined ushort3;
typedef implementation-defined ushort4;
typedef implementation-defined ushort8;
typedef implementation-defined ushort16;
typedef implementation-defined int2;
typedef implementation-defined int3;
typedef implementation-defined int4;
typedef implementation-defined int8;
typedef implementation-defined int16;
typedef implementation-defined uint2;
typedef implementation-defined uint3;
typedef implementation-defined uint4;
typedef implementation-defined uint8;
typedef implementation-defined uint16;
typedef implementation-defined long2;
typedef implementation-defined long3;
typedef implementation-defined long4;
typedef implementation-defined long8;
typedef implementation-defined long16;
typedef implementation-defined ulong2;
typedef implementation-defined ulong3;
typedef implementation-defined ulong4;
typedef implementation-defined ulong8;
typedef implementation-defined ulong16;
typedef implementation-defined float2;
typedef implementation-defined float3;
typedef implementation-defined float4;
typedef implementation-defined float8;
typedef implementation-defined float16;
#ifdef cl_khr_fp16
typedef implementation-defined half2;
typedef implementation-defined half3;
typedef implementation-defined half4;
typedef implementation-defined half8;
typedef implementation-defined half16;
#endif
#ifdef cl_khr_fp64
typedef implementation-defined double2;
typedef implementation-defined double3;
typedef implementation-defined double4;
typedef implementation-defined double8;
typedef implementation-defined double16;
#endif

typedef bool2    cl_bool2;
typedef bool3    cl_bool3;
typedef bool4    cl_bool4;
typedef bool8    cl_bool8;
typedef bool16   cl_bool16;
typedef char2    cl_char2;
typedef char3    cl_char3;
typedef char4    cl_char4;
typedef char8    cl_char8;
typedef char16   cl_char16;
typedef uchar2   cl_uchar2;
typedef uchar3   cl_uchar3;
typedef uchar4   cl_uchar4;
typedef uchar8   cl_uchar8;
typedef uchar16  cl_uchar16;
typedef short2   cl_short2;
typedef short3   cl_short3;
typedef short4   cl_short4;
typedef short8   cl_short8;
typedef short16  cl_short16;
typedef ushort2  cl_ushort2;
typedef ushort3  cl_ushort3;
typedef ushort4  cl_ushort4;
typedef ushort8  cl_ushort8;
typedef ushort16 cl_ushort16;
typedef int2     cl_int2;
typedef int3     cl_int3;
typedef int4     cl_int4;
typedef int8     cl_int8;
typedef int16    cl_int16;
typedef uint2    cl_uint2;
typedef uint3    cl_uint3;
typedef uint4    cl_uint4;
typedef uint8    cl_uint8;
typedef uint16   cl_uint16;
typedef long2    cl_long2;
typedef long3    cl_long3;
typedef long4    cl_long4;
typedef long8    cl_long8;
typedef long16   cl_long16;
typedef ulong2   cl_ulong2;
typedef ulong3   cl_ulong3;
typedef ulong4   cl_ulong4;
typedef ulong8   cl_ulong8;
typedef ulong16  cl_ulong16;
typedef float2   cl_float2;
typedef float3   cl_float3;
typedef float4   cl_float4;
typedef float8   cl_float8;
typedef float16  cl_float16;
#ifdef cl_khr_fp16
typedef half2    cl_half2;
typedef half3    cl_half3;
typedef half4    cl_half4;
typedef half8    cl_half8;
typedef half16   cl_half16;
#endif
#ifdef cl_khr_fp64
typedef double2  cl_double2;
typedef double3  cl_double3;
typedef double4  cl_double4;
typedef double8  cl_double8;
typedef double16 cl_double16;
#endif

3.2. Conversions Library

This section describes the explicit conversion cast functions. These functions provide a full set of type conversions between supported scalar and vector data types (see Built-in Scalar Data Types and Built-in Vector Data Types sections) except for the following types: size_t, ptrdiff_t, intptr_t, uintptr_t, and void.

The behavior of the conversion may be modified by one or two optional modifiers that specify saturation for out-of-range inputs and rounding behavior.

The convert_cast type conversion operator that specifies a rounding mode and saturation is also provided.

3.2.1. Header <opencl_convert> Synopsis

namespace cl
{
enum class rounding_mode { rte, rtz, rtp, rtn };
enum class saturate { off, on };

template <class T, class U>
T convert_cast(U const& arg);
template <class T>
T convert_cast(T const& arg);

template <class T, rounding_mode rmode, class U>
T convert_cast(U const& arg);
template <class T, rounding_mode rmode>
T convert_cast(T const& arg);

template <class T, saturate smode, class U>
T convert_cast(U const& arg);
template <class T, saturate smode>
T convert_cast(T const& arg);

template <class T, rounding_mode rmode, saturate smode, class U>
T convert_cast(U const& arg);
template <class T, rounding_mode rmode, saturate smode>
T convert_cast(T const& arg);

}

3.2.2. Data Types

Conversions are available for the following scalar types: bool, char, uchar, short, ushort, int, uint, long, ulong, half [4], float, double, and built-in vector types derived therefrom. The operand and result type must have the same number of elements. The operand and result type may be the same type in which case the conversion has no effect on the type or value of an expression.

Conversions between integer types follow the conversion rules specified in the C++14 specification except for out-of-range behavior and saturated conversions which are described in Out-of-Range Behavior and Saturated Conversions section below.

3.2.3. Rounding Modes

Conversions to and from floating-point type shall conform to IEEE-754 rounding rules. Conversions may have an optional rounding mode specified as described in the table belows.

Table 9. Rounding Modes

Rounding Mode

Description

rte

Round to nearest even

rtz

Round toward zero

rtp

Round toward positive infinity

rtn

Round toward negative infinity

If a rounding mode is not specified, conversions to integer type use the rtz (round toward zero) rounding mode and conversions to floating-point type [5] uses the rte rounding mode.

3.2.4. Out-of-Range Behavior and Saturated Conversions

When the conversion operand is either greater than the greatest representable destination value or less than the least representable destination value, it is said to be out-of-range. The result of out-of-range conversion is determined by the conversion rules specified by the C++14 specification in chapter 4.9. When converting from a floating-point type to integer type, the behavior is implementation-defined.

Conversions to integer type may opt to convert using the optional saturation mode. When in saturated mode, values that are outside the representable range shall clamp to the nearest representable value in the destination format. (NaN should be converted to 0).

Conversions to floating-point type shall conform to IEEE-754 rounding rules. The convert_cast operator with a saturate argument may not be used for conversions to floating-point formats.

3.2.5. Examples

Example 1

Examples of casting between two vector types with saturation.

#include <opencl_convert>
using namespace cl;

kernel void Foo() {
short4 s;
        // negative values clamped to 0
        ushort4 u = convert_cast<ushort4,saturate::on>(s);

// values > CHAR_MAX converted to CHAR_MAX
        // values < CHAR_MIN converted to CHAR_MIN
        char4 c = convert_cast<char4, saturate::on>(s);
}
Example 2

Examples of casting from float to integer vector type with saturation and rounding mode specified.

#include <opencl_convert>
using namespace cl;

kernel void Foo() {
        float4  f;

        // values implementation defined for
        // f > INT_MAX, f < INT_MIN or NaN
        int4    i1 = convert_cast<int4>(f);

        // values > INT_MAX clamp to INT_MAX, values < INT_MIN clamp
        // to INT_MIN. NaN should produce 0.
        // The rtz rounding mode is used to produce the integer
        // values.
        int4    i2 = convert_cast<int4,saturate::on>(f);

        // similar to convert_cast<int4>, except that floating-point
        // values are rounded to the nearest integer instead of
        // truncated
        int4    i3 = convert_cast<int4, rounding_mode::rte>(f);

        // similar to convert_cast<int4, saturate::on>, except that
        // floating-point values are rounded to the nearest integer
        // instead of truncated
        int4    i4 = convert_cast<int4, rounding_mode::rte,
        saturate::on>(f);
}
Example 3

Examples of casting from integer to float vector type.

#include <opencl_convert>
using namespace cl;

kernel void Foo() {
        int4    i;

        // convert ints to floats using the default rounding mode.
        float4  f1 = convert_cast<float4>(i);

        // convert ints to floats. integer values that cannot
        // be exactly represented as floats should round up to the
        // next representable float.
        float4  f2 = convert_cast<float4, rounding_mode::rtp>(i);
}

3.3. Reinterpreting Data Library

It is frequently necessary to reinterpret bits in a data type as another data type in OpenCL C++. This is typically required when direct access to the bits in a floating-point type is needed, for example to mask off the sign bit or make use of the result of a vector relational operator on floating-point data.

3.3.1. Header <opencl_reinterpret> Synopsis

namespace cl
{
template <class T, class U>
T as_type(U const& arg);

}

3.3.2. Reinterpreting Types

All data types described in Device built-in scalar data types and Device built-in vector data types tables (except bool and void) may be also reinterpreted as another data type of the same size using the as_type() [6] function for scalar and vector data types. When the operand and result type contain the same number of elements, the bits in the operand shall be returned directly without modification as the new type. The usual type promotion for function arguments shall not be performed.

For example, as_type<float>(0x3f800000) returns 1.0f, which is the value that the bit pattern 0x3f800000 has if viewed as an IEEE-754 single precision value.

When the operand and result type contain a different number of elements, the result shall be implementation-defined except if the operand is a 4-component vector and the result is a 3-component vector. In this case, the bits in the operand shall be returned directly without modification as the new type. That is, a conforming implementation shall explicitly define a behavior, but two conforming implementations need not have the same behavior when the number of elements in the result and operand types does not match. The implementation may define the result to contain all, some or none of the original bits in whatever order it chooses. It is an error to use the as_type<T> operator to reinterpret data to a type of a different number of bytes.

3.3.3. Examples

Example 1

Examples of reinterpreting data types using as_type<> function.

#include <opencl_reinterpret>
using namespace cl;

kernel void Foo() {
        float f = 1.0f;
        uint u = as_type<uint>(f);      // Legal. Contains:  0x3f800000

        float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
        // Legal. Contains:
        // int4(0x3f800000, 0x40000000, 0x40400000, 0x40800000)
        int4 i = as_type<int4>(f);

        int i;
        // Legal. Result is implementation-defined.
        short2 j = as_type<short2>(i);

        int4 i;
        // Legal. Result is implementation-defined.
        short8 j = as_type<short8>(i);

        float4 f;
        // Error.  Result and operand have different sizes
        double4 g = as_type<double4>(f);

        float4 f;
        // Legal. g.xyz will have same values as f.xyz.  g.w is
        // undefined
        float3 g = as_type<float3>(f);
}

3.4. Address Spaces Library

Unlike OpenCL C, OpenCL C++ does not require the address space qualifiers to allocate storage from global, local and constant memory pool. The same functionality is provided using the storage and pointer classes. These new types are designed to avoid many programming issues and it is recommended to use them for the static and program scope variables even if it is not required.

3.4.1. Header <opencl_memory> Synopsis

namespace cl
{
enum class mem_fence
{
    local,
    global,
    image
};

inline mem_fence operator ~(mem_fence flags);
inline mem_fence operator &(mem_fence LHS, mem_fence RHS);
inline mem_fence operator |(mem_fence LHS, mem_fence RHS);
inline mem_fence operator ^(mem_fence LHS, mem_fence RHS);

// address space pointer classes
template<class T>
class global_ptr;

template<class T>
class local_ptr;

template<class T>
class private_ptr;

template<class T>
class constant_ptr;

template<class T>
using global = see 'global class' section;

template<class T>
using local = see 'local class' section;

template<class T>
using priv = see 'priv class' section;

template<class T>
using constant = see 'constant class' section;

// address space query functions
template<class T>
mem_fence get_mem_fence(T *ptr);

// address space cast functions
template<class T>
T dynamic_asptr_cast(T *ptr) noexcept;

template <class T, class U>
local_ptr<T> static_asptr_cast(local_ptr<U> const& ptr) noexcept;
template <class T, class U>
global_ptr<T> static_asptr_cast(global_ptr<U> const& ptr) noexcept;
template <class T, class U>
constant_ptr<T> static_asptr_cast(constant_ptr<U> const& ptr) noexcept;
template <class T, class U>
private_ptr<T> static_asptr_cast(private_ptr<U> const& ptr) noexcept;

template <class T, class U>
local_ptr<T> reinterpret_asptr_cast(local_ptr<U> const& ptr) noexcept;
template <class T, class U>
global_ptr<T> reinterpret_asptr_cast(global_ptr<U> const& ptr) noexcept;
template <class T, class U>
constant_ptr<T> reinterpret_asptr_cast(constant_ptr<U> const& ptr) noexcept;
template <class T, class U>
private_ptr<T> reinterpret_asptr_cast(private_ptr<U> const& ptr) noexcept;

template <class T>
T* addressof(T& t) noexcept;

}

3.4.2. Explicit address space storage classes

The explicit address space storage classes described in this section are designed to allocate memory in one of the named address spaces: global, local, constant or private.

global class

The variables declared using global<T> class refer to memory objects allocated from the global memory pool (Global Memory Pool section). The global storage class can only be used to declare variables at program, function and class scope. The variables at function and class scope must be declared with static specifier.

If T is a fundamental or an array type, the global class should meet the following requirements:

  • no user provide default constructor

  • default copy and move constructors

  • default copy and move assignment operators

  • address-of operators that return a generic T pointer (T*)

  • conversion operators to a generic T lvalue reference type (T&)

  • assignment const T& operator

  • ptr() methods that return a global_ptr<T> pointer class

If T is a class type, the global class should provide the following interface:

  • the same public interface as T type including constructors and assignment operators address-of operators that return a generic T pointer (T*)

  • conversion operators to a generic T lvalue reference type (T&)

  • ptr() methods that return a global_ptr<T> pointer class

local class

The variables declared using local<T> class refer to memory objects allocated from the local memory pool (Local Memory Pool section). The local storage class can only be used to declare variables at program, kernel and class scope. The variables at class scope must be declared with static specifier.

If T is a fundamental or an array type, the local class should meet the following requirements:

  • no user provide default constructor

  • default copy and move constructors

  • default copy and move assignment operators

  • address-of operators that return a generic T pointer (T*)

  • conversion operators to a generic T lvalue reference type (T&)

  • assignment const T& operator

  • ptr() methods that return a local_ptr<T> pointer class

If T is a class type, the local class should provide the following interface:

  • the same public interface as T type including constructors and assignment operators

  • address-of operators that return a generic T pointer (T*)

  • conversion operators to a generic T lvalue reference type (T&)

  • ptr() methods that return a local_ptr<T> pointer class

priv class

The variables declared using the priv<T> class refer to memory objects allocated from the private memory pool.

The priv storage class cannot be used to declare variables in the program scope, with static specifier or extern specifier.

If T is a fundamental or an array type, the priv class should meet the following requirements:

  • no user provide default constructor

  • default copy and move constructors

  • default copy and move assignment operators

  • address-of operators that return a generic T pointer (T*)

  • conversion operators to a generic T lvalue reference type (T&)

  • assignment const T& operator

  • ptr() methods that return a private_ptr<T> pointer class

If T is a class type, the priv class should provide the following interface:

  • the same public interface as T type including constructors and assignment operators

  • address-of operators that return a generic T pointer (T*)

  • conversion operators to a generic T lvalue reference type (T&)

  • ptr() methods that return a private_ptr<T> pointer class

constant class

The variables declared using the constant<T> class refer to memory objects allocated from the global memory pool and which are accessed inside a kernel(s) as read-only variables. The constant storage class can only be used to declare variables at program, kernel and class scope. The variables at class scope must be declared with static specifier.

The T type must meet the following requirements:

  • T must be constructible at compile time

  • T cannot have any user defined constructors, destructors, methods and operators

If T is a fundamental, array or class type, the constant class should meet the following requirements:

  • no user provide default constructor

  • default copy and move constructors

  • copy and move assignment operators deleted

  • address-of operators that return a constant_ptr<T> pointer class

  • ptr() methods that return a constant_ptr<T> pointer class

  • conversion operators to a constant T lvalue reference type (add_constant_t<T>&)

3.4.3. Explicit address space pointer classes

The explicit address space pointer classes are just like pointers: they can be converted to and from pointers with compatible address spaces, qualifiers and types. Assignment or casting between explicit pointer types of incompatible address spaces is illegal.

All named address spaces are incompatible with all other address spaces, but local, global and private pointers can be converted to standard C++ pointers.

global_ptr class
namespace cl
{
template <class T> class global_ptr
{
public:
    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_global_t<T>& reference;
    typedef const add_global_t<T>& const_reference;
    typedef add_global_t<T>* pointer;
    typedef const add_global_t<T>* const_pointer;

    //constructors:
    constexpr global_ptr() noexcept;
    explicit global_ptr(pointer p) noexcept;
    global_ptr(const global_ptr &r) noexcept;
    global_ptr(global_ptr &&r) noexcept;
    constexpr global_ptr(nullptr_t) noexcept;

    //assignment:
    global_ptr &operator=(const global_ptr &r) noexcept;
    global_ptr &operator=(global_ptr &&r) noexcept;
    global_ptr &operator=(pointer r) noexcept;
    global_ptr &operator=(nullptr_t) noexcept;

    //observers:
    add_lvalue_reference_t<add_global_t<T>> operator*() const noexcept;
    pointer operator->() const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release() noexcept;
    void reset(pointer p = pointer()) noexcept;
    void swap(global_ptr& r) noexcept;

    global_ptr &operator++() noexcept;
    global_ptr operator++(int) noexcept;
    global_ptr &operator--() noexcept;
    global_ptr operator--(int) noexcept;
    global_ptr &operator+=(difference_type r) noexcept;
    global_ptr &operator-=(difference_type r) noexcept;
    global_ptr operator+(difference_type r) noexcept;
    global_ptr operator-(difference_type r) noexcept;
};

template <class T> class global_ptr<T[]>
{
public:
    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_global_t<T>& reference;
    typedef const add_global_t<T>& const_reference;
    typedef add_global_t<T>* pointer;
    typedef const add_global_t<T>* const_pointer;

    //constructors:
    constexpr global_ptr() noexcept;
    explicit global_ptr(pointer p) noexcept;
    global_ptr(const global_ptr &r) noexcept;
    global_ptr(global_ptr &&r) noexcept;
    constexpr global_ptr(nullptr_t) noexcept;

    //assignment:
    global_ptr &operator=(const global_ptr &r) noexcept;
    global_ptr &operator=(global_ptr &&r) noexcept;
    global_ptr &operator=(pointer r) noexcept;
    global_ptr &operator=(nullptr_t) noexcept;

    //observers:
    reference operator[](size_t pos) const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release()noexcept;
    void reset(pointer p) noexcept;
    void reset(nullptr_t p = nullptr) noexcept;
    void swap(global_ptr& r) noexcept;

    global_ptr &operator++() noexcept;
    global_ptr operator++(int) noexcept;
    global_ptr &operator--() noexcept;
    global_ptr operator--(int) noexcept;
    global_ptr &operator+=(difference_type r) noexcept;
    global_ptr &operator-=(difference_type r) noexcept;
    global_ptr operator+(difference_type r) noexcept;
    global_ptr operator-(difference_type r) noexcept;
};

template<class T, class U>
bool operator==(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator!=(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<=(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>=(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;

template<class T>
bool operator==(const global_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator==(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const global_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator!=(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator<(const global_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator>(const global_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const global_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<=(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const global_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>=(nullptr_t, const global_ptr<T> &x) noexcept;

template<class T>
void swap(global_ptr<T>& a, global_ptr<T>& b) noexcept;

}
local_ptr class
namespace cl
{
template <class T> class local_ptr
{
public:
    struct size_type
    {
        explicit constexpr size_type(size_t size);
        operator size_t();
    };

    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_local_t<T>& reference;
    typedef const add_local_t<T>& const_reference;
    typedef add_local_t<T>* pointer;
    typedef const add_local_t<T>* const_pointer;

    //constructors:
    constexpr local_ptr() noexcept;
    explicit local_ptr(pointer p) noexcept;
    local_ptr(const local_ptr &r) noexcept;
    local_ptr(local_ptr &&r) noexcept;
    constexpr local_ptr(nullptr_t) noexcept;

    //assignment:
    local_ptr &operator=(const local_ptr &r) noexcept;
    local_ptr &operator=(local_ptr &&r) noexcept;
    local_ptr &operator=(pointer r) noexcept;
    local_ptr &operator=(nullptr_t) noexcept;

    //observers:
    add_lvalue_reference_t<add_local_t<T>> operator*() const noexcept;
    pointer operator->() const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release() noexcept;
    void reset(pointer p = pointer()) noexcept;
    void swap(local_ptr& r) noexcept;

    local_ptr &operator++() noexcept;
    local_ptr operator++(int) noexcept;
    local_ptr &operator--() noexcept;
    local_ptr operator--(int) noexcept;
    local_ptr &operator+=(difference_type r) noexcept;
    local_ptr &operator-=(difference_type r) noexcept;
    local_ptr operator+(difference_type r) noexcept;
    local_ptr operator-(difference_type r) noexcept;
};

template <class T> class local_ptr<T[]>
{
public:
    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_local_t<T>& reference;
    typedef const add_local_t<T>& const_reference;
    typedef add_local_t<T>* pointer;
    typedef const add_local_t<T>* const_pointer;

    //constructors:
    constexpr local_ptr() noexcept;
    explicit local_ptr(pointer p) noexcept;
    local_ptr(const local_ptr &r) noexcept;
    local_ptr(local_ptr &&r) noexcept;
    constexpr local_ptr(nullptr_t) noexcept;

    //assignment:
    local_ptr &operator=(const local_ptr &r) noexcept;
    local_ptr &operator=(local_ptr &&r) noexcept;
    local_ptr &operator=(pointer r) noexcept;
    local_ptr &operator=(nullptr_t) noexcept;

    //observers:
    reference operator[](size_t pos) const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release()noexcept;
    void reset(pointer p) noexcept;
    void reset(nullptr_t p = nullptr) noexcept;
    void swap(local_ptr& r) noexcept;

    local_ptr &operator++() noexcept;
    local_ptr operator++(int) noexcept;
    local_ptr &operator--() noexcept;
    local_ptr operator--(int) noexcept;
    local_ptr &operator+=(difference_type r) noexcept;
    local_ptr &operator-=(difference_type r) noexcept;
    local_ptr operator+(difference_type r) noexcept;
    local_ptr operator-(difference_type r) noexcept;
};

template<class T, class U>
bool operator==(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator!=(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<=(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>=(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;

template<class T>
bool operator==(const local_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator==(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const local_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator!=(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator<(const local_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator>(const local_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const local_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<=(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const local_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>=(nullptr_t, const local_ptr<T> &x) noexcept;

template<class T>
void swap(local_ptr<T>& a, local_ptr<T>& b) noexcept;

}
private_ptr class
namespace cl
{
template <class T> class private_ptr
{
public:
    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_private_t<T>& reference;
    typedef const add_private_t<T>& const_reference;
    typedef add_private_t<T>* pointer;
    typedef const add_private_t<T>* const_pointer;

    //constructors:
    constexpr private_ptr() noexcept;
    explicit private_ptr(pointer p) noexcept;
    private_ptr(const private_ptr &r) noexcept;
    private_ptr(private_ptr &&r) noexcept;
    constexpr private_ptr(nullptr_t) noexcept;

    //assignment:
    private_ptr &operator=(const private_ptr &r) noexcept;
    private_ptr &operator=(private_ptr &&r) noexcept;
    private_ptr &operator=(pointer r) noexcept;
    private_ptr &operator=(nullptr_t) noexcept;

    //observers:
    add_lvalue_reference_t<add_private_t<T>> operator*() const noexcept;
    pointer operator->() const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release() noexcept;
    void reset(pointer p = pointer()) noexcept;
    void swap(private_ptr& r) noexcept;

    private_ptr &operator++() noexcept;
    private_ptr operator++(int) noexcept;
    private_ptr &operator--() noexcept;
    private_ptr operator--(int) noexcept;
    private_ptr &operator+=(difference_type r) noexcept;
    private_ptr &operator-=(difference_type r) noexcept;
    private_ptr operator+(difference_type r) noexcept;
    private_ptr operator-(difference_type r) noexcept;
};

template <class T> class private_ptr<T[]> {
public:
    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_private_t<T>& reference;
    typedef const add_private_t<T>& const_reference;
    typedef add_private_t<T>* pointer;
    typedef const add_private_t<T>* const_pointer;

    //constructors:
    constexpr private_ptr() noexcept;
    explicit private_ptr(pointer p) noexcept;
    private_ptr(const private_ptr &r) noexcept;
    private_ptr(private_ptr &&r) noexcept;
    constexpr private_ptr(nullptr_t) noexcept;

    //assignment:
    private_ptr &operator=(const private_ptr &r) noexcept;
    private_ptr &operator=(private_ptr &&r) noexcept;
    private_ptr &operator=(pointer r) noexcept;
    private_ptr &operator=(nullptr_t) noexcept;

    //observers:
    reference operator[](size_t pos) const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release()noexcept;
    void reset(pointer p) noexcept;
    void reset(nullptr_t p = nullptr) noexcept;
    void swap(private_ptr& r) noexcept;

    private_ptr &operator++() noexcept;
    private_ptr operator++(int) noexcept;
    private_ptr &operator--() noexcept;
    private_ptr operator--(int) noexcept;
    private_ptr &operator+=(difference_type r) noexcept;
    private_ptr &operator-=(difference_type r) noexcept;
    private_ptr operator+(difference_type r) noexcept;
    private_ptr operator-(difference_type r) noexcept;
};

template<class T, class U>
bool operator==(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator!=(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<=(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>=(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;

template<class T>
bool operator==(const private_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator==(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const private_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator!=(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator<(const private_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator>(const private_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const private_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<=(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const private_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>=(nullptr_t, const private_ptr<T> &x) noexcept;

template<class T>
void swap(private_ptr<T>& a, private_ptr<T>& b) noexcept;

}
constant_ptr class
namespace cl
{
template <class T> class constant_ptr
{
public:
    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_constant_t<T>& reference;
    typedef const add_constant_t<T>& const_reference;
    typedef add_constant_t<T>* pointer;
    typedef const add_constant_t<T>* const_pointer;

    //constructors:
    constexpr constant_ptr() noexcept;
    explicit constant_ptr(pointer p) noexcept;
    constant_ptr(const constant_ptr &r) noexcept;
    constant_ptr(constant_ptr &&r) noexcept;
    constexpr constant_ptr(nullptr_t) noexcept;

    //assignment:
    constant_ptr &operator=(const constant_ptr &r) noexcept;
    constant_ptr &operator=(constant_ptr &&r) noexcept;
    constant_ptr &operator=(pointer r) noexcept;
    constant_ptr &operator=(nullptr_t) noexcept;

    //observers:
    add_lvalue_reference_t<add_constant_t<T>> operator*() const noexcept;
    pointer operator->() const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release() noexcept;
    void reset(pointer p = pointer()) noexcept;
    void swap(constant_ptr& r) noexcept;

    constant_ptr &operator++() noexcept;
    constant_ptr operator++(int) noexcept;
    constant_ptr &operator--() noexcept;
    constant_ptr operator--(int) noexcept;
    constant_ptr &operator+=(difference_type r) noexcept;
    constant_ptr &operator-=(difference_type r) noexcept;
    constant_ptr operator+(difference_type r) noexcept;
    constant_ptr operator-(difference_type r) noexcept;
};

template <class T> class constant_ptr<T[]>
{
public:
    //types:
    typedef T element_type;
    typedef ptrdiff_t difference_type;
    typedef add_constant_t<T>& reference;
    typedef const add_constant_t<T>& const_reference;
    typedef add_constant_t<T>* pointer;
    typedef const add_constant_t<T>* const_pointer;

    //constructors:
    constexpr constant_ptr() noexcept;
    explicit constant_ptr(pointer p) noexcept;
    constant_ptr(const constant_ptr &r) noexcept;
    constant_ptr(constant_ptr &&r) noexcept;
    constexpr constant_ptr(nullptr_t) noexcept;

    //assignment:
    constant_ptr &operator=(const constant_ptr &r) noexcept;
    constant_ptr &operator=(constant_ptr &&r) noexcept;
    constant_ptr &operator=(pointer r) noexcept;
    constant_ptr &operator=(nullptr_t) noexcept;

    //observers:
    reference operator[](size_t pos) const noexcept;
    pointer get() const noexcept;
    explicit operator bool() const noexcept;

    //modifiers:
    pointer release()noexcept;
    void reset(pointer p) noexcept;
    void reset(nullptr_t p = nullptr) noexcept;
    void swap(constant_ptr& r) noexcept;

    constant_ptr &operator++() noexcept;
    constant_ptr operator++(int) noexcept;
    constant_ptr &operator--() noexcept;
    constant_ptr operator--(int) noexcept;
    constant_ptr &operator+=(difference_type r) noexcept;
    constant_ptr &operator-=(difference_type r) noexcept;
    constant_ptr operator+(difference_type r) noexcept;
    constant_ptr operator-(difference_type r) noexcept;
};

template<class T, class U>
bool operator==(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;
template<class T, class U>
bool operator!=(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<=(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>=(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;

template<class T>
bool operator==(const constant_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator==(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const constant_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator!=(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator<(const constant_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator>(const constant_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const constant_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator<=(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const constant_ptr<T> &x, nullptr_t) noexcept;
template<class T>
bool operator>=(nullptr_t, const constant_ptr<T> &x) noexcept;

template<class T>
void swap(constant_ptr<T>& a, constant_ptr<T>& b) noexcept;

}
Constructors
constexpr global_ptr() noexcept;
constexpr local_ptr() noexcept;
constexpr private_ptr() noexcept;
constexpr constant_ptr() noexcept;

Constructs an object which points to nothing.

explicit global_ptr(pointer p) noexcept;
explicit local_ptr(pointer p) noexcept;
explicit private_ptr(pointer p) noexcept;
explicit constant_ptr(pointer p) noexcept;

Constructs an object which points to p.

global_ptr(const global_ptr &) noexcept;
local_ptr(const local_ptr &) noexcept;
private_ptr(const private_ptr &) noexcept;
constant_ptr(const constant_ptr &) noexcept;

Copy constructor.

global_ptr(global_ptr &&r) noexcept;
local_ptr(local_ptr &&r) noexcept;
private_ptr(private_ptr &&r) noexcept;
constant_ptr(constant_ptr &&r) noexcept;

Move constructor.

constexpr global_ptr(nullptr_t) noexcept;
constexpr local_ptr(nullptr_t) noexcept;
constexpr private_ptr(nullptr_t) noexcept;
constexpr constant_ptr(nullptr_t) noexcept;

Constructs an object initialized with nullptr.

Assignment operators
global_ptr &operator=(const global_ptr &r) noexcept;
local_ptr &operator=(const local_ptr &r) noexcept;
private_ptr &operator=(const private_ptr &r) noexcept;
constant_ptr &operator=(const constant_ptr &r) noexcept;

Copy assignment operator

global_ptr &operator=(global_ptr &&r) noexcept;
local_ptr &operator=(local_ptr &&r) noexcept;
private_ptr &operator=(private_ptr &&r) noexcept;
constant_ptr &operator=(constant_ptr &&r) noexcept;

Move assignment operator

global_ptr &operator=(pointer r) noexcept;
local_ptr &operator=(pointer r) noexcept;
private_ptr &operator=(pointer r) noexcept;
constant_ptr &operator=(pointer r) noexcept;

Assigns r pointer to the stored pointer

global_ptr &operator=(nullptr_t) noexcept;
local_ptr &operator=(nullptr_t) noexcept;
private_ptr &operator=(nullptr_t) noexcept;
constant_ptr &operator=(nullptr_t) noexcept;

Assigns nullptr to the stored pointer

Observers
add_lvalue_reference_t<add_global_t<T>> operator*() const noexcept;
add_lvalue_reference_t<add_local_t<T>> operator*() const noexcept;
add_lvalue_reference_t<add_private_t<T>> operator*() const noexcept;
add_lvalue_reference_t<add_constant_t<T>> operator*() const noexcept;

Returns *get(). It is only defined in single object version of the explicit address space pointer class. The result of this operator is undefined if get() == nullptr.

pointer operator->() const noexcept;

Returns get(). It is only defined in single object version of the explicit address space pointer class. The result of this operator is undefined if get() == nullptr.

reference operator[](size_t pos) const noexcept;

Returns get()[pos]. The subscript operator is only defined in specialized global_ptr<T[]>, local_ptr<T[]>, private_ptr<T[]> and constant_ptr<T[]> version for array types. The result of this operator is undefined if pos >= the number of elements in the array to which the stored pointer points.

pointer get() const noexcept;

Returns the stored pointer.

explicit operator bool() const noexcept;

Returns get() != nullptr.

Modifiers
pointer release() noexcept;

Assigns nullptr to the stored pointer and returns the value get() had at the start of the call to release.

void reset(pointer p = pointer()) noexcept;

Assigned p to the stored pointer. It is only defined in single object version of the explicit address space pointer class

void reset(pointer p) noexcept;

Assigned p to the stored pointer. It is only defined in specialized global_ptr<T[]>, local_ptr<T[]>, private_ptr<T[]> and constant_ptr<T[]> version for array types.

void reset(nullptr_t p = nullptr) noexcept;

Equivalent to reset(pointer()). It is only defined in specialized global_ptr<T[]>, local_ptr<T[]>, private_ptr<T[]> and constant_ptr<T[]> version for array types.

void swap(global_ptr& r) noexcept;
void swap(local_ptr& r) noexcept;
void swap(private_ptr& r) noexcept;
void swap(constant_ptr& r) noexcept;

Invokes swap on the stored pointers.

global_ptr &operator++() noexcept;
local_ptr &operator++() noexcept;
private_ptr &operator++() noexcept;
constant_ptr &operator++() noexcept;

Prefix increment operator. Increments the stored pointer by one.

global_ptr operator++(int) noexcept;
local_ptr operator++(int) noexcept;
private_ptr operator++(int) noexcept;
constant_ptr operator++(int) noexcept;

Postfix increment operator. Increments the stored pointer by one.

global_ptr &operator--() noexcept;
local_ptr &operator--() noexcept;
private_ptr &operator--() noexcept;
constant_ptr &operator--() noexcept;

Prefix decrement operator. Decrements the stored pointer by one.

global_ptr operator--(int) noexcept;
local_ptr operator--(int) noexcept;
private_ptr operator--(int) noexcept;
constant_ptr operator--(int) noexcept;

Postfix decrement operator. Decrements the stored pointer by one.

global_ptr &operator+=(difference_type r) noexcept;
local_ptr &operator+=(difference_type r) noexcept;
private_ptr &operator+=(difference_type r) noexcept;
constant_ptr &operator+=(difference_type r) noexcept;

Adds r to the stored pointer and returns *this.

global_ptr &operator-=(difference_type r) noexcept;
local_ptr &operator-=(difference_type r) noexcept;
private_ptr &operator-=(difference_type r) noexcept;
constant_ptr &operator-=(difference_type r) noexcept;

Subtracts r to the stored pointer and returns *this.

global_ptr operator+(difference_type r) noexcept;
local_ptr operator+(difference_type r) noexcept;
private_ptr operator+(difference_type r) noexcept;
constant_ptr operator+(difference_type r) noexcept;

Adds r to the stored pointer and returns the value *this has at the start of operator+.

global_ptr operator-(difference_type r) noexcept;
local_ptr operator-(difference_type r) noexcept;
private_ptr operator-(difference_type r) noexcept;
constant_ptr operator-(difference_type r) noexcept;

Subtracts r to the stored pointer and returns the value *this has at the start of operator-.

Non-member functions
template<class T, class U>
bool operator==(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator==(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator==(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator==(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;

Comparison operator== for the explicit address space pointer classes.

template<class T>
bool operator==(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator==(const global_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator==(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator==(const local_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator==(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator==(const private_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator==(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator==(const constant_ptr<T> &x, nullptr_t) noexcept;

Comparison operator== for the explicit address space pointer classes with a nullptr_t.

template<class T, class U>
bool operator!=(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator!=(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator!=(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator!=(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;

Comparison operator!= for the explicit address space pointer classes.

template<class T>
bool operator!=(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const global_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator!=(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const local_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator!=(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const private_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator!=(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator!=(const constant_ptr<T> &x, nullptr_t) noexcept;

Comparison operator!= for the explicit address space pointer classes with a nullptr_t.

template<class T, class U>
bool operator<(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;

Comparison operator< for the explicit address space pointer classes.

template<class T>
bool operator<(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator<(const global_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator<(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator<(const local_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator<(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator<(const private_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator<(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator<(const constant_ptr<T> &x, nullptr_t) noexcept;

Comparison operator< for the explicit address space pointer classes with a nullptr_t.

template<class T, class U>
bool operator>(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;

Comparison operator> for the explicit address space pointer classes.

template<class T>
bool operator>(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator>(const global_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator>(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator>(const local_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator>(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator>(const private_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator>(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator>(const constant_ptr<T> &x, nullptr_t) noexcept;

Comparison operator> for the explicit address space pointer classes with a nullptr_t.

template<class T, class U>
bool operator<=(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<=(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<=(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator<=(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;

Comparison operator<= for the explicit address space pointer classes.

template<class T>
bool operator<=(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const global_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator<=(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const local_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator<=(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const private_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator<=(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator<=(const constant_ptr<T> &x, nullptr_t) noexcept;

Comparison operator<= for the explicit address space pointer classes with a nullptr_t.

template<class T, class U>
bool operator>=(const global_ptr<T> &a, const global_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>=(const local_ptr<T> &a, const local_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>=(const private_ptr<T> &a, const private_ptr<U> &b) noexcept;
template<class T, class U>
bool operator>=(const constant_ptr<T> &a, const constant_ptr<U> &b) noexcept;

Comparison operator>= for the explicit address space pointer classes.

template<class T>
bool operator>=(nullptr_t, const global_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const global_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator>=(nullptr_t, const local_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const local_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator>=(nullptr_t, const private_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const private_ptr<T> &x, nullptr_t) noexcept;

template<class T>
bool operator>=(nullptr_t, const constant_ptr<T> &x) noexcept;
template<class T>
bool operator>=(const constant_ptr<T> &x, nullptr_t) noexcept;

Comparison operator>= for the explicit address space pointer classes with a nullptr_t.

template<class T>
void swap(global_ptr<T>& a, global_ptr<T>& b) noexcept;
template<class T>
void swap(local_ptr<T>& a, local_ptr<T>& b) noexcept;
template<class T>
void swap(private_ptr<T>& a, private_ptr<T>& b) noexcept;
template<class T>
void swap(constant_ptr<T>& a, constant_ptr<T>& b) noexcept;

Calls a.swap(b)

3.4.4. Other functions

The OpenCL C++ address space library implements the address space query and cast functions. The cast function that allows to explicitly convert from a pointer in the generic address space to a pointer in the global, local and private address space.

get_mem_fence
template <class T>
mem_fence get_mem_fence (T *ptr);

Returns the mem_fence value for ptr. ptr must be the generic pointer and it cannot be the explicit address space pointer (global_ptr<>, local_ptr<>, private_ptr<> and constant_ptr<>) or pointer to address space storage class (global<>*, local<>*, priv<>* and constant<>*).

dynamic_asptr_cast
template<class T, class U>
T dynamic_asptr_cast(U *ptr);

Returns a pointer that points to a region in the address space pointer class specified in T if dynamic_asptr_cast can cast ptr to the specified address space. Otherwise it returns nullptr. Only global_ptr<U>, local_ptr<U> and private_ptr<U> are valid T template arguments. ptr must be the generic pointer and it cannot be the explicit address space pointer (global_ptr<>, local_ptr<>, private_ptr<> and constant_ptr<>) or pointer to address space storage class (global<>*, local<>*, priv<>* and constant<>*).

static_asptr_cast
template <class T, class U>
local_ptr<T> static_asptr_cast(local_ptr<U> const& ptr) noexcept;
template <class T, class U>
global_ptr<T> static_asptr_cast(global_ptr<U> const& ptr) noexcept;
template <class T, class U>
constant_ptr<T> static_asptr_cast(constant_ptr<U> const& ptr) noexcept;
template <class T, class U>
private_ptr<T> static_asptr_cast(private_ptr<U> const& ptr) noexcept;

The expression static_cast(r.get()) shall be well formed.

reinterpret_asptr_cast
template <class T, class U>
local_ptr<T> reinterpret_asptr_cast(local_ptr<U> const& ptr) noexcept;
template <class T, class U>
global_ptr<T> reinterpret_asptr_cast(global_ptr<U> const& ptr) noexcept;
template <class T, class U>
constant_ptr<T> reinterpret_asptr_cast(constant_ptr<U> const& ptr) noexcept;
template <class T, class U>
private_ptr<T> reinterpret_asptr_cast(private_ptr<U> const& ptr) noexcept;

The expression reinterpret_cast(r.get()) shall be well formed.

3.4.5. Restrictions

1. The objects allocated using global, local and constant storage classes can be passed to a function only by reference or pointer

#include <opencl_memory>
#include <opencl_array>
using namespace cl;

kernel void foo(global<array<int, 5>> val) {
    // Error: variable in the global
    //        address space passed by value
    //...
}

kernel void bar(global<array<int, 5>> &val) { // OK
    //...
}

kernel void foobar(global_ptr<int> val) { // OK
    //...
}

kernel void barfoo(global_ptr<int[]> val) { // OK
    //...
}

2. The global, local, priv and constant storage classes cannot be used as a return type of function

#include <opencl_memory>
#include <opencl_array>
using namespace cl;

global<array<int, 5>> programVar;

global<array<int, 5>> foo() { // error: variable in the global
                             // address space returned by value
    return programVar;
}

global<array<int, 5>> &bar() { // OK
    return programVar;
}

3. The global, local and constant storage classes cannot be used to declare class members unless static keyword is used

#include <opencl_memory>
#include <opencl_array>
using namespace cl;

struct Foo {
    global<int> a; // error: class members cannot be qualified
                   // with address space
    local<array<int, 5>> b; // error: class members cannot be
                           // qualified with address space

    static global<int> c; // OK
    static local<array<int, 5>> d; // OK
};

4. The global storage class cannot be used to declare variables at function scope unless static keyword is used

#include <opencl_memory>
using namespace cl;

kernel void foo() {
    global<int> b; // error
    static global<int> b; // OK
}

5. The local variables can be declared only at kernel function scope, program scope and with static keyword

#include <opencl_memory>
#include <opencl_array>
using namespace cl;

// An array of 5 ints allocated in
// local address space.
local<array<int, 5>> a = { 10 }; // OK: program scope local
                                 // variable

kernel void foo() {
    // A single int allocated in
    // local address space
    local<int> b{1}; // OK
    static local<int> d{1}; // OK

    if(get_local_id(0) == 0) {
        // example of variable in local address space
        // but not declared at __kernel function scope.
        local<int> c{2}; // not allowed
  }
}

6. The objects allocated using global storage class must be initialized with the constant expression arguments

#include <opencl_memory>
#include <opencl_work_item>
using namespace cl;

kernel void foo() {
    int a = get_local_id(0);
    static global<int> b{a}; // undefined behavior
    static global<int> c{0}; // OK
}

7. The constructors of objects allocated using constant storage class must be constant expression

#include <opencl_memory>
#include <opencl_work_item>
using namespace cl;

constant<int> b{0}; // OK

kernel void foo() {
    int a = get_local_id(0);
    static constant<int> b{a}; // undefined behavior
}

8. Constant variables must be initialized

#include <opencl_memory>
using namespace cl;

constant<int> a{0}; // OK
constant<int> b; // error: constant variable must be initialized

kernel void foo() {
    static constant<int> c{0}; // OK
    static constant<int> d; // error: constant variable must be initialized
}

9. The priv storage class cannot be used to declare variables in the program scope or with static specifier.

#include <opencl_memory>
using namespace cl;

priv<int> a{0}; // error: priv variable in program scope

kernel void foo() {
    static priv<int> c{0}; // error: priv variable with static specifier
    priv<int> d; // OK
}

10. T type used in constant storage class cannot have any user defined constructors, destructors, operators and methods

#include <opencl_memory>
using namespace cl;

struct bar {
    int get() { return 10; }
};

kernel void foo() {
    constant<bar> a;
    int b = a.get() // undefined behavior
}

11. T type used in global, local, priv and constant storage class cannot be sealed class

#include <opencl_memory>
using namespace cl;

struct bar final { };

kernel void foo() {
    local<bar> a; // error: bar is marked as final
}

12. Using work-group barriers or relying on a specific work-item to be executed in constructors and destructors of global and local objects can result in undefined behavior

#include <opencl_memory>
#include <opencl_synchronization>
using namespace cl;

struct Foo {
    Foo() {
        work_group_barrier(mem_fence::local); // not allowed
    }

    ~Foo() {
        if(get_local_id(0) != 5) { // not allowed
            while(1) {}
        }
    }
};

kernel void bar() {
    local<Foo> a;
}

13. All local (address-space) variable declarations in kernel-scope shall be declared before any explicit return statement. Declaring local variable after return statement may cause undefined behavior. Implementation is encouraged to generate at least a warning in such cases.

3.4.6. Examples

Example 1

Example of passing an explicit address space storage object to a kernel.

#include <opencl_memory>
using namespace cl;

kernel void foo(global<int> *arg) {
    //...
}
Example 2

Example of passing an explicit address space pointer object to a kernel.

#include <opencl_memory>
using namespace cl;

kernel void foo(global_ptr<int> arg) {
    //...
}
Example 3

Example of casting a generic pointer to an explicit address space pointer object. This is the runtime operation and the dynamic_asptr_cast can fail.

#include <opencl_memory>
using namespace cl;

kernel void foo(global_ptr<int> arg) {
    int *ptr = arg;
    auto globalPtr = dynamic_asptr_cast<global_ptr<int>>(ptr);
    if(globalPtr)
    {
        //...
    }
}
Example 4

Example of using an array with an explicit address space storage class.

#include <opencl_memory>
#include <opencl_array>
#include <opencl_work_item>
using namespace cl;

kernel void foo() {
    local<array<int, 2>> localArray;
    if(get_local_id(0) == 0) {
        for(auto it = localArray.begin(); it != localArray.end(); ++it)
            *it = 0;
    }
    work_group_barrier(mem_fence::local);
    localArray[0] += 1;
}
Example 5

Example of using a fundamental type with an explicit address space storage class.

#include <opencl_memory>
#include <opencl_work_item>
using namespace cl;

kernel void foo() {
    local<int> a;
    if(get_local_id(0) == 0)
        a = 1;

    work_group_barrier(mem_fence::local);
    if(get_local_id(0) == 1)
        a += 1;
}

3.5. Specialization Constants Library

The specialization constants are objects that will not have known constant values until after initial generation of a SPIR-V module. Such objects are called specialization constants. Application might provide values for the specialization constants that will be used when SPIR-V program is built.

3.5.1. Header <opencl_spec_constant> Synopsis

namespace cl
{
template<class T, unsigned int ID>
struct spec_constant
{
    spec_constant() = delete;
    spec_constant(const spec_constant &) = default;
    spec_constant(spec_constant&&) = default;

    constexpr spec_constant(const T& value);

    spec_constant& operator=(const spec_constant&) = delete;
    spec_constant& operator=(spec_constant&&) = delete;

    const T& get() const noexcept;

    operator const T&() const noexcept;
};

template<class T, unsigned int ID>
const T& get(const spec_constant<T, ID> &r) noexcept;

}

3.5.2. spec_constant class methods and get function

spec_constant::spec_constant
constexpr spec_constant(const T& value);

Constructor of spec_constant class. The value parameter is a default value of the specialization constant that will be used if a value is not set by the host API. It must be a literal value.

get
const T& get() const noexcept;

operator const T&() const noexcept;

template<class T, unsigned int ID>
const T& get(const spec_constant<T, ID> &r) noexcept;

Return a value of specialization constant. If an object is not specialized from the host, the default value will be returned.

3.5.3. Requirements

Specialization constant variables cannot be defined constexpr.

Data

Template parameter T in spec_constant class template denotes the data type of specialization constant. The type T must be integral or floating point type.

ID

Template parameter ID in spec_constant class template denotes an unique ID of the specialization constant that can be used to set a value from the host API. The value of ID must be unique within this compilation unit and across any other SPIR-V modules that it is linked with.

3.5.4. Examples

Example 1

Example of using the specialization constant in the kernel.

#include <opencl_spec_constant>
cl::spec_constant<int, 1> spec1{ 255 };
constexpr cl::spec_constant<int, 2> spec2{ 255 }; // error, constexpr specialization
                                                  // constant variables are not allowed

kernel void myKernel()
{
  if(cl::get(spec1) == 255)
  {
        // do something if a default value is used
  }
  else
  {
    // do something if the spec constant was specialized by the host
  }
}
Example 2

Example of specializing one of the dimensions in cl::required_work_group_size attribute.

#include <opencl_spec_constant>
cl::spec_constant<int, 1> spec1{ 512 };

[[cl::required_work_group_size(spec1, 1, 1)]]
kernel void myKernel()
{
   //...
}

3.6. Half Wrapper Library

The OpenCL C++ programming language implements a wrapper class for the built-in half data type (Built-in Half Data Type section). The class methods perform implicit vload_half and vstore_half operations from Vector Data Load and Store Functions section.

3.6.1. Header <opencl_half> Synopsis

namespace cl {
struct fp16
{
    fp16() = default;
    fp16(const fp16 &) = default;
    fp16(fp16 &&) = default;
    fp16 &operator=(const fp16 &) = default;
    fp16 &operator=(fp16 &&) = default;

    explicit operator bool() const noexcept;

#ifdef cl_khr_fp16
    fp16(half r) noexcept;
    fp16 &operator=(half r) noexcept;
    operator half() const noexcept;
#endif

    fp16(float r) noexcept;
    fp16 &operator=(float r) noexcept;
    operator float() const noexcept;

#ifdef cl_khr_fp64
    fp16(double r) noexcept;
    fp16 &operator=(double r) noexcept;
    operator double() const noexcept;
#endif

    fp16 &operator++() noexcept;
    fp16 operator++(int) noexcept;
    fp16 &operator--() noexcept;
    fp16 operator--(int) noexcept;
    fp16 &operator+=(const fp16 &r) noexcept;
    fp16 &operator-=(const fp16 &r) noexcept;
    fp16 &operator*=(const fp16 &r) noexcept;
    fp16 &operator/=(const fp16 &r) noexcept;
};

bool operator==(const fp16& lhs, const fp16& rhs) noexcept;
bool operator!=(const fp16& lhs, const fp16& rhs) noexcept;
bool operator< (const fp16& lhs, const fp16& rhs) noexcept;
bool operator> (const fp16& lhs, const fp16& rhs) noexcept;
bool operator<=(const fp16& lhs, const fp16& rhs) noexcept;
bool operator>=(const fp16& lhs, const fp16& rhs) noexcept;
fp16 operator+(const fp16& lhs, const fp16& rhs) noexcept;
fp16 operator-(const fp16& lhs, const fp16& rhs) noexcept;
fp16 operator*(const fp16& lhs, const fp16& rhs) noexcept;
fp16 operator/(const fp16& lhs, const fp16& rhs) noexcept;

}

3.6.2. Constructors

fp16(const half &r) noexcept;

Constructs an object with a half built-in type.

fp16(const float &r) noexcept;

Constructs an object with a float built-in type. If the cl_khr_fp16 extension is not supported, vstore_half built-in function is called with the default rounding mode.

fp16(const double &r) noexcept;

Constructs an object with a double built-in type. If the cl_khr_fp16 extension is not supported, vstore_half built-in function is called with the default rounding mode. The constructor is only present if the double precision support is enabled.

3.6.3. Assignment operators

fp16 &operator=(const half &r) noexcept;

Assigns r to the stored half type.

fp16 &operator=(const float &r) noexcept;

Assigns r to the stored half type. If the cl_khr_fp16 extension is not supported, vstore_half built-in function is called with the default rounding mode.

fp16 &operator=(const double &r) noexcept;

Assigns r to the stored half type. If the cl_khr_fp16 extension is not supported, vstore_half built-in function is called with the default rounding mode. The operator is only present if the double precision support is enabled.

3.6.4. Conversion operators

explicit operator bool() const noexcept;

Returns m != 0.0h. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

operator half() const noexcept;

Conversion operator to the built-in half type.

operator float() const noexcept;

Conversion operator. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

operator double() const noexcept;

Conversion operator. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called. The operator is only present if the double precision support is enabled.

3.6.5. Arithmetic operations

fp16 &operator++() noexcept;

Pre-increment operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 operator++(int) noexcept;

Post-increment operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 &operator--() noexcept;

Pre-decrement operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 operator--(int) noexcept;

Pre-decrement operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 &operator+=(const fp16 &r) noexcept;

Addition operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 &operator-=(const fp16 &r) noexcept;

Subtract operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 &operator*=(const fp16 &r) noexcept;

Multiplication operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 &operator/=(const fp16 &r) noexcept;

Division operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

3.6.6. Non-member functions

bool operator==(const fp16& lhs, const fp16& rhs) noexcept;

Comparison operator ==. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

bool operator!=(const fp16& lhs, const fp16& rhs) noexcept;

Comparison operator !=. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

bool operator< (const fp16& lhs, const fp16& rhs) noexcept;

Comparison operator <. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

bool operator> (const fp16& lhs, const fp16& rhs) noexcept;

Comparison operator >. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

bool operator<=(const fp16& lhs, const fp16& rhs) noexcept;

Comparison operator ⇐. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

bool operator>=(const fp16& lhs, const fp16& rhs) noexcept;

Comparison operator >=. If the cl_khr_fp16 extension is not supported, vload_half built-in function is called.

fp16 operator+(const fp16& lhs, const fp16& rhs) noexcept;

Addition operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 operator-(const fp16& lhs, const fp16& rhs) noexcept;

Subtract operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 operator*(const fp16& lhs, const fp16& rhs) noexcept;

Multiplication operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

fp16 operator/(const fp16& lhs, const fp16& rhs) noexcept;

Division operator. If the cl_khr_fp16 extension is not supported, vload_half and vstore_half built-in functions are called.

3.7. Vector Wrapper Library

The OpenCL C++ programming language implements a vector wrapper type that works efficiently on the OpenCL devices. The vector class supports methods that allow construction of a new vector from a swizzled set of component elements or from a built-in vector type. The vector class can be converted to a corresponding built-in vector type.

The Size parameter can be one of: 2, 3, 4, 8 or 16. Any other value should produce a compilation failure. The element type parameter T, must be one of the basic scalar types defined in Device built-in scalar data types table except void type.

3.7.1. Header <opencl_vec> Synopsis

namespace cl {
static constexpr size_t undef_channel = static_cast<size_t>(-1);
enum class channel : size_t { r = 0, g = 1, b = 2, a = 3, x = 0, y = 1, z = 2, w = 3, undef = undef_channel };

template<class T, size_t Size>
struct vec
{
    using element_type = T;
    using vector_type = make_vector_t<T, Size>;
    static constexpr size_t size = Size;

    vec( ) = default;
    vec(const vec &) = default;
    vec(vec &&) = default;

    vec(const vector_type &r) noexcept;
    vec(vector_type &&r) noexcept;

    template <class... Params>
    vec(Params... params) noexcept;

    vec& operator=(const vec &) = default;
    vec& operator=(vec &&) = default;

    vec& operator=(const vector_type &r) noexcept;
    vec& operator=(vector_type &&r) noexcept;

    operator vector_type() const noexcept;

    vec& operator++() noexcept;
    vec& operator++(int) noexcept;
    vec& operator--() noexcept;
    vec& operator--(int) noexcept;
    vec& operator+=(const vec &r) noexcept;
    vec& operator+=(const element_type &r) noexcept;
    vec& operator-=(const vec &r) noexcept;
    vec& operator-=(const element_type &r) noexcept;
    vec& operator*=(const vec &r) noexcept;
    vec& operator*=(const element_type &r) noexcept;
    vec& operator/=(const vec &r) noexcept;
    vec& operator/=(const element_type &r) noexcept;
    vec& operator%=(const vec &r) noexcept;
    vec& operator%=(const element_type &r) noexcept;

    template <size_t... Sizes>
    auto swizzle() noexcept;

    template <size_t... Sizes>
    auto swizzle() const noexcept;

#ifdef SIMPLE_SWIZZLES
    auto x() noexcept;
...
    auto xyzw() noexcept;
...
    auto zzzz() noexcept;
#endif
};

template <size_t... Swizzle, class Vec>
auto swizzle(Vec& v);
template <channel... Swizzle, class Vec>
auto swizzle(Vec& v);

template<class T, size_t Size>
make_vector_t<bool, Size> operator==(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
make_vector_t<bool, Size> operator!=(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
make_vector_t<bool, Size> operator<(const vec<T, Size> &lhs,
                                    const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
make_vector_t<bool, Size> operator>(const vec<T, Size> &lhs,
                                    const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
make_vector_t<bool, Size> operator<=(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
make_vector_t<bool, Size> operator>=(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator+(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator+(const vec<T, Size> &lhs, const T &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator+(const T &lhs, const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator-(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator-(const vec<T, Size> &lhs, const T &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator-(const T &lhs, const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator*(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator*(const vec<T, Size> &lhs, const T &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator*(const T &lhs, const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator/(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator/(const vec<T, Size> &lhs, const T &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator/(const T &lhs, const vec<T, Size> &rhs) noexcept;

}

3.7.2. Constructors

vec(const vector_type &r) noexcept;

Copy constructor. Constructs an object with the corresponding built-in vector type.

vec(vector_type &&r) noexcept;

Move constructor. Constructs an object with the corresponding built-in vector type.

template <class... Params>
vec(Params... params) noexcept;

Constructs a vector object from a swizzled set of component elements.

3.7.3. Assignment operators

vec& operator=(const vector_type &r) noexcept;

Copy assignment operator. The operator assigns a corresponding built-in vector type.

vec& operator=(vector_type &&r) noexcept;

Move assignment operator. The operator assigns a corresponding built-in vector type.

3.7.4. Conversion operators

operator vector_type() const noexcept;

Conversion operator. The operator converts from the vector wrapper class to a corresponding built-in vector type.

3.7.5. Arithmetic operations

vec& operator++() noexcept;

Pre-increment operator.

vec& operator++(int) noexcept;

Post-increment operator.

vec& operator--() noexcept;

Pre-decrement operator.

vec& operator--(int) noexcept;

Post-decrement operator.

vec& operator+=(const vec &r) noexcept;
vec& operator+=(const element_type &r) noexcept;

Add each element of r to the respective element of the current vector in-place.

vec& operator-=(const vec &r) noexcept;
vec& operator-=(const element_type &r) noexcept;

Subtract each element of r from the respective element of the current vector in-place.

vec& operator*=(const vec &r) noexcept;
vec& operator*=(const element_type &r) noexcept;

Multiply each element of r by the respective element of the current vector in-place.

vec& operator/=(const vec &r) noexcept;
vec& operator/=(const element_type &r) noexcept;

Divide each element of the current vector in-place by the respective element of r.

vec& operator%=(const vec &r) noexcept;
vec& operator%=(const element_type &r) noexcept;

Remainder of each element of the current vector in-place by the respective element of r.

3.7.6. Swizzle methods

All swizzle methods return a temporary object representing a swizzled set of the original vector’s member elements. The swizzled vector may be used as a source (rvalue) and destination (lvalue). In order to enable the r-value and lvalue swizzling to work, this returns an intermediate swizzled-vector class, which can be implicitly converted to a vector (rvalue evaluation) or assigned to.

template <size_t... Sizes>
auto swizzle() noexcept;

template <size_t... Sizes>
auto swizzle() const noexcept;

Returns a vector swizzle. The number of template parameters specified in Sizes must be from 1 to Size. Sizes parameters must be channel values: channel::r, channel::b, . Swizzle letters may be repeated or re-ordered.

auto x() noexcept;
...
auto xyzw() noexcept;
...
auto zzzz() noexcept;

Returns a swizzle. These swizzle methods are only generated if the user defined the SIMPLE_SWIZZLES macro before including opencl_vec header.

3.7.7. Non-member functions

template <size_t... Swizzle, class Vec>
auto swizzle(Vec& v);

template <channel... Swizzle, class Vec>
auto swizzle(Vec& v);

Returns a vector swizzle. The number of template parameters specified in Sizes must be from 1 to Size. Sizes parameters must be channel values: channel::r, channel::b, . Swizzle letters may be repeated or re-ordered.

template<class T, size_t Size>
make_vector_t<bool, Size> operator==(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;

Return true if all elements of rhs compare equal to the respective element of lhs.

template<class T, size_t Size>
make_vector_t<bool, Size> operator!=(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;

Return true if any one element of rhs does not compare equal to the respective element of lhs.

template<class T, size_t Size>
make_vector_t<bool, Size> operator<(const vec<T, Size> &lhs,
                                    const vec<T, Size> &rhs) noexcept;

Return true if all elements of lhs are less than the respective element of rhs.

template<class T, size_t Size>
make_vector_t<bool, Size> operator>(const vec<T, Size> &lhs,
                                    const vec<T, Size> &rhs) noexcept;

Return true if all elements of lhs are greater than the respective element of rhs.

template<class T, size_t Size>
make_vector_t<bool, Size> operator<=(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;

Return true if all elements of lhs are less than or equal to the respective element of rhs.

template<class T, size_t Size>
make_vector_t<bool, Size> operator>=(const vec<T, Size> &lhs,
                                     const vec<T, Size> &rhs) noexcept;

Return true if all elements of lhs are greater than or equal to the respective element of rhs.

template<class T, size_t Size>
vec<T, Size> operator+(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;

template<class T, size_t Size>
vec<T, Size> operator+(const vec<T, Size> &lhs, const T &rhs) noexcept;

template<class T, size_t Size>
vec<T, Size> operator+(const T &lhs, const vec<T, Size> &rhs) noexcept;

Add each element of rhs to the respective element of lhs.

template<class T, size_t Size>
vec<T, Size> operator-(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator-(const vec<T, Size> &lhs, const T &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator-(const T &lhs, const vec<T, Size> &rhs) noexcept;

Subtract each element of rhs from the respective element of lhs.

template<class T, size_t Size>
vec<T, Size> operator*(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator*(const vec<T, Size> &lhs, const T &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator*(const T &lhs, const vec<T, Size> &rhs) noexcept;

Multiply each element of rhs by the respective element of lhs.

template<class T, size_t Size>
vec<T, Size> operator/(const vec<T, Size> &lhs,
                       const vec<T, Size> &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator/(const vec<T, Size> &lhs, const T &rhs) noexcept;
template<class T, size_t Size>
vec<T, Size> operator/(const T &lhs, const vec<T, Size> &rhs) noexcept;

Divide each element of lhs by the respective element of the rhs.

3.8. Range Library

OpenCL C++ implements small library that contains useful utilities to manipulate iterator ranges.

3.8.1. Header <opencl_range> Synopsis

namespace cl
{
template <class It>
struct range_type
{
    constexpr range_type(It& it) noexcept;
    constexpr range_type(It& it, difference_type end) noexcept;
    constexpr range_type(It& it, difference_type begin,
                         difference_type end) noexcept;

    constexpr auto begin( ) noexcept;
    constexpr auto end( ) noexcept;
};

template <class It>
constexpr auto range(It& it) noexcept;

template <class It>
constexpr auto range(It& it, difference_type end) noexcept;

template <class It>
constexpr auto range(It& it, difference_type begin,
                     difference_type end) noexcept;

// difference_type is It::difference_type if present ptrdiff_t otherwise.

}

3.8.2. Range type

Range type represents a given range over iterable type. Depending on constructor used:

constexpr range_type(It& it) noexcept;

Represents range from begin(it) to end(it).

constexpr range_type(It& it, difference_type end) noexcept;

Represents range from begin(it) to begin(it)+end.

constexpr range_type(It& it, difference_type begin,
                     difference_type end) noexcept;

Represents range from begin(it)+begin to begin(it)+end.

3.8.3. Range function

range function is present in three overloads matching range_type constructors. It is a factory function building range_type.

Note
This function main purpose is enabling the use of range based for loops on built-in vectors.

3.9. Vector Utilities Library

OpenCL C++ implements vector utilities library that contains multiple helper classes to help working with built-in vectors.

3.9.1. Header <opencl_vector_utility> Synopsis

namespace cl
{
template <size_t Channel, class Vec>
constexpr remove_attrs_t<vector_element_t<Vec>> get(Vec & vector) noexcept;

template <size_t Channel, class Vec>
constexpr void set(Vec & vector,
                   remove_attrs_t<vector_element_t<Vec>> value) noexcept;

template <class Vec>
struct channel_ref
{
    using type = remove_attrs_t<vector_element_t<Vec>>;

    constexpr operator type( ) noexcept;
    constexpr channel_ref& operator=(type value) noexcept;
    constexpr channel_ref& operator +=(type value) noexcept;
    constexpr friend type operator +(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator -=(type value) noexcept;
    constexpr friend type operator -(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator *=(type value) noexcept;
    constexpr friend type operator *(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator /=(type value) noexcept;
    constexpr friend type operator /(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator %=(type value) noexcept;
    constexpr friend type operator %(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator ^=(type value) noexcept;
    constexpr friend type operator ^(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator &=(type value) noexcept;
    constexpr friend type operator &(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator |=(type value) noexcept;
    constexpr friend type operator |(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator++( ) noexcept;
    constexpr channel_ref operator++(int) noexcept;
    constexpr channel_ref& operator--( ) noexcept;
    constexpr channel_ref operator--(int) noexcept;
};

template <>
struct channel_ref<floating_point_vector>
{
    using type = remove_attrs_t<vector_element_t<Vec>>;

    constexpr operator type( ) noexcept;
    constexpr channel_ref& operator=(type value) noexcept;
    constexpr channel_ref& operator +=(type value) noexcept;
    constexpr friend type operator +(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator -=(type value) noexcept;
    constexpr friend type operator -(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator *=(type value) noexcept;
    constexpr friend type operator *(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator /=(type value) noexcept;
    constexpr friend type operator /(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator++( ) noexcept;
    constexpr channel_ref& operator++(int) noexcept;
    constexpr channel_ref& operator--( ) noexcept;
    constexpr channel_ref& operator--(int) noexcept;
};

template <>
struct channel_ref<boolean_vector>
{
    using type = remove_attrs_t<vector_element_t<Vec>>;

    constexpr operator type( ) noexcept;
    constexpr channel_ref& operator=(type value) noexcept;
    constexpr channel_ref& operator +=(type value) noexcept;
    constexpr friend type operator +(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator -=(type value) noexcept;
    constexpr friend type operator -(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator *=(type value) noexcept;
    constexpr friend type operator *(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator /=(type value) noexcept;
    constexpr friend type operator /(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator %=(type value) noexcept;
    constexpr friend type operator %(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator ^=(type value) noexcept;
    constexpr friend type operator ^(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator &=(type value) noexcept;
    constexpr friend type operator &(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator |=(type value) noexcept;
    constexpr friend type operator |(channel_ref lhs, type rhs) noexcept;
    constexpr channel_ref& operator++( ) noexcept;
    constexpr channel_ref& operator++(int) noexcept;
};

template <class Vec>
struct channel_ptr
{
    constexpr channel_ptr( ) noexcept;
    constexpr channel_ptr(const channel_ref<Vec>& ref) noexcept;
    constexpr channel_ptr(const channel_ptr&) noexcept = default;
    constexpr channel_ptr(channel_ptr&&) noexcept = default;
    constexpr channel_ptr& operator=(const channel_ptr&) noexcept = default;
    constexpr channel_ptr& operator=(channel_ptr&&) noexcept = default;

    using type = remove_attrs_t<vector_element_t<Vec>>;

    constexpr channel_ref<Vec>& operator*( ) noexcept;
};

template <class Vec>
struct vector_iterator : iterator<random_access_iterator_tag,
                          remove_attrs_t<vector_element_t<remove_attrs_t<Vec>>>,
                          ptrdiff_t,
                          channel_ptr<remove_attrs_t<Vec>>,
                          channel_ref<remove_attrs_t<Vec>>>
{
    using type = remove_attrs_t<Vec>;

    constexpr vector_iterator(type & vector, size_t offset) noexcept;
    constexpr vector_iterator( ) noexcept = default;
    constexpr vector_iterator(const vector_iterator&) noexcept = default;
    constexpr vector_iterator(vector_iterator&&) noexcept = default;
    constexpr vector_iterator& operator=(
                                     const vector_iterator&) noexcept = default;
    constexpr vector_iterator& operator=(vector_iterator&&) noexcept = default;

    constexpr vector_iterator& operator+=(difference_type value) noexcept;
    constexpr friend vector_iterator operator+(const vector_iterator& lhs,
                                               difference_type rhs) noexcept;
    constexpr friend vector_iterator operator+(difference_type lhs,
                                           const vector_iterator& rhs) noexcept;
    constexpr vector_iterator& operator-=(difference_type value) noexcept;
    constexpr friend vector_iterator operator-(const vector_iterator& lhs,
                                                  difference_type rhs) noexcept;

    constexpr vector_iterator operator++(int) noexcept;
    constexpr vector_iterator& operator++( ) noexcept;
    constexpr vector_iterator operator--(int) noexcept;
    constexpr vector_iterator& operator--( ) noexcept;

    friend constexpr bool operator ==(const vector_iterator& lhs,
                                      const vector_iterator& rhs) noexcept;
    friend constexpr bool operator !=(const vector_iterator& lhs,
                                      const vector_iterator& rhs) noexcept;
    friend constexpr bool operator <(const vector_iterator& lhs,
                                     const vector_iterator& rhs) noexcept;
    friend constexpr bool operator <=(const vector_iterator& lhs,
                                      const vector_iterator& rhs) noexcept;
    friend constexpr bool operator >(const vector_iterator& lhs,
                                     const vector_iterator& rhs) noexcept;
    friend constexpr bool operator >=(const vector_iterator& lhs,
                                      const vector_iterator& rhs) noexcept;

    constexpr reference operator[ ](difference_type value) noexcept;
    constexpr reference operator*( ) noexcept;

    constexpr pointer operator->( ) noexcept;
};

template <class Vec, class = enable_if_t<is_vector_type<Vec>::value, void>>
constexpr channel_ref<Vec> index(Vec& vector, size_t channel) noexcept;

template <class Vec, class = enable_if_t<is_vector_type<Vec>::value, void>>
constexpr vector_iterator<Vec> begin(Vec & vector) noexcept;

template <class Vec, class = enable_if_t<is_vector_type<Vec>::value, void>>
constexpr vector_iterator<Vec> end(Vec & vector) noexcept;

}

3.9.2. Vector iterator

Vector iterator is a random access iterator that allows runtime iteration over vector channels. Meets all the requirements for random access iterator. Iterating outside of vector bounds is an undefined behavior.

The library also exposes non member begin and end functions for vectors.

Note
Due to the usage of argument-dependent lookup in range based for loops this functions are not available, and the new range adapter has to be used

There is also an index function present in the library that allows runtime numerical indexing of channels. It returns a channel reference to a given channel number. Indexing out of vector bounds results in undefined behavior.

The following example will present simple template function computing sum of channels of a given vector:

template<class V>
auto sum(const V& v) {
    vector_element_t<V> temp = 0;
    for(auto e : range(v)) {
        temp += e;
    }
    return temp;
}

3.9.3. Channel reference and channel pointer

channel_ref and channel_ptr classes provide lightweight reference and pointer wrappers for vector channels. This is required due to the fact that vector channels can be moved across memory during execution so direct physical addressing is impossible. Reference wrapper provides a set of binary operators (depending on vector channel type).

The following example will present a simple usage of channel reference to set first channel of given vector to 0:

template <class V>
void fun(V& v) {
    channel_ref<V> r = *begin(v);
    r = 0;
}

3.9.4. Get and set functions

Get and set functions allow compile time numerical indexing of channels to substitute for normal swizzling. Indexing out of vector range generates a compile error. Get function returns a copy of channel value.

The following example will present how get and set functions can be used to duplicate the value of the first channel of given vector:

template <class V>
void fun(V& v) {
    auto c = get< 0 >(v);
    set< 0 >(v, 2*c);
}

3.9.5. Examples

Example 1

Example of using built-in vector iterators.

#include <opencl_vector_utility>
#include <opencl_range>
using namespace cl;

kernel void foo() {
    int8 v_i8;
    auto iter_begin = begin(v_i8);
    auto iter_end = end(v_i8);

    iter_begin = iter_end;

    int a = 0;
    a = *iter_begin;
    a = iter_begin[0];

    iter_begin++;
    iter_begin+=1;
    iter_begin = iter_begin + 1;

    iter_begin--;
    iter_begin-=1;
    iter_begin = iter_begin - 1;
}
Example 2

Example of iterating though built-in vector channels and using range library.

#include <opencl_vector_utility>
#include <opencl_range>

kernel void foo() {
    int16 a;

    for (auto it = cl::begin(a); it != cl::end(a); it++) {
        int b = *it;
        *it = 2;
    }

    for (auto c : cl::range(a,3,6)) {
        int b = c;
        c = 2;
    }
}

3.10. Marker Types

Some types in OpenCL C++ are considered marker types. These types are special in the manner that their usages need to be tracked by the compiler. This results in the following set of restrictions that marker types have to follow:

  • Marker types have the default constructor deleted.

  • Marker types have all default copy and move assignment operators deleted.

  • Marker types have address-of operator deleted.

  • Marker types cannot be used in divergent control flow. It can result in undefined behavior.

  • Size of marker types is undefined.

3.10.1. Header <opencl_marker> Synopsis

namespace cl
{
struct marker_type;

template<class T>
struct is_marker_type;

}

3.10.2. marker_type class

All special OpenCL C++ types must use the marker_type class as a base class.

3.10.3. is_marker_type type trait

is_marker_type type trait provides compile-time check if the base of a class is marker_type.

namespace cl
{
template<class T>
struct is_marker_type : integral_constant <bool, is_base_of<marker_type, T>::value> { };

}

3.10.4. Examples

Example 1

The examples of invalid use of marker types.

#include <opencl_image>
#include <opencl_work_item>
using namespace cl;

float4 bar(image1d<float4> img) {
    return img.read({get_global_id(0), get_global_id(1)});
}

kernel void foo(image1d<float4> img1, image1d<float4> img2) {
    image1d<float4> img3; //error: marker type cannot be declared
                          //       in the kernel
    img1 = img2; //error: marker type cannot be assigned
    image1d<float4> *imgPtr = &img1; //error: taking address of
                                     //       marker type

    size_t s = sizeof(img1); //undefined behavior: size of marker
                             //               type is not defined

    float4 val = bar(get_global_id(0) ? img1: img2);
                          //undefined behavior: divergent control flow
}
Example 2

The examples of how to use is_marker_type trait.

#include <opencl_image>
using namespace cl;

kernel void foo(image1d<float4> img) {
  static_assert(is_marker_type<decltype(img)>(), "");
}

3.11. Images and Samplers Library

This section describes the image and sampler types and functions that can be used to read from and/or write to an image. image1d, image1d_buffer, image1d_array, image2d, image2d_array, image3d, image2d_depth, image2d_array_depth, image2d_ms, image2d_array_ms, image2d_depth_ms, image2d_array_depth_ms [13] and sampler follow the rules for marker types (Marker Types section). The image and sampler types can only be used if the device support images i.e. CL_DEVICE_IMAGE_SUPPORT as described in table 4.3 in OpenCL 2.2 specification is CL_TRUE.

3.11.1. Image and Sampler Host Types

The below table describes the OpenCL image and sampler data types and the corresponding data type available to the application:

Table 10. Host image and sampler types
Type in OpenCL C++ API type for application

cl::image1d,

cl::image1d_buffer,

cl::image1d_array,

cl::image2d,

cl::image2d_array,

cl::image3d,

cl::image2d_depth,

cl::image2d_array_depth,

cl::image2d_ms,

cl::image2d_array_ms,

cl::image2d_depth_ms,

cl::image2d_array_depth_ms

cl_image

cl::sampler

cl_sampler

3.11.2. Header <opencl_image> Synopsis

namespace cl
{
enum class image_access;
enum class image_channel_type;
enum class image_channel_order;
enum class addressing_mode;
enum class normalized_coordinates;
enum class filtering_mode;

struct sampler;

template <addressing_mode A, normalized_coordinates C, filtering_mode F>
constexpr sampler make_sampler();

template <class T, image_access A, image_dim Dim, bool Depth, bool Array,
          bool MS>
struct image;

template <class T, image_access A = image_access::read>
using image1d = image<T, A, image_dim::image_1d, false, false, false>;

template <class T, image_access A = image_access::read>
using image1d_buffer = image<T, A, image_dim::image_buffer, false, false,
                             false>;

template <class T, image_access A = image_access::read>
using image1d_array = image<T, A, image_dim::image_1d, false, true, false>;

template <class T, image_access A = image_access::read>
using image2d = image<T, A, image_dim::image_2d, false, false, false>;

template <class T, image_access A = image_access::read>
using image2d_depth = image<T, A, image_dim::image_2d, true, false, false>;

template <class T, image_access A = image_access::read>
using image2d_array = image<T, A, image_dim::image_2d, false, true, false>;

template <class T, image_access A = image_access::read>
using image3d = image<T, A, image_dim::image_3d, false, false, false>;

template <class T, image_access A = image_access::read>
using image2d_array_depth = image<T, A, image_dim:: image_2d, true, true,
                                  false>;

#if defined(cl_khr_gl_msaa_sharing) && defined(cl_khr_gl_depth_images)
template <class T, image_access A = image_access::read>
using image2d_ms = image<T, A, image_dim::image_2d, false, false, true>;

template <class T, image_access A = image_access::read>
using image2d_array_ms = image<T, A, image_dim::image_2d, false, true, true>;

template <class T, image_access A = image_access::read>
using image2d_depth_ms = image<T, A, image_dim::image_2d, true, false, true>;

template <class T, image_access A = image_access::read>
using image2d_array_depth_ms = image<T, A, image_dim::image_2d, true, true,
                                     true>;
#endif

}

Where T is the type of value returned when reading or sampling from given image or the type of color used to write to image.

3.11.3. image class

Every image type has the following set of publicly available members and typedefs:

template <class T, image_access A, image_dim Dim, bool Depth, bool Array,
          bool MS>
struct image: marker_type
{
    static constexpr image_dim dimension = Dim;
    static constexpr size_t dimension_num = image_dim_num<Dim>::value;
    static constexpr size_t image_size = dimension_num + (Array? 1: 0);
    static constexpr image_access access = A;
    static constexpr bool is_array = Array;
    static constexpr bool is_depth = Depth;
#if defined(cl_khr_gl_msaa_sharing) && defined(cl_khr_gl_depth_images)
    static constexpr bool is_ms = MS;
#else
    static constexpr bool is_ms = false;
#endif
    typedef element_type T;
    typedef integer_coord make_vector_t<int, image_size>;
    typedef float_coord make_vector_t<float, image_size>;

#ifdef cl_khr_mipmap_image
    typedef gradient_coord make_vector_t<float, dimension_num>;
#endif

    struct pixel;

    image() = delete;
    image(const image&) = default;
    image(image&&) = default;

    image& operator=(const image&) = delete;
    image& operator=(image&&) = delete;
    image* operator&() = delete;
};

3.11.4. Image element types

We can classify images into four categories: depth images which have the Depth template parameter set to true, multi-sample depth images which have the Depth and MS template parameters set to true, multi-sample which have the MS template parameter set to true, and the normal images which have the Depth and MS template parameters set to false.

  • For non-multisample depth images the only valid element types are: float and half [4]

  • For normal images the only valid element types are: float4, half4 [4], int4 and uint4

  • For multi-sample 2D and multi-sample 2D array images the only valid element types are: float4, int4 and uint4

  • For multi-sample 2D depth and multi-sample 2D array depth images the only valid element type is: float

Image type with invalid pixel type is ill formed.

3.11.5. Image dimension

namespace cl
{
enum class image_dim
{
    image_1d,
    image_2d,
    image_3d,
    image_buffer
};

template <image_dim Dim>
struct image_dim_num;

}

Image types present different set of methods depending on their dimensionality and arrayness.

  • Images of dimension 1 (image_dim::image_1d and image_dim::buffer) have method:

    int width() const noexcept;
  • Images of dimension 2 (image_dim::image_2d) have all methods of 1 dimensional images and

    int height() const noexcept;
  • Images of dimension 3 (image_dim::image_3d) have all methods of 2 dimensional images and

    int depth() const noexcept;
  • Arrayed images have additional method

    int array_size() const noexcept;

If cl_khr_mipmap_image or cl_khr_mipmap_image_writes extension is enabled then the following methods are also present:

  • Images of dimension 1 (image_dim::image_1d and image_dim::buffer) have method:

    int width(int lod) const noexcept;
  • Images of dimension 2 (image_dim::image_2d) have all methods of 1 dimensional images and

    int height(int lod) const noexcept;
  • Images of dimension 3 (image_dim::image_3d) have all methods of 2 dimensional images and

    int depth(int lod) const noexcept;
  • Arrayed images have additional method

    int array_size(int lod) const noexcept;

If cl_khr_gl_msaa_sharing and cl_khr_gl_depth_images extensions are enabled then the following methods are also present:

  • Images of dimension 2D (image_dim::image_2d) have method:

    int num_samples() const noexcept;

The following table describes the image_dim_num trait that return a number of dimensions based on image_dim parameter.

Table 11. Image_dim_num trait
Template Value

template <image_dim Dim> struct image_dim_num;

If Dim is image_dim::image_1d or image_dim::image_buffer, image dimension is 1.

If Dim is image_dim::image_2d, image dimension is 2.

If Dim is image_dim::image_3d, image dimension is 3.

3.11.6. Image access

namespace cl
{
enum class image_access
{
    sample,
    read,
    write,
    read_write
};

}

The non-multisample image template class specializations present different set of methods based on their access parameter.

  • Images specified with image_access::read provide additional methods:

    element_type image::read(integer_coord coord) const noexcept;
    
    pixel image::operator[](integer_coord coord) const noexcept;
    
    element_type image::pixel::operator element_type() const noexcept;
  • Images specified with image_access::write provide additional method:

    void image::write(integer_coord coord, element_type color) noexcept;
    
    image::pixel image::operator[](integer_coord coord) noexcept;
    
    image::pixel & image::pixel::operator=(element_type color) noexcept;
  • Images specified with image_access::read_write provide additional methods:

    element_type image::read(integer_coord coord) const noexcept;
    
    void image::write(integer_coord coord, element_type color) noexcept;
    
    image::pixel image::operator[](integer_coord coord) noexcept;
    
    element_type image::pixel::operator element_type() const noexcept;
    
    image::pixel & image::pixel::operator=(element_type color) noexcept;
  • Images specified with image_access::sample provide additional methods:

    element_type image::read(integer_coord coord) const noexcept;
    
    element_type image::sample(const sampler &s,
                               integer_coord coord) const noexcept;
    
    element_type image::sample(const sampler &s, float_coord coord) const noexcept;
    
    image::pixel image::operator[](integer_coord coord) const noexcept;
    
    element_type image::pixel::operator element_type() const noexcept;

If cl_khr_mipmap_image extension is enabled the following methods are added to the non-multisample image types:

  • Images specified with image_access::sample provide additional methods:

    element_type image::sample(const sampler &s, float_coord coord,
                               float lod) const noexcept;
    
    element_type image::sample(const sampler &s, float_coord coord,
                               gradient_coord gradient_x,
                               gradient_coord gradient_y) const noexcept;

If cl_khr_mipmap_image_writes extension is enabled the following methods are added to the non-multisample image types:

  • Images specified with image_access::write provide additional method:

    void image::write(integer_coord coord, element_type color, int lod) noexcept;

If cl_khr_gl_msaa_sharing and cl_khr_gl_depth_images extensions are enabled and the multisample image type is used, the following method is available:

  • The multisample images specified with image_access::read provide method:

    element_type image::read(integer_coord coord, int sample) noexcept;

3.11.7. Common image methods

Each image type implements a set of common methods:

image_channel_type image::data_type() const noexcept;
image_channel_order image::order() const noexcept;

If cl_khr_mipmap_image or cl_khr_mipmap_image_writes extension is enabled then the following method is also present in the non-multisample image types:

int image::miplevels() const noexcept;

where image_channel_type and image_channel_order are defined as follows:

namespace cl
{
enum class image_channel_type
{
    snorm_int8,
    snorm_int16,
    unorm_int8,
    unorm_int16,
    unorm_int24,
    unorm_short_565,
    unorm_short_555,
    unorm_int_101010,
    unorm_int_101010_2,
    signed_int8,
    signed_int16,
    signed_int32,
    unsigned_int8,
    unsigned_int16,
    unsigned_int32,
    fp16,
    fp32
};

enum class image_channel_order
{
    a,
    r,
    rx,
    rg,
    rgx,
    ra,
    rgb,
    rgbx,
    rgba,
    argb,
    bgra,
    intensity,
    luminance,
    abgr,
    depth,
    depth_stencil,
    srgb,
    srgbx,
    srgba,
    sbgra
};

}

3.11.8. Other image methods

image::sample
element_type image::sample(const sampler &s, float_coord coord) const noexcept;

Reads a color value from the non-multisample image using sampler and floating point coordinates.

element_type image::sample(const sampler &s, integer_coord coord) const noexcept;

Reads a color value from non-multisample image using sampler and integer coordinates.

A sampler must use filter mode set to filtering_mode::nearest, normalized coordinates and addressing mode set to addressing_mode::clamp_to_edge, addressing_mode::clamp, addressing_mode::none, otherwise the values returned are undefined.

element_type image::sample(const sampler &s, float_coord coord, float lod) const noexcept;

Reads a color value from non-multisample image using sampler and floating point coordinates in the mip-level specified by lod.

Method is present for non-multisample images if cl_khr_mipmap_image extension is enabled.

element_type image::sample(const sampler &s, float_coord coord,
                           gradient_coord gradient_x,
                           gradient_coord gradient_y) const noexcept;

Use the gradients to compute the lod and coordinate coord to do an element lookup in the mip-level specified by the computed lod.

Method is present if cl_khr_mipmap_image extension is enabled.

Based on the parameters with which image was created on host side the function will return different ranges of values

  • returns floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_type set to one of the pre-defined packed formats or image_channel_type::unorm_int8 or image_channel_type::unorm_int16.

  • returns floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_type::snorm_int8 or image_channel_type::snorm_int16.

  • returns floating-point values for image objects created with image_channel_type::float16 or image_channel_type::float32.

Values returned by image::sample where T is a floating-point type for image objects with image_channel_type values not specified in the description above are undefined.

The image::sample functions that take an image object where T is a signed integer type can only be used with image objects created with:

  • image_channel_type::sint8,

  • image_channel_type::sint16 and

  • image_channel_type::sint32.

If the image_channel_type is not one of the above values, the values returned by image::sample are undefined.

The image::sample functions that take an image object where T is an unsigned integer type can only be used with image objects created with:

  • image_channel_type::uint8,

  • image_channel_type::uint16 and

  • image_channel_type::uint32.

If the image_channel_type is not one of the above values, the values returned by image::sample are undefined.

image::read
element_type image::read(integer_coord coord) const noexcept;

Reads a color value from non-multisample image without sampler and integral coordinates. If cl_khr_mipmap_image extension is present may perform reads also from mipmap layer 0.

Based on the parameters with which image was created on host side the function will return different ranges of values

Read function behaves exactly as the corresponding image sample function with sampler that has filter mode set to filtering_mode::nearest, normalized coordinates set to normalized_coordinates::unnormalized and addressing mode to addressing_mode::none. The coordinates must be between 0 and image size in that dimension non inclusive.

  • returns floating-point values in the range [0.0 … 1.0] for image objects created with image_channel_type set to one of the pre-defined packed formats or image_channel_type::unorm_int8 or image_channel_type::unorm_int16.

  • returns floating-point values in the range [-1.0 … 1.0] for image objects created with image_channel_type::snorm_int8 or image_channel_type::snorm_int16.

  • returns floating-point values for image objects created with image_channel_type::float16 or image_channel_type::float32.

Values returned by image::read where T is a floating-point type for image objects with image_channel_type values not specified in the description above are undefined.

The image::read functions that take an image object where T is a signed integer type can only be used with image objects created with:

  • image_channel_type::sint8,

  • image_channel_type::sint16 and

  • image_channel_type::sint32.

If the image_channel_type is not one of the above values, the values returned by image::read are undefined.

The image::read functions that take an image object where T is an unsigned integer type can only be used with image objects created with image_channel_type set to one of the following values:

  • image_channel_type::uint8,

  • image_channel_type::uint16 and

  • image_channel_type::uint32.

If the image_channel_type is not one of the above values, the values returned by image::read are undefined.

element_type image::read(integer_coord coord, int sample) noexcept;

Use the coordinate and sample to do an element lookup in the image object. Method is only available in the MSAA image types and if cl_khr_gl_msaa_sharing and cl_khr_gl_depth_images extension are supported.

When a multisample image is accessed in a kernel, the access takes one vector of integers describing which pixel to fetch and an integer corresponding to the sample numbers describing which sample within the pixel to fetch. sample identifies the sample position in the multi-sample image.

For best performance, we recommend that sample be a literal value so it is known at compile time and the OpenCL compiler can perform appropriate optimizations for multisample reads on the device.

No standard sampling instructions are allowed on the multisample image. Accessing a coordinate outside the image and/or a sample that is outside the number of samples associated with each pixel in the image is undefined.

image::write
void image::write(integer_coord coord, element_type color) noexcept;

Writes a color value to location specified by coordinates from non-multisample image. If cl_khr_mipmap_image_writes extension is present may perform writes also to the mipmap layer 0. The coordinates must be between 0 and image size in that dimension non inclusive.

Based on the parameters with which image was created on host side the function will perform appropriate data format conversions before writing a color value.

void image::write(integer_coord coord, element_type color, int lod) noexcept;

Writes a color value to location specified by coordinates and lod from mipmap image. The coordinates must be between 0 and image size in that dimension non inclusive.

Method is present if cl_khr_mipmap_image extension is enabled.

Based on the parameters with which image was created on host side the function will perform appropriate data format conversions before writing a color value.

The image::write functions that take an image object where T is a floating-point type can only be used with image objects created with image_channel_type set to one of the pre-defined packed formats or set to:

  • image_channel_type::snorm_int8

  • image_channel_type::unorm_int8

  • image_channel_type::snorm_int16

  • image_channel_type::unorm_int16

  • image_channel_type::float16

  • image_channel_type::float32

The image::write functions that take an image object where T is a signed integer type can only be used with image objects created with:

  • image_channel_type::sint8

  • image_channel_type::sint16

  • image_channel_type::sint32

The image::write functions that take an image object where T is an unsigned integer type can only be used with image objects created with:

  • image_channel_type::uint8

  • image_channel_type::uint16

  • image_channel_type::uint32

The behavior of image::write for image objects created with image_channel_type values not specified in the description above is undefined.

image::operator[]
pixel operator[](integer_coord coord) noexcept;

pixel operator[](integer_coord coord) const noexcept;

Creates a pixel which can be used to read or/and write operation(s). It depends on image_access specified in the image.

Note
The pixel stores a reference to image and coordinates. This operation can consume more private memory than image::read and image::write methods. It can also negatively impact performance.
image::pixel::operator element_type
element_type pixel::operator element_type() const noexcept;

Reads a color value from non-multisample image without sampler and integral coordinates specified in pixel. If cl_khr_mipmap_image extension is present may perform reads also from mipmap layer 0.

This function is similar to image::read method. Please refer to description of this method for more details.

image::pixel::operator=
pixel & pixel::operator=(element_type color) noexcept;

Writes a color value to location specified by coordinates in pixel from non-multisample image. If cl_khr_mipmap_image_writes extension is present may perform writes also to the mipmap layer 0. The coordinates specified in pixel must be between 0 and image size in that dimension non inclusive.

Based on the parameters with which image was created on host side the function will perform appropriate data format conversions before writing a color value.

This function is similar to image::write method. Please refer to description of this method for more details.

image::width
int width() const noexcept;

Returns width of the image.

int width(int lod) const noexcept;

Returns width of the mip-level specified by lod.

Method is present in the non-multisample image types if cl_khr_mipmap_image extension is enabled.

image::height
int height() const noexcept;

Returns height of the image.

int height(int lod) const noexcept;

Returns height of the mip-level specified by lod.

Method is present in the non-multisample image types if cl_khr_mipmap_image extension is enabled.

image::depth
int depth() const noexcept;

Returns depth of the image.

int depth(int lod) const noexcept;

Returns depth of the mip-level specified by lod.

Method is present in the non-multisample image types if cl_khr_mipmap_image extension is enabled.

image::array_size
int array_size() const noexcept;

Returns size of the image array.

int array_size(int lod) const noexcept;

Returns size of the image array specified by lod.

Method is present in the non-multisample image types if cl_khr_mipmap_image extension is enabled.

image::size
integer_coord size() const noexcept;

Returns appropriately sized vector, or scalar for 1 dimensional images, containing all image dimensions followed by array size.

image::data_type
image_channel_type image::data_type() const noexcept;

Returns format of the image as specified upon its creation on host side.

image::order
image_channel_order image::order() const noexcept;

Returns channel order of the image as specified upon its creation on host side.

image::miplevels
int miplevels() const noexcept;

Returns number of mipmaps of image. Method is present if cl_khr_mipmap_image or cl_khr_mipmap_image_writes extension is enabled.

image::num_samples
int num_samples() const noexcept;

3.11.9. Sampler

namespace cl
{
struct sampler: marker_type
{
    sampler() = delete;
    sampler(const sampler&) = default;
    sampler(sampler&&) = default;

    sampler& operator=(const sampler&) = delete;
    sampler& operator=(sampler&&) = delete;
    sampler* operator&() = delete;

};

template <addressing_mode A, normalized_coordinates C, filtering_mode F>
constexpr sampler make_sampler();

}

There are only two ways of acquiring a sampler inside of a kernel. One is to pass it as a kernel parameter from host using clSetKernelArg, the other one is to create one using make_sampler function in the kernel code. make_sampler function has three template parameters specifying behavior of sampler. Once acquired sampler can only be passed by reference as all other marker types. The sampler objects at non-program scope must be declared with static specifier.

The maximum number of samplers that can be declared in a kernel can be queried using the CL_DEVICE_MAX_SAMPLERS token in clGetDeviceInfo.

3.11.10. Sampler Modes

namespace cl
{
enum class addressing_mode
{
    mirrored_repeat,
    repeat,
    clamp_to_edge,
    clamp,
    none
};

enum class normalized_coordinates
{
    normalized,
    unnormalized
};

enum class filtering_mode
{
    nearest,
    linear
};

}

The following tables describe the inline sampler parameters and their behavior.

Table 12. Addressing modes
Addressing mode Description

mirrored_repeat

Out of range coordinates will be flipped at every integer junction. This addressing mode can only be used with normalized coordinates. If normalized coordinates are not used, this addressing mode may generate image coordinates that are undefined.

repeat

Out of range image coordinates are wrapped to the valid range. This addressing mode can only be used with normalized coordinates. If normalized coordinates are not used, this addressing mode may generate image coordinates that are undefined.

clamp_to_edge

Out of range image coordinates are clamped to the extent.

clamp

Out of range image coordinates will return a border color.

none

For this addressing mode the programmer guarantees that the image coordinates used to sample elements of the image refer to a location inside the image; otherwise the results are undefined.

For 1D and 2D image arrays, the addressing mode applies only to the x and (x, y) coordinates. The addressing mode for the coordinate which specifies the array index is always clamp_to_edge.

Table 13. Normalized coordinates
Normalized Coordinate Values Description

normalized

Specifies whether the x, y and z coordinates are passed in as normalized values.

unnormalized

Specifies whether the x, y and z coordinates are passed in as unnormalized values.

Sampling from an image with samplers that differ in specification of coordinates normalization result in undefined behavior.

Table 14. Coordinate filtering modes
Filtering mode Description

nearest

Chooses a color of nearest pixel.

linear

Performs a linear sampling of adjacent pixels.

Refer to section 4.2 in the OpenCL API specification for a description of these filter modes.

3.11.11. Determining the border color or value

If <addressing mode> in sampler is clamp, then out-of-range image coordinates return the border color. The border color selected depends on the image channel order and can be one of the following values:

  • If the image channel order is image_channel_order::a, image_channel_order::intensity, image_channel_order::rx, image_channel_order::ra, image_channel_order::rgx, image_channel_order::rgbx, image_channel_order::srgbx, image_channel_order::argb, image_channel_order::bgra, image_channel_order::abgr, image_channel_order::rgba, image_channel_order::srgba or image_channel_order::sbgra, the border color is (0.0f, 0.0f, 0.0f, 0.0f).

  • If the image channel order is image_channel_order::r, image_channel_order::rg, image_channel_order::rgb, or image_channel_order::luminance, the border color is (0.0f, 0.0f, 0.0f, 1.0f).

  • If the image channel order is image_channel_order::depth, the border value is 0.0f.

3.11.12. sRGB Images

The built-in image read functions perform sRGB to linear RGB conversions if the image is an sRGB image. Writes to sRGB images from a kernel is an optional extension. The cl_khr_srgb_image_writes extension will be reported in the CL_DEVICE_EXTENSIONS string if a device supports writing to sRGB images using image::write. clGetSupportedImageFormats will return the supported sRGB images if CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY is specified in flags argument and the device supports writing to an sRGB image. If the cl_khr_srgb_image_writes extension is supported and has been enabled, the built-in image write functions perform the linear to sRGB conversion.

Only the R, G and B components are converted from linear to sRGB and vice-versa. The alpha component is returned as is.

3.11.13. Reading and writing to the same image in a kernel

To read and write to the same image in a kernel, the image must be declared with the image_access::read_write. Only sampler-less reads and write functions can be called on an image declared with the image_access::read_write access qualifier. Calling the image::sample functions on an image declared with the image_access::read_write will result in a compilation error.

The atomic_fence function from Atomic Fences section can be used to make sure that writes are visible to later reads by the same work-item. Without use of the atomic_fence function, write-read coherence on image objects is not guaranteed: if a work-item reads from an image to which it has previously written without an intervening atomic_fence, it is not guaranteed that those previous writes are visible to the work-item. Only a scope of memory_order_acq_rel is valid for atomic_fence when passed the mem_fence::image flag. If multiple work-items are writing to and reading from multiple locations in an image, the work_group_barrier from Synchronization Functions section should be used.

Consider the following example:

#include <opencl_work_item>
#include <opencl_atomic>
#include <opencl_image>
using namespace cl;

kernel void foo(image2d<float4, image_access::read_write> img, ... ) {
    int2 coord;
    coord.x = (int)get_global_id(0);
    coord.y = (int)get_global_id(1);

    float4 clr = img.read(coord);
    //...
    img.write(coord, clr);

    // required to ensure that following read from image at
    // location coord returns the latest color value.
    atomic_fence(mem_fence::image,
     memory_order_acq_rel,
     memory_scope_work_item);

    float4 clr_new = img.read(coord);
    //...
}

3.11.14. Mapping image channels to color values returned by image::sample, image::read and color values passed to image::write to image channels

The following table describes the mapping of the number of channels of an image element to the appropriate components in the float4, int4 or uint4 vector data type for the color values returned by image::sample, image::read or supplied to image::write. The unmapped components will be set to 0.0 for red, green and blue channels and will be set to 1.0 for the alpha channel.

Table 15. Image channel mappings
Image Channel Order float4, int4 or uint4 components of channel data

r, rx

(r, 0.0, 0.0, 1.0)

a

(0.0, 0.0, 0.0, a)

rg, rgx

(r, g, 0.0, 1.0)

ra

(r, 0.0, 0.0, a)

rgb, rgbx, srgb, srgbx

(r, g, b, 1.0)

rgba, bgra, argb, abgr, srgba, sbgra

(r, g, b, a)

intensity

(I, I, I, I)

luminance

(L, L, L, 1.0)

For image_channel_order::depth images, a scalar value is returned by image::sample, image::read or supplied to image::write.

Note
A kernel that uses a sampler with the clamp addressing mode with multiple images may result in additional samplers being used internally by an implementation. If the same sampler is used with multiple images called via image::sample, then it is possible that an implementation may need to allocate an additional sampler to handle the different border color values that may be needed depending on the image formats being used. The implementation allocated samplers will count against the maximum sampler values supported by the device and given by CL_DEVICE_MAX_SAMPLERS. Enqueuing a kernel that requires more samplers than the implementation can support will result in a CL_OUT_OF_RESOURCES error being returned.

3.11.15. Restrictions

  • The image and sampler types cannot be used with variables declared inside a class or union field, a pointer type, an array, global variables declared at program scope or the return type of a function.

  • The image and sampler types cannot be used with the global, local, priv and constant address space storage classes (Explicit address space storage classes section).

  • The values returned by applying the sizeof operator to the image and sampler types are implementation-defined.

3.11.16. Examples

Example 1

The example how to use an image object with sampler-less reads.

#include <opencl_image>
#include <opencl_work_item>
using namespace cl;

kernel void foo(image2d<float4, image_access::read> img) {
    int2 coord;
    coord.x = get_global_id(0);
    coord.y = get_global_id(1);

    float4 val = img.read(coord);
}
Example 2

The example how to use an image object with image_access::read_write access and atomic_fence function.

#include <opencl_image>
#include <opencl_atomic>
#include <opencl_work_item>
using namespace cl;

kernel void foo(image2d<float4, image_access::read_write> img) {
    int2 coord;
    coord.x = get_global_id(0);
    coord.y = get_global_id(1);

    float4 val1{0.5f};
    img[coord] = val1;

    atomic_fence(mem_fence::image, memory_order_acq_rel,
                 memory_scope_work_item);

    float4 val2 = img[coord];
}
Example 3

The example how to use an image object with sampler passed by a kernel argument.

#include <opencl_image>
#include <opencl_work_item>
using namespace cl;

kernel void foo(image2d<float4, image_access::sample> img, sampler s) {
    int2 coord;
    coord.x = get_global_id(0);
    coord.y = get_global_id(1);

    float4 val = img.sample(s, coord);
}
Example 4

The example how to use an image object with sampler declared at program scope.

#include <opencl_image>
#include <opencl_work_item>
using namespace cl;

sampler s = make_sampler<addressing_mode::clamp,
                         normalized_coordinates::unnormalized,
                         filtering_mode::nearest>();

kernel void foo(image2d<float4, image_access::sample> img) {
    int2 coord;
    coord.x = get_global_id(0);
    coord.y = get_global_id(1);

    float4 val = img.sample(s, coord);
}
Example 5

The example how to use an image object with sampler declared at non-program scope.

#include <opencl_image>
#include <opencl_work_item>
using namespace cl;

kernel void foo(image2d<float4, image_access::sample> img) {
    int2 coord;
    coord.x = get_global_id(0);
    coord.y = get_global_id(1);

    static sampler s = make_sampler<addressing_mode::clamp,
                                    normalized_coordinates::unnormalized,
                                    filtering_mode::nearest>();

    float4 val = img.sample(s, coord);
}

3.12. Pipes Library

Header <opencl_pipe> defines pipe and pipe_storage template classes. pipe and pipe_storage can be used as a communication channel between kernels. pipe, reservation and pipe_storage template classes follow all the rules for marker types as specified in Marker Types section.

3.12.1. Pipe Host Type

The below describes the OpenCL pipe data type and the corresponding data type available to the application:

Table 16. Host pipe type
Type in OpenCL C++ API type for application

cl::pipe

cl_pipe

3.12.2. Header <opencl_pipe> Synopsis

namespace cl
{
enum class pipe_access { read, write };

template <class T, pipe_access Access = pipe_access::read>
struct pipe;

template <class T, size_t N>
struct pipe_storage;

template<pipe_access Access = pipe_access::read, class T, size_t N>
pipe<T, Access> make_pipe(const pipe_storage<T, N>& ps);

}

3.12.3. pipe class specializations

pipe class has two distinct specializations depending on pipe_access parameter defined as follows:

namespace cl
{
template <class T, pipe_access Access = pipe_access::read>
struct pipe: marker_type
{
    typedef T element_type;
    static constexpr pipe_access access = Access;

    template<memory_scope S>
    struct reservation: marker_type
    {
        reservation() = delete;
        reservation(const reservation&) = default;
        reservation(reservation&&) = default;

        reservation& operator=(const reservation&) = delete;
        reservation& operator=(reservation&&) = delete;
        reservation* operator&() = delete;

        bool is_valid() const noexcept;
        bool read(uint index, T& ref) const noexcept;
        void commit() noexcept;

        explicit operator bool() const noexcept;
    };

    pipe() = delete;
    pipe(const pipe&) = default;
    pipe(pipe&&) = default;

    pipe& operator=(const pipe&) = delete;
    pipe& operator=(pipe&&) = delete;
    pipe* operator&() = delete;

    bool read(T& ref) const noexcept;
    reservation<memory_scope_work_item> reserve(
                                               uint num_packets) const noexcept;
    reservation<memory_scope_work_group> work_group_reserve(
                                               uint num_packets) const noexcept;
    reservation<memory_scope_sub_group> sub_group_reserve(
                                               uint num_packets) const noexcept;

    uint num_packets() const noexcept;
    uint max_packets() const noexcept;
};

template <class T>
struct pipe<T, pipe_access::write>: marker_type
{
    typedef T element_type;
    static constexpr pipe_access access = pipe_access::write;

    template<memory_scope S>
    struct reservation: marker_type
    {
        reservation() = delete;
        reservation(const reservation &) = default;
        reservation(reservation &&) = default;

        reservation& operator=(const reservation &) noexcept = delete;
        reservation& operator=(reservation &&) noexcept = delete;
        reservation* operator&() = delete;

        bool is_valid() const noexcept;
        bool write(uint index, const T& ref) noexcept;
        void commit() noexcept;

        explicit operator bool() const noexcept;
    };

    pipe() = delete;
    pipe(const pipe&) = default;
    pipe(pipe&&) = default;

    pipe& operator=(const pipe&) = delete;
    pipe& operator=(pipe&&) = delete;
    pipe* operator&() = delete;

    bool write(const T& ref) noexcept;
    reservation<memory_scope_work_item> reserve(uint num_packets) noexcept;
    reservation<memory_scope_work_group> work_group_reserve(
                                                     uint num_packets) noexcept;
    reservation<memory_scope_sub_group> sub_group_reserve(
                                                     uint num_packets) noexcept;

    uint num_packets() const noexcept;
    uint max_packets() const noexcept;
};

}

3.12.4. pipe class methods

pipe::read
bool read(T& ref) const noexcept;

Read packet from pipe into ref.

Returns true if read is successful and false if the pipe is empty.

pipe::write
bool write(const T& ref) noexcept;

Write packet specified by ref to pipe. Returns true if write is successful and false if the pipe is full.

pipe::reserve
reservation reserve(uint num_packets) const noexcept;

reservation reserve(uint num_packets) noexcept;

Reserve num_packets entries for reading/writing from/to pipe. Returns a valid reservation if the reservation is successful.

The reserved pipe entries are referred to by indices that go from 0 … num_packets - 1.

pipe::work_group_reserve
reservation work_group_reserve(uint num_packets) const noexcept;

reservation work_group_reserve(uint num_packets) noexcept;

Reserve num_packets entries for reading/writing from/to pipe. Returns a valid reservation if the reservation is successful.

The reserved pipe entries are referred to by indices that go from 0 … num_packets - 1.

pipe::sub_group_reserve
reservation sub_group_reserve(uint num_packets) const noexcept;

reservation sub_group_reserve(uint num_packets) noexcept;

Reserve num_packets entries for reading/writing from/to pipe. Returns a valid reservation if the reservation is successful.

The reserved pipe entries are referred to by indices that go from 0 … num_packets - 1.

pipe::num_packets
uint num_packets() const noexcept;

Returns the current number of packets that have been written to the pipe, but have not yet been read from the pipe. The number of available entries in a pipe is a dynamic value. The value returned should be considered immediately stale.

pipe::max_packets
uint max_packets() const noexcept;

Returns the maximum number of packets specified when pipe was created.

pipe::reservation::read
bool pipe::reservation::read(uint index, T& ref) const noexcept;

Read packet from the reserved area of the pipe referred to by index into ref.

The reserved pipe entries are referred to by indices that go from 0 … num_packets - 1.

Returns true if read is successful and false otherwise.

pipe::reservation::write
bool pipe::reservation::write(uint index, const T& ref) noexcept;

Write packet specified by ref to the reserved area of the pipe referred to by index.

The reserved pipe entries are referred to by indices that go from 0 … num_packets - 1.

Returns true if write is successful and false otherwise.

pipe::reservation::commit
void pipe::reservation::commit() const noexcept;

void pipe::reservation::commit() noexcept;

Indicates that all reads/writes to num_packets associated with reservation are completed.

pipe::reservation::is_valid
bool pipe::reservation::is_valid();

Return true if reservation is a valid reservation and false otherwise.

pipe::reservation::operator bool
explicit pipe::reservation::operator bool() const noexcept;

Return true if reservation is a valid reservation and false otherwise.

3.12.5. pipe_storage class

The lifetime of pipe_storage objects is the same as a program where they were declared. The variables of such type are not shared across devices.

N in the pipe_storage template class specifies the maximum number of packets which can be held by an object.

namespace cl
{
template<class T, size_t N>
struct pipe_storage: marker_type
{
    pipe_storage();
    pipe_storage(const pipe_storage&) = default;
    pipe_storage(pipe_storage&&) = default;

    pipe_storage& operator=(const pipe_storage&) = delete;
    pipe_storage& operator=(pipe_storage&&) = delete;
    pipe_storage* operator&() = delete;

    template<pipe_access Access = pipe_access::read>
    pipe<T, Access> get() const noexcept
};

template<pipe_access Access = pipe_access::read, class T, size_t N>
pipe<T, Access> make_pipe(const pipe_storage<T, N>& ps);

}

3.12.6. pipe_storage class methods and make_pipe function

pipe_storage::get
template<pipe_access Access = pipe_access::read>
pipe<T, Access> get() noexcept;

Constructs a read only or write only pipe from pipe_storage object. One kernel can have only one pipe accessor associated with one pipe_storage object.

make_pipe
template<pipe_access Access = pipe_access::read, class T, size_t N>
pipe<T, Access> make_pipe(const pipe_storage<T, N>& ps);

Constructs a read only or write only pipe from pipe_storage object. One kernel can have only one pipe accessor associated with one pipe_storage object.

3.12.7. Operations ordering using reservations

The reservation::read and reservation::write pipe functions can be used to read from or write to a packet index. These functions can be used to read from or write to a packet index one or multiple times. If a packet index that is reserved for writing is not written to using the reservation::write method, the contents of that packet in the pipe are undefined. reservation::commit remove the entries reserved for reading from the pipe. reservation::commit ensures that the entries reserved for writing are all added in-order as one contiguous set of packets to the pipe.

There can only be CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS (refer to Table 4.3) reservations active (i.e. reservations that have been reserved but not committed) per work-item or work-group for a pipe in a kernel executing on a device.

Work-item based reservations made by a work-item are ordered in the pipe as they are ordered in the program. Reservations made by different work-items that belong to the same work-group can be ordered using the work-group barrier function. The order of work-item based reservations that belong to different work-groups is implementation defined.

Work-group based reservations made by a work-group are ordered in the pipe as they are ordered in the program. The order of work-group based reservations by different work-groups is implementation defined.

3.12.8. Requirements

Data

Template parameter T in pipe and pipe_storage class template denotes the data type stored in pipe. The type T must be a POD type i.e. satisfy is_pod<T>::value == true.

Work-group operations

All work-group specific functions must be encountered by all work items in a work-group executing the kernel with the same argument values, otherwise the behavior is undefined.

Sub-group operations

All sub-group specific functions must be encountered by all work items in a sub-group executing the kernel with the same argument values, otherwise the behavior is undefined.

3.12.9. Restrictions

pipe
  • The pipe type cannot be used with variables declared inside a class or union field, a pointer type, an array, global variables declared at program scope or the return type of a function.

  • A kernel cannot read from and write to the same pipe object.

  • The pipe type cannot be used with the global, local, priv and constant address space storage classes (Explicit address space storage classes section).

  • The value returned by applying the sizeof operator to the pipe type is implementation-defined.

reservation
  • The reservation type cannot be used with variables declared inside a class or union field, a pointer type, an array, global variables declared at program scope or the return type of a function.

  • The reservation type cannot be used with the global, local, priv and constant address space storage classes (Explicit address space storage classes section).

  • The value returned by applying the sizeof operator to the reservation type is implementation-defined.

The following behavior is undefined:

  • A kernel calls reservation::read or reservation::write with a valid reservation but with an index that is not a value from 0 … num_packets - 1 specified to the corresponding call to pipe::reserve, pipe::work_group_reserve or pipe::sub_group_reserve.

  • A kernel calls reservation::read or reservation::write with a reservation that has already been committed (i.e. a reservation::commit with this reservation has already been called).

  • The contents of the reserved data packets in the pipe are undefined if the kernel does not call reservation::write for all entries that were reserved by the corresponding call to pipe::reserve, pipe::work_group_reserve or pipe::sub_group_reserve.

  • Calls to reservation::read and reservation::commit or reservation::write and reservation::commit for a given reservation must be called by the same kernel that made the reservation using pipe::reserve, pipe::work_group_reserve or pipe::sub_group_reserve. The reservation cannot be passed to another kernel including child kernels.

pipe_storage
  • Variables of type pipe_storage can only be declared at program scope or with the static specifier.

  • The pipe_storage type cannot be used as a class or union field, a pointer type, an array or the return type of a function.

  • The pipe_storage type cannot be used with the global, local, priv and constant address space storage classes (Explicit address space storage classes section).

  • The value returned by applying the sizeof operator to the pipe_storage type is implementation-defined.

  • Variables of type pipe created from pipe_storage can only be declared inside a kernel function at kernel scope.

The following behavior is undefined:

  • A kernel cannot contain more than one pipe accessor made from one pipe_storage object. Otherwise behavior is undefined.

3.12.10. Examples

Example 1

Example of reading from a pipe object.

#include <opencl_pipe>
using namespace cl;

kernel void reader(pipe<int> p) {
    int val;
    if(p.read(val)) {
        //...
    }
}
Example 2

Example of writing to a pipe object.

#include <opencl_pipe>
using namespace cl;

kernel void writer(pipe<int, pipe_access::write> p) {
    //...
    int val;
    if(p.write(val)) {
        //...
    }
}
Example 3

Example of reading from a pipe object using reservations.

#include <opencl_pipe>
using namespace cl;

kernel void reader(pipe<int, pipe_access::read> p) {
    int val;
    auto rid = p.reserve(1);
    if(rid.read(0, val)) {
        //...
    }
    rid.commit();
}
Example 4

Example of using a pipe_storage object and how to create the pipe objects/accessors from it.

#include <opencl_pipe>

cl::pipe_storage <int, 100> myProgramPipe0;

kernel void producer() {
    cl::pipe<int, cl::pipe_access::write> p =
   myProgramPipe0.get<cl::pipe_access::write>();
    //...
    p.write(...);
}

kernel void consumer() {
    cl::pipe<int, cl::pipe_access::read> p =
  myProgramPipe0.get<cl::pipe_access::read>();
    if(p.read(...)) {
        //...
    }
}
Example 5

Example of using more than one pipe_storage object.

#include <opencl_pipe>
using namespace cl;

pipe_storage<int2, 20> myProgramPipe2;
pipe_storage<float, 40> myProgramPipe3;

kernel void input() {
    auto p = make_pipe<pipe_access::write>(myProgramPipe2);
    //...
    p.write(...);
}

kernel void processor() {
    auto p_in = make_pipe<pipe_access::read>(myProgramPipe2);
    auto p_out = make_pipe<pipe_access::write>(myProgramPipe3);
    ...
    if(p_in.read(...)) {
        //...
    }
    p_out.write(...);
}

kernel void output() {
    auto p = make_pipe<pipe_access::read>(myProgramPipe3);
    if(p.read(...)) {
        //...
    }
}

3.13. Device Enqueue Library

OpenCL C++ device enqueue functionality allows a kernel to enqueue the same device, without host interaction. A kernel may enqueue code represented by lambda syntax, and control execution order with event dependencies including user events and markers. device_queue follows all the rules for marker types as specified in Marker Types section.

3.13.1. Queue Host Type

The below table describes the OpenCL queue data type and the corresponding data type available to the application:

Table 17. Host queue type
Type in OpenCL C++ API type for application

cl::device_queue

cl_queue

3.13.2. Header <opencl_device_queue> Synopsis

namespace cl
{
enum class enqueue_status;
enum class enqueue_policy;
enum class event_status;
enum class event_profiling_info;

struct event
{
    event();
    event(const event&) = default;
    event(event&) = default;

    event& operator=(const event&) = default;
    event& operator=(event&&) = default;

    bool is_valid() const noexcept;
    void retain() noexcept;
    void release() noexcept;

    explicit operator bool() const noexcept;

    void set_status(event_status status) noexcept;
    void profiling_info(event_profiling_info name,
                        global_ptr<long> value) noexcept;
};

event make_user_event();

struct ndrange
{
    explicit ndrange(size_t global_work_size) noexcept;
    ndrange(size_t global_work_size,
            size_t local_work_size) noexcept;
    ndrange(size_t global_work_offset,
            size_t global_work_size,
            size_t local_work_size) noexcept;

    template <size_t N>
    ndrange(const size_t (&global_work_size)[N]) noexcept;
    template <size_t N>
    ndrange(const size_t (&global_work_size)[N],
            const size_t (&local_work_size)[N]) noexcept;
    template <size_t N>
    ndrange(const size_t (&global_work_offset)[N],
            const size_t (&global_work_size)[N],
            const size_t (&local_work_size)[N]) noexcept;
};

struct device_queue: marker_type
{
    device_queue() noexcept = delete;
    device_queue(const device_queue&) = default;
    device_queue(device_queue&&) = default;

    device_queue& operator=(const device_queue&) = delete;
    device_queue& operator=(device_queue&&) = delete;
    device_queue* operator&() = delete;

    template <class Fun, class... Args>
    enqueue_status enqueue_kernel(enqueue_policy flag,
                                  const ndrange &ndrange,
                                  Fun fun,
                                  Args... args) noexcept;

    template <class Fun, class... Args>
    enqueue_status enqueue_kernel(enqueue_policy flag,
                                  uint num_events_in_wait_list,
                                  const event *event_wait_list,
                                  event *event_ret,
                                  const ndrange &ndrange,
                                  Fun fun,
                                  Args... args) noexcept;

    enqueue_status enqueue_marker(uint num_events_in_wait_list,
                                  const event *event_wait_list,
                                  event *event_ret) noexcept;
};

device_queue get_default_device_queue();

template <class Fun, class... Args>
uint get_kernel_work_group_size(Fun fun, Args... args);
template <class Fun, class... Args>
uint get_kernel_preferred_work_group_size_multiple(Fun fun,
                                                   Args... args);
template <class Fun, class... Args>
uint get_kernel_sub_group_count_for_ndrange(const ndrange &ndrange,
                                            Fun fun,
                                            Args... args);
template <class Fun, class... Args>
uint get_kernel_max_sub_group_size_for_ndrange(const ndrange &ndrange,
                                               Fun fun,
                                               Args... args);
template <class Fun, class... Args>
uint get_kernel_local_size_for_sub_group_count(uint num_sub_groups,
                                               Fun fun,
                                               Args... args);
template <class Fun, class... Args>
uint get_kernel_max_num_sub_groups(Fun fun, Args... args);

}

3.13.3. device_queue class methods

device_queue object represents work queue of the device. Device queue meets all requirements of the marker types as in Marker Types section.

device_queue::enqueue_kernel
template <class Fun, class... Args>
enqueue_status enqueue_kernel(enqueue_policy policy,
                              const ndrange &ndrange,
                              Fun fun,
                              Args... args) noexcept;

This method allows to enqueue functor or lambda fun on the device with specified policy over the specified ndrange.

args are the arguments that will be passed to fun when kernel will be enqueued with the exception for local_ptr parameters. For local pointers user must supply the size of local memory that will be allocated using local_ptr<T>::size_type{num elements}. Please see examples how to use enqueue_kernel are in Examples section.

template <class Fun, class... Args>
enqueue_status enqueue_kernel(enqueue_policy policy,
                              uint num_events_in_wait_list,
                              const event *event_wait_list,
                              event *event_ret,
                              const ndrange &ndrange,
                              Fun fun,
                              Args... args) noexcept;

This method enqueues functor or lambda fun in the same way as the overload above with the exception for the passed event list. If an event is returned, enqueue_kernel performs an implicit retain on the returned event.

device_queue::enqueue_marker
enqueue_status enqueue_marker(uint num_events_in_wait_list,
                              const event *event_wait_list,
                              event *event_ret) noexcept;

This method enqueues a marker to device queue. The marker command waits for a list of events specified by event_wait_list to complete before the marker completes. event_ret must not be nullptr as otherwise this is a no-op.

If an event is returned, enqueue_marker performs an implicit retain on the returned event.

3.13.4. event class methods

event::is_valid
bool is_valid() const noexcept;

Returns true if event object is a valid event. Otherwise returns false.

event::operator bool
explicit operator bool() const noexcept;

Returns true if event object is a valid event. Otherwise returns false.

event::retain
void retain() noexcept;

Increments the event reference count. Event must be an event returned by enqueue_kernel or enqueue_marker or a user event.

event::release
void release() noexcept;

Decrements the event reference count. The event object is deleted once the event reference count is zero, the specific command identified by this event has completed (or terminated) and there are no commands in any device command queue that require a wait for this event to complete. Event must be an event returned by enqueue_kernel, enqueue_marker or a user event.

event::set_status
void set_status(event_status status) noexcept;

Sets the execution status of a user event. Event must be a user event. status can be either event_status::complete or event_status::error value indicating an error.

event::profiling_info
void profiling_info(event_profiling_info name,
                    global_ptr<long> value) noexcept;

Captures the profiling information for functions that are enqueued as commands. The specific function being referred to is: enqueue_kernel. These enqueued commands are identified by unique event objects. The profiling information will be available in value once the command identified by event has completed. Event must be an event returned by enqueue_kernel.

name identifies which profiling information is to be queried and can be:

  • event_profiling_info::exec_time

    value is a pointer to two 64-bit values.

    The first 64-bit value describes the elapsed time CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_START for the command identified by event in nanoseconds.

    The second 64-bit value describes the elapsed time CL_PROFILING_COMMAND_COMPLETE - CL_PROFILING_COMMAND_START for the command identified by event in nanoseconds.

Note
profiling_info when called multiple times for the same event is undefined.

3.13.5. Other operations

get_default_device_queue
device_queue get_default_device_queue();

Returns the default device queue. If a default device queue has not been created, device_queue::is_valid() will return false.

make_user_event
event make_user_event();

Creates a user event. Returns the user event. The execution status of the user event created is set to event_status::submitted.

get_kernel_work_group_size
template <class Fun, class... Args>
uint get_kernel_work_group_size(Fun fun, Args... args);

This provides a mechanism to query the maximum work-group size that can be used to execute a functor on the current device.

fun specifies the functor representing the kernel code that would be enqueued.

args are the arguments that will be passed to fun when kernel will be enqueued with the exception for local_ptr parameters. For local pointers user must supply the size of local memory that will be allocated.

get_kernel_preferred_work_group_size_multiple
template <class Fun, class... Args>
uint get_kernel_preferred_work_group_size_multiple(Fun fun,
                                                   Args... args);

Returns the preferred multiple of work-group size for launch. This is a performance hint. Specifying a work-group size that is not a multiple of the value returned by this query as the value of the local work size argument to enqueue will not fail to enqueue the functor for execution unless the work-group size specified is larger than the device maximum.

fun specifies the functor representing the kernel code that would be enqueued.

args are the arguments that will be passed to fun when kernel will be enqueued with the exception for local_ptr parameters. For local pointers user must supply the size of local memory that will be allocated.

get_kernel_sub_group_count_for_ndrange
template <class Fun, class... Args>
uint get_kernel_sub_group_count_for_ndrange