Results 1 to 4 of 4

Thread: Questions on OpenCL Built-in functions?

  1. #1
    Junior Member
    Join Date
    Aug 2012
    Posts
    18

    Questions on OpenCL Built-in functions?

    Hi,
    I am trying to use OpenCL Built-in "rotate" function in one of my kernel as given below, but i am getting following errors while compiling :

    clBuildProgram Error for -11 Error Number
    error: call to 'rotate' is ambiguous
    W[0] = rotate(A, 5);
    ^~~~~
    <built-in>:2784:22: note: candidate function
    int __OVERLOADABLE__ rotate(int, int);
    ^
    <built-in>:2785:23: note: candidate function
    uint __OVERLOADABLE__ rotate(uint, uint);
    ^
    <built-in>:2780:23: note: candidate function
    char __OVERLOADABLE__ rotate(char, char);
    ^
    <built-in>:2781:24: note: candidate function
    uchar __OVERLOADABLE__ rotate(uchar, uchar);
    ^
    <built-in>:2782:24: note: candidate function
    short __OVERLOADABLE__ rotate(short, short);
    ^
    <built-in>:2783:25: note: candidate function
    ushort __OVERLOADABLE__ rotate(ushort, ushort);

    ------
    -----
    ----
    similar for other datatype.....
    and so on.......

    The kernel I used for demo is as ::
    __kernel void demoKernel(__global unsigned int *d_io_2d, long max_size)
    {
    long i = get_global_id(0);
    if(i >= max_size)
    return;
    unsigned int A;
    A = d_io_2d[i];

    unsigned int W[1];
    W[0] = rotate(A, 5);
    }

    My questions are ::
    1. How to use OpenCL Built-in functions? (As in above example, what am i missing actually?)
    2. What is the purpose of using Built-in functions?
    3. Does it improve the performance?

    Thanks !!

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Posts
    114
    1. Try rotate(A, (uint)5)
    2. Some of them can use dedicated hardware circuitry (e.g. rsqrt, mad). They also circumvent the fact that OpenCL C has no standard library.
    3. At least they don't degrade performance. You can expect that they are optimally implemented, i.e. you won't be able to do better.

  3. #3
    Junior Member
    Join Date
    Aug 2012
    Posts
    18
    Quote Originally Posted by utnapishtim View Post
    1. Try rotate(A, (uint)5)
    2. Some of them can use dedicated hardware circuitry (e.g. rsqrt, mad). They also circumvent the fact that OpenCL C has no standard library.
    3. At least they don't degrade performance. You can expect that they are optimally implemented, i.e. you won't be able to do better.
    Thanku !
    1. after trying rotate(A, (uint)5), my kernel compiled and i got correct result.
    3. My implementation of rotate function is :
    uint rotate1(int n, uint x)
    {
    return (x << n) | (x >> (32-n));
    }
    when i benchmarked my application with built-in "rotate" function and user-defined "rotate1" function (as mentioned above), I got that built-in "rotate" function was little slower compare to user-defined "rotate1".
    Then how can we expect that they are optimally implemented?

  4. #4
    Senior Member
    Join Date
    Oct 2012
    Posts
    114
    Please note that your implementation of rotate1() breaks down when n>32, whereas the built-in function rotate() is guaranteed to work for any n.

    I have checked the assembly code produced by three calls in a row to rotate() with NVIDIA OpenCL.
    Using built-in rotate(), 17 instructions are generated (and my NVIDIA hardware has no rotate instruction).
    Using your function rotate1(), 19 instructions are generated.

    So using built-in rotate() is safer and faster.

    Furthermore, if this code was compiled on a hardware device that has a rotate assembly instruction, you would have to rely on the compiler to detect that "(x << n) | (x >> (32-n))" is in fact a rotation and can be optimized into the rotate instruction.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •