Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Calling CPU kernels with lower overhead!

  1. #1
    Junior Member
    Join Date
    May 2011
    Posts
    24

    Calling CPU kernels with lower overhead!

    Hello!

    Please provide capability to call Open CL kernels on CPU via simple function pointer bypassing the threading engine.

    Purpose:
    - suitable for very short kernels which require low overhead
    - useful for compilers which do not use latest CPU instructions yet (including but not limited to MS VC++). Open CL could in this case deliver faster running functions which would be fastest possible on any platform regardless of the capabilities of the compiler /scripting language used.

    Thanks!
    Atmapuri

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Calling CPU kernels with lower overhead!

    Please provide capability to call Open CL kernels on CPU via simple function pointer
    Wouldn't clEnqueueNativeKernel() be what you are looking for?
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    May 2011
    Posts
    24

    Re: Calling CPU kernels with lower overhead!

    Which other (than C++) compilers and scripting languages come with built-in C++ compiler so that programmers could use clEnqueueNativeKernel(..) ??

  4. #4
    Junior Member
    Join Date
    May 2011
    Posts
    24

    Re: Calling CPU kernels with lower overhead!

    If you call a function:

    double aFun(double a, double b)
    {
    return a + b;
    }

    such short functions have a huge call overhead when using Open CL because they need to go through the threading library for the CPU devices. Calling aFun from a C++ for loop directly is 1000x faster than via Open CL API.

  5. #5
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Calling CPU kernels with lower overhead!

    Which other (than C++) compilers and scripting languages come with built-in C++ compiler so that programmers could use clEnqueueNativeKernel(..) ??
    clEnqueueNativeKernel() does not require such a thing. You simply pass a function pointer to it, much like you pass a function pointer in regular C.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  6. #6
    Junior Member
    Join Date
    May 2011
    Posts
    24

    Re: Calling CPU kernels with lower overhead!

    Ok, but who compiles the function of which you are passing the pointer to with support for Intel AVX and SSE 4.2?

  7. #7
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Calling CPU kernels with lower overhead!

    Ok, but who compiles the function of which you are passing the pointer to with support for Intel AVX and SSE 4.2?
    The same compiler you are using for the rest of your application. Again, this is not any different from using a function pointer in C99 --what you asked for--.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  8. #8
    Junior Member
    Join Date
    May 2011
    Posts
    24

    Re: Calling CPU kernels with lower overhead!

    That compiler which I am using does not support Intel AVX and SSE 4.2 and produces very much slower code from what Open CL compiler produces. You are assuming I have a good compiler which is used to call Open CL code. Take for example any .NET compiler or Java script. They produce very much substandard code in compare to lets say Intel C++.

    >Again, this is not any different from using a function pointer in C99
    > --what you asked for--.

    From syntax point and when using C++ compiler maybe. But from performance point definitely not. I read the help for clEnqueNative... and the call has enormous performance overhead. If nothing else, it adds the function to the queue. The handling of the queue alone already takes 1000x more time than direct function call in C++. (The handling of the queue overhead is mostly related to thread synchronization issues). Is it possible to a call an OpenCL kernel without it being added to the queue?

  9. #9
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Calling CPU kernels with lower overhead!

    If nothing else, it adds the function to the queue. The handling of the queue alone already takes 1000x more time than direct function call in C++.
    Even executing a simple function pointer requires the OpenCL runtime to guarantee the same synchronization and memory coherency constraints as when you are running any other kernel. The runtime can't simply take the pointer and call it right away.

    Is it possible to a call an OpenCL kernel without it being added to the queue?
    No, it's not possible. It's hard to understand the value of executing a piece of code without synchronizing it with the rest of the computations going on in the runtime.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  10. #10
    Junior Member
    Join Date
    May 2011
    Posts
    24

    Re: Calling CPU kernels with lower overhead!

    >The runtime can't simply take the pointer and call it right away.

    I am very well aware of that. Hence the suggestion.

    >It's hard to understand the value of executing a piece of code without synchronizing it >with the rest of the computations going on in the runtime.

    That's what I tried to explain although obviously not very successfully. (You do agree that there are algorithms for which current Open CL API is not suitable to accelerate?) The primary value is the quality or the speed of the compiled code. As mentioned before, many compilers from which you can call Open CL generate substandard code. In the world where the CPU will soon have registers wide enough to store 8 double precision values (to perform add or mul concurrently in one cycle on all of them)and compilers which operate only on the first item in this registers, this can make a big difference.

    In the world of Intel CPU, you can easily achieve performance ratio of 50x depending on the compiler that you use. This is not a marginal gain (!)

    Here is a list of items, why it makes sense to call Open CL kernels out of the threaded context:

    1.) Users can use (Open CL) compiler to generate many times faster code from what their own compilers which they use as primary development tool can deliver.
    2.) The resulting application will be cross platform enabled. (portable performance)
    3.) The use of a threading library (inside Open CL API) implies big jobs. But we all know that not all jobs can be threaded. Some are simply tool small in that context, but they are still such that they could greatly benefit from a compiler capable of vectorization.
    4.) You get a free access to high performance compiler.

    For these reasons I would like to see Open CL to allow un-threaded calls of its kernels and/or threaded from the callers side where entire Open CL API can optionally be bypassed except for its platfrom->device->compiler->getkernel->function_call

    Such API would enable cross platform applications to accelerate more of its code with lower development costs.

Page 1 of 2 12 LastLast

Similar Threads

  1. compileWithBinnaries and calling Kernels
    By luizdrumond in forum OpenCL
    Replies: 1
    Last Post: 11-28-2011, 04:14 PM
  2. compileWithBinnaries and calling Kernels
    By luizdrumond in forum OpenVG and VGU
    Replies: 0
    Last Post: 11-24-2011, 12:40 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •