Search:

Type: Posts; User: Atmapuri

Search: Search took 0.00 seconds.

  1. Replies
    12
    Views
    5,484

    Re: Calling CPU kernels with lower overhead!

    In my book Open CL is the most innovative design in computing science for the last 10 years. I think that although it was designed to address GPUs, it will ultimately make the biggest difference for...
  2. Replies
    12
    Views
    5,484

    Re: Calling CPU kernels with lower overhead!

    >The runtime can't simply take the pointer and call it right away.

    I am very well aware of that. Hence the suggestion.

    >It's hard to understand the value of executing a piece of code without...
  3. Replies
    12
    Views
    5,484

    Re: Calling CPU kernels with lower overhead!

    That compiler which I am using does not support Intel AVX and SSE 4.2 and produces very much slower code from what Open CL compiler produces. You are assuming I have a good compiler which is used to...
  4. Replies
    12
    Views
    5,484

    Re: Calling CPU kernels with lower overhead!

    Ok, but who compiles the function of which you are passing the pointer to with support for Intel AVX and SSE 4.2?
  5. Replies
    12
    Views
    5,484

    Re: Calling CPU kernels with lower overhead!

    If you call a function:

    double aFun(double a, double b)
    {
    return a + b;
    }

    such short functions have a huge call overhead when using Open CL because they need to go through the threading...
  6. Replies
    12
    Views
    5,484

    Re: Calling CPU kernels with lower overhead!

    Which other (than C++) compilers and scripting languages come with built-in C++ compiler so that programmers could use clEnqueueNativeKernel(..) ??
  7. Replies
    12
    Views
    5,484

    Calling CPU kernels with lower overhead!

    Hello!

    Please provide capability to call Open CL kernels on CPU via simple function pointer bypassing the threading engine.

    Purpose:
    - suitable for very short kernels which require low...
  8. Replies
    11
    Views
    6,113

    Re: Sharing host memory with clSetKernelArg!

    1.) Using only create/free buffer:

    platform[0]=AMD Accelerated Parallel Processing
    device[0]=Juniper
    end-start time 15.666632 usec

    device[1]=Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
    end-start...
  9. Replies
    11
    Views
    6,113

    Re: Sharing host memory with clSetKernelArg!

    I appreciate the detailed and well put answer. Here are some timings that I performed for the CPU device:

    cl_mem buffer = clCreateBuffer(context, CL_MEM_USE_HOST_PTR, 1024*1024*2, array, &error)...
  10. Replies
    11
    Views
    6,113

    Re: Sharing host memory with clSetKernelArg!

    I cant post complete code as it is scattered across lots of other code. I call clCreateBuffer with:

    CL_MEM_READ_WRITE

    clEnqueueMapBuffer has CL_TRUE for blocking and CL_MAP_READ when reading ...
  11. Replies
    11
    Views
    6,113

    Re: Sharing host memory with clSetKernelArg!

    I measured overhead between 50 and 2000us and you say it is not there?
    Going over clCreateBuffer or clEnqueueRead/Write even for CPU devices adds an overhead considerably (1000x) above the optimum...
  12. Replies
    11
    Views
    6,113

    Sharing host memory with clSetKernelArg!

    Hi!

    For CPU and AMD Fusion devices which share the same (host) memory, there is no point in relying on the clCreateBuffer to copy data to the device and back. In such cases it would make sense...
  13. Replies
    1
    Views
    996

    Re: Initializing array on the GPU!

    Hi!

    Found the problem: the offset parameter clEnqueueReadBuffer was not in bytes. <g>

    Thanks!
    Atmapuri
  14. Replies
    1
    Views
    996

    Initializing array on the GPU!

    Hi!

    I am running the following kernel:

    __kernel void InitArray(const float Val, __global float *Dst, const int DstIdx)
    {
    int gid = get_global_id(0);
    Dst[DstIdx + gid] = Val;
    }
  15. Replies
    6
    Views
    2,190

    Re: Local work size!

    In compare the to the naive implementation, the kernel1 is 20 percent slower and the kernel2 is 50 percent slower each with its optimal settings giving fastest run.

    >Surely if you reduce the...
  16. Replies
    6
    Views
    2,190

    Re: Local work size!

    __kernel void ippsAddd_Idx(__global const float *Src1, const int Src1Idx,
    __global const float *Src2, const int Src2Idx,
    __global float *Dst,...
  17. Replies
    6
    Views
    2,190

    Re: Local work size!

    Size is equal to length of the vectors or arrays. BlockLen describes how the long vector is broken down in to short vector. (in to many pieces each of BlockLen size). The idea with BlockLen is (or...
  18. Replies
    6
    Views
    2,190

    Local work size!

    Hi!

    I have been playing with various settings to local_work_size and looking at this kernel:

    for (unsigned int i = get_global_id(0); i < Size; i += get_global_size(0))
    ...
  19. Replies
    2
    Views
    895

    Re: Question in local synchronization!

    Thanks. The kernel is from ViennaCL library.
  20. Replies
    0
    Views
    1,827

    Some suggestions!

    Hi!

    1.) clEnqueNDRangeKernel limits the global_size to be be divisible with local_size, but internally at least AMD can set local size with which global_size is not divisible, if the user does not...
  21. Replies
    2
    Views
    895

    Question in local synchronization!

    Hi!

    I am looking at this kernel I found (and scratching my head):


    __kernel void sqrt_sum(
    __global float * vec1,
    __global float * result)
    {
    for (unsigned int...
  22. Replies
    1
    Views
    765

    Abs function is special?

    Hi!

    I notice that Abs is not defined for float and other negative range types. Can somebody elaborate what is the reasoning behind this? There are many other math functions which use "if" or...
  23. Replies
    1
    Views
    1,031

    Returning a single value from kernel!

    Hi!

    In order to return a single value (float or int) from the kernel, is it necessary to use buffers (clCreateBuffer, clEnqueReadBuffer) or is there some other way?

    Thanks!
    Atmapuri
  24. Replies
    1
    Views
    1,384

    Vector offset!

    Hi!

    How can I specify that indexing of float4 should start at index 1 instead of 0? Example:

    Offset = 1;
    for (i = 0; i < Len; i++) v[i+Offset] = 1;

    How can the same be achieved if the v is...
Results 1 to 24 of 25