Search:

Type: Posts; User: yoavhacohen

Page 1 of 2 1 2

Search: Search took 0.00 seconds.

  1. Replies
    9
    Views
    5,275

    Re: Example for Random Number Generator?

    Not right: it's possible to save the memory overhead using a counter-based RNG:
    http://www.openclblog.com/2011/11/gpus- ... ation.html
  2. Replies
    6
    Views
    2,263

    Re: Huge kernel overhead on Mac

    That makes sense, thanks.
    I think that the specification should allow prevention of such lazy operations by adding flags to the kernel and the buffer constructors.
    It might be the case that the...
  3. Replies
    14
    Views
    3,102

    Re: minimal efficient workgroup size

    Thanks a lot for the detailed reply!

    The branching is done by testing some shared value, so all threads at the wavefront should terminate together. And yes, the branching is done in the outer...
  4. Replies
    6
    Views
    2,263

    Re: Huge kernel overhead on Mac

    I've just noticed that the huge overhead happens only at the first time I run the kernel.

    I do build it in advance and cache the program, but are there addition operations that Apple's...
  5. Replies
    6
    Views
    2,263

    Re: Huge kernel overhead on Mac

    Sorry, I forgot to mention that I do use queue.finish() before calling get_time() at second time on the host side. I'm aware that clEnqueueNDRangeKernel() is a non-blocking operation.
    However the...
  6. Replies
    2
    Views
    1,041

    Re: pow precision

    It turns out that the problem was in a previous function, where one of the compilers make some optimization for float constants and the other don't.
  7. Replies
    6
    Views
    2,263

    Huge kernel overhead on Mac

    Hello,
    Sometimes I get huge kernel overhead.
    I measure the time of the time using two ways using:


    double start_time_total = get_time();

    cl::Event event;
    ...
  8. Replies
    1
    Views
    959

    Should I cache get_global_id(0)?

    Hi,

    Should I cache get_global_id(0) to a private integer like that:


    size_t idx = get_global_id(0)

    or should I call get_global_id(0) several times?

    I want to reduce the number of...
  9. Replies
    14
    Views
    3,102

    Re: minimal efficient workgroup size

    Is there a Mac version of the AMD profiler?
  10. Replies
    2
    Views
    1,003

    Re: OpenCL c++ binding in iMac

    As far as I know, the c++ wrapper is not included in Apple's OpenCL framework.
    You should download and include it manually.
  11. Replies
    14
    Views
    3,102

    Re: minimal efficient workgroup size

    Can't the GPU run another workgroup in parallel on the same compute unit to hide latency?[/quote]

    Answer to my question:
    The GPU can run other workgroups in parallel to hide latency, but only if...
  12. Replies
    2
    Views
    1,041

    pow precision

    Hello,

    I noticed that pow(t, 3.3333334e-1f) in OpenCL and std::pow(t, 3.3333334e-1f) do not always yield the same result, even if I use the CPU as the OpenCL device.

    I understand the OpenCL is...
  13. Replies
    14
    Views
    3,102

    Re: minimal efficient workgroup size

    Thanks for the detailed replies!



    Can't the GPU run another workgroup in parallel on the same compute unit to hide latency?
  14. How to get the warp/wavefront size in runtime?

    Is it possible to get the warp/wavefront size in runtime?

    (I need it inside the kernel so a predefined macro would be great, but if it's only possible to find this value at the host side than I...
  15. Re: workaround for no pointers to image2d_t restriction

    Thanks! (I work on Mac OS so no implementation for OpenCl 1.2 yet).
    How can I ensure that the compiler does this optimization? Can I look at the assembly?
  16. Replies
    14
    Views
    3,102

    Re: minimal efficient workgroup size

    Right, but does the gpu runs more than WARPSIZE work items from the same workgroup at a time?
  17. workaround for no pointers to image2d_t restriction

    Hello,

    I need to read value from one of two images, like that:


    for (i = 1; i < N; i++)
    {
    p = foo(i);
    if (cond)
    v = read_image_f(a, sampler, p);
  18. Image3D with different size for each slice

    When using mipmaps (e.g. for graphics or image processing applications) it can save memory to subsample the size of the coarse scales.

    Thus, it might be helpful to support image3D type with...
  19. Replies
    1
    Views
    2,618

    Re: Built-in random function

    +1
  20. Replies
    11
    Views
    5,457

    Re: Predefined Macros: device type

    I also think that this is a useful feature to add.
    It just means that these macros would be added automatically, instead of passing them manually.
  21. Replies
    4
    Views
    3,866

    Re: Mipmap with Trilinear interpolation

    I don't think it's related to precision issues, as trilinear interpolations are already defined for image3d.
    Basically, every vendor that supports image2d can support mipmaps (just add a small array...
  22. Replies
    14
    Views
    3,102

    minimal efficient workgroup size

    Hello,

    I'm working on Mac OS 10.7, with AMD Radeon 6750M.
    I wrote an OpenCL kernel, signed with the following attributes:


    kernel
    __attribute__((vec_type_hint(float4)))...
  23. Replies
    5
    Views
    3,355

    Re: specify work group sizes

    Agree with duanmu.
  24. Replies
    9
    Views
    5,275

    Re: Example for Random Number Generator?

    Thanks!
    Here is the rand() function that I want to use (I took it from OpenCV, and I use it in my CPU implementation):

    Option 1:


    ///
    /// Generate a random float4 vector of values between 0...
  25. Replies
    9
    Views
    5,275

    Re: Example for Random Number Generator?

    It seems like a simple PRNG can be adequate for my needs, since I need to generate several random numbers per work-item, and their statistical properties are not critical.

    However, most of the...
Results 1 to 25 of 37
Page 1 of 2 1 2