Search:

Type: Posts; User: utnapishtim

Page 1 of 5 1 2 3 4

Search: Search took 0.00 seconds.

  1. Yes, it is possible. It is often the case when...

    Yes, it is possible.
    It is often the case when the image contains an intermediate result produced by kernel A and to be consumed by kernel B.
  2. Replies
    8
    Views
    362

    That's what the attribute "endian" is made for:...

    That's what the attribute "endian" is made for: use __attribute__ ((endian(host))) or __attribute__ ((endian(device))) to tell OpenCL which kind of endianness a buffer uses. Default is device...
  3. OpenCL requires that sine has a minimum accuracy...

    OpenCL requires that sine has a minimum accuracy of 4 ulp.
    For example, if the expected result is 0.5, one ulp is 2^-53 = 1.11e-16. So the maximum admissible error is 4 ulp ~ 4.5e-16.

    So the...
  4. Check the max size of a constant buffer with...

    Check the max size of a constant buffer with CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE.
    It is generally 64KB on a GPU, so your buffer is probably too big to fit into a constant buffer.
  5. You can find a good introduction to reduction...

    You can find a good introduction to reduction with OpenCL here:

    http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-simple-reductions/
  6. Intel CPU and GPU share physical memory so...

    Intel CPU and GPU share physical memory so mapping a buffer is very efficient if the following conditions are fulfilled:

    - The buffer is created with CL_MEM_ALLOC_HOST_PTR, or with...
  7. To put it simply, a kernel should never access a...

    To put it simply, a kernel should never access a host memory buffer.

    In that case, the OpenCL implementation will either:

    - make a copy of the buffer between host and device before the kernel...
  8. Honestly, if your buffer is to be accessed by a...

    Honestly, if your buffer is to be accessed by a GPU kernel, you shouldn't use a host buffer and expect that all transfers will be magically optimized.
    If you need a buffer for your GPU kernel, then...
  9. Calling clEnqueueReadBuffer() with the pointer...

    Calling clEnqueueReadBuffer() with the pointer used to create the host-allocated buffer won't make any redundant copy but will only synchronize memory between GPU and CPU if needed (whence the...
  10. In your scenario, you can use...

    In your scenario, you can use clEnqueueReadBuffer() with blocking_read=true and ptr set to the host memory pointer.
    This will synchronize the (host) buffer with the GPU cache. You can then release...
  11. Replies
    2
    Views
    644

    You are using unnormalized integer coordinates...

    You are using unnormalized integer coordinates with read_imagef(), so your sampler should be

    const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
    ...
  12. I've checked on NVIDIA GPU, AMD GPU and Intel CPU...

    I've checked on NVIDIA GPU, AMD GPU and Intel CPU and your kernel is fine.

    How do you get the result from the device buffer on the host side?
  13. Try to cast plain to unsigned int instead of...

    Try to cast plain to unsigned int instead of unsigned char, such as:

    W[t] = ((unsigned int) plain[t * 4]) << 24;

    and so on...
  14. Replies
    6
    Views
    880

    The host buffer is not necessarily up-to-date...

    The host buffer is not necessarily up-to-date when your kernel ends because its content can be cached in device memory.

    You have to use clEnqueueMapBuffer / clEnqueueUnmapBuffer to ensure that the...
  15. Replies
    3
    Views
    1,163

    Check whether the extension is present in the...

    Check whether the extension is present in the string returned by clGetDeviceInfo() with CL_DEVICE_EXTENSIONS.
  16. Replies
    3
    Views
    1,163

    Are you sure that your device has support for the...

    Are you sure that your device has support for the cl_khr_3d_image_writes extension?

    Also use clGetProgramBuildInfo() with CL_PROGRAM_BUILD_LOG to get more info about the reason why the build...
  17. Your kernels could be optimized, but the most...

    Your kernels could be optimized, but the most important parameter when using a GPU is the local work size.

    NVIDIA GPUs for instance are optimized for a local work size of 128, so you should try...
  18. Replies
    7
    Views
    1,027

    CL_MEM_READ_WRITE flag will create a buffer in...

    CL_MEM_READ_WRITE flag will create a buffer in device memory. CL_MEM_HOST_NO_ACCESS is just an optional hint.
  19. Replies
    7
    Views
    1,027

    Just use clCreateBuffer() with CL_MEM_READ_WRITE...

    Just use clCreateBuffer() with CL_MEM_READ_WRITE flag. You can also add the hint flag CL_MEM_HOST_NO_ACCESS if your device has support for OpenCL 1.2.
  20. Replies
    5
    Views
    715

    Note that buffers use the endianness of the...

    Note that buffers use the endianness of the device, so a buffer should be read or written taking this into account.

    You can change this behavior with __attribute__((endian(host))) to declare that...
  21. Replies
    32
    Views
    16,466

    Sticky: Illegal cast in Appendix B - Portability

    The example at the bottom of page 363 in appendix B uses illegal casts:



    float4 v = vload4( 0, x );
    uint4 y = (uint4) v; // legal, portable
    ushort8 z = (ushort8) v; // legal, not portable

    ...
  22. Replies
    3
    Views
    863

    You have to install an OpenCL driver for a...

    You have to install an OpenCL driver for a supported device. Since you have an Intel CPU but no GPU, you should install the Intel OpenCL driver instead.
  23. Replies
    5
    Views
    715

    One of the job of the OpenCL runtime is to...

    One of the job of the OpenCL runtime is to marshal data between host and device transparently. Alignment and packing are defined in the OpenCL specification and are compatible with standard C usage...
  24. Replies
    7
    Views
    1,027

    "Recent GPU" probably means less than 10-year old...

    "Recent GPU" probably means less than 10-year old here...

    Gather means that the GPU can do random-access loads, while scatter means that the GPU can do random-access stores.

    It dates from the...
  25. You should read the section "3.1.1 Platform Mixed...

    You should read the section "3.1.1 Platform Mixed Version Support" in the OpenCL Specification.

    1. There are three kinds of version:

    - Platform version: this gives the version of the OpenCL API...
Results 1 to 25 of 116
Page 1 of 5 1 2 3 4