Search:

Type: Posts; User: utnapishtim

Page 1 of 5 1 2 3 4

Search: Search took 0.00 seconds.

  1. Intel CPU and GPU share physical memory so...

    Intel CPU and GPU share physical memory so mapping a buffer is very efficient if the following conditions are fulfilled:

    - The buffer is created with CL_MEM_ALLOC_HOST_PTR, or with...
  2. To put it simply, a kernel should never access a...

    To put it simply, a kernel should never access a host memory buffer.

    In that case, the OpenCL implementation will either:

    - make a copy of the buffer between host and device before the kernel...
  3. Honestly, if your buffer is to be accessed by a...

    Honestly, if your buffer is to be accessed by a GPU kernel, you shouldn't use a host buffer and expect that all transfers will be magically optimized.
    If you need a buffer for your GPU kernel, then...
  4. Calling clEnqueueReadBuffer() with the pointer...

    Calling clEnqueueReadBuffer() with the pointer used to create the host-allocated buffer won't make any redundant copy but will only synchronize memory between GPU and CPU if needed (whence the...
  5. In your scenario, you can use...

    In your scenario, you can use clEnqueueReadBuffer() with blocking_read=true and ptr set to the host memory pointer.
    This will synchronize the (host) buffer with the GPU cache. You can then release...
  6. Replies
    2
    Views
    558

    You are using unnormalized integer coordinates...

    You are using unnormalized integer coordinates with read_imagef(), so your sampler should be

    const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
    ...
  7. I've checked on NVIDIA GPU, AMD GPU and Intel CPU...

    I've checked on NVIDIA GPU, AMD GPU and Intel CPU and your kernel is fine.

    How do you get the result from the device buffer on the host side?
  8. Try to cast plain to unsigned int instead of...

    Try to cast plain to unsigned int instead of unsigned char, such as:

    W[t] = ((unsigned int) plain[t * 4]) << 24;

    and so on...
  9. Replies
    6
    Views
    793

    The host buffer is not necessarily up-to-date...

    The host buffer is not necessarily up-to-date when your kernel ends because its content can be cached in device memory.

    You have to use clEnqueueMapBuffer / clEnqueueUnmapBuffer to ensure that the...
  10. Replies
    3
    Views
    889

    Check whether the extension is present in the...

    Check whether the extension is present in the string returned by clGetDeviceInfo() with CL_DEVICE_EXTENSIONS.
  11. Replies
    3
    Views
    889

    Are you sure that your device has support for the...

    Are you sure that your device has support for the cl_khr_3d_image_writes extension?

    Also use clGetProgramBuildInfo() with CL_PROGRAM_BUILD_LOG to get more info about the reason why the build...
  12. Your kernels could be optimized, but the most...

    Your kernels could be optimized, but the most important parameter when using a GPU is the local work size.

    NVIDIA GPUs for instance are optimized for a local work size of 128, so you should try...
  13. Replies
    7
    Views
    899

    CL_MEM_READ_WRITE flag will create a buffer in...

    CL_MEM_READ_WRITE flag will create a buffer in device memory. CL_MEM_HOST_NO_ACCESS is just an optional hint.
  14. Replies
    7
    Views
    899

    Just use clCreateBuffer() with CL_MEM_READ_WRITE...

    Just use clCreateBuffer() with CL_MEM_READ_WRITE flag. You can also add the hint flag CL_MEM_HOST_NO_ACCESS if your device has support for OpenCL 1.2.
  15. Replies
    5
    Views
    636

    Note that buffers use the endianness of the...

    Note that buffers use the endianness of the device, so a buffer should be read or written taking this into account.

    You can change this behavior with __attribute__((endian(host))) to declare that...
  16. Replies
    32
    Views
    15,428

    Sticky: Illegal cast in Appendix B - Portability

    The example at the bottom of page 363 in appendix B uses illegal casts:



    float4 v = vload4( 0, x );
    uint4 y = (uint4) v; // legal, portable
    ushort8 z = (ushort8) v; // legal, not portable

    ...
  17. Replies
    3
    Views
    765

    You have to install an OpenCL driver for a...

    You have to install an OpenCL driver for a supported device. Since you have an Intel CPU but no GPU, you should install the Intel OpenCL driver instead.
  18. Replies
    5
    Views
    636

    One of the job of the OpenCL runtime is to...

    One of the job of the OpenCL runtime is to marshal data between host and device transparently. Alignment and packing are defined in the OpenCL specification and are compatible with standard C usage...
  19. Replies
    7
    Views
    899

    "Recent GPU" probably means less than 10-year old...

    "Recent GPU" probably means less than 10-year old here...

    Gather means that the GPU can do random-access loads, while scatter means that the GPU can do random-access stores.

    It dates from the...
  20. You should read the section "3.1.1 Platform Mixed...

    You should read the section "3.1.1 Platform Mixed Version Support" in the OpenCL Specification.

    1. There are three kinds of version:

    - Platform version: this gives the version of the OpenCL API...
  21. Your kernel and the global and local ranges look...

    Your kernel and the global and local ranges look fine.

    What do you mean by "not working"? Does OpenCL return an error, or are the results numerically different from expected?
  22. Replies
    5
    Views
    636

    If you have only one instance of the struct to...

    If you have only one instance of the struct to send to the kernel, you can pass it by copy:

    __kernel void ker(..., struct Params test)
  23. If the buffer is allocated in device memory,...

    If the buffer is allocated in device memory, there is no such thing as a pointer to host memory.

    Even when the buffer is allocated in host memory, if it resides in pinned memory, a call to...
  24. The constructor of Buffer allocates memory....

    The constructor of Buffer allocates memory. enqueueMapBuffer doesn't.

    enqueueMapBuffer maps the buffer object into host memory. Calling it several times will not allocate several buffers.
    ...
  25. Replies
    10
    Views
    1,181

    In Java, a char is 2-byte long, so you should...

    In Java, a char is 2-byte long, so you should replace cl_char by cl_ushort in your Java code, and char by ushort in your kernel.

    Furthermore, the barrier is unnecessary in your kernel.
    ...
Results 1 to 25 of 111
Page 1 of 5 1 2 3 4