Search:

Type: Posts; User: Dithermaster

Page 1 of 8 1 2 3 4

Search: Search took 0.01 seconds.

  1. I've seen 1024,1,1 only for the Apple CPU device,...

    I've seen 1024,1,1 only for the Apple CPU device, so I agree with your guess that it was that device. Switch to the GPU device for better dimensions.
  2. Replies
    1
    Views
    213

    Because the runtime may choose to run some...

    Because the runtime may choose to run some workgroups to completion before starting others (when the number of workgroups far exceeds the hardware capabilties) there are therefore no global...
  3. On Windows, OpenCL.dll _is_ the ICD, but you...

    On Windows, OpenCL.dll _is_ the ICD, but you still don't want to ship it. It varies by version, for one thing (what if you ship a version 1.2 one, but the vendor driver updated the system ICD...
  4. You do NOT want to ship this DLL with your...

    You do NOT want to ship this DLL with your project. The one installed on the system is the one you want to use. What problem are you trying to solve?
  5. Replies
    2
    Views
    505

    clFlush can certainly block the CPU; it won't...

    clFlush can certainly block the CPU; it won't return until the command queue has completely been flushed to the hardware, and if the hardware queue is full, the CPU will block.

    Except for CL/GL...
  6. I answered on SO (before I saw this).

    I answered on SO (before I saw this).
  7. That kernel looks like it was code-generated, not...

    That kernel looks like it was code-generated, not hand coded. In any case, one source of slowdown is that each work item reads 16 doubles from global memory. While they can be broadcast within each...
  8. The API is designed to be async -- all of the...

    The API is designed to be async -- all of the clEnqueue calls are designed to return quickly. The OpenCL driver uses a separate thread to push work to the GPU. So once you've queued up work to the...
  9. If you have an OpenCL driver for CPU installed...

    If you have an OpenCL driver for CPU installed then CL_DEVICE_TYPE_CPU devices appear, so yes, it is a useful flag to have.

    You might, for example, try for a GPU device, and only if one is not...
  10. Replies
    1
    Views
    624

    clBuildProgram is _required_ regardless of...

    clBuildProgram is _required_ regardless of whether you created the program using clCreateProgramWithSource or clCreateProjectWithBinary. It will be faster with binary sources.
  11. Replies
    3
    Views
    1,845

    OpenCL C is based on C99, so if it is ill-defined...

    OpenCL C is based on C99, so if it is ill-defined in C99, it's ill-defined in OpenCL C.
  12. Replies
    3
    Views
    1,845

    No such limitation. You can do multiple reads and...

    No such limitation. You can do multiple reads and writes to global memory from within a kernel. You should go back and ask your past self what they meant in the comment.
  13. My cursory understanding is that it's up to the...

    My cursory understanding is that it's up to the vendor's driver and how it's implemented. From what I'm reading above, AMD's driver support it. I think NVIDIA Tesla cards run in the non-graphics mode...
  14. > 256 is the work group size and 700 is the...

    > 256 is the work group size and 700 is the global size so it is evenly divisible.
    Um, no it's not. 256 goes into 768 but not 700.
    The common solution is to "round up" the global size to be an...
  15. You have old knowledge. Intel and AMD are both...

    You have old knowledge. Intel and AMD are both shipping OpenCL 2.0 drivers.

    Intel: https://software.intel.com/en-us/articles/opencl-drivers (2014 r2 is OpenCL 2.0)

    AMD:...
  16. Does the device report slightly less local memory...

    Does the device report slightly less local memory for CL_DEVICE_LOCAL_MEM_SIZE when you're running the r340.xx driver? It had better!

    I did notice a while back that some older NVIDIA OpenCL 1.0...
  17. OpenCL 2.0 adds support for images with the...

    OpenCL 2.0 adds support for images with the read_write qualifier. It is not possible in OpenCL 1.x, you'll need to use two different images. Note: It might just be faster that way anyway.
  18. OpenCL 1.x supports 2D and 3D images and OpenCL...

    OpenCL 1.x supports 2D and 3D images and OpenCL 1.2 adds 1D images, and clEnqueueNDRangeKernel supports 1D, 2D, and 3D workgroups. Of course all of these ultimately map to linear memory, so it's just...
  19. Do you have any constants with a decimal point...

    Do you have any constants with a decimal point and no "f" on the end? Those are doubles, and anything they do math with will get promoted to a double.
  20. Replies
    3
    Views
    1,410

    FPGA would be more applicable to a vertical...

    FPGA would be more applicable to a vertical market solution (where FPGAs typically have). OpenCL is now an alternative programming environment that may be more productive than learning other FPGA...
  21. Sounds like a driver bug.

    Sounds like a driver bug.
  22. Replies
    2
    Views
    694

    > I assume that inside a kernel I should bundle a...

    > I assume that inside a kernel I should bundle a set of such answers into a reasonably-sized scalar type, say ushort.
    Yes, you should do that. Bit set/clear across work items would not be...
  23. OpenCL and clEnqueueNDRangeKernel are all about...

    OpenCL and clEnqueueNDRangeKernel are all about parallel execution.

    The global work size is "how many work items do I want to compute?"

    Then, inside your parallel kernel execution,...
  24. Short answer: It will improve things. Long...

    Short answer: It will improve things.

    Long answer: Of course for big data bandwidth-limited problems where compute has been overlapped with transfer but the transfer is very large it will be a...
  25. I agree. This is a parallel reduction problem....

    I agree. This is a parallel reduction problem. OpenCL 2 even has operators for reduction, but you can write it in OpenCL 1.x yourself. I'd write the values to global memory and then run the reduction...
Results 1 to 25 of 193
Page 1 of 8 1 2 3 4