Search:

Type: Posts; User: Dithermaster

Page 1 of 7 1 2 3 4

Search: Search took 0.00 seconds.

  1. Replies
    8
    Views
    288

    That's simply not possible since the runtime...

    That's simply not possible since the runtime doesn't know what you're storing in the buffers. It could be any size data types in any combination. There is no way for it to know which bytes to swap...
  2. With that amount of overlapped reads (work items...

    With that amount of overlapped reads (work items re-reading the same memory other work items just read) this is a good candidate for workgroup shared local memory. Make those global memory reads just...
  3. Replies
    8
    Views
    288

    Likely true, but again, until such a mis-matched...

    Likely true, but again, until such a mis-matched implementation exists how would you test that you handled it correctly? Seems like a lot of extra work for an unlikely scenario.
  4. Replies
    8
    Views
    288

    For kernel arguments the implementation takes...

    For kernel arguments the implementation takes care of endianness. For data buffers (clBuffer) it is your responsibility. However, I haven't heard yet of an implementation where the device doesn't...
  5. Replies
    5
    Views
    354

    They are correct. You can't ship this binary and...

    They are correct. You can't ship this binary and expect it to work on all their hardware.

    What you want it SPIR, but nobody is shipping support yet since it is new.
  6. For what input value? Since doubles have 15 to 17...

    For what input value? Since doubles have 15 to 17 decimal digits of precision, your result is expected accuracy for a result was between (say) 0.1 and 1.0. A CPU might use 80 bit intermediate...
  7. Replies
    5
    Views
    354

    In the AMD sample code, it is declared thusly:...

    In the AMD sample code, it is declared thusly: "cl_context_properties cprops[5];". The type "cl_context_properties" is defined in the cl.h header.
  8. Replies
    1
    Views
    393

    In modern hardware, the case of all work items...

    In modern hardware, the case of all work items reading the same location (known as a "broadcast") does not cause a conflict and is therefore not serialized (and so fast).
  9. If you're using a modern C++ compiler with the...

    If you're using a modern C++ compiler with the standard library ("std"), you should use std::string instead of cl::string. I think they only put cl::string and cl::vector in there for older compilers.
  10. Replies
    2
    Views
    360

    to save folks time, I see this is already being...

    to save folks time, I see this is already being discussed elsewhere:

    http://stackoverflow.com/questions/25315102/opencl-compiler-error-c4996
  11. This is not valid in OpenCL 1.x; you can't have a...

    This is not valid in OpenCL 1.x; you can't have a buffer accessed from both CPU and device. You must use clEnqueueMapBuffer to get CPU access and clEnqueueUnmapMemObject to give access back to the...
  12. Try putting clFinish calls after every clEnqueue...

    Try putting clFinish calls after every clEnqueue call to narrow down the specific call that is causing a problem.
  13. There is an OpenCL 1.2 extension to create images...

    There is an OpenCL 1.2 extension to create images from buffers (cl_khr_image2d_from_buffer). However, images created this way perform slightly differently compared to regular images due to the linear...
  14. On any recent Mac OS X (10.6+) the runtime will...

    On any recent Mac OS X (10.6+) the runtime will always exist.

    On Windows you can use /DELAYLOAD and an alternate `clGetPlatformIDs` implementation that returns 0 for when the OS can't find...
  15. It's the second. __local is shared local...

    It's the second.

    __local is shared local memory, so it's a single piece of memory that every work item in a work group can access. It's essentially a programmer-managed cache, and frequent used...
  16. No, swap steps 1 and 2 so it becomes: 1. run...

    No, swap steps 1 and 2 so it becomes:

    1. run the kernel
    2. map the clbuffer
    3. use the pointer retuned from maping to go through the buffer and print each item
    4: Unmap the buffer

    Mapping...
  17. You don't use the contents after...

    You don't use the contents after clEnqueueUnmapMemObject.

    Instead of clEnqueueReadBuffer, do a clEnqueueMapBuffer (with blocking), use the pointer returned to access the buffer, then...
  18. For many applications, yes. You can certainly try...

    For many applications, yes. You can certainly try to write a function that calculates an optimal work group size, but it will be a challenge. Alternatively, you can benchmark all sizes on the user's...
  19. I'd love to be proven wrong, but in my opinion...

    I'd love to be proven wrong, but in my opinion and based on my experience, it's a black art.

    It varies by hardware vendor, and I've even seen where non-multiples of...
  20. Replies
    5
    Views
    653

    As I said, look it up in cl.h: #define...

    As I said, look it up in cl.h:

    #define CL_INVALID_VALUE -30

    Then look in the OpenCL specification for the API that is returning that error to see what it means.

    On...
  21. Replies
    5
    Views
    653

    It's always helpful to look at the error code you...

    It's always helpful to look at the error code you get back from OpenCL APIs. For example, what code to you get back from clGetDeviceIDs? Look it up in cl.h to get a clue as to what is happening.
  22. Replies
    4
    Views
    644

    OpenCL 1.x doesn't have a continuous data...

    OpenCL 1.x doesn't have a continuous data streaming mode so you'll need to chop up your data into blocks and upload them one by one, process them, and download results. On modern hardware you'll be...
  23. Replies
    6
    Views
    821

    > If you specify a work group size larger than...

    > If you specify a work group size larger than your hardware or kernel supports, the clEnqueueNDRange call should fail and return an error code.
    That would be nice but I don't think you can reliably...
  24. Replies
    7
    Views
    801

    Just an observation: The GlobalWorkSize is not an...

    Just an observation: The GlobalWorkSize is not an integer multiple of the WorkGroupSize (216 is not evenly divisible by 16). In OpenCL 1.x, if you specify the work group size then the global size...
  25. Replies
    2
    Views
    446

    Use a "buffer" in OpenCL global device memory.

    Use a "buffer" in OpenCL global device memory.
Results 1 to 25 of 167
Page 1 of 7 1 2 3 4