Search:

Type: Posts; User: Bilog

Page 1 of 2 1 2

Search: Search took 0.00 seconds.

  1. Your platform might have limitations on the local...

    Your platform might have limitations on the local work size in the second and third dimensions. You can check this by retrieving the CL_DEVICE_MAX_WORK_ITEM_SIZES property, which returns a list of...
  2. Replies
    18
    Views
    4,935

    Sticky: Another point that needs clarification (aside...

    Another point that needs clarification (aside from the meaning of CL_DEVICE_VENDOR_ID) is the behavior of sub-devices in terms of (pre-)existing contexts. I've opened a specific discussion about this...
  3. Ambiguity in the specification about sub-devices and contexts

    Hello all,

    what is the correct behavior in the cases of sub-devices created _after_ context creation?

    Let's say that I create a context C that only includes a single device, devA. I then...
  4. Replies
    18
    Views
    4,935

    Sticky: An additional point, concerning the available...

    An additional point, concerning the available device information:

    * as I mentioned, it would be better to have device info entry about the supported OpenCL C++ version; while currently there is...
  5. Replies
    18
    Views
    4,935

    Sticky: Absolutely agreed. An important case where the...

    Absolutely agreed. An important case where the high-leve feature exposed in OpenCL C (or C++) would be better replaced by lower-level functions is that of work-group and subgroup scans and...
  6. Replies
    18
    Views
    4,935

    Sticky: A few things I've noticed on the first read of...

    A few things I've noticed on the first read of the OpenCL C++ 1.0 draft:

    * a minor missing point is that there is no device property retrievable by `clGetDeviceInfo` about the supported OpenCL C++...
  7. According to the specification, the requirement...

    According to the specification, the requirement is that the kernel signature (number and type of arguments) should be the same for all devices for which the program was built. If you build different...
  8. The preferred wg size multiple is what the OpenCL...

    The preferred wg size multiple is what the OpenCL platforms thinks the local workgroup size should be a multiple of to achieve optimal performance. On NVIDIA GPUs, this is always returned as the warp...
  9. Replies
    32
    Views
    22,163

    work_group_prefixsum_{inclusive,exclusive}_{add,mi...

    work_group_prefixsum_{inclusive,exclusive}_{add,min,max} functions are not named correctly, since they are not necessarily additions. Is it too late to change them to...
  10. Replies
    3
    Views
    1,595

    -52 is CL_INVALID_KERNEL_ARGS, and indeed you are...

    -52 is CL_INVALID_KERNEL_ARGS, and indeed you are passing 4 args to a kernel that needs 5 of them.
  11. Replies
    8
    Views
    2,526

    You should probably report your problem to AMD...

    You should probably report your problem to AMD (they have a forum dedicated to OpenCL questions and issues over at their devgurus.amd.com site)
  12. Replies
    1
    Views
    1,129

    Nothing. Since OpenCL has separate sources for...

    Nothing. Since OpenCL has separate sources for the host and device parts, there is no need to qualify device functions.
  13. Replies
    1
    Views
    2,268

    In OpenCL all functions are automatically inlined.

    In OpenCL all functions are automatically inlined.
  14. Replies
    4
    Views
    2,008

    Re: get_global_id is undefined

    get_global_id() is a built-in of OpenCL C, so it is only defined inside of kernels. Are you trying to use it in host code? please post a minimal buildable example showing the problem.
  15. Replies
    3
    Views
    1,765

    Re: warp size vs # of SPs per SM

    On Fermi, each warp is physically executed as two half-warps; the 2.1 devices can effectively run 3 half-warps at once. (The thing is actually more complex, due to the device ability to issue more...
  16. Replies
    3
    Views
    2,097

    Re: running on GPU but not on CPU

    Are you using the Intel OpenCL SDK on an AMD CPU? In my experience, this combination doesn't work, while the reverse (AMD APP with Intel CPU) works.
  17. Replies
    2
    Views
    1,828

    Re: Copying c++ classes for use in open CL

    The OpenCL C programming language is based on C99, and therefore has no support for C++ features and types. In particular, this means you cannot pass C++ objects to OpenCL.

    For your specific...
  18. Replies
    4
    Views
    6,783

    Re: OpenCL struct alignment on host and device

    The problem is, how do you guarantee that the host and device compiler will introduce the additional padding in the same place? Note that the spec says that padding may be added. Compilers will then...
  19. Replies
    3
    Views
    5,712

    Re: OpenCL profiling tools for Linux

    The AMD APP includes a command-line profiler that works in Linux as well. It produces CSV files that you can then open and analyze by hand.
  20. Replies
    1
    Views
    3,052

    Re: Well defined ways of detecting product ID

    All the major OpenCL platforms expose the relevant information in the form of macros that you can test for in the kernel. Quality and amount of documentation of these macros varies. For example, AMD...
  21. Replies
    6
    Views
    6,851

    Re: using printf in the .cl file

    I doubt they'll ever think about doing that, unless a sizeable number of users complain about the lack of support and possibly threatens to switch to the competition.
  22. Replies
    1
    Views
    1,380

    Re: What is vstoren for?

    vload and vstore are used to load/store data in non-standard alignments, so they are in fact typically slower than standard read/writes. Typical usage is to read packed data from 3-component vectors...
  23. Replies
    6
    Views
    6,851

    Re: using printf in the .cl file

    No, NVIDIA does not expose a printf extension in OpenCL, even though its Fermi (and higher) cards actually support it. Complain to NVIDIA about it.
  24. Re: Regarding creating a very big array within an OpenCL Ker

    You can create an OpenCL buffer which is large enough to hold the data from all arrays from all workgroups (I assume you are launching the kernel with 4 workgroups and single work-item per workgroup,...
  25. Re: Making kernels aware of a large amount of buffers

    OpenCL buffers are quite different from CUDA malloc()ed global memory, in that there is no 1:1 mapping between an OpenCL buffer and a specific global memory area on the device (theoretically, the...
Results 1 to 25 of 42
Page 1 of 2 1 2