Search:

Type: Posts; User: Bilog

Page 1 of 2 1 2

Search: Search took 0.00 seconds.

  1. According to the specification, the requirement...

    According to the specification, the requirement is that the kernel signature (number and type of arguments) should be the same for all devices for which the program was built. If you build different...
  2. The preferred wg size multiple is what the OpenCL...

    The preferred wg size multiple is what the OpenCL platforms thinks the local workgroup size should be a multiple of to achieve optimal performance. On NVIDIA GPUs, this is always returned as the warp...
  3. Replies
    32
    Views
    15,433

    Sticky: work_group_prefixsum_{inclusive,exclusive}_{add,mi...

    work_group_prefixsum_{inclusive,exclusive}_{add,min,max} functions are not named correctly, since they are not necessarily additions. Is it too late to change them to...
  4. Replies
    3
    Views
    1,246

    -52 is CL_INVALID_KERNEL_ARGS, and indeed you are...

    -52 is CL_INVALID_KERNEL_ARGS, and indeed you are passing 4 args to a kernel that needs 5 of them.
  5. Replies
    8
    Views
    2,099

    You should probably report your problem to AMD...

    You should probably report your problem to AMD (they have a forum dedicated to OpenCL questions and issues over at their devgurus.amd.com site)
  6. Replies
    1
    Views
    901

    Nothing. Since OpenCL has separate sources for...

    Nothing. Since OpenCL has separate sources for the host and device parts, there is no need to qualify device functions.
  7. Replies
    1
    Views
    956

    In OpenCL all functions are automatically inlined.

    In OpenCL all functions are automatically inlined.
  8. Replies
    4
    Views
    1,664

    Re: get_global_id is undefined

    get_global_id() is a built-in of OpenCL C, so it is only defined inside of kernels. Are you trying to use it in host code? please post a minimal buildable example showing the problem.
  9. Replies
    3
    Views
    1,435

    Re: warp size vs # of SPs per SM

    On Fermi, each warp is physically executed as two half-warps; the 2.1 devices can effectively run 3 half-warps at once. (The thing is actually more complex, due to the device ability to issue more...
  10. Replies
    3
    Views
    1,779

    Re: running on GPU but not on CPU

    Are you using the Intel OpenCL SDK on an AMD CPU? In my experience, this combination doesn't work, while the reverse (AMD APP with Intel CPU) works.
  11. Replies
    2
    Views
    1,572

    Re: Copying c++ classes for use in open CL

    The OpenCL C programming language is based on C99, and therefore has no support for C++ features and types. In particular, this means you cannot pass C++ objects to OpenCL.

    For your specific...
  12. Replies
    4
    Views
    5,790

    Re: OpenCL struct alignment on host and device

    The problem is, how do you guarantee that the host and device compiler will introduce the additional padding in the same place? Note that the spec says that padding may be added. Compilers will then...
  13. Replies
    3
    Views
    4,621

    Re: OpenCL profiling tools for Linux

    The AMD APP includes a command-line profiler that works in Linux as well. It produces CSV files that you can then open and analyze by hand.
  14. Replies
    1
    Views
    2,729

    Re: Well defined ways of detecting product ID

    All the major OpenCL platforms expose the relevant information in the form of macros that you can test for in the kernel. Quality and amount of documentation of these macros varies. For example, AMD...
  15. Replies
    6
    Views
    5,273

    Re: using printf in the .cl file

    I doubt they'll ever think about doing that, unless a sizeable number of users complain about the lack of support and possibly threatens to switch to the competition.
  16. Replies
    1
    Views
    1,166

    Re: What is vstoren for?

    vload and vstore are used to load/store data in non-standard alignments, so they are in fact typically slower than standard read/writes. Typical usage is to read packed data from 3-component vectors...
  17. Replies
    6
    Views
    5,273

    Re: using printf in the .cl file

    No, NVIDIA does not expose a printf extension in OpenCL, even though its Fermi (and higher) cards actually support it. Complain to NVIDIA about it.
  18. Re: Regarding creating a very big array within an OpenCL Ker

    You can create an OpenCL buffer which is large enough to hold the data from all arrays from all workgroups (I assume you are launching the kernel with 4 workgroups and single work-item per workgroup,...
  19. Re: Making kernels aware of a large amount of buffers

    OpenCL buffers are quite different from CUDA malloc()ed global memory, in that there is no 1:1 mapping between an OpenCL buffer and a specific global memory area on the device (theoretically, the...
  20. Re: Difference between Cuda Core (Nvidia) and Stream Core(AT

    Yes. OpenCL compute units map to physical multiprocessors available on GPU devices, and the 560 has 7 multiprocessors. Since it's an NVIDIA GPU with CUDA capability 2.1, it has 48 processing elements...
  21. Replies
    7
    Views
    2,960

    Re: how do I invoke the OpenCL compiler

    That's just stuff that nvidia puts in its SDK examples, for code that is shared by all of them (things like parsing the command line and selecting a gpu). It has nothing to do with OpenCL.
  22. Replies
    5
    Views
    1,963

    Re: Check whether cl_mem object is valid

    Even though initialization to NULL is a way to work around this, the implementation should still not segfault on encountering an invalid cl_mem object. I suggest you do report it to the manufacturer,...
  23. Replies
    5
    Views
    1,963

    Re: Check whether cl_mem object is valid

    Querying properties on invalid mem objects should return safely with the CL_INVALID_MEM_OBJECT error. Your program crashing indicates a possible bug in the OpenCL implementation you are using.
  24. Re: Optimisation tips for fetch intensive kernel on ATI

    What kind of workgroup size and topology (2D shape) are you using? When reading images, it's typically much faster to have a square-ish shape (like 16x4) than a linear shape (64x1).
  25. Replies
    2
    Views
    1,327

    Re: Question about protecting ocl code

    Try to check if __OPENCL_VERSION__ is defined
Results 1 to 25 of 36
Page 1 of 2 1 2