Search:

Type: Posts; User: kunze

Page 1 of 2 1 2

Search: Search took 0.01 seconds.

  1. Replies
    5
    Views
    360

    I don't know about the AMD issue, but as far as...

    I don't know about the AMD issue, but as far as the Intel GPU goes, I have two thoughts:

    - On some platforms, OpenCL on the integrated GPU won't work if a discrete GPU is hosting the display. You...
  2. Replies
    5
    Views
    354

    A few more tips, in addition to cartographer's...

    A few more tips, in addition to cartographer's good advice:

    - I see you are using long and double datatypes in your arithmetic. These calculations will be much faster if you can get away with...
  3. I assume the performance figures you gave are not...

    I assume the performance figures you gave are not per-run, correct? They must be aggregated for a number of iterations, yes? Assuming that's true, is it necessary to create new buffers every time...
  4. Replies
    1
    Views
    421

    Piyush, A couple of comments: - It looks...

    Piyush,

    A couple of comments:

    - It looks like you are missing an ampersand in your clEnqueueNDRangeKernel call. The last parameter to that function is a pointer to a cl_event, not the cl_event...
  5. Another source of this kind of behavior could be...

    Another source of this kind of behavior could be dynamic frequency adjustment in the GPU. As it heats up, it could be reducing the GPU frequency. As it cools, it could raise the frequency again. ...
  6. One thing I notice is this: ...

    One thing I notice is this:

    ret=clSetKernelArg(kernel[i], 14, sizeof(float), &buffer_freq[i]);

    This should pass sizeof(cl_mem), not sizeof(float). Are the error codes being checked on this and...
  7. I assume from your code that each of those tasks...

    I assume from your code that each of those tasks is writing the same buffer, and that you're trying to get different results from each clEnqueueTask. If that's the case, then you need to interleave...
  8. Replies
    10
    Views
    1,046

    Also, I want to point out that on some systems,...

    Also, I want to point out that on some systems, especially those with integrated GPUs, CL_MEM_USE_HOST_PTR can be the faster alternative. In many cases, the implementation can avoid ever copying...
  9. Can you show us your clSetKernelArg call? Does...

    Can you show us your clSetKernelArg call? Does it look like this:

    cl_mem buf = clCreateBuffer....
    clSetKernelArg(kernel, 0, sizeof(cl_mem), &buf)

    Or something different?
  10. Replies
    6
    Views
    771

    I have seen some OpenCL implementations return...

    I have seen some OpenCL implementations return CL_INVALID_COMMAND_QUEUE after something catastrophic happens inside a kernel. Like perhaps an out-of-bounds access. You might try and comment out...
  11. The initial data transfer could be part of buffer...

    The initial data transfer could be part of buffer creation, depending on how you created your buffer. If you created it using the CL_MEM_USE_HOST_PTR flag, then on some architectures, you may not...
  12. Another thing to consider when using the...

    Another thing to consider when using the CL_MEM_USE_HOST_PTR flag is that some hardware may need for the alignment of the pointer and/or the size of the buffer to meet certain alignment requirements....
  13. Yes, I don't see a way to do this that is...

    Yes, I don't see a way to do this that is guaranteed by the spec. cl_platform.h defines CL_API_SUFFIX__VERSION_1_2 in the 1.2 version of the standard headers. That might be a more readable way to...
  14. This test is probably best done at run-time...

    This test is probably best done at run-time instead of at compile time. If you tried to do the test at compile time, someone could take your binary to another machine and get different results. Or...
  15. Replies
    2
    Views
    2,200

    I don't know exactly the issue you are running...

    I don't know exactly the issue you are running into, but I have some questions that might lead us there:

    - Do you know that OpenCL is officially supported on your device? Do you expect this to...
  16. David, There's a couple of problems with this...

    David,

    There's a couple of problems with this code that I see:

    - If temp is a local memory region, you probably never want to index it with get_global_id(0) or i in your code. Every work-group...
  17. Ramanarayan, It's hard to tell exactly what is...

    Ramanarayan,

    It's hard to tell exactly what is going on here, but I have two things to consider:

    - First, it could be that your IDE (Visual Studio, Eclipse, etc.) doesn't recognize OpenCL C...
  18. In most cases, your intuition is correct: ...

    In most cases, your intuition is correct: Calling the two kernels as functions from within a unified kernel is usually preferable. There are a few things to consider when doing this:

    - Sizes of...
  19. There's no single correct answer to your...

    There's no single correct answer to your question. But I can give you some things to consider:

    - OpenCL kernels are typically most effective when you launch hundreds of work items or more,...
  20. Replies
    1
    Views
    868

    Steve, The Device class derives from...

    Steve,

    The Device class derives from detail::Wrapper, which has a destructor that calls the appropriate release function.
  21. This is problematic in OpenCL C as well regular...

    This is problematic in OpenCL C as well regular C. The data contained in result_reg[] should be considered invalid when the function returns. So there's no legitimate purpose for returning a...
  22. Replies
    1
    Views
    953

    stephahn, Why do you need to subdivide the...

    stephahn,

    Why do you need to subdivide the actual data? Can you get away with just passing the entire image to each kernel, even though each kernel launch runs on a part of the image? Then, if...
  23. That error typically occurs when the global work...

    That error typically occurs when the global work size is not evenly divisible by the local work size. Your global work-size looks like it is 20, which is pretty small. But if you are querying...
  24. Replies
    1
    Views
    950

    Re: Private variables compiler optimization?

    Each OpenCL implementor implements their own compiler, including optimizations. So if you are observing missed opportunities for optmizations on a particular implementation, you should contact the...
  25. Replies
    1
    Views
    1,353

    Re: what is CL_INVALID_KERNEL ?

    Did you check for an error return from clCreateKernel? Perhaps the kernel was not successfully created.
Results 1 to 25 of 42
Page 1 of 2 1 2