Search:

Type: Posts; User: kunze

Page 1 of 3 1 2 3

Search: Search took 0.00 seconds.

  1. There's something I notice in your source code...

    There's something I notice in your source code that may produce poor performance. The idea is this: GPUs want to have address patterns that are "coalesced", which means that neighboring work items...
  2. This will be very implementation dependent, so it...

    This will be very implementation dependent, so it could be anything. But I know that some architectures will reduce the maximum allowed work group size based on the number of registers used in the...
  3. Replies
    1
    Views
    225

    Markx, A few comments here: 1: The "mapped"...

    Markx,

    A few comments here:

    1: The "mapped" parameter to your first clCreateBuffer is probably being ignored. The host_ptr parameter is only used if the flags contain CL_MEM_USE_HOST_PTR or...
  4. Replies
    3
    Views
    541

    The basic idea is that you want to make...

    The basic idea is that you want to make neighboring work items access neighboring data, if it's possible. Here's a good StackOverflow link that explains this:
    ...
  5. Replies
    3
    Views
    541

    Usually, the decision to use private or global is...

    Usually, the decision to use private or global is driven by how the memory is being used. Data written into private memory is only visible within a single work-item, and can only be written by the...
  6. I think you're on the right path here. Once you...

    I think you're on the right path here. Once you allocate the space for all of the nodes, using clSVMAlloc, then you need to manage the nodes in that block. You could do that a number of ways. If...
  7. Replies
    3
    Views
    538

    Matt, Lots of good questions here. * On...

    Matt,

    Lots of good questions here.

    * On the hardware I have, 2D image arrays performs equal to or better than 3D images. If you needed linear interpolation or clamping, then you would be...
  8. Sometimes, I see CL_OUT_OF_RESOURCE returned from...

    Sometimes, I see CL_OUT_OF_RESOURCE returned from blocking calls when a previously enqueued kernel has done an out-of-bounds access. So perhaps your kernel code has a bug? I know it's a strange...
  9. Replies
    2
    Views
    307

    While it is tempting to use OpenCL 1.x zero-copy...

    While it is tempting to use OpenCL 1.x zero-copy buffers (via the CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR flags) to share data between the host and the kernel while the kernel is running, OpenCL...
  10. Replies
    2
    Views
    1,109

    I think you should look at the error codes coming...

    I think you should look at the error codes coming out of your OpenCL library calls and make sure none of them are failing. If they are failing, perhaps the particular error code will give you some...
  11. Replies
    1
    Views
    2,724

    This comes from the C++ header using deprecated...

    This comes from the C++ header using deprecated APIs from previous versions of the OpenCL API. If you #define CL_USE_DEPRECATED_OPENCL_2_0_APIS before you include cl.hpp, the compiler should be...
  12. Replies
    1
    Views
    1,538

    I noticed something in your code. When the code...

    I noticed something in your code. When the code writes the MAX_ROW_IDX_BUFF and MAX_ROW_VAL_BUFF buffers, it uses total_workgroups*sizeof(PRECISION) as the amount of data to write. The data comes...
  13. If you call clEnqueueMapBuffer (with...

    If you call clEnqueueMapBuffer (with blocking==TRUE), then immediately call clEnqueueUnmapBuffer and clReleaseMemObject, that should leave you with valid data in system memory. Does this sequence...
  14. Again, the answer here would be architecture...

    Again, the answer here would be architecture dependent. But for the architecture I use, one memory access with four lanes trying to access the same bank is no worse than four memory accesses with no...
  15. Replies
    1
    Views
    546

    CL_DEVICE_MAX_WORK_ITEM_SIZES refers to the...

    CL_DEVICE_MAX_WORK_ITEM_SIZES refers to the number of work-items in work-groups, not the number of work-items of the complete NDRange. The NDRange size limit is the max value of size_t on the device.
  16. A compiler could theoretically tell that case 1...

    A compiler could theoretically tell that case 1 and case 2 are essentially the same. I have seen compilers do this in similar cases, but I can't speak for all compilers. As such, I typically prefer...
  17. The only trick will be dealing with deprecated...

    The only trick will be dealing with deprecated APIs. You can see those by looking at the end of the OpenCL 2.0 header file:

    http://www.khronos.org/registry/cl/api/2.0/cl.h

    You'll see that...
  18. OpenCV implements something like this and has...

    OpenCV implements something like this and has similar portability requirements. It may not do exactly what you want, but perhaps it's a start. Check this out:
    ...
  19. Replies
    5
    Views
    1,234

    I don't know about the AMD issue, but as far as...

    I don't know about the AMD issue, but as far as the Intel GPU goes, I have two thoughts:

    - On some platforms, OpenCL on the integrated GPU won't work if a discrete GPU is hosting the display. You...
  20. Replies
    5
    Views
    1,058

    A few more tips, in addition to cartographer's...

    A few more tips, in addition to cartographer's good advice:

    - I see you are using long and double datatypes in your arithmetic. These calculations will be much faster if you can get away with...
  21. I assume the performance figures you gave are not...

    I assume the performance figures you gave are not per-run, correct? They must be aggregated for a number of iterations, yes? Assuming that's true, is it necessary to create new buffers every time...
  22. Replies
    1
    Views
    830

    Piyush, A couple of comments: - It looks...

    Piyush,

    A couple of comments:

    - It looks like you are missing an ampersand in your clEnqueueNDRangeKernel call. The last parameter to that function is a pointer to a cl_event, not the cl_event...
  23. Another source of this kind of behavior could be...

    Another source of this kind of behavior could be dynamic frequency adjustment in the GPU. As it heats up, it could be reducing the GPU frequency. As it cools, it could raise the frequency again. ...
  24. One thing I notice is this: ...

    One thing I notice is this:

    ret=clSetKernelArg(kernel[i], 14, sizeof(float), &buffer_freq[i]);

    This should pass sizeof(cl_mem), not sizeof(float). Are the error codes being checked on this and...
  25. I assume from your code that each of those tasks...

    I assume from your code that each of those tasks is writing the same buffer, and that you're trying to get different results from each clEnqueueTask. If that's the case, then you need to interleave...
Results 1 to 25 of 60
Page 1 of 3 1 2 3