Search:

Type: Posts; User: homemade-jam

Page 1 of 2 1 2

Search: Search took 0.00 seconds.

  1. Seem to need to be device accessible...

    Seem to need to be device accessible (http://www.techpowerup.com/forums/showthread.php?t=156426).

    Have you tried nvidia?
  2. This might help you with some info as to how...

    This might help you with some info as to how threadsafe the OpenCL APIs are: http://www.khronos.org/message_boards/showthread.php/6788-Multiple-host-threads-with-single-command-queue-and-device .
  3. It will also depend on your host CPU. Some CPU's...

    It will also depend on your host CPU. Some CPU's (used?) to be able to do data transfer without going through the CPU. Pinned memory might let you bypass the GPU but there will have to be something...
  4. Not sure exactly why you're doing the mmap...

    Not sure exactly why you're doing the mmap yourself. Pinning the memory won't necessarily gain the performance you require. To get it working, just let the runtime allocate the memory for you - AMD...
  5. Just pass the pointer in as an arg

    Just pass the pointer in as an arg
  6. Re: clinfo on Ubuntu 11.04 only recognizing 1 of 3 ati 6950

    I've had problems in the past due to the fact that the AMD drivers require an X session to be running on the chosen device. It essentially meant we couldn't use the GPU in headless mode. I imagine...
  7. Replies
    10
    Views
    3,939

    Re: CL_DEVICE_TYPE_CPU

    Each type of device requires the relevant OpenCL SDK to be installed. For example, Intel CPUs require the Intel OpenCL SDK: http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/....
  8. Re: Low level behaviour of clEnqueueWriteBuffer() - spec vs

    What spec and driver version do you have?
  9. Re: Regarding creating a very big array within an OpenCL Ker

    Are you using clCreateBuffer()? As far I know, you can't use 2D arrays: you have to unroll them into a 1D since you can't have pointers to points in the kernels.
    ...
  10. Re: Intel CPU performs better with AMDAPP than Intel OpenCL

    Perhaps you could post your kernel to see how it compares across other implementations? I've seen plenty of disparity between the two SDKs but usually in the other direction...for example latency for...
  11. Re: A few technical questions about work-items and wavefront

    I think ideally, say you have x compute units (ie. streaming multiprocessors) then yes, you should have N/x however it isnt that simple. For example, they should form warps or wavefronts as mentioned...
  12. Replies
    5
    Views
    1,641

    Re: Summing up all elements of a buffer

    If they're global ints, you can use atomics right?

    http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/atomic_add.html
  13. Replies
    20
    Views
    3,965

    Re: How to synchronize iterations?

    Just speculation, but how are you organising your work?

    If you have different kernels are they being given their own workgroups? You want to reduce thread divergence as much as possible and make...
  14. Replies
    20
    Views
    3,965

    Re: How to synchronize iterations?

    Not especially. Just minimise the amount of data transfer you do at each iteration and do as few kernel calls as possible :wink:
  15. Replies
    20
    Views
    3,965

    Re: How to synchronize iterations?

    You should read the NVIDIA OpenCL programming guide and the OpenCL best practices from here http://developer.nvidia.com/nvidia-gpu-computing-documentation. There are many ways you can organise your...
  16. Replies
    20
    Views
    3,965

    Re: How to synchronize iterations?

    Yes it stays on the device and is persistent between kernel calls. You can do a clEnqueueReadBuffer when you want to get it off.
  17. Replies
    2
    Views
    1,716

    Re: OpenCL On Multiple Platforms

    Brilliant, thanks notzed. Thought it must have been something like that but didn't make sense to have multiple libOpenCL on my system (e.g. One in NVIDIA dir).

    Cheers.
  18. Replies
    2
    Views
    1,716

    OpenCL On Multiple Platforms

    I'm slightly confused as to how OpenCL works under the hood.
    When I do `locate libOpenCL.so` I get multiple files; presumably one for each SDK I have install (NVIDIA, Intel and AMD).

    When I link...
  19. Replies
    5
    Views
    1,513

    Mapping Data: remap to change?

    If I have a buffer that I then map to an array followed by an unmap which I then use in a kernel...if I want to change the values in the buffer do I have to remap it and copy them all back in? Do I...
  20. CL_INVALID_KERNEL_ARGS from clEnqueueReadBuffer

    I recently converted my code to be dependant on event queues rather than doing a clFlush() everytime. I therefore modified the relevant arguments. The following occurs with my NVIDIA SDK on my GTX...
  21. Re: Why can kernels take __local pointer arguments?

    Great thanks. I was clarifying since the docs I have read are very clear on what it can do but not on what it can't do, so just wanted to clarify.

    Thanks.
  22. Re: Why can kernels take __local pointer arguments?

    Ok thanks. Another further question. Is local memory intended therefore for memory used locally within the workitem or can it be accessed across the workgroup? ie. You pass in the pointer as an arg...
  23. Re: Why can kernels take __local pointer arguments?

    So, just to confirm. With local memory, you allocate with NULL on the host side and then set the kernel arg.
    This just allocates local memory right? You can't fill it or edit it from the host-side?...
  24. Replies
    5
    Views
    1,166

    Re: Memory Allocation and time collection

    Ok so I'm slightly confused now. Private memory is only accessible within a workitem; ie. you can't reach it from other workitems? Therefore, how does OpenCL know that it can be put into private...
  25. Replies
    5
    Views
    1,166

    Re: Memory Allocation and time collection

    I am interested in a similar thing also. I achieve significant speedup with the use of constant memory for a start. I am now looking at fissioning my task into data chunks of the size of the shared...
Results 1 to 25 of 30
Page 1 of 2 1 2