Search:

Type: Posts; User: OferRosenberg

Search: Search took 0.00 seconds.

  1. Replies
    1
    Views
    461

    Try protecting the call to enqueueWriteImage with...

    Try protecting the call to enqueueWriteImage with a mutex (OpenMP has a lock mechanism - see here: http://stackoverflow.com/questions/2396430/how-to-use-lock-in-openmp ) so that only one thread will...
  2. Looks like a weird (and wrong) compiler...

    Looks like a weird (and wrong) compiler optimization. The difference between the two kernels is that in the second kernel, the compiler has no visibility on the value, hence it can't perform this...
  3. Replies
    1
    Views
    473

    Support for Windows XP was dropped on SDK 2.6,...

    Support for Windows XP was dropped on SDK 2.6, and on Catalyst 12.2 - so you need to make sure that you have the right versions of both the SDK and the Driver.
    AMD did a complete redesign of their...
  4. Replies
    4
    Views
    663

    On PCIe 3.0, using pinned memry, it is possible...

    On PCIe 3.0, using pinned memry, it is possible to get to 12GB/Sec transfer rate to the GPU (NVIDIA) using pinned memory. Regular memory reduces it to around 5GB/Sec.
    Can you provide more details...
  5. You can't use a single context, as a context is...

    You can't use a single context, as a context is created only on devices which belong to the same platform - and you have two platforms. So you'll have to create two contexts, one on AMD platform...
  6. NVIDIA people really likes to re-define...

    NVIDIA people really likes to re-define terminology and create a confusion / marketing buzz ...
    Here's my explanation, hope it helps.
    GTX260 is based on GT200. Anandtech also good article on it...
  7. Replies
    2
    Views
    552

    On GPUs, the thread allocation is not...

    On GPUs, the thread allocation is not deterministic, and very much depends on runtime scheduling. Even the first workgroup location is unknown - if the GPU scheduler is advanced (such as NV & AMD),...
  8. It is very unlikely that HD6670 will be able to...

    It is very unlikely that HD6670 will be able to support OpenCL 2.0. The HD6670 is based on Northern Island architecture, and as such missing a lot of the HW capabilities to support the minimal...
  9. Replies
    2
    Views
    463

    This sounds like a very interesting library /...

    This sounds like a very interesting library / pattern to implement.
    As Dithermaster said, you need to use a buffer, especially if you're planning it to be large. There are a few explanations and...
  10. Currently, there is no OpenCL 2.0 implementation...

    Currently, there is no OpenCL 2.0 implementation available from any vendor - the maximal is OpenCL 1.2 (Intel, AMD, Apple, Imagination, ...)
    Some vendors might have internal versions which are under...
  11. Replies
    1
    Views
    558

    Hi Zvika, Geforce 9400 GT is compute...

    Hi Zvika,

    Geforce 9400 GT is compute capability 1.0 (see here: https://developer.nvidia.com/cuda-gpus)

    Look at CUDA programming guide, Appendix G.3, for explanation on Compute Capability 1.x...
  12. Hi Sajjadul, As far as I understand, the...

    Hi Sajjadul,

    As far as I understand, the comment refers to the differences between memory allocation concepts between OpenCL and CUDA.

    In CUDA, cudaMalloc API call returns a pointer. This...
  13. Replies
    3
    Views
    1,285

    You didn't mention which implementation you're...

    You didn't mention which implementation you're using (AMD, Intel or NVIDIA).
    Try using CL_USE_HOST_PTR with a buffer allocated by the application - and have this buffer pinned/locked before the map...
  14. Hi, Few things to check: 1. Check the...

    Hi,

    Few things to check:
    1. Check the alignment of the host allocated buffer. Appendix C.3 provides the aligment rules.
    2. Note that when using CL_MEM_USE_HOST_PTR, implementations may cache...
  15. Extending clint's answer a little: The type of...

    Extending clint's answer a little:

    The type of image created is a single color per location - only R. (if you wish to work with RGBA, you need to modify the format). As such:
    1. The buffer that...
  16. In most examples of N-body that I'm familiar...

    In most examples of N-body that I'm familiar with, the usage of vector data type is somewhat reversed compared to your code - each particle is a float4 (or float3), and the kernel code has a "for"...
  17. Replies
    1
    Views
    1,192

    Maybe it fails because you have two platforms...

    Maybe it fails because you have two platforms installed (Intel and AMD). Your code takes the first platform returned by clGetPlatformID, and tries to get a GPU device. If the first platform on the...
  18. The difference is that you don't need to enable...

    The difference is that you don't need to enable the extension via the compiler directive.

    Accordying to the spec, Section 9.1, if a developer wants to use an optional extension in his program, he...
  19. I did a presentation on that 3Y ago at SIGGRAPH...

    I did a presentation on that 3Y ago at SIGGRAPH 2010.
    Google for "Ofer Rosenberg SIGGRAPH" (or Bing. or Yahoo. choose your favorite...)
  20. Radeon HD6750 is VLIW5 architecture. Look at...

    Radeon HD6750 is VLIW5 architecture. Look at wikipedia or search the web for it (I tried to add a link to anandtech, but the forum system blocked me...)

    Basically, a workitem is executed on one SC...
Results 1 to 20 of 20