Search:

Type: Posts; User: andrew.brownsword

Page 1 of 3 1 2 3

Search: Search took 0.00 seconds.

  1. Re: Memory Problem when trying to speed up Kernel

    Have you checked all your error and other return values? How big is the buffer you're asking for? Is it possible you've previously leaked memory? Does this happen with another vendor's driver or...
  2. Re: Memory Problem when trying to speed up Kernel

    And I assume the first parameter should be pSrcPos as well?

    Its a bit challenging to figure out what your problem is when you aren't posting the code you're actually running...
  3. Re: Memory Problem when trying to speed up Kernel

    The write is the only place I see where you multiply by 3.
  4. Replies
    4
    Views
    1,444

    Re: async_work_group_strided_copy

    Sorry David, but I have to quibble. The async_work_group_strided_copy is not especially useful for an AoS <-> SoA transformation. If it were to be useful for the latter it would take this:


    ...
  5. Re: Need a way to calculate theoretical FLOPS of a device

    Doing this in a meaningful way is very complex and highly subject to the exact nature of your algorithm(s). The best way to evaluate this turns out to be for an application to simply try it's...
  6. Re: setKernelArg as size in bytes and NULL for __global poin

    Because global memory is a shared resource that persists beyond the scope of a kernel. It is better to let the application allocate and reused it's allocated buffers as it knows best. Local memory...
  7. Replies
    5
    Views
    4,496

    Re: "clDevicePointer" function needed!

    Why not use indices instead of pointers? This way they are independent of buffer location, devices, address spaces, etc. The same index could be applied to multiple buffers. Index size can be...
  8. Replies
    3
    Views
    1,862

    Re: OpenCL for Real-Time environments

    I am also interested in problems in this domain. What do you see as needing to change in the spec to enable RT in CL? If no spec changes are required, what do you see as needing to change in...
  9. Re: How do I check OpenCL is OK? Mac/Windows

    On SnowLeopard, OpenCL is always present and you can just start calling it using the default cl_platform. Under Windows your app should either have a hard dependency on the ICD dll (if you can't run...
  10. Re: Image object support on MAC OS with ATI Radeon HD 5750 ?

    Unfortunately Apple is generally not very forthcoming with information about their roadmaps, so its hard to know when things will be fixed and updated. You are correct about the lack of image...
  11. Replies
    5
    Views
    1,819

    Re: Images without Image support?

    You can certainly do this, and it will work although performance will not be as good as if you were using the GPU's texture sampler units. You need to compute each pixel's linear address from the...
  12. Re: Traversing a Tree using the root pointer

    You really don't want to use pointers. Not only are they potentially different sizes between devices (and host), but they are also potentially different sizes between address spaces (global, local,...
  13. Replies
    15
    Views
    3,179

    Re: calculation of a float value

    Currently you have no option. You have to create two contexts in that case. If you had an AMD GPU and an AMD CPU then you could have both in one context.
    [/quote]

    FWIW, I believe the AMD OpenCL...
  14. Replies
    18
    Views
    5,742

    Re: Matrix Multiplication

    You could also try using the CPU device to see how that performs.
  15. Re: OpenCL Image Rotate/Scale/Translate, Affine Transform, .

    The hardware math acceleration comes in the form of SIMD vector operations which are exposed as the vector types in OpenCL C (e.g. float4) and many built-in math functions and operators on those. ...
  16. Replies
    3
    Views
    1,591

    Re: Init Buffer Problem

    How is globalWorkSizeInit declared and initialized?

    The 1D case should work fine and is a little simpler.
  17. Re: Undefined reference errors with image2d functions

    Given those error messages I'm inclined to think that the problem is in your host program, not in your kernel. They look more like linker errors than compiler errors. Perhaps you are trying to use...
  18. Re: clGetKernelWorkGroupInfo does not return correct local m

    Sounds like a bug in the implementation, I would report it to your vendor.

    Before checking the spec, I didn't realize that CL_KERNEL_LOCAL_MEM_SIZE was supposed to include the dynamically set arg...
  19. Re: Multi-GPU System, multiple contexts or command queues?

    I too would expect that a single context with multiple devices would be preferable. In addition to being able to synchronize between them, they could also then share buffers. My wild-assed guess...
  20. Re: Communication between OpenCL and CUDA

    I don't know exactly what your requirements are, but I would suggest an all OpenCL application that compiles from source (and perhaps caches compiled binaries), or if you can't ship source then...
  21. Re: Communication between OpenCL and CUDA

    The ICD is supposed to enable applications to use all vendor implementations. Your application links against the ICD DLL and uses the clGetPlatformIDs to find all the installed implementations, and...
  22. Re: Globally visible buffers or direct memory access?

    I'm not sure exactly what you are doing here, but are you speculating or have you tried it and measured the performance? The performance cost of adding kernel arguments is going to be insignificant...
  23. Replies
    2
    Views
    1,131

    Re: const variable / memory latency

    On some devices, for some sizes of data using __constant int* in_array might be an improvement.
  24. Re: OpenCL Intel x86_64 code generation

    You should try and look at the generated assembly code for each version you have -- it is usually quite obvious when SSE is being used (look for the XMM registers). OpenMP in simple cases, depending...
  25. Re: How to use C++ templates in kernel?

    It will depend on how templates are being used. You may be able to replace some simple uses with preprocessor #defines. Long ago there used to be a tool called CFront that was the C++ compiler and...
Results 1 to 25 of 62
Page 1 of 3 1 2 3