Search:

Type: Posts; User: dweo

Search: Search took 0.00 seconds.

  1. Replies
    6
    Views
    3,640

    Re: Printf within a Kernel

    I use printf on my CPU when debugging my kernels. I also find assert(...) to be a useful function.

    Here are some useful macros that you can use in your kernel...


    #define DEBUG

    #ifdef DEBUG...
  2. Thread: Reduce, Map

    by dweo
    Replies
    0
    Views
    1,697

    Reduce, Map

    I have not seen this mentioned elsewhere.

    Having a function or mechanism with similar semantics to async_work_group_copy(...) but for reduce (very useful) and map (somewhat useful) would be quite...
  3. Re: Running kernel on GPU causing CPU to leak memory

    Thanks to everyone who has looked into this.

    I reduced the kernel to take only one __constant argument, and the memory leak persists. This thread suggests that there is a bug with the drivers:
    ...
  4. Running kernel on GPU causing CPU to leak memory

    Hi. I have a kernel which when run on the CPU works fine.

    However, when I run the kernel on a GPU my system profiler indicates about 1MB / sec of system memory (RAM) is leaking. This is still the...
  5. Replies
    7
    Views
    4,150

    Re: Atomic operations in OpenCL 1.0

    Thanks for the reply! Here are my available extensions:

    cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_APPLE_gl_sharing...
  6. Replies
    7
    Views
    4,150

    Re: Atomic operations in OpenCL 1.0

    I have the same problem with atomics but on a GT 120 running os x 10.6 snow leopard. Any fixes?
  7. Re: copy from global memory to local memory..problem

    Try changing
    event_t events = async_work_group_copy(&clientPixelData, sourcepos, cWidth,0);

    to
    event_t events = async_work_group_copy(clientPixelData, sourcepos, cWidth,0);
    or
    event_t...
  8. Replies
    3
    Views
    2,684

    Re: Code 10x faster on CPU device than GPU device

    To clarify. Without the code it takes less than a second. With the

    ca2[ar] = ...

    part it takes more like 10 seconds (on GPU only). It is just a local memory assignment, so the time increase is...
  9. Replies
    3
    Views
    2,684

    Code 10x faster on CPU device than GPU device

    Hi all,

    I'm using a 1D cellular automaton (CA) to generate random unsigned ints in OpenCL. Basically, the CA is an array of uchar. And we evolve the CA in iterations such that ca[i] depends on...
  10. Replies
    3
    Views
    2,108

    Re: atom_add with float

    I see that now. I assumed T was any native type (float/int/long/char..). Thanks for clearing this up!
  11. Replies
    3
    Views
    2,108

    atom_add with float

    According to the OpenCl spec there should be a function:
    T atom_add (Q T *p, T val)

    The following line works properly
    __local int z=1;
    int w = 5;
    atom_add(&z, w);

    But switching...
Results 1 to 11 of 11