Search:

Type: Posts; User: openclnewb

Search: Search took 0.00 seconds.

  1. Replies
    4
    Views
    1,920

    Re: using local memory

    Yes, I see how I goofed ...

    I was setting the kernel argument to ((16*1024)/256) because there are 256 threads per work group, and I thought the argument should be the amount of local memory _per...
  2. Replies
    4
    Views
    1,920

    using local memory

    I've got a simple test kernel that writes into local memory, and then copies the data to an output buffer in global memory:



    __kernel void foo( __global float *debug_data, __local float...
  3. Replies
    2
    Views
    1,967

    Re: using the prefetch command

    OK, good to know. It does seem that the async copy does the copy in a coalesced manner -- that's what the profiler says -- but since I have a wait right after it, I've been using it as if it was a...
  4. Replies
    2
    Views
    1,967

    using the prefetch command

    Greetings,

    I've been trying to use prefetch to improve my performance, but haven't seen any impact one way or another. I wonder if I'm using the command the correct way. I haven't been able to...
  5. Replies
    3
    Views
    2,823

    Re: global memory coalescing question

    Hmmm, if I change the code to:



    float test;
    for ( int i = 0 ; i < 1024; i++ )
    {
    barrier( CLK_GLOBAL_MEM_FENCE);
    float f = *(input_data + get_local_id(0)); // indexing off tid instead of...
  6. Replies
    3
    Views
    2,823

    Re: global memory coalescing question

    Whooops: I forgot to add, I'm running this code on a compute capability 1.1 board.
  7. Replies
    3
    Views
    2,823

    global memory coalescing question

    Hi,

    If I run this test kernel, where input_data and output_data are pointers to global floats:



    float test;
    for ( int i = 0 ; i < 1024; i++ )
    {
    barrier( CLK_GLOBAL_MEM_FENCE);
  8. register to global memory performance mystery

    Hi,

    In my code, I have an private array:



    __private float foo[10][2];


    To make sure it stays in the registers and out of high-latency local memory, I use array offsets that are computed...
  9. Replies
    2
    Views
    1,745

    Re: images and memory access optimization

    DBS2,

    I appreciate the instructive reply. Your explanation makes perfect sense.
  10. Replies
    2
    Views
    1,745

    images and memory access optimization

    Hi,

    I have a newbie conceptual question about memory optimization when working with image data types.

    I've read through a number of tutorials about the importance of coalescing global memory...
Results 1 to 10 of 14