Search:

Type: Posts; User: cartographer

Search: Search took 0.00 seconds.

  1. Replies
    5
    Views
    558

    Yes, sumbb would require making a separate kernel...

    Yes, sumbb would require making a separate kernel and storing the results. It may or may not be worth it to do so. On the other hand, sumaa can be moved outside the loop, it does not depend on any...
  2. Replies
    5
    Views
    558

    Whoops, double post but I also just noticed, your...

    Whoops, double post but I also just noticed, your inner loop for calculating your sums:


    for(i=0;i<rSize;i++)
    {
    for(j=0;j<rSize;j++)
    {
    // sums
    }
    }
  3. Replies
    5
    Views
    558

    Looking at your code, two things jump out right...

    Looking at your code, two things jump out right away:


    sumaa and suma do not need to be in the loop at all, they only need to be calculated once per range block/work item (rCount times total),...
  4. Use code blocks next time to preserve indentation...

    Use code blocks next time to preserve indentation "(code=c) (/code)" with [ ] instead of ( ). Did you mean:


    *Curr_domain = *dMobj;

    instead of


    Curr_domain = dMobj;
  5. Depends on the GPU, for AMD yes, NVIDIA I'm not...

    Depends on the GPU, for AMD yes, NVIDIA I'm not sure. See: http://devgurus.amd.com/thread/160838 for running AMD cards (simple version is just run with sudo and it will see the cards correctly, you...
  6. Replies
    7
    Views
    822

    I'm tempted to say that 50% isn't terrible, all...

    I'm tempted to say that 50% isn't terrible, all things considered, but I guess theoretically your code could be twice as fast (if you were aiming to be compute-bound I guess?). I tried running a test...
  7. Replies
    7
    Views
    822

    I haven't had a chance to try it out yet, but you...

    I haven't had a chance to try it out yet, but you might try transposing gA and gB so that the reads from global memory are coalesced. That is to say, if NUM_SAMPLES was 8 and you had 8 threads, have...
  8. Replies
    7
    Views
    822

    To me it looks like this is essentially normal...

    To me it looks like this is essentially normal matrix multiplication, no? I hate to not actually give you any explicit help, but googling (or searching these forums) for "OpenCL local memory matrix...
  9. Replies
    1
    Views
    422

    I'm not sure what your question is exactly, do...

    I'm not sure what your question is exactly, do you have a concrete example?
  10. Replies
    2
    Views
    547

    To me it sounds like the problem is this: Julien...

    To me it sounds like the problem is this: Julien has a set of computations that need up to 1 MB of memory on the GPU for each work unit, and due to this size they are forced to use global memory. So...
Results 1 to 10 of 10