Results 1 to 9 of 9

Thread: how to implement serial calculation in kernel code?

  1. #1
    Junior Member
    Join Date
    Feb 2010
    Posts
    12

    how to implement serial calculation in kernel code?

    The following piece of code is part of my kernel code for my calculation, because other part code are quite independent parallel that can be executed on each work item (no data synchronization needed), but this part looks like a serial one (the i th output needs the output the i-1 th updated value), so I think that I can make one work item do it, and other work item just do nothing when it comes to this step. So i wrote this , supposing I use work item 0 to finish the computation

    //tid is the thread local id, tB and m are all pointer to local memory
    //basically I need to derive array m from array tB, one element of m is derived on each step of the first loop. The value of m Is correct when I execute the kernel on CPU, but wrong on GPU. Is it because the synchronizing goes wrong on gpu? Or do you have suggestions to make it work right on gpu? Thank you so much!

    barrier(CLK_LOCAL_MEM_FENCE);
    if(tid==0)
    {

    for (i=0; i<34; i++)
    {
    m[i]= tB[i];

    for(j = i+1; j < 34; j++)
    {


    tB[j]=mod_subtract(tB[j],tB[i],baseB[j]);

    tB[j]=mod_mul(tB[j],Bm[33*i+j-1-i*(i+1)/2],baseB[j]);


    }
    }
    }
    barrier(CLK_LOCAL_MEM_FENCE);
    //then i read value m back to host code and check the values

  2. #2
    Junior Member
    Join Date
    Dec 2009
    Posts
    22

    Re: how to implement serial calculation in kernel code?

    then why not just use cpu to do the work?

  3. #3
    Junior Member
    Join Date
    Feb 2010
    Posts
    12

    Re: how to implement serial calculation in kernel code?

    Quote Originally Posted by deNorma
    then why not just use cpu to do the work?
    hi, thanks for your reply. but i have two pieces of such code in my kernel, if i do it on the cpu, then i would need to break the kernel into 3 kernel codes? and pass the value back and forth betwee the cpu and gpu five times. that doesn't sound efficient. do you know whether it is eligible to make one thread do this work, or other way to write this part of code? what is strange is that code i wrote like this output correct result when running on cpu, but wrong on gpu, i don't understand why.....

  4. #4
    Junior Member
    Join Date
    Dec 2009
    Posts
    22

    Re: how to implement serial calculation in kernel code?

    when you do this serial calculation, does every one has to wait the serial result to proceed?

    of course you can use one thread to calculate. and if the result is different, it just means your gpu code is not correct.

  5. #5
    Junior Member
    Join Date
    Feb 2010
    Posts
    12

    Re: how to implement serial calculation in kernel code?

    Quote Originally Posted by deNorma
    when you do this serial calculation, does every one has to wait the serial result to proceed?

    of course you can use one thread to calculate. and if the result is different, it just means your gpu code is not correct.
    yes, the following steps in each thread need to wait for the serial result to proceed. does putting barrier(CLK_GLOBAL_MEM_FENCE); before and after this piece of code enough to synchronize all other threads with the this thread?

  6. #6
    Junior Member
    Join Date
    Dec 2009
    Posts
    22

    Re: how to implement serial calculation in kernel code?

    that's the synchronization within a workgroup/block.

    but if you need to do it on multiple workgroups, then that is not right. for synchronization among blocks I will return the control to cpu. i.e. wait the calculation kernel finish for all workgroups

  7. #7
    Junior Member
    Join Date
    Feb 2010
    Posts
    12

    Re: how to implement serial calculation in kernel code?

    Quote Originally Posted by deNorma
    that's the synchronization within a workgroup/block.

    but if you need to do it on multiple workgroups, then that is not right. for synchronization among blocks I will return the control to cpu. i.e. wait the calculation kernel finish for all workgroups
    i think i have set the work items to be in the same workgroup..........

  8. #8
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: how to implement serial calculation in kernel code?

    If you're only using one work-group you will get only a tiny (1/4 to 1/48th) of the total GPU performance.

    If you need to do this sort of synchronization across all work-items you have to wait for the kernel to finish. If the cost of doing the data transfer to the CPU is too high to do it that way, then you have two options:
    1) wait for the first kernel to finish and then run a second kernel which just does the serial part using a global size of 1
    or
    2) figure out another algorithm.

    #2 is almost certainly faster, but may be difficult or impossible.

  9. #9

    Re: how to implement serial calculation in kernel code?

    You also have to watch out that your workload is not too big and a thread doesnt "hang" too long. In my experience, if I have a kernel hang two long, then too things happen:

    The OS stops drawing

    It may BlueScreen.

    I was running a for loop in which each subsequent cell ran longer than the last, and it glitched out before the execution was done.

Similar Threads

  1. Replies: 1
    Last Post: 12-06-2010, 10:39 AM
  2. Serial to Parallel question
    By clamport in forum OpenCL
    Replies: 3
    Last Post: 02-23-2010, 01:22 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •