Hey,

I was using an Nvidia GTX 560 card and was able to determine which SM is running each workgroup by means of the following ptx code:

Code :
	uint smid;
	asm("mov.u32 %0, %%smid;" : "=r"(smid));

the problem, however, is that now I moved onto an AMD ATI radeon 7970 card and trying to get my code working on it. As I am implementing a global barrier using atomic operations on global memory to implement the barrier, the code worked well on the Nvidia card but it freezes on ATI, so my guess is that two or more workgroups might be scheduled on one CU and therefore lead to deadlock. I tried it with only 2 WGs but still no luck, so I need to make sure that each workgroup is scheduled on a different compute unit to eliminate that suspicion

FYI, the code for the barrier is as follows:

Code :
inline void barrierGlobalRamp(__global volatile int *synch, char *direction)
{
	mem_fence (CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);
	barrier(0);
	if (get_local_id(0) == 0)
	{
		char goOutFlag = 0;
		switch (*direction)
		{
			case BARRIER_INCREASE:
				atomic_inc(synch);
				while (!goOutFlag)	
				{
					if (*synch >= WORKGROUP_COUNT)
							goOutFlag = 1;
				}
				*direction = BARRIER_DECREASE;
				break;
			case BARRIER_DECREASE:
				atomic_dec(synch);
				while (!goOutFlag)
				{
					if (*synch <= 0)
							goOutFlag = 1;
				}
				*direction = BARRIER_INCREASE;
				break;
		}
	}
	barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);
	return;
}

any insight is much appreciated