PDA

View Full Version : Question in local synchronization!



Atmapuri
05-18-2011, 02:22 AM
Hi!

I am looking at this kernel I found (and scratching my head):


__kernel void sqrt_sum(
__global float * vec1,
__global float * result)
{
for (unsigned int stride = get_global_size(0)/2; stride > 0; stride /= 2)
{
if (get_global_id(0) < stride)
vec1[get_global_id(0)] += vec1[get_global_id(0)+stride];
barrier(CLK_GLOBAL_MEM_FENCE);
}

if (get_global_id(0) == 0)
*result = sqrt(vec1[0]);
}

I understand the for-loop. The problem is the synchronization. The barrier is defined to
work only within the same work_group: "All the work-items of a work-group must execute the barrier before any are allowed to continue execution beyond the barrier."

This implies that work_group size for this kernel to work, must be equal to get_global_size(0) to make sure that only one work group is launched (running on the same compute unit)?

Maximum work_group_size is limited with CL_KERNEL_WORK_GROUP_SIZE to typically 512. This means that all dimensions x * y * z across all work groups cannot exceed 512, but it is possible to have x = 512, y = 1, z =1.

This kernel thus works only for get_global_size(0) of less than 512 (for AMD) with condition that only one work_group of equal size as get_global_size is specified when clEnqueNDRangeKernel is called?

get_global_size(0) == get_local_size(0) //??

Thanks!
Atmapuri

david.garcia
05-18-2011, 04:57 AM
I think your analysis is correct. Where did you get this kernel?

Atmapuri
05-19-2011, 03:53 AM
Thanks. The kernel is from ViennaCL library.