with OpenCL 1.1 it is possible to define an offset to your NDRange when launching a kernel. However, according to the spec (see 3.2) this offset is only affecting the global ID, but not the workgroup ID. In other words, your workgroup IDs will always start with 0, no matter what the offset is.
It was always my intuition that the following is true: get_global_id() / get_local_size() = get_group_id()
Therefore, when I first saw the offset parameter I thought that it also affects the workgroup IDs accordingly (assuming that the offset is a multiple of the workgroup size). But this is not the case.
As explained by Micah Villmov from AMD here, this may be because it is easier/faster to implement in hardware.
But I'm wondering how the offset can be used given that the workgroup IDs will ignore the offset. Imagine you want to partition a task between two GPUs, e.g. half of the work on GPU1 and the rest on GPU2. Now I would assume that setting the offset for GPU2 to half the problem size would do the trick. However, if a work-item uses get_group_id() to identify the part of work it's been assigned this wouldn't work.
Are there any other (non hardware-related) reason for this behaviour?