In a prior bug report someone requested some statement about the execution order of work-groups (order of starting, not necessarily finishing). The discussion seemed to end without agreement on stating any requirements on this ordering.
However, given the existance of NUMA architectures and caches, I think at least there should be a statement on the order that work-groups are assigned to compute units. Are they assigned randomly or in a round-robin fashion with possibly some chunksize of work-groups (e.g., assigning work-groups in contiguous pairs). Without any statement I don‘t see how we can control the load balancing; and a poorly implemented ICD could potentially starve all compute units except one.
Please take a stance and specify how work-groups must be mapped to compute units, and provide a means to query the chunksize if such a concept exists in the mapping scheme. Especially in the case where the number of work-groups is less than or equal to the number of compute units.