No improvement seen when moved to latest GPU which has more max_work_item_size
When i moved from a old GPU having 256 as max work item size to a new GPU having 512 as max work item size, there is no performance improvement seen.
Even the local work group size is changed from 8 to 16 as it allowed the local work group size of 16 in the new GPU. But even then the performance is same as the old one.
I wanted to know why there is no performacne improvement even after the local work group size is more in new GPU.