I have a bit of a beginners question on how to set the number of work items correctly for my program.
My program does some calculations on a grid, and my parallel calculation runs column wise. I need all the calculations on a column to finish before I start the next, so I want a single work group with a barrier, and every work item then processes a single row, with a barrier after it finishes working on each column. I cannot use more than one work group as each cell depends on all values in the previously calculated columns in the grid.
I was then planning on using the global_id (get_global_id(0)) to basically set the row number for each work item, but if my program has a very large number of rows (e.g. 250k ,or 2.5m) I'm not sure that I can do this.
Can I use the global_id in this way, and if so how would I call clEnqueueNDRangeKernel?
Does my program structure sound sensible?