I currently have a GPU program with the following work-flow:
1. Upload data
2. Process uploaded data into something more useful
3. Do more gpu intensive stuff
This is all done within a single command queue on a single GPU. I occurred to me that in theory, nothing prevents me from doing steps 1 and 3 in parallel, as the calculation intensive stuff only requires the processed data from step 2, and I can already load the next slice of data from disk and upload it while step 3 finishes.
So much for theory. How would I best go about this in practice?