Hello,

I have an OpenCL code, multi-threaded, each thread using GPUs:
it loops over:
- get data
- spawn : each thread (up to 16) treats data, running 2 subtasks on GPUs or Xeon Phi depending on...