I've been working with OpenCL for the past year now, and I'm about to start my first multi-GPU design. I'm going to have some data that will be split up among 9 GPUs. I'm trying to decide if the easiest way to do this is by having 1 context with 9 command queues (1 for each GPU) or have 1 context for each GPU. All of the input data for each GPU will be unique to that GPU, with exception to a few constants. If anyone has any thoughts, I would love to hear them.