PDA

View Full Version : Devices and command queues



absence
01-17-2010, 03:34 AM
Some examples I've seen simply use the first device returned by clGetContextInfo with CL_CONTEXT_DEVICES. This is obviously fine for single-GPU systems, but what happens on multi-GPU systems? Will all but one GPU sit idle, or does OpenCL spread the load to all devices even if there aren't any command queues created for them? What is the right way to make sure a program scales well from single to multiple GPUs (devices)?

jjs
01-17-2010, 01:56 PM
You've got it right, I believe. Only devices for which you've created command queues and assigned some work will be used; the runtime doesn't do any sort of automatic load balancing.

As to the "right way" to ensure scalability, I can only offer the truly sucky "It depends on your problem." Keep in mind that you'll have to either copy all the data for your problem to each card, or split the data up on the CPU based on the number of cards you're working with. Combining results is your job as well.

absence
01-17-2010, 02:33 PM
Thanks, that cleared things up a bit. :)

dschwen
01-17-2010, 09:44 PM
And how do multi GPU cards like some high end GeForce 2xx into play? Do the GPUs share the memory (I suppose they should). Do they appear as different devices, or as one compute device?

jjs
01-17-2010, 11:24 PM
I think that these cards (like the GTX295) just show up as two devices with two separate memories. So from the perspective of the OpenCL runtime, it's no different than just chucking two physically separate cards in the machine.