PDA

View Full Version : OpenCL on distributed systems.



LucasCampos
06-01-2011, 02:05 AM
Can OpenCL be used on distributed systems without the need of MPI? For instance, if I have a micro beowulf cluster of two computers, each one with a OpenCL video board, will can a single program use both?

If it's possíble, could I have a example on how to do it? Else, would it work with MPI?

Thanks in advance

sean.settle
06-01-2011, 08:25 AM
The scope of OpenCL is limited to a single node. You would need to use MPI in addition to OpenCL to accomplish what you want. I'm not quite sure how to go about load balancing heterogenous systems (CPUs + GPUs) so I'm sorry I can't give you more detail about that. Generally when executing MPI programs one would specify a list of hostname with the number of CPU cores for each node, but how would one also specify the GPUs, GPU cores, frequencies, etc?

If you find some good info, please remember to share :D

LucasCampos
06-01-2011, 01:50 PM
I have not tought of running on several GPUs, as there would be a HUGE communcation delay. But maybe, using the right parameters, on a proper problem, it would work quite well. I'll check some more and if I find any good stuff, I promise to post here.

Lucas

david.garcia
06-01-2011, 03:49 PM
I don't understand these two sentences. They seem to say the opposite.


For instance, if I have a micro beowulf cluster of two computers, each one with a OpenCL video board, can a single program use both?


I have not tought of running on several GPUs, as there would be a HUGE communication delay.

As Sean said, you can use MPI to communicate between nodes.

sean.settle
06-02-2011, 12:38 AM
I talked with some people who are familar with distributed computing with GPUs, and they said for best performance there should be one GPU for each CPU core. That way each MPI slot on a node is dedicated to one GPU.



For instance, if I have a micro beowulf cluster of two computers...


If you're concerned about communication bandwidth, latency, and not going to use all the CPU cores intensively, then multiple GPUs in a single node will outperform multiple GPUs on different nodes. This assumes that you're motherboard has PCIe x16 for each GPU on that node.