|Summary:||Suggestion for the next OpenCL release: waiting for any event|
|Product:||OpenCL||Reporter:||Maxim Milakov <maxim.milakov>|
|Component:||Specification||Assignee:||Aaftab Munshi <amunshi>|
|Status:||NEW ---||QA Contact:||OpenCL Working Group <opencl>|
|Priority:||P3||CC:||bwatt, daaugusto, wade.colson|
Description Maxim Milakov 2011-07-06 07:50:59 PDT
OpenCL 1.1 defines clWaitForEvents function which synchronously waits for all events to fire (commands to complete). It is highly desirable to have a function (for example, clWaitForAnyEvent) which synchronously waits for any event from the list. This function would be especially useful in case of out-of-order execution commands.
Comment 1 Brian Watt 2011-07-07 09:46:52 PDT
Thank you for your request. The OpenCL working group is interested in understanding your request in more detail. Can you provide more information as to what scenario (AKA use case) you are trying to accomplish with this enhancement? Why do you need to wait for only a subset of events passed to clWaitForEvents-like function to complete? After returning back from this function what behavior does your algorithm need to do with the completed and non-completed events? In other words, any further information you can provide will be useful in our evaluation of your request. Thanks again.
Comment 2 Maxim Milakov 2011-07-07 11:20:42 PDT
Brian, thank you for a quick answer. Sorry for I didn't specify use case in the very beginning. Here it is, simplified one: Consider the system having 1 device within the same platform. The OpenCL program runs cycles: 1) read chunk of data from disk 2) enqueue copy from host to device 3) enqueue ndrange 4) finish command 5) go to step 1 Yes, it is very simplified one (it even doesn't return any results to the host), it might be improved with simaltaneous kernel execution and preparing data for the next kernel execution (read from disk, copy data from host to device). But let's keep it simple. What if we have 2 devices within the same platform (and context)? Let's try to construct our new cycle: 1) read 1st chunk of data from disk 2) enqueue copy from host to the 1st device for the 1st chunk through the 1st command queue (associated with the 1st device) 3) enqueue ndrange in the 1st command queue 4) flush the 1st command queue 5) read 2nd chunk of data from disk 6) enqueue copy from host to the 2nd device for the 2nd chunk through the 2nd command queue (associated with the 2nd device) 7) enqueue ndrange in the 2nd command queue 8) flush the 2nd command queue 5) finish command... for what command queue? We might wait for the 1st command queue to finish while the 2nd has already finished and the 2nd device is idle. If we would have "wait for any" functionality we would be able to organize the following cycle: 1) read chunk of data from disk 2) "wait for any" command to finish (actually we will wait for any of the 2 events to fire) 3) "wait for any" returned the index of the event to fire, so we know what device has just become idle (the starting 2 runs are deterministic here, we will just pick up the 1st and the 2nd device) 4) enqueue copy from host to device 5) enqueue ndrange and keep the event of this action. This event is one of the two events we will be waiting in step 2. 6) go to step 1 And no device is idle, even if one of them is slower than another or the kernel execution time is not constant. Why did I mention out-of-order execution commands (and devices supporting simaltaneous kernel execution)? Because each such device is like several devices in the same context, from synchronization perspective. The more people are able to run several kernels simultanously the greater the need for knowing when one of the several device activities finished will be.
Comment 3 Aaftab Munshi 2011-07-14 21:20:50 PDT
Maxim, I believe you should be able to implement the scenario you describe in comment #2 with event callbacks. Here is what you can do (assuming an in-order command queue): 1) read 1st chunk of data from disk 2) enqueue copy from host to the 1st device for the 1st chunk through the 1st command queue (associated with the 1st device) 3) a) enqueue ndrange in the 1st command queue - let event 1 be the event associated with this command 3) b) set event callback for event 1 4) flush the 1st command queue 5) read 2nd chunk of data from disk 6) enqueue copy from host to the 2nd device for the 2nd chunk through the 2nd command queue (associated with the 2nd device) 7) a) enqueue ndrange in the 2nd command queue - let event 2 be the event associated with this command 7) b) set event callback for event 2 8) flush the 2nd command queue 5) finish command... for what command queue? We might wait for the 1st command queue to finish while the 2nd has already finished and the 2nd device is idle. The appropriate callback for event 1 or 2 will be invoked based on which command finishes first and then you can device which device / queue you enqueue the next work.
Comment 4 Maxim Milakov 2011-07-14 22:24:29 PDT
Aaftab, Callbacks are powerful functionality. If we have them then we don't need clFinish and clWaitForEvents functions. They would be marked as depricated at least. But it is not the case. My guess is that the reason is user convenience. Besides, blocking calls are not permitted in callbacks, thus if I use Map/Unmap functionality (instead of Read+Write) I will need to have yet another callback which will be called when the buffer(s) is mapped. All synchronization libraries I worked with included not only "waitForAll" but "waitForAny" too.
Comment 5 Aaftab Munshi 2011-07-15 11:43:28 PDT
I agree it would be useful to do a wait for any event for user convenience. Are you just looking for a clWaitForEvents variant that allows you to wait on the host side for all specified events or wait for any event to complete. I also assume that for the latter case you want to know which event(s) from the list of events actually completed.
Comment 6 Maxim Milakov 2011-07-15 11:55:49 PDT
Aaftab, I am not sure I got you right... We already have clWaitForEvents function which returns only when all events (passed as params) fired. It would be useful to have another function which returns when one of the events fired (and doesn't wait for others). And yes, you are absolutely correct, this function should indicate what event fired. Maybe the function should accept (uint *) parameter and set the value to the index of the event from the list passed as parameter?
Comment 7 Douglas 2012-10-05 15:27:20 PDT
Any news on this? I agree with Milakov on that a "wait for any" function would be very convenient and simplify the coding by avoiding the complexity of event callbacks.
Comment 8 Wade Colson 2014-04-15 16:01:24 PDT