I've been looking over the OpenCL specification, and I have a question about the rationale for one limitation/requirement I've come across.
In the v1.1 spec, section 5.9, under clSetUserEventStatus (page 143) there is the following note (this text is present in v1.1 to v2.0):
"Enqueued commands that specify user events in the event_wait_list argument of
clEnqueue*** commands must ensure that the status of these user events being waited on are set using clSetUserEventStatus before any OpenCL APIs that release OpenCL objects except for event objects are called; otherwise the behavior is undefined."
I was wondering why this requirement is in place - it's not immediately obvious to me why a memory release would necessarily impact a user event that has not been marked as CL_COMPLETE (the only valid option for a call to clSetUserEventStatus), excluding the obvious problem of race conditions.
In practice, this restriction has broad effects. For example, in certain situations, I'd like to be able to control a command queue such that it may block on a host callback in the same manner as a device kernel, and this requirement prevents doing so in a straightforward manner (since it limits the set of valid memory operations from the time the event is created until the callback is completed). While there are other means to this end, they're a good bit more complicated, so I was curious to find out the rationale here.
RIGHT before I was about to submit this, I saw the thread "user event limitations" (attempting to link to it failed), asking a similar question from 2011. There, it was indicated that the only risks are from releasing memory objects that may be used by the stream in which the user event exists (i.e., the aforementioned race)...and that the text should be clarified.
...but before I add to the feedback thread asking for text clarification, I figured I'd check here to make sure that that understanding is still valid.