He began the study of OpenCL and raised a number of
1. Is it possible to parallel processing of multiple cl_command_queue on one device? (which is parallel to the best loading device).
The need for such a task has arisen for the following reasons there is a simple operation which is performed very quickly and for a maximum of 30-40 operating flows for the next iteration iteration is based on data received from the previous iteration. iteration of 3.5 units.
2. Another question using the command clEnqueueNDRangeKernel we set the queue for execution, but as I understood turn performed consistently at the same time without having to download all the card computing modules (even if the queue contains several kernels they consistently loaded 30-40 thread) is it possible to organize a parallel implementation of these kernels ?

Best Regards,