I have to execute a kernel many times where the only change is that two buffer are exchanged and one is updated before the call: For example the first call would be
Code :
myContext.program.myKernel(
    myContext.queue,
    (NThreads,),
    None,
    bufferA,
    bufferB,
    bufferState)

and the next

Code :
myContext.program.myKernel(
    myContext.queue,
    (NThreads,),
    None,
    bufferB,
    bufferA,
    bufferState)

but bufferState contains actually only one interger which is increased by 1 each call. Can I somehow avoid creating a new clBuffer each time to pass this integer? All threads have to be sychron. So increasing this integer inside the kernel each call is not possible due to asynchron access.

I'm also wondering if I can pass only scalar values instead of arrays. Up to now I create a numpy array, then a clBuffer and after passing it to the kernel I extract the scalar out of the array inside the kernel. This seems a bit to much effort. Maybe this is specific to pyopencl, I don't know.