I would like to create a kernel that looks like this:
__kernel void Traverse(__global const TreeRoot *root)
// perform some operations...
Now, I need to copy the whole tree (obviously) to device. But how
can I get back a pointer to device memory when creating buffers
so that I can assemble the tree in device? I am sure this can be
done in CUDA, but so far I have the feeling that it's not possible