Hello everyone,

Now I'm doing an experiment on moving computations from CPU to GPU. For example, matrix multiplication: A x B can be easily done by C code running on CPU, but how can I dispatch...