I have a simple question regarding the best way to solve my problem.
Let's say that I have a 2D array A on the device, and that this array contains (after some computations done by some kernels) mainly values at 0 and few different from 0.
What I need is to find the index idx in the array (and the corresponding value) so that A[idx]!=0.
So far, I transfer the array from the device to the host memory, and I process the array with a basic for loop to solve my problem.
The problem is that with this method, the memory transfer costs a lot and the serial function is not very efficient. I was thinking that maybe I could do the search on GPU and then transfer a small amount of information back since my array is sparse?
What do you think about it?
Thanks and happy new year