PDA

View Full Version : Fast array initialization



nachovall
03-16-2011, 04:51 AM
Hi,

I want to allocate an array in GPU memory initialized (all the positions) with a given value (for instance 0).

I tried by creating a cpu array pointer with zeros and create the buffer with the CL_MEM_COPY_HOST_PTR flag.

I also tried by running a kernel with one thread per position of the array an inside the kernel just assign the given value.

The array is huge, by the way.

I wonder if there is some way more efficient of doing this.

Thank you.

sean.settle
03-16-2011, 10:56 AM
Someone can correct me if I'm wrong, but you could allocate them cl_mem buffer on the device using a NULL host pointer, then run a kernel to zero all the array elements before running your desired kernel. That way you don't have to allocate and zero the array on the host and then do a host to device memory transfer, which all together could take a lot of time.

nachovall
03-16-2011, 10:59 AM
you could allocate them cl_mem buffer on the device using a NULL host pointer, then run a kernel to zero all the array elements before running your desired kernel.

That's what I tried to explain (wrong apparently) when I said:



I also tried by running a kernel with one thread per position of the array an inside the kernel just assign the given value.


Thanks anyway.

sean.settle
03-16-2011, 11:30 AM
Ah, I'm sorry I misunderstood.

I guess another option (other than clEnqueueWriteBuffer) is to use clMapBuffer to write to the device memory through a host pointer. There are some examples of how to do this in the AMD and NVIDIA SDKs under memory optimizations/bandwidth (PCIe). If you do writing async then you can overlap some other computations while writing to the buffer.

nachovall
03-16-2011, 12:15 PM
Ah, I'm sorry I misunderstood.

I guess another option (other than clEnqueueWriteBuffer) is to use clMapBuffer to write to the device memory through a host pointer.

Nice one!! I'll try it.

Thank you very much.