Ok. I just find that puzzling There are probably easier ways to parallelise code if you only have a CPU to work on.Originally Posted by CNugteren
Just as an intellectual exercise?I understand all that, I do have some experience programming CUDA and OpenCL for GPUs. The only thing I'm trying to do now is to omit the memory copies on the CPU.
Well it depends on who is writing to it and the read/write flags, but in general where both sides are writing, then yes.This is what I understand, please correct me if I'm wrong:
* If I have a memory object which is created using 'CL_MEM_USE_HOST_PTR', it is meant to be accessed by the accelerator only (read/write).
* After I've created such an object, I should not access the host version of it, as it contains undefined data.
Again, it depends on who is writing it. If you're only mapping it for read then both sides can still read it.* If I map the memory object, it is accessible by the host from that point on (either for read or write, specified as a flag to the API call), but the accelerator should not access it anymore, as it contains undefined data from accelerator perspective.
Well the heap memory wont be unmapped from the process: you will still have access to it. It's just that if you subsequently invoke a kernel, and have written to it in the mean-time, there's no guarantee the kernel will get any of those writes.* If I unmap the memory object, it goes back to the state it previously was (accessible by the accelerator, not by the host).
If you're only reading a result or never use it for a kernel, the data will stay around and be valid after you unmap it.
245 void* pointer_to_B = clEnqueueMapBuffer(bones_queue,device_B,CL_TRUE,CL _MAP_READ,(N * 1)*sizeof(int),0,0,NULL,NULL,&bones_errors); error_check(bones_errors);Therefore, I print inside this map/unmap region (and at various other places), but it does not seem to work. I've made a link to the full version of the code here: http://dl.dropbox.com/u/26669157/opencl-cpu.tar.gz (I'm not asking you to go through the code, but maybe somebody is interested anyway - the printf is in line 247 of the example6_host.c file).
This looks wrong, you're passing the size as the offset, and mapping 0 bytes. i.e. pointer_to_B should end up being &B[N*4], not &B
(another example of why actual source is much better than fragments/discussions).
Because it's part of the api? You've effectively allocated a resource and it's just a resource management issue.A small question to end with: why do I want to 'unmap' the object? After my OpenCL kernel has ran I will do a lot of computations on the resulting data outside of OpenCL, no kernels anymore. Ideally I just want to 'map' the object directly after the kernel has ended to give it back to the CPU and never 'unmap' it. That doesn't seem to be possible, as the 'map' requires you to specify either read or write.
Thanks again for the help!
But anyway unmapping will work if you are either only reading it, or never using that same buffer ever again in a kernel. If you need to do some processing and subsequently invoke another kernel on it, you will either need to keep the map around during the whole host-side update, or alternatively release the buffer and create a new one when you need it again.