Page 1 of 3 123 LastLast
Results 1 to 10 of 21

Thread: How to synchronize iterations?

  1. #1
    Junior Member
    Join Date
    Mar 2012
    Posts
    29

    How to synchronize iterations?

    The host provides a 3d vector field, i.e. a 4d-float-matrix:

    field[Nx][Ny][Nz][3]

    The first three dimensions represent a lattice and the fourth dimension of length 3 stores the three vector components x,y,z at a given lattice point. Before passing this structure to the kernel, it is flattened to a 1d array of length 3*Nx*Ny*Nz. Inside the kernel an iteration for each lattice point (i.e. each vector) has to be done for let's say 10 steps. BUT: For each iteration step the values of adjacent lattice points (6 for each lattice point) have to be considered. Without this restriction I can just let each worker do all 10 iteration steps for each lattice point as they are all independent. But with this restriction I have to wait for each lattice point to reach the current iteration step before the next step for any lattice point can be done.

    Is there a way to cope with this? I'm not very experienced with OpenCL.

  2. #2
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: How to synchronize iterations?

    The only way to synchornise global memory writes across multiple work groups is to run another kernel. i.e. do one step at a time. The best way to think about it is that global memory is either read-only, write-only, or only read-write to the same address range (per work group).

    There are many reasons, including that kernels might not even be running physically because they can't fit on a specific device, and it allows the hardware to run faster as it doesn't need to worry about coherency across 20+ devices.

    Atomics are no solution here, they will be too slow and are not designed for it.

    If the problem had only a local requirement then some synchronisation could occur inside the kernel using local memory instead. e.g. depending on what you do with the adjacent values, you could just over-calculate overlapping regions so you can do synchronisation in-kernel. But I doubt this is the case.

  3. #3
    Junior Member
    Join Date
    Mar 2012
    Posts
    29

    Re: How to synchronize iterations?

    Quote Originally Posted by notzed
    The only way to synchornise global memory writes across multiple work groups is to run another kernel. i.e. do one step at a time. The best way to think about it is that global memory is either read-only, write-only, or only read-write to the same address range (per work group).
    I also thought about this possibility. Does the data have to be moved between host and device every kenrel call or can it stay in device memory until the last timestep/call is done and then fetched only once to the host? (I'm not that familiar with the OpenCL memory model.)

  4. #4
    Junior Member
    Join Date
    Dec 2011
    Posts
    25

    Re: How to synchronize iterations?

    Yes it stays on the device and is persistent between kernel calls. You can do a clEnqueueReadBuffer when you want to get it off.

  5. #5
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: How to synchronize iterations?

    Quote Originally Posted by MaximS
    (I'm not that familiar with the OpenCL memory model.)
    You should probably read up on the relevant parts of the spec, section 3.3 is about the memory model. Chapter 3 overall is a fairly light read and introduction to the architecture and you should at least read that.

  6. #6
    Junior Member
    Join Date
    Mar 2012
    Posts
    29

    Re: How to synchronize iterations?

    Thanks a lot for the hints!

  7. #7
    Junior Member
    Join Date
    Mar 2012
    Posts
    29

    Re: How to synchronize iterations?

    OK, I've read section 3. Now I have a question about the synchronization. Currently I'm using AMD APP SDK and a Intel Core 2 but in near future I will switch to a Nvidia GTX 560. The device info method says:

    MAX_WORK_ITEM_SIZES: [1024, 1024, 1024]

    So, if my vector field matrix doesn't exceed this dimensions, I can synchronize the work-items inside the kernel, right? Would it be more efficient than synchronizing by recalling the kernel?

  8. #8
    Junior Member
    Join Date
    Dec 2011
    Posts
    25

    Re: How to synchronize iterations?

    You should read the NVIDIA OpenCL programming guide and the OpenCL best practices from here http://developer.nvidia.com/nvidia-g...-documentation. There are many ways you can organise your workgroups and workitems. For the GPU in particular, yes you can put them all in one workgroup but you won't get very good performance as this workgroup will only use a single SM (since you cant synchronise across SMs).

    You want to allocate your workitems in multiples of 32 (a warp) and then you make however many workgroups you need based on the multiple you use for best performance. As I say, the programming guides explain it very well. But yes, if you want to synchronise across all workitems you need one big workgroup.

  9. #9
    Junior Member
    Join Date
    Mar 2012
    Posts
    29

    Re: How to synchronize iterations?

    How much overhead is there if I move the iteration loop out from the kernel and put the kernel calls in a host side itration loop? I tried this and the performance dropped that much it makes no sense.

    The loop in the python code looks like this now:

    Code :
            for nt in range(int((time - self.time) / dt)):
                self.theCLTool.program.solve_LLG_heun(
                    self.theCLTool.queue,
                    self.thePhysicalObject.dimensions,
                    None,
                    self.theCLDataBuffer,
                    self.theCLParameterBuffer)

    Before that I had to call the kernel only once. Now the total calculation time increased by a factor of 10000 or even more! Is this what one should expect or is there maybe something I don't know?

  10. #10
    Junior Member
    Join Date
    Mar 2012
    Posts
    29

    Re: How to synchronize iterations?

    Message deleted.

Page 1 of 3 123 LastLast

Similar Threads

  1. Replies: 8
    Last Post: 04-07-2010, 02:56 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •