Results 1 to 3 of 3

Thread: how to do task parallel and data parallel on the same device

  1. #1
    Junior Member
    Join Date
    Feb 2010
    Posts
    12

    how to do task parallel and data parallel on the same device

    i have one algorithm, that can be implemented using 34 work items executing the same kernel (clEnqueueNDRangeKernel), i.e. SIMD (data parallel method) in OpenCL. in this case, only 34 work items are used, and the GPU is quite low utilized.

    In order to measure the maximum throughput on the GPU, i want to push as many execution of such algorithm instance as possible to the GPU so that all computation elements can be used. i.e. i want to do task paralllel as the same time. Can anyone tell me to how to do that? my understanding is that command queue in opencl is like a one server queue, two clEnqueueNDRangeKernel commands can't be executed at the same time on the GPU even though there are resource available... how can i make the device execute multiple algorithm instances with data parallellism in the algorithm?

  2. #2

    Re: how to do task parallel and data parallel on the same device

    The next generation Fermi cards will be able to execute multiple kernels at the same time. However, current cards can only execute one kernel at a time, there is no way around this. Why not place each 34 work items into work groups, and then launch many work groups with a single kernel invocation.

    Remember, you need to have thousands of work items running in order to make full utilization of a GPU.

    -Brian

  3. #3
    Junior Member
    Join Date
    Feb 2010
    Posts
    12

    Re: how to do task parallel and data parallel on the same device

    Quote Originally Posted by coleb
    The next generation Fermi cards will be able to execute multiple kernels at the same time. However, current cards can only execute one kernel at a time, there is no way around this. Why not place each 34 work items into work groups, and then launch many work groups with a single kernel invocation.

    Remember, you need to have thousands of work items running in order to make full utilization of a GPU.

    -Brian
    thank you very much for the information, i will see if i can use workgroup to achieve my goal on my algorithm.

Similar Threads

  1. Task parallel programming
    By ken-domino in forum OpenCL
    Replies: 3
    Last Post: 08-06-2011, 06:26 PM
  2. 2 data-parallel models
    By Kravell in forum OpenCL
    Replies: 3
    Last Post: 09-22-2009, 12:18 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •