Results 1 to 5 of 5

Thread: CL_DEVICE_MAX_WORK_GROUP_SIZE

  1. #1
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    CL_DEVICE_MAX_WORK_GROUP_SIZE

    Hello

    I'm getting a great support here on forum, but just a moment ago I encountered situation which is confusing so just started another thread.

    Theoritically I should be able to run my kernel with 512 work items in workgroup. But it seems that when I specify local_work_size=512 for clEnqueueNDRangeKernel then no work is done and nearly 10 000 runs of my kernel are done surprisingly fast. When I decrease value of local_work_size to 256 kernel runs normally and performs computations.

    This is part of device query for OpenCL:
    Code :
     ---------------------------------
     Device Quadro NVS 140M
     ---------------------------------
      CL_DEVICE_NAME:                       Quadro NVS 140M
      CL_DEVICE_VENDOR:                     NVIDIA Corporation
      CL_DRIVER_VERSION:                    260.99
      CL_DEVICE_VERSION:                    OpenCL 1.0 CUDA
      CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
      CL_DEVICE_MAX_COMPUTE_UNITS:          2
      CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
      CL_DEVICE_MAX_WORK_ITEM_SIZES:        512 / 512 / 64
      CL_DEVICE_MAX_WORK_GROUP_SIZE:        512
      CL_DEVICE_MAX_CLOCK_FREQUENCY:        800 MHz
      CL_DEVICE_ADDRESS_BITS:               32
      CL_DEVICE_MAX_MEM_ALLOC_SIZE:         128 MByte
      CL_DEVICE_GLOBAL_MEM_SIZE:            113 MByte
      CL_DEVICE_ERROR_CORRECTION_SUPPORT:   no
      CL_DEVICE_LOCAL_MEM_TYPE:             local
      CL_DEVICE_LOCAL_MEM_SIZE:             16 KByte
      CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
      CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
      CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
      CL_DEVICE_IMAGE_SUPPORT:              1
      CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
      CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
      CL_DEVICE_SINGLE_FP_CONFIG:           INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

    Is it possible that driver/OpenCL is just giving me wrong information here about maximum allowed number of work items in work group?

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: CL_DEVICE_MAX_WORK_GROUP_SIZE

    Theoritically I should be able to run my kernel with 512 work items in workgroup.
    I believe you are confusing the maximum work size supported by the device with the maximum work size that can be used to run a particular kernel. For a very complex kernel, the maximum work size will be smaller than the maximum that the device can support for a very simple kernel.

    That's why in OpenCL there are two different queries. One of them is clGetDeviceInfo(..., CL_DEVICE_MAX_WORK_GROUP_SIZE, ...) -- this is the maximum for the device. The other one is clGetKernelWorkGroupInfo(..., CL_KERNEL_WORK_GROUP_SIZE, ...) -- this one is the maximum value you can pass to clEnqueueNDRangeKernel() for this kernel.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Senior Member
    Join Date
    Mar 2011
    Location
    Seoul
    Posts
    118

    Re: CL_DEVICE_MAX_WORK_GROUP_SIZE

    Quote Originally Posted by david.garcia
    In OpenCL there are two different queries. One of them is clGetDeviceInfo(..., CL_DEVICE_MAX_WORK_GROUP_SIZE, ...) -- this is the maximum for the device. The other one is clGetKernelWorkGroupInfo(..., CL_KERNEL_WORK_GROUP_SIZE, ...) -- this one is the maximum value you can pass to clEnqueueNDRangeKernel() for this kernel.
    Is CL_KERNEL_WORK_GROUP_SIZE guaranteed to be less than or equal to CL_DEVICE_MAX_WORK_GROUP_SIZE, or would one have to explicily take the minimum of the two?

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: CL_DEVICE_MAX_WORK_GROUP_SIZE

    Is CL_KERNEL_WORK_GROUP_SIZE guaranteed to be less than or equal to CL_DEVICE_MAX_WORK_GROUP_SIZE, or would one have to explicily take the minimum of the two?
    Technically speaking, I don't think the spec guarantees that CL_KERNEL_WORK_GROUP_SIZE must be less than or equal to CL_DEVICE_MAX_WORK_GROUP_SIZE.

    However, the spec describes CL_KERNEL_WORK_GROUP_SIZE as the "maximum work-group size that can be used to execute a kernel on a specific device
    given by <device>", and that's all the application cares about.

    CL_DEVICE_MAX_WORK_GROUP_SIZE is pretty much irrelevant.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    Re: CL_DEVICE_MAX_WORK_GROUP_SIZE

    According to documentation here http://www.khronos.org/registry/cl/s...GroupInfo.html I wrote a query and after executing it:
    Code :
    uint result;
    size_t size_ret;
    clGetKernelWorkGroupInfo(OpenCL, NULL, CL_KERNEL_WORK_GROUP_SIZE, sizeof(uint), (void*)&result, &size_ret);

    result is 320.

    Running my kernel with global_work_size=640 & local_work_size=320 provides right computations. Still computations are not faster than on CPU, but this is subject for separate thread...

    Thanks

Similar Threads

  1. Replies: 4
    Last Post: 10-13-2011, 03:56 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •