Results 1 to 2 of 2

Thread: Higher register usage after migration from cuda

  1. #1
    Junior Member
    Join Date
    Jun 2010
    Posts
    11

    Higher register usage after migration from cuda

    Hi,

    I've just migrated my program from cuda to opencl. It involved a bit work to change all the host code, like device initialization, memory allocation, kernel execution etc.

    For the device code (kernels) changes were very small hovewer:
    - replacing __syncthreads() with barrier(CLK_LOCAL_MEM_FENCE)
    - changing from sqrtf(x) to sqrt((float)x)
    - constant memory not allocated statically like in cuda, but dynamically (with __constant kernel argument, and appropriate call to clSetKernelArg() on the host side).

    That were the only changes made.

    Unfortunately opencl version consumes more registers than native cuda version.
    Even specifying -cl-nv-maxrregcount (in clBuildProgram) and seting the amount of max registers to number achieved in native cuda compilation didn't help much - it works (judging by BUILD_LOG from clGetProgramBuildInfo), but there are spills to private memory ("local" in cuda nomenclature) and overall performance of kernel is lower.

    I've been experimenting with -cl-nv-opt-level build option but achieved nothing.
    Both with cuda and opencl I'am using fast math option - checked without it but nothing. Everytime opencl reg usage is higher or there are spills. The difference can be as high as 7 registers and it ruins the performance (lower number of blocks/workgroups can be processed simultaneously on multiprocessor).

    What is the cause of such behaviour - poor opencl compiler?

  2. #2
    Junior Member
    Join Date
    Jun 2010
    Posts
    11

    Re: Higher register usage after migration from cuda

    EDIT: Forgot to mention: Cuda compiles my kernels for arch 11, opencl only to arch 10. Since I dont know how to force opencl to compile for particular architecture (is it possible?) I compiled with cuda to arch 10 - no change, still reg usage is higher.

Similar Threads

  1. register usage : float3 vs float4
    By roger512 in forum OpenCL
    Replies: 2
    Last Post: 03-18-2013, 01:57 AM
  2. Replies: 3
    Last Post: 01-11-2010, 07:10 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •