Results 1 to 3 of 3

Thread: register usage : float3 vs float4

  1. #1
    Junior Member
    Join Date
    Mar 2013
    Posts
    8

    register usage : float3 vs float4

    Hi everybody,

    I'm currently working with opencl and i'm getting issues with a high amount of registers per thread in my main kernel.

    The main kernel use a quite large amount of float4 but actually it could be float3 most of the time. I know cl_float3 is a typedef of cl_float4, i also know that float3 on device side is a 16 bytes struct.

    Am i right, if i think that extra unused float is a waste of register ?

    if yes I'm looking for a tip to bypass this problem ?


    sorry for bad english.

    Roger

    Ty

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Posts
    166

    Re: register usage : float3 vs float4

    You are correct, float3 is a float4 to fix the alignment. There are ways to use float3 in kernels. Have a look ate the vloadn function. But it is wayyyyyyy slower than using cl_float3 (which is a cl_float4). And because of the register question I'm not really shure. Best way is to use the 4th component for data you will need in your computation somwhere else (index of vecor in Hostmemory ect)

  3. #3
    Junior Member
    Join Date
    Mar 2013
    Posts
    8

    Re: register usage : float3 vs float4

    ok it seems NVIDIA compiler is able to save register when i use float3 instead of float4.
    And i can keep float4 in global to keep the alignment using as_float3 convertor.

    Roger

Similar Threads

  1. Replies: 1
    Last Post: 06-13-2010, 06:31 AM
  2. Replies: 3
    Last Post: 01-11-2010, 07:10 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •