Results 1 to 2 of 2

Thread: Optimization

  1. #1
    Junior Member
    Join Date
    Nov 2010
    Posts
    9

    Optimization

    Hi,

    I am just playing around with Apples OpenCL FFT code and added the following optimization:

    Instead of calculating "(dir*2.0f*M_PI*j/64)" over and over again, I cached the result in a variable and use that subsequently.

    But the code runs SLOWER than before now!

    What might be the reason?

    original code:
    __kernel void fft1(__global float2 *in, __global float2 *out, int dir, int S)
    {
    ...
    ang = dir*2.0f*M_PI*j/64*1;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[1] = complexMul(a[1], w);
    ang = dir*2.0f*M_PI*j/64*2;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[2] = complexMul(a[2], w);
    ang = dir*2.0f*M_PI*j/64*3;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[3] = complexMul(a[3], w);
    ang = dir*2.0f*M_PI*j/64*4;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[4] = complexMul(a[4], w);
    ang = dir*2.0f*M_PI*j/64*5;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[5] = complexMul(a[5], w);
    ang = dir*2.0f*M_PI*j/64*6;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[6] = complexMul(a[6], w);
    ang = dir*2.0f*M_PI*j/64*7;
    w = (float2)(native_cos(ang), native_sin(ang));
    ...
    }

    my optimization:
    __kernel void fft1(__global float2 *in, __global float2 *out, int dir, int S)
    {
    ...
    float cached_multiplicator;
    cached_multiplicator = dir*2.0f*M_PI*j/64;

    ang = cached_multiplicator;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[1] = complexMul(a[1], w);
    ang = cached_multiplicator*2;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[2] = complexMul(a[2], w);
    ang = cached_multiplicator*3;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[3] = complexMul(a[3], w);
    ang = cached_multiplicator*4;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[4] = complexMul(a[4], w);
    ang = cached_multiplicator*5;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[5] = complexMul(a[5], w);
    ang = cached_multiplicator*6;
    w = (float2)(native_cos(ang), native_sin(ang));
    a[6] = complexMul(a[6], w);
    ang = cached_multiplicator*7;
    w = (float2)(native_cos(ang), native_sin(ang));
    ...
    }

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Optimization

    I don't know why the second code would be slower.

    However, just because in the first code there's an expression that is repeated multiple times it doesn't mean that the hardware is going to evaluate it over and over. An optimizing compiler will typically remove the repeated code thanks to common subexpression elimination or related techniques.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Default Optimization in OpenCL
    By akhal in forum OpenCL
    Replies: 1
    Last Post: 08-21-2011, 05:48 PM
  2. Improvements and Optimization.
    By thesidisticme in forum OpenVG and VGU
    Replies: 1
    Last Post: 11-02-2010, 06:15 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •