Results 1 to 3 of 3

Thread: Best way to implement this?

  1. #1

    Best way to implement this?

    In advance, I do not expect you to do my work for me, I would just like some thoughts.

    I have a kernel that needs to scan every item in an array of data. (pseudocode)

    Code :
    kernal void myKernel(
    global const float* arrayValues,
    global const float* arrayMult,
    global const float* output)
    {
       int index = get_global_id(0);
       int value = 0;
       for(int i = 0; i < arrayValues.length; i++)
       {
           int x = algorithm;
           value += arrayMult[x] * arrayValues[i];
       }
        output[index] = value;
    }

    So I have a lot of access into global memory, and the inputArray is too large to fit into my local mem, so what would be the best way to approach this?

  2. #2

    Re: Best way to implement this?

    To specify (maybe it will help), I am running a grid of weights over a larger grid of values. For each value, I am recalculating it based on the (values around it) * (weights that correspond), but it is making so many reads from global memory that it goes slower on the GPU than the CPU. Here is the code that I am using to call the kernel:

    Code :
    //sizeIn is equal to the size of an array of floats
    clEnqueueNDRangeKernel(queue, kernel, 1, 0, &sizeIn, NULL, 0, NULL, NULL);
    and here is the code from my kernel that is weighing down the efficiency

    Code :
    __kernel void simple(
    	global const float* input,
    	global float* output,
    	constant float* weightsIn,
    	private int halfVal,
    	private int numData,
    	private int numWeights)
     
    ...
     
    for(int yIn = boundsYLeft; yIn <= boundsYRight; yIn++)
    	{
    		for(int xIn = boundsXLeft; xIn <= boundsXRight; xIn++)
    		{
    			weight += weightsIn[(xIn-boundsXLeft)+(yIn-boundsYLeft)*numWeights] * input[xIn+yIn*numData];
    		}
    	}
    output[index] = weight;
    my problem is the for loop here that loops through the smaller grid of weights, and applies changes to the large grid that is passed in, reading every piece of data from global memory. Is there a way to make this more efficient, such as global_work_size in a for loop, or somehow limiting the reads from global memory? Any ideas would be helpful

  3. #3

    Re: Best way to implement this?

    maybe prefetch or async_copy?

Similar Threads

  1. How to implement flight movement?
    By mobilevisuals in forum OpenGL ES general technical discussions
    Replies: 1
    Last Post: 11-11-2009, 10:26 AM
  2. how to implement :isPointInPath
    By akaiwall in forum OpenVG and VGU
    Replies: 1
    Last Post: 04-01-2009, 08:57 AM
  3. Trying to implement EGL in SDL
    By WillPash in forum Cross API and window system integration
    Replies: 2
    Last Post: 03-30-2009, 04:04 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •