Results 1 to 3 of 3

Thread: Efficient use of memory in GPU

  1. #1

    Efficient use of memory in GPU

    Hey guys,

    I have been looking for answers now for about a week and cant find anything useful, so here goes.

    I have a kernel that takes a global float* as an input parameter, and another as an output. Due to the massive number of global accesses, the CPU is doing the algorithm quicker than the GPU, and I need it the other way around. I tried passing in a local float* to hold temp data from global to local, but it causes the code to error, and it outputs the exact same numbers it did last time I ran my program.

    I tried this:

    Code :
    __kernel void simple(
    	global const float* input1, //input
    	global float* input2, //output
    	constant float* input3, //another input
            local float* tempArg, //temp array
    	private int numData,
    	private int numData2)
    {
    int index = get_global_id(0);
    ...
    //for testing purposes
    tempArg[index] = index;
    write_mem_fence(CLK_GLOBAL_MEM_FENCE);
    ...
     
    output[index] = tempArg[index]; // this is where it breaks, giving me incorrect values
    //output[index] = index //works, if I dont have the local arg in the kernel parameters
    is it because I am running out of memory, or is it because something else is wrong? I am trying to make it faster, but it just keeps giving me crap values

  2. #2

    Re: Efficient use of memory in GPU

    EDIT:

    creating the tempBuffer inside the GPU works, rather than passing it in, but I cannot pass the size a variable, it has to be hardcoded. Could there be a workaround to this?

  3. #3
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Efficient use of memory in GPU

    If the amount of local memory you need changes dynamically, you will have to pass the local pointer as an argument to the kernel and use clSetKernelArg() to indicate the amount of local memory you need.

    As for the kernel not working correctly, keep in mind that the amount of local memory available in the hardware is going to be limited to a few KB, and therefore doing something like "tempArg[get_global_id(0)] = foo;" will not work because get_global_id() will typically return large values.

    You can query the amount of local memory available in your hardware with clGetDeviceInfo(..., CL_DEVICE_LOCAL_MEM_SIZE, ...). You can also query the amount of local memory currently used by your kernel with clGetKernelWorkGroupInfo(..., CL_KERNEL_LOCAL_MEM_SIZE, ...). The latter must be less than or equal to the former.

    Also, it's a good idea to always check whether OpenCL API calls return an error code. It's likely that clEnqueueNDRangeKernel() was returning an error code when you tried running that kernel.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Query Memory used on GPU
    By clint3112 in forum OpenCL
    Replies: 5
    Last Post: 11-17-2012, 12:25 AM
  2. Replies: 1
    Last Post: 05-25-2012, 12:57 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •