Page 1 of 3 123 LastLast
Results 1 to 10 of 21

Thread: Memory allocation inside kernel

  1. #1
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    Memory allocation inside kernel

    Hello

    Is it possible to allocate memory inside kernel using malloc()? (so later I can use free()).
    I wrote a kernel, but OpenCLEditor (http://www.cmsoft.com.br/download/OpenCLTemplate.zip) display some errors when I want to use malloc() / free(): http://imageshack.us/photo/my-images/802/opencl.png/

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Memory allocation inside kernel

    Is it possible to allocate memory inside kernel using malloc()? (so later I can use free()).
    malloc/free are not supported in OpenCL C.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    Re: Memory allocation inside kernel

    So to have access for independent arrays (lets say of size 15) for each work item, I have to create an array of arrays (for example) of size 'computeSizeBlock' before calling kernel and pass pointer to it?

    So if I want to do 10 computations (run 10 times a work item) and my 'computeSizeBlock' is 2 I have to create two dimensional array ([2][15]) and pass it to kernel which will be executed 5 times (computeSizeBlock=2, passes = 5 so computations = 10).

    Or maybe there is a better idea to allocate independent array for every work item so it can do it's computations ?

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Memory allocation inside kernel

    So to have access for independent arrays (lets say of size 15) for each work item, I have to create an array of arrays (for example) of size 'computeSizeBlock' before calling kernel and pass pointer to it?
    You can't pass pointers to pointers as kernel arguments either. If you want each work-item to have its own little space, you can either pass a pointer to global memory and internally assign some portion of it to each work-item, or you can do the same with local memory, or you can declare a private array inside the kernel like this:
    Code :
    __kernel void foo()
    {
        // The following variable is in private memory by default
        float myArray[15];
    }
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    Re: Memory allocation inside kernel

    Allocating array of size = computeBlockSize*sizeOfOneArray globally and passing pointer seems to be a good solution. I will try doing it and computing offset for each workitem inside the kernel.

  6. #6
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    Re: Memory allocation inside kernel

    Can you point me to some article describing how stack memory is maintained in OpenCL kernels?
    OK lets say some of arrays I need in kernel are small (uchar [4]). Is it safe to use private memory even if computeSizeBlock will be a big number?

  7. #7
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Memory allocation inside kernel

    Can you point me to some article describing how stack memory is maintained in OpenCL kernels?
    It will depend a lot on the implementation. In fact, it's possible that some implementations don't even have a stack.

    OK lets say some of arrays I need in kernel are small (uchar [4]). Is it safe to use private memory even if computeSizeBlock will be a big number?
    What is computeSizeBlock? Is it the work-group size? I'm not sure what do you mean by "safe" either. It should certainly work in any reasonable implementation.

    If you use a lot of private memory you will face two problems. The first one is that performance will generally decrease as you increase the amount of private memory required to run a kernel. The second problem is that at some point you may hit an implementation limit on how much private memory you can use and either compilation will fail or clEnqueueNDRangeKernel() will fail.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  8. #8
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    Re: Memory allocation inside kernel

    Thanks for reply. So I will just try different settings of work-group size / passes in case of any problems (smaller work-grup size -> more passes I need to perform in oder to do all needed computations).

    And...
    What is computeSizeBlock? Is it the work-group size?
    Yes, I was thinking about work-group size.

  9. #9
    Junior Member
    Join Date
    Mar 2011
    Posts
    28

    Re: Memory allocation inside kernel

    I have some problems with my kernel, or maybe the way I'm using and allocating memory is wrong.
    I'm trying to allocate a workSize*sizeOfArray "big" array and then in kernel for every work item I compute offset - so every work item has independent space for storage.

    But I experiencing some problems and I would like you to explain me what I'm doing wrong.

    OK so I need 3 bigger arrays for every work item. I'm allocating space by using clCreateBuffer(). Let's say I want to run clEnqueueNDRangeKernel() with global_work_size parameter set to 512.

    Every work item need one of the arrays to be 256 bytes long. So in advance I need to allocate 256 * workSize = 256 * 512 = 131 072 bytes array. In kernel I do some computations using only a part of a this array. To compute offset I simply use: get_local_id(0)*256.

    I use these commands:
    int workSize=512
    int N=256;
    Code :
    cl_mem SBuffer = clCreateBuffer(GPUContext, CL_MEM_READ_WRITE, sizeof(uchar)*workSize*N, NULL, &errcode);
    assert(errcode==CL_SUCCESS);
    clSetKernelArg(OpenCLVectorAdd, 6, sizeof(cl_mem), (void*)&SBuffer);

    After executing kernel I can read the array doing:
    Code :
    uchar *s = new uchar[N*SIZE];
    clEnqueueReadBuffer(GPUCommandQueue, SBuffer, CL_TRUE,0,SIZE*N*sizeof(uchar),s,0, NULL, NULL);

    I expected the whole array will be filled with some values, but it seems only 65536 bytes were used. So now it's clear why my computations were wrong - probably space I though will be used only by one workitem was used by many workitems.
    So, 65536 bytes used - it means 2 times less then should be used.

    Is my implementation correct? I mean assuming that with get_local_id(0) I can compute offset and it will work?

    Maybe I just using my GPU wrong (it's Nvidia Quadro NVS140)?
    These are values which are displayed by Cloo framework:

    Code :
    LocalMemorySize = 16384
    MaxComputeUnits = 2
    MaxConstantArguments = 9
    MaxConstantBAufferSize = 65536
    MaxMemoryAllocationSize = 134217728
    MaxSamplers = 16
    MaxWorkGroupSize = 512
    MaxWorkItemDimenstions = 3
    MaxWorkItemSizes = 512 / 512 / 64

    How should I understand these values?

  10. #10
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Memory allocation inside kernel

    I don't see the necessary information to understand what is happening.

    What are the arguments you pass to clEnqueueNDRangeKernel? When you call clCreateBuffer() you allocate "workSize*N" bytes, but when you read back the data, you pass "SIZE*N" to clEnqueueReadBuffer. Is SIZE the same as workSize?

    Something that seems wrong with the code is that you create the buffer with a size of "workSize*N" which implicitly assumes that there is only one work-group running. You need to allocate "worksize*N*numWorkGroups" instead, where numWorkGroups is the number of work-groups that you launch when you call clEnqueueNDRangeKernel.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Page 1 of 3 123 LastLast

Similar Threads

  1. Replies: 4
    Last Post: 08-06-2012, 01:18 AM
  2. Can i allocate memory inside kernel?
    By luizdrumond in forum OpenCL
    Replies: 5
    Last Post: 02-22-2011, 11:10 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •