Page 1 of 3 123 LastLast
Results 1 to 10 of 23

Thread: Problems in performing Saxpy with mapping/unmapping

  1. #1
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Problems in performing Saxpy with mapping/unmapping

    Hi,
    I'm trying to test an APU with different buffer creation/allocation strategies. The algorithm is the classic Saxpy (y = ax + y, where x and y are vectors).
    I encounter a problem when I try to perform a test where buffers are created with USE_HOST_PTR flag and mapping/unmapping is performed. This is probably due to the fact that I still have some problems to understand the data transferring mechanism between host and device behind the mapping.

    To perform the test, I follow these steps:
    1) Allocate on the host the vectors X and Y (and Z, to test the computation on the host without overwriting Y, which has to be used later for the GPU)
    2) Create a buffer for X, using USE_HOST_PTR and passing the pointer;
    3) Create a buffer for Y, using USE_HOST_PTR and passing hte pointer;
    4) Execute the kernel
    5) Call clEnqueueMapBuffer and wait for it to complete, so to get consistend values for Y.

    When I try to run it, I get a memory violation exception. The nested function is HeapAlloc(_crtheap, 0, size ? size : 1), as reported by the Visual studio debugger.
    The point where the exception is raised changes from time to time, but it is always located in the part of the code where I get the result of the computation, i.e. where the host tries to read Y after clEnqueueMapBuffer.

    I post the relevant part of the code, hoping you can help me to find the mistake.

    Gloabal declarations:
    Code :
    cl_float * pX = NULL;
    cl_float * pY = NULL;
    cl_float * pZ = NULL;
    cl_float a = 2.f;

    Host initialization and computation:
    Code :
    void initHost(unsigned int length)
    {
    	size_t sizeInBytes = length * sizeof(cl_float);
    	pX = (cl_float *) malloc(sizeInBytes);
    	if (pX == NULL)
    		throw(string("Error: Failed to allocate input memory on host\n"));
    	pY = (cl_float *) malloc(sizeInBytes);
    	if (pY == NULL)
    		throw(string("Error: Failed to allocate input memory on host\n"));
    	pZ = (cl_float *) malloc(sizeInBytes);
    	if (pZ == NULL)
    		throw(string("Error: Failed to allocate input memory on host\n"));
    	for(int i = 0; i < length; i++)
    	{
    		pX[i] = cl_float(i);
    		pY[i] = cl_float(length-1-i);
    	}
    }
     
    void vectorAddHost(
    	const float* pfData1, 
    	const float* pfData2, 
    	float* pfResult, 
    	int iNumElements)
    {
        int i;
        for (i = 0; i < iNumElements; i++) 
        {
            pfResult[i] = a * pfData1[i] + pfData2[i]; 
        }
    }

    Code to initialize and run OpenCL computation and to compare results:
    Code :
    //128 is the local work size
    currNumElements = 128 * 1024;
     
    /////////////////////////////////////////////////////////////////
    // Allocate and initialize memory on the host
    /////////////////////////////////////////////////////////////////
    initHost(currNumElements);
     
    /////////////////////////////////////////////////////////////////
    // Test host
    /////////////////////////////////////////////////////////////////
    LARGE_INTEGER frequency;
    LARGE_INTEGER cpu_start = startTimer(&frequency);
    vectorAddHost(pX, pY, pZ, currNumElements);
    double cpu_time = getTimer(frequency, cpu_start);
    cout << "CPU TIME (CPU timer) = " << cpu_time << " ms" << endl;
    cpu_data << currNumElements << " " << cpu_time << endl;
     
    /////////////////////////////////////////////////////////////////
    // Start timer
    /////////////////////////////////////////////////////////////////
    LARGE_INTEGER gpu_start = startTimer(&frequency);
     
    /////////////////////////////////////////////////////////////////
    // Create OpenCL memory buffers
    /////////////////////////////////////////////////////////////////
    bufX = cl::Buffer(
    	context, 
    	CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
    	sizeof(cl_float) * currNumElements,
    	pX);
    bufY = cl::Buffer(
    	context,
    	CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    	sizeof(cl_float) * currNumElements,
    	pY);
     
    /////////////////////////////////////////////////////////////////
    // Set the arguments that will be used for kernel execution
    /////////////////////////////////////////////////////////////////
    kernel.setArg(
    	0,
    	bufX);
    kernel.setArg(
    	1, 
    	bufY);
    kernel.setArg(
    	2, 
    	a);
     
    /////////////////////////////////////////////////////////////////
    // Enqueue the kernel to the queue
    // with appropriate global and local work sizes
    /////////////////////////////////////////////////////////////////
    queue.enqueueNDRangeKernel(
    	kernel, 
    	cl::NDRange(),
    	cl::NDRange(currNumElements), 
    	cl::NDRange(localSize));	
     
    /////////////////////////////////////////////////////////////////
    // Map buffers (get capability?)
    /////////////////////////////////////////////////////////////////				
    cl_int err;
    cl_float* pEnd = (cl_float*)queue.enqueueMapBuffer(bufY, TRUE, CL_MAP_READ, 0,   
                           currNumElements * sizeof(cl_float), NULL, NULL, &err);
     
    //err == 0
    printf("%d\n", err);
     
    /////////////////////////////////////////////////////////////////
    // Test gpu
    /////////////////////////////////////////////////////////////////
    queue.finish();
    double gpu_time = getTimer(frequency, gpu_start);
    cout << "GPU TIME (CPU timer) = " << gpu_time << " ms" << std::endl;
    gpu_data << currNumElements << " " << gpu_time << endl;
     
    if(verify(pEnd, pZ, currNumElements))
    	cout << "Verification SUCCESS" << endl;
    else
    	cout << "Verification FAIL" << endl;
     
    /////////////////////////////////////////////////////////////////
    // Release host resources
    /////////////////////////////////////////////////////////////////
    cleanupHost();

    Thank you very much!

  2. #2
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Re: Problems in performing Saxpy with mapping/unmapping

    Hi, I found out that the exception is thrown by the clEnqueueMap, but not at the first iteration but at the second.
    From the code i posted I excluded the outer loop that increments currNumElements to test the trend of the performances by incrementing the size of the vectors.
    I think that the exception is due somewhat to the fact that I do not "release" or "unmap" the mapped memory. Can you help me?

  3. #3

    Re: Problems in performing Saxpy with mapping/unmapping

    Yes, for every clEnqueueMapBuffer you need to do a clEnqueueUnmapBuffer. It's just good 'balanced' programming to do this.

  4. #4
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Re: Problems in performing Saxpy with mapping/unmapping

    Do you mean clEnqueueUnmapMemObject, right? I can't find any UnmapBuffer...

  5. #5
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Re: Problems in performing Saxpy with mapping/unmapping

    I tried to put an enqueueUnmapMemObject but I continue to get an heap allocation exception. Now it is located right here:
    Code :
    double gpu_time = getTimer(frequency, gpu_start);
    cout << "GPU TIME (CPU timer) = " << gpu_time << " ms" << std::endl;
    gpu_data << currNumElements << " " << gpu_time << endl;
    That is, when I print to the console the gpu execution time. Anyway I think that the point where the exception is raised doesn't give much information about the mistake in the code.

    I uploaded the full source code here: http://www.gabrielecocco.it/SaxpyAllocAndCopyPtr.cpp

    Any suggestion?

  6. #6

    Re: Problems in performing Saxpy with mapping/unmapping

    Everything looks OK in general, and you should only unmap if you map:

    Code :
          if(BUFFER_MODE == CL_MEM_ALLOC_HOST_PTR)
          {
              queue.enqueueUnmapMemObject(bufY, pY);
          }

    You might want to check if your OpenCL implementation supports the data size you are requesting by getting the maximum allocation size

    Code :
        size_t deviceMaxAlloc = devices[0].getInfo<CL_DEVICE_MAX_MEM_ALLOC_SIZE>();
        cout << "CL_DEVICE_MAX_MEM_ALLOC_SIZE: " << deviceMaxAlloc << endl;

    I'm assuming your kernel looks something like this:

    Code :
        kernel void saxpy(global float *x, global float *y, const float a) {
            int i = get_global_id(0); 
            y[i] = a * x[i] + y[i];
        }

  7. #7

    Re: Problems in performing Saxpy with mapping/unmapping

    Oh, one other minor point... you don't need to issue a finish if all of the previous enqueued command set a blocking flag

    Code :
    //    queue.finish();

  8. #8

    Re: Problems in performing Saxpy with mapping/unmapping

    I ran you code (with some non-OpenCL adjustments for my implementation and OS ) and got

    Code :
    Testing GPU vs CPU with 128 elements
    Verification SUCCESS
    Testing GPU vs CPU with 256 elements
    Verification SUCCESS
    Testing GPU vs CPU with 512 elements
    Verification SUCCESS
    Testing GPU vs CPU with 1024 elements
    Verification SUCCESS
    Testing GPU vs CPU with 2048 elements
    Verification SUCCESS
    Testing GPU vs CPU with 4096 elements
    Verification SUCCESS
    Testing GPU vs CPU with 8192 elements
    Verification SUCCESS
    Testing GPU vs CPU with 16384 elements
    Verification SUCCESS
    Testing GPU vs CPU with 32768 elements
    Verification SUCCESS
    Testing GPU vs CPU with 65536 elements
    Verification SUCCESS
    Testing GPU vs CPU with 131072 elements
    Verification SUCCESS
    Testing GPU vs CPU with 262144 elements
    Verification SUCCESS
    Testing GPU vs CPU with 524288 elements
    Verification SUCCESS
    Testing GPU vs CPU with 1048576 elements
    Verification SUCCESS
    Testing GPU vs CPU with 2097152 elements
    Verification SUCCESS
    Testing GPU vs CPU with 4194304 elements
    Verification SUCCESS
    Testing GPU vs CPU with 8388608 elements
    Verification SUCCESS

  9. #9
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Re: Problems in performing Saxpy with mapping/unmapping

    Incredible, on my macbook it doesn't work I put the unmap after the calling to the function "verify", but at the second iteration I still get an heap allocation error.
    Are you sure to execute the mapping branch instead of the branch where the buffers are allocated using ALLOC_PTR | USE PTR?
    This means that the macro BUFFER_MODE must be set to CL_MEM_ALLOC_HOST_PTR:
    #define BUFFER_MODE CL_MEM_ALLOC_HOST_PTR.

    My kernel:
    Code :
    __kernel void saxpy(
    	const __global float * x,
    	__global float * y,
    	const float a) 
    {
    	uint gid = get_global_id(0);
    	y[gid] = a * x[gid] + y[gid];
    }

    Screenshot of the exception: http://www.gabrielecocco.it/mix/screenshot.jpg

    Your adjustments may do the job...

  10. #10
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Re: Problems in performing Saxpy with mapping/unmapping

    Ok I think I've found the problem. The exception is raised cause this piece of code (inside cleanupHost function):

    Code :
    	if(pX)
    	{
    		free(pX);
    		pX = NULL;
    	}
    This may sound quite right, since at the beginning I create a buffer with USE_HOST_PTR, passing pX ,and at the and I want to free the memory pointed by pX. In the middle I don't perform any Map/Unmap. Don't I have to tell OpenCL that I'm going to free a pointer that is passed to the buffer creation function?

Page 1 of 3 123 LastLast

Similar Threads

  1. Texture mapping
    By Giacomo in forum OpenGL ES general technical discussions
    Replies: 3
    Last Post: 06-28-2005, 08:04 AM
  2. question about texture mapping.
    By night96 in forum OpenGL ES general technical discussions
    Replies: 1
    Last Post: 05-29-2005, 07:05 PM
  3. Texture mapping in Opengl ES ?
    By gautam in forum OpenGL ES general technical discussions
    Replies: 4
    Last Post: 10-23-2004, 01:24 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •