Results 1 to 8 of 8

Thread: mapping buffer host on GPU device

  1. #1
    Junior Member
    Join Date
    Oct 2010
    Posts
    11

    mapping buffer host on GPU device

    hi, I'm trying to perform calculations on a sequence of images. The images are written on a single buffer at intervals of 10 seconds which I apply the Sobel filter.
    Instead of changing the buffer on the host, you can edit the buffer of the GPU? In doing so, I would always avoid creating host buffer and then write them on the device.

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: mapping buffer host on GPU device

    First, create your buffer using clCreateImage2D() and pass CL_MEM_ALLOC_HOST_PTR as the "flags" argument.

    Now you can use clEnqueueMapBuffer()/clEnqueueUnmapBuffer() get a pointer to the image data. With it you can overwrite the contents of the image as easily as if it was a buffer in the host. That way you can avoid a memory copy because you are writing directly into device memory.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Oct 2010
    Posts
    11

    Re: mapping buffer host on GPU device

    thank you david...
    i created the buffer using clCreateBuffer. if i want use clEnqueueMapBuffer() / clEnqueueUnmapBuffer() , i need again to create two buffer of clCreateBuffer type for host input e output?
    This is my code
    Code :
    cl_mem bufferIn = clCreateBuffer (context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, imageW * imageH * 4, NULL, &ciErr);
    cl_mem bufferOut = clCreateBuffer (context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, imageW * imageH * 4, NULL, &ciErr);
    unsigned char * dataIn;
    dataIn = (BYTE *) clEnqueueMapBuffer(clComQueue, bufferIn, CL_TRUE, CL_MAP_WRITE, 0,  imageW * imageH * 4, 0, NULL, NULL, &ciErr)

    After, i have this function where i pass the pointer of image buffer
    Code :
     
    BYTE * imageIn;
    unsigned int * imageOut;
    unsigned int imageW;
    unsigned int imageH;
     
    imageIn = (BYTE *) display1.bitmap.GetBits();
    display1.bitmap.ReleaseBits();
    imageW = display1.bitmap.myBmpWidth;
    imageH = display1.bitmap.myBmpHeight;
    szOut = imageW * imageH * sizeof (unsigned int);
    imageOut = (unsigned int * )malloc(szOut);
     
    ExecuteSobel ((BYTE*)imageIn,(BYTE*)imageOut);
     
    void ExecuteSobel ( BYTE* pImgIn, BYTE* pImgOut)
    {
    ciErr = clEnqueueWriteBuffer(clComQueue, bufferIn, CL_TRUE,  imageW * imageH * 4, imageIn, 0, NULL, NULL);
    ciErr = clEnqueueNDRangeKernel(clComQueue, clKernel, 2, NULL, szGlobalWorkSize, szLocalWorkSize, 0, NULL, NULL);
    ciErr = clEnqueueReadBuffer(clComQueue, bufferOut, CL_TRUE,  imageW * imageH * 4, pImgOut, 0, NULL, NULL);
     
    clEnqueueUnmapMemObject(clComQueue, bufferIn, imageIn, 0, NULL, NULL);
    }

    i know that in some part i'm wrong.
    Please help me!!

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: mapping buffer host on GPU device

    The reason for using clEnqueueMapBuffer() is that you no longer need to use clEnqueueReadBuffer/clEnqueueWriteBuffer any more, but it requires that whoever is writing the image data into the buffer needs to let you pass a pointer where you want the data to be written. In your example, the input data comes from display1.bitmap.GetBits(), which allocates its own memory and gives you a pointer to it.

    At least you can benefit when you read back the output data this way:

    Code :
    [...]
    bufferOut = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR, imageW * imageH * sizeof (cl_int), NULL, &errCode);
     
    ciErr = clEnqueueNDRangeKernel(clComQueue, clKernel, 2, NULL, szGlobalWorkSize, szLocalWorkSize, 0, NULL, NULL);
     
    imageOut = clEnqueueMapBuffer(queue, bufferOut, CL_TRUE, CL_MAP_READ, 0, szOut, 0, NULL, NULL, &errCode);
     
    // here you can read the data from imageOut
     
    errCode = clEnqueueUnmapMemObject(queue, bufferOut, imageOut, 0, NULL, NULL);
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Oct 2010
    Posts
    11

    Re: mapping buffer host on GPU device

    But in this way i must change also the kernel arguments?
    Before using clEnqueueMapBuffer i have used two buffer, one to pass the image to the kernel function, and one to take the image in output.

    Code :
    ciErr  = clSetkernelArg(clKernel, 0 , sizeof(cl_mem) , (void*)&bufferIn);
    ciErr |= clSetkernelArg(clKernel, 1 , sizeof(cl_mem) , (void*)&bufferOut);
    ciErr |= clSetkernelArg(clKernel, 2 , sizeof(cl_uint) ,  (void*)&imageW);
    ciErr |= clSetkernelArg(clKernel, 3 , sizeof(cl_uint) ,  (void*)&imageH);

  6. #6
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: mapping buffer host on GPU device

    The kernel arguments do not change. The same buffers are used: one for input and one for output. The only difference is that now instead of calling clEnqueueReadBuffer to read the output buffer, now you will use a mapbuffer/unmapbuffer operation, which can save you a data transfer.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  7. #7
    Junior Member
    Join Date
    Oct 2010
    Posts
    11

    Re: mapping buffer host on GPU device

    i tryed to use clenqueueMapBuffer/clenqueueUnmapMemObj and the test was good. i have compared the result and if i use clenqueueMapBuffer/clenqueueUnmapMemObj the time execution is reduced to about 50% (create input and output buffer,map the input,execute,map the output,release map output).

    Now i would like know if is possibile improve more the time execution. this is my code.

    Code :
     
    ...
     
    if (memMode == PINNED)
    	{
    		// memMode == PINNED
    		bufferIn  = clCreateBuffer(context, CL_MEM_READ_WRITE, szBufferIn , NULL, &ciErr);
    		bufferOut = clCreateBuffer(context, CL_MEM_READ_WRITE, szBufferOut, NULL, &ciErr);
    	}
    	else 
    	{	
    		// Standard host allocation (malloc)
    		// memMode == PAGEABLE	
    		bufferIn  = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, szBufferIn , NULL, &ciErr);
    		bufferOut = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, szBufferOut, NULL, &ciErr);
    	}
     
    	if (accMode == MAPPED)
    	{
    		// Get mapped pointers for writing to pinned input and output host image pointers 
    		// accMode = MAPPED
    		cDataIn  = (BYTE *)clEnqueueMapBuffer(clCommandQueue, bufferIn , CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, szBufferIn , 0, NULL, NULL, &ciErr);
    		//cDataOut = (BYTE *)clEnqueueMapBuffer(clCommandQueue, bufferOut, CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, szBufferOut, 0, NULL, NULL, &ciErr);
    	}
    	else 
    	{
    		// DIRECT: API access to device buffer
    		// accMode = DIRECT
    	}
     
    ...
     
    Execute(BYTE* pImgIn, BYTE* pImageOut)
    {
     
    	if (!pImageOut) 
    	{
    		char log[10240] = "ERRORE BUFFER";
    		MessageBox(NULL,log,"pImageOut NULL",MB_OK);
    		return;
    	}
     
    	if (accMode == MAPPED)
    	{
    		//cDataIn  = (BYTE *)clEnqueueMapBuffer(clCommandQueue, bufferIn , CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, szBufferIn , 0, NULL, NULL, &ciErr);
    		dt.Start();
    		memcpy(cDataIn, pImgIn, szBufferIn);
    		writeBuffer = dt.Check();
    		//ciErr = clEnqueueUnmapMemObject(clCommandQueue, bufferIn , cDataIn,NULL,NULL,NULL );
     
    		// Launch kernel
    		dt.Start();
    		ciErr = clEnqueueNDRangeKernel(clCommandQueue, clKernel, 2, NULL, szGlobalWorkSize, szLocalWorkSize, 0, NULL, &e_runCode);
    		clWaitForEvents(1, &e_runCode);
    		runCode = dt.Check();
     
    		cDataOut = (BYTE *)clEnqueueMapBuffer(clCommandQueue, bufferOut, CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, szBufferOut, 0, NULL, NULL, &ciErr);
    		dt.Start();
    		memcpy(pImageOut, cDataOut, szBufferOut);
    		readBuffer = dt.Check();
    		ciErr = clEnqueueUnmapMemObject(clCommandQueue, bufferOut, cDataOut, NULL,NULL,NULL );
    	}
    	else 
    	{
    		// Copy input data from host to device
    		dt.Start();
    		ciErr = clEnqueueWriteBuffer(clCommandQueue, bufferIn, CL_TRUE, 0, szBufferIn, pImgIn, 0, NULL, &e_writeBuffer);
    		clWaitForEvents(1, &e_writeBuffer);
    		writeBuffer = dt.Check();
     
    		// Launch kernel
    		dt.Start();
    		ciErr = clEnqueueNDRangeKernel(clCommandQueue, clKernel, 2, NULL, szGlobalWorkSize, szLocalWorkSize, 0, NULL, &e_runCode);
    		clWaitForEvents(1, &e_runCode);
    		runCode = dt.Check();
     
    		// Copy results back to host, block until complete
    		dt.Start();
    		ciErr = clEnqueueReadBuffer(clCommandQueue, bufferOut, CL_TRUE, 0, szBufferOut, pImageOut, 0, NULL, &e_readBuffer);
    		clWaitForEvents(1, &e_readBuffer);
    		readBuffer = dt.Check();
    	}
    }

    This code functions but if i create the map of the output buffer in the same moment when i create the map for the input buffer the kernel code doesn't function. Why?

  8. #8
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: mapping buffer host on GPU device

    If I understood correctly your code, then the reason you can't map the output buffer before the execution of the kernel is because map/unmap operations are synchronization points. When you map/unmap a pointer, the OpenCL driver may implicitly perform a memory copy from the host to the device (or vice-versa) through PCIe.

    This is why if you map the output buffer before the GPU has written into it you will get invalid data out of it. For the same reason you should unmap the input buffer before you enqueue the NDRange. That will ensure that the input data is copied to the GPU before the range is executed.

    Also I have another suggestion: be careful with the read/write flags you pass to clEnqueueMapBuffer(). It can save you a lot of performance.

    Try this for example:

    Code :
    cDataIn  = (BYTE *)clEnqueueMapBuffer(clCommandQueue, bufferIn , CL_TRUE, CL_MAP_WRITE, 0, szBufferIn , 0, NULL, NULL, &ciErr);
    cDataOut = (BYTE *)clEnqueueMapBuffer(clCommandQueue, bufferOut, CL_TRUE, CL_MAP_READ, 0, szBufferOut, 0, NULL, NULL, &ciErr);

    By marking the input buffer as write-only you avoid making a copy from the device to the host when the buffer is mapped. By marking the input buffer as read-only you avoid a copy from the host to the device when the buffer is unmapped.

    I hope this all makes sense to you. Feel free to ask any other questions and I will try to answer.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Replies: 3
    Last Post: 05-12-2010, 03:09 PM
  2. mapping buffers vs using host pointers
    By acolubri in forum OpenCL
    Replies: 5
    Last Post: 08-08-2009, 03:28 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •