Results 1 to 3 of 3

Thread: Image2D objects in OpenCL and OpenCL kernel performance

  1. #1
    Junior Member
    Join Date
    Dec 2009
    Posts
    2

    Image2D objects in OpenCL and OpenCL kernel performance

    There are two alternatives to creating and populating an image object (texture) in OpenCL: a) Setting the CL_MEM_COPY_HOST_PTR flag in clCreateImage2D() or b) using the clEnqueueWriteImage() API.
    texImage = clCreateImage2D(GPUContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, &imageFormat, imageWidth, imageHeight, 0, inputData, &err);

    or

    texImage = clCreateImage2D(GPUContext, CL_MEM_READ_ONLY, &imageFormat, imageWidth, imageHeight, 0, 0, &err);
    size_t size3D[3] = {imageWidth, imageHeight,1};
    size_t size3DOrig[3] = {0, 0, 0};
    err = EnqueueWriteImage(commandQueue, texImage, CL_TRUE, size3DOrig, size3D, 0, 0, inputData, 0, NULL, NULL);
    Using the second alternative, the time to create and populate texture is similar to that of CUDA, while the first is 6 times slower? Also, the second alternative leads to atleast 3 times faster access to the texture data within the OpenCL kernel as compared to the first. Any idea why? Any differences in the locality?

    Also, in general, OpenCL kernel performance is 2-3 times slower than a CUDA kernel? Is this due to some overheads?

  2. #2
    Junior Member
    Join Date
    Dec 2009
    Posts
    2

    Re: Image2D objects in OpenCL and OpenCL kernel performance

    Btw, I am using NVIDIA's OpenCL implementation.

  3. #3
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Image2D objects in OpenCL and OpenCL kernel performance

    Those two approaches to initializing a memory object should give identical performance for accessing the image. (They do on MacOS X.) As long as you are careful to make sure that you don't accidentally set CL_MEM_USE_HOST_PTR (which can cause the runtime to have to do extra copying over the PCIe bus or use slower mapped system memory) you should be able to use either. I would suggest filing a performance bug against Nvidia in this regard.

    As to why kernels are a lot slower, it's most likely due to Nvidia's OpenCL being a lot newer than CUDA and hence less optimized. My understanding is that the compiler backend is similar, so my guess is that it has to do with performance issues in their runtime. I would again suggest filing a performance bug against the developer.

Similar Threads

  1. Image computing in OpenCL (image2D or array)
    By Elhassan in forum OpenCL
    Replies: 1
    Last Post: 11-20-2012, 12:34 AM
  2. Replies: 2
    Last Post: 01-26-2012, 07:32 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •