Results 1 to 1 of 1

Thread: Manually optimizing OpenCL/CUDA intermediate code !

  1. #1
    Junior Member
    Join Date
    Aug 2012
    Posts
    18

    LLVM: Manually optimizing OpenCL/CUDA intermediate code !

    Hi,

    I am interested to optimize OpenCL code, in this regards i went through some OpenCL optmization guide book which says that there are following things you should consider while optimizing your code:
    1. Device utilization and occupancy:- it is required to launch as many blocks as possible to get optimal occupancy and to hide memory latency.
    2. Maximize Memory Bandwidth:- by minimizing the data transfer and by using overlapping of data transfer with device computation.
    3. Shared Memory:- Use shared memory when you need to access data more than once either within the same thread or from different thread within a block.

    There may be few more things to consider while optimizing:

    my questions are:
    1. what can be the other possibilities to optimize OpenCL/CUDA code?
    2. Is there any way to manually optimize IR code generated by OpenCL/CUDA compiler? If yes then what are the procedure to do this?
    3. One more thing I want to know about CUDA terminology is that why we have concept of warps/blocks/grids?
    4. OpenCL guarantees that its programs are portable but it does not guarantee of having optimum performance across different vendor's device, so if I want to get optimum performance across different vendor's device then how should I approach?
    5. Can we modify LLVM IR code generated by OpenCL to optimize my code?

    Thanks !!
    Last edited by Gopal_HC; 09-04-2013 at 12:16 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •