Results 1 to 2 of 2

Thread: Moving from float to float4: What should be changed in the host code ?

  1. #1
    Junior Member
    Join Date
    Oct 2013
    Posts
    22

    Moving from float to float4: What should be changed in the host code ?

    Hello,

    My input to the kernel is 4 x 2D matrices each contains 256x32 float numbers.
    The size of the output is the same.

    So in the host I called to:

    Code :
    size_t dim = 2;
    size_t global_offset[] = {0, 0};
    size_t global_size[] = {4 , 256 * 32};
     
    err = clEnqueueNDRangeKernel(queue, kernel, dim, global_offset,
    			 global_size, 0, 0 ,NULL, &prof_event);

    I dicided that each element in the output will be a work item.
    Not sure it is wise.

    The kernel function is:

    Code :
    __kernel void id_check(__global float *in,
    						__global float *out,
    						int n_in_matrices,
    						int n_out_matrices)


    In order to run faster I changed to:

    Code :
    __kernel void id_check(__global float4 *in,
    						__global float4 *out,
    						int n_in_matrices,
    						int n_out_matrices)

    Of course that I changed the code of the kernel so that 4 elements will be processed at single clock.

    In both cases I got the same results and the same processing time.
    It does not make sense !!!
    The second version should work 4 times faster.

    What should I change in the host code ?

    Thanks,
    Zvika

  2. #2
    Junior Member
    Join Date
    Oct 2013
    Posts
    22
    Hello,

    I found the problem.
    I should change global_size to:
    size_t global_size[] = {4 , 256 * 32 / 4};

    Thanks,
    Zvika

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •