Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Matrix Multiplication

  1. #1

    Matrix Multiplication

    Hi,

    I want to test un matrix multiplication, because,i think, it's a good way to compare the GPU perf.(OpenCL) and CPU perf(OpenMP, in my case).

    So, first, I began with un simple vector addition :
    Code :
    __kernel void IntAdd(__global const int* a, __global const int* b, __global int* c)
     
    {
     
        int iGID = get_global_id(0);
     
        c[iGID] = a[iGID] + b[iGID];
     
    }

    A a can deduce that :
    c[0] = a[0] + b[0]
    c[1] = a[1] + b[1]
    ....

    Now, I'm trying the following kernel code :
    Code :
    #define N 32  // matrice carre
     
     
    __kernel void MatMult(__global const int* a, __global const int* b, __global int* c)
     
    {
     
        int row = get_global_id(0);
        int col = get_global_id(1);
     
        int Cres = 0;
        for(int i = 0;i< N; i++)
     
        {Cres += a[row*N + i ] * b[i*N + col];}
     
        c[row*N + col]= Cres;
     
    }

    I see that row={0,1,....15} and col is always 0. If I need col =1 , i should do col+1.
    , and I have this result on my terminal :

    Code :
    ##  A MATRIX  ##
        0    1    2    3
        4    5    6    7
        8    9   10   11
       12   13   14   15
     
    ##  B MATRIX  ##
        0    1    2    3
        4    5    6    7
        8    9   10   11
       12   13   14   15
     
    ##  C MATRIX  ##
        0    1    2    3
        4    5    6    7
        8    9   10   11
       12   13   14   15

    I can put the full code (single .cpp file) if necessary. compilation : "g++ DemoMatMult.cpp -o MatMult -lOpenCL".

    My questions are :
    - Do "get_global_id" works like said.
    - Of course, why the code doesn't work.

    Thank you in advance.

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Matrix Multiplication

    The values returned by get_global_id() are determined by the arguments passed to clEnqueueNDRangeKernel().

    In your case you want something like this:
    Code :
    size_t work_size[2] = {N, N};
     
    errcode = clEnqueueNDRangeKernel(queue, kernel, 2 /*two-dimensional ndrange */,
            NULL, &work_size[0], NULL, 0, NULL, NULL);

    With that, get_global_id(0) will return values from 0 to N-1 and get_global_id(1) will do the same.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3

    Re: Matrix Multiplication

    Thank you Mr Garcia.

    I do like you said, but I think my "main" problem was something else.

    Let me explain : matrix multiplication work with :

    Code :
    const char* program_source[] =
     
    {
        "__kernel void MatMult(__global const int* a, __global const int* b, __global int* c)",
     
        "{",
     
            "int row = get_global_id(0);",
            "int col = get_global_id(1);",
            "int Cres = 3;",
            "for(int i = 0;i< 4; i++)",
     
            "{Cres += a[row*4 + i ] * b[i*4 + col];}",
     
            "c[row*4+col]= Cres;",
     
     
        "}",
     
    };

    but, it doesn't work with a constant value (#define N 4 ):

    Code :
    ...
    #define N 4
    ...
    const char* program_source[] =
     
    {
        "__kernel void MatMult(__global const int* a, __global const int* b, __global int* c)",
     
        "{",
     
            "int row = get_global_id(0);",
            "int col = get_global_id(1);",
            "int Cres = 3;",
            "for(int i = 0;i< N; i++)",
     
            "{Cres += a[row*N + i ] * b[i*N + col];}",
     
            "c[row*N+col]= Cres;",
     
     
     
        "}",
     
    };

    I think " " " are not take account the value of N ?!
    I don't use the .cl file for now. Should I use it on this kind of situation.
    Thanks.

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Matrix Multiplication

    You need to #define N inside the kernel source.

    Code :
    const char* program_source =
        "#define N 4\n"
        "__kernel void MatMult(__global const int* a, __global const int* b, __global int* c)\n"
        ...
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5

    Re: Matrix Multiplication

    So simple. I had not thought of that.
    Thank you once again.

    (This) Problem solved.
    Best Regards!

  6. #6

    Re: Matrix Multiplication

    I think I can continue here for asking my question. If inappropriate, please correct.

    I did a comparison between my "optimized" matrix multiplication with OpenMP, and my "simple" mat. mult. with OpenCL.


    #define N 2048 // for both -> N*N matrices
    OpenMP :
    Code :
    ...
      #pragma omp parallel for private(i,j,k), shared(a,b,c), schedule(dynamic)
      for(i=0; i<N; i++)
      {
        for(k=0; k<N; k++)
        {
          for(j=0; j<N; j++)
          {
    	    c[i][j]+=a[i][k]*b[k][j];
          }
        }
      }
    ...
    compiled with gcc -O3

    , OpenCL
    Code :
    ...
    	const size_t global_work_size[2] = {N,N};
    	const size_t local_work_size[2] = {16,16};
     
    	result = clEnqueueNDRangeKernel (command_queue,
    					kernel,
    					2,
    					NULL,
    					&global_work_size[0],
    					&local_work_size[0],
                        0, NULL, NULL);

    OpenMP time = 1.125 s.
    OpenCL time = 5.532 s.

    i think the calculating is not so big, and it take time to transfer data to GPU, and we don't need to use it on this case.
    I don't know yet use the shared memory. Maybe I should continue by learning how to use shared memory.

  7. #7

    Re: Matrix Multiplication

    sorry, local memory for openCL

  8. #8
    Member
    Join Date
    Oct 2010
    Location
    Vancouver, Canada
    Posts
    66

    Re: Matrix Multiplication

    Quote Originally Posted by wrx
    i think the calculating is not so big, and it take time to transfer data to GPU, and we don't need to use it on this case.
    You could also try using the CPU device to see how that performs.

  9. #9

    Re: Matrix Multiplication

    OpenMP code use CPU only:

    $ time ./mult_matrix
    c[2047][2047] = 1449828352
    real 0m1.108s
    user 0m8.550s
    sys 0m0.030s

  10. #10
    Junior Member
    Join Date
    Feb 2011
    Posts
    5

    Re: Matrix Multiplication

    First of all the code written is in the wrong syntax .

    int iGID = get_global_id(0);In the get _global_id(),there should be no parameters passed.The error is thrown as now the function takes the 0 as a parameter and thus the entire logic changes

    moreover c[row*N + col]= Cres;

    is not the proper way to write again,as the variable should always be present in the LHS.and the constants or the working formula should be in the RHS

Page 1 of 2 12 LastLast

Similar Threads

  1. arbitrary size matrix multiplication
    By lxu in forum OpenCL
    Replies: 1
    Last Post: 02-13-2013, 02:17 PM
  2. Matrix multiplication question
    By BKB in forum OpenGL ES 2X - for programmable 3D graphics pipelines
    Replies: 1
    Last Post: 08-23-2011, 02:32 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •