Results 1 to 7 of 7

Thread: Starter: matrix mul

  1. #1
    Junior Member
    Join Date
    Apr 2010
    Posts
    6

    Starter: matrix mul

    This is not a real problem. It is a simple starter question about OpenCL memory model.
    (I know basics to run a simple kernel with OpenCL)

    I want to multiply 2 really big matrices.
    30000x30000 x 30000x30000

    ONE THREAD CPU:
    They don't fit on physical RAM but multiplication can be transparent for C++ because of big swap file. Speed is very low off-course.

    OPENCL:
    What approach is used? It is transparent or I must slice the matrices?
    I think this:
    - Move to GPU first row-vector from matrix A.
    - Move to GPU N column-vectors from matrix B.
    - Create N elements of matrix C in first row.
    - Move to GPU next N column-vectors from matrix B.
    - Create next N elements of matrix C in first row.
    - .......
    - Move to GPU next row-vector from matrix A.
    - .......

    Are all of these needed, or I loose something transparent?
    The problem with above code: you don't know in which device will be executed, so you don't know if the device has enough available memory.

    I am in a mess!

  2. #2
    Member
    Join Date
    Mar 2010
    Location
    Raleigh, NC
    Posts
    55

    Re: Starter: matrix mul

    That's (un)fortunately the good and bad part about OpenCL. There is no built-in matrix multiplication in the OpenCL standard. Additionally, there is no restrictions on the amount of memory that you can use. It is up to the software and hardware developers to control memory, performance, etc.

    In your hypothetical situation, you'd have to develop your own algorithm for the matrix multiplication. You'd want to slice the matrices, load them into local memory, and perform the multiplication. How you partition them depends on the device(s) you have in your system.

    I'd check out NVIDIA's and AMD's SDKs with OpenCL. I believe both of them have some pretty involved matrix multiply examples. Matrix multiply is kind of the "hello world" of OpenCL writers

  3. #3
    Junior Member
    Join Date
    Apr 2010
    Posts
    6

    Re: Starter: matrix mul

    Hmmmm...
    The question is simpler.

    For matrix multiplication, I have 2 big matrices which not fit in physical GPU memory (because matrices have size 10GB and GPU memory is 1GB).

    This is handled from Vendor's implementation of OpenCL or I must handle this in my code?

    Thanks pal!

    PS: They don't fit either in physical CPU RAM, but swap file helps here.

  4. #4
    Junior Member
    Join Date
    Jan 2012
    Posts
    4

    Re: Starter: matrix mul

    I've a small doubt here. If a swap file is used for a 10Gigs of data. Won't you lose the performance that you gain out of the GPU? i.e., saving and retrieving data to and from the swap file won't cost you much?

  5. #5
    Junior Member
    Join Date
    Jan 2012
    Posts
    2

    Re: Starter: matrix mul

    Hi,

    try to decompose the big matrix into smaller sub-blocks. A good presentation of this technique is given in the CUDA C Programming Guide.
    The NVIDIA SDK also has a matrix multiplication example in OpenCL.

    I.

  6. #6
    Junior Member
    Join Date
    Apr 2010
    Posts
    6

    Re: Starter: matrix mul

    Yes, I saw this approach.
    So, the implementation of OpenCL can handle any size of arrays.
    There is no GPU hardware limit in global array size.
    'Global' GPU memory block can be also in system RAM or in hard disk swap file or when the OpenCL implementation believes it is efficient to store data that doesn't fit in GPU memory.
    Am I correct?

  7. #7
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Starter: matrix mul

    'Global' GPU memory block can be also in system RAM or in hard disk swap file or when the OpenCL implementation believes it is efficient to store data that doesn't fit in GPU memory.
    That is rather unlikely if your device is a GPU. What is going to happen is that when you attempt to allocate memory for a huge matrix it will return CL_OUT_OF_RESOURCES.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Completely Starter in OpenCL
    By chameleon in forum OpenCL
    Replies: 1
    Last Post: 12-27-2011, 01:16 AM
  2. OpenGL ES Starter
    By GoodBud in forum OpenGL ES general technical discussions
    Replies: 8
    Last Post: 01-13-2006, 05:47 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •