Results 1 to 3 of 3

Thread: Using non-square rectangular blocking for a matrix multiplication kernel

Hybrid View

  1. #1
    Newbie
    Join Date
    Oct 2013
    Posts
    2

    Using non-square rectangular blocking for a matrix multiplication kernel

    I have been working with a kernel that does matrix multiplication.

    The kernel is very much like the the common examples on matrix multiplication (can't post a URL to it yet)

    It uses 16 x 16 blocksizes. I have read that one could use rectangular block sizes (but that always seems to be "an exercise left to the reader")

    When I try them I am routinely getting -5 errors, so I know I am going somewhere I shouldn't.

    I assume I am not quite understanding how I am accessing the LOCAL (shared) memory, as well, I am not sure if the block is only relative to the output or actually either or both of the input matrices.

    Can someone point me to a reference that might help me, or an example of a matrix multiplication that does in fact use rectangular blocking?

    Thanks.

  2. #2
    Newbie
    Join Date
    Oct 2013
    Posts
    2
    OK, figured it out. For what I did the blocking had to be evenly divisible one by the other, and at least in the first case the width had to be greater than/equal to the height.

  3. #3
    Newbie
    Join Date
    Nov 2013
    Posts
    1
    See Volkov's paper on matrix multiplication in CUDA

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •