Results 1 to 4 of 4

Thread: Is this caused by memory contention?

  1. #1
    Junior Member
    Join Date
    Mar 2010
    Posts
    14

    Is this caused by memory contention?

    I suspect that I experience problems with memory contention in the below setup. Do you agree? If so: can I do anything about it?

    I send two large arrays to the GPU (in form of read-only buffers) and each kernel computes some output value by performing a large bunch of lookups in a sub-area of each input array. I have run the program on an 8 core CPU, and on a 240 core GPU, but the CPU is still marginally faster than the GPU. However, if I perform an experiment in which I still provide the two large arrays as input, but replace the array lookup-code with some very local computation (without lookups in the arrays), the GPU is much faster than the CPU as it should be.
    So, doesn't this looks like a problem with memory contention as the only difference (as I see it) is the numerous array lookups? In that case: can I deal with this contention in some way?

    The arrays are transferred like this:
    bs1_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=numpy.array(bs1).astype(numpy.int32))

  2. #2
    Junior Member
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    21

    Re: Is this caused by memory contention?

    Where is the lookup table located ? It should be in the __local memory part or __constant. __local if you write them in the thread, else use constant (more space and could be given by the caller)
    Lookup tables are bandwich limitated.

  3. #3
    Junior Member
    Join Date
    Mar 2010
    Posts
    14

    Re: Is this caused by memory contention?

    OK, very interesting! How do I specify that the arrays should be located in "constant" memory? Or in "local" or "private" memory?

  4. #4
    Junior Member
    Join Date
    Mar 2010
    Location
    Germany
    Posts
    21

    Re: Is this caused by memory contention?

    If you use
    Code :
    __kernal void func(__constant int* cInts, __local int* lInts)
    you set the parameter into local or constant mem. Local mem is given just as cl_int parameter, which defines the size to be used in local mem.

Similar Threads

  1. Controlling GPU Contention.
    By billw in forum OpenCL
    Replies: 2
    Last Post: 03-30-2012, 01:47 AM
  2. finding out what call caused the INVALID_ENUM?
    By zed in forum WebGL - General
    Replies: 5
    Last Post: 11-26-2010, 01:20 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •