Results 1 to 3 of 3

Thread: Cross-device bandwidth for discrete GPU (HD 5870)

  1. #1
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Cross-device bandwidth for discrete GPU (HD 5870)

    Hi,
    I'm testing a system equipped with a Fusion A8-3850 and an HD 5870 gpu. I was planning to test the memory access bandwidth in the following cases:

    1) The discrete GPU (HD 5870) reads from a buffer allocated in the host memory (CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY)
    2) The integrated GPU (6550D) reads from a buffer allocated in the host memory (CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY)

    Reads are performed linearly (each thread reads a fixed-size memory range starting from its own global index).

    I was assuming that the result of the first test (discrete gpu) would have never been higher than the PCI-express bandwidth (approx 8GB/s), but I'm getting a bandwidth that is around 40 GB/s.
    I'm checking the bandwidth by using both the GlobalMemoryTest sample shipped with the AMD SDK and a program written by myself. The results are very similar.

    Can you explain me if it is (and why it is) possible to get a cross-domain (gpu->cpu) read bandwidth higher than the PCI one from a discrete GPU?.

    Thank you very much!

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Cross-device bandwidth for discrete GPU (HD 5870)

    Can you explain me if it is (and why it is) possible to get a cross-domain (gpu->cpu) read bandwidth higher than the PCI one from a discrete GPU?.
    CL_MEM_ALLOC_HOST_PTR doesn't guarantee that the memory is allocated in any particular place. All it guarantees is that calls to clEnqueueMapBuffer() and clEnqueueMapImage() will not return CL_MAP_FAILURE.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Aug 2011
    Posts
    25

    Re: Cross-device bandwidth for discrete GPU (HD 5870)

    Quote Originally Posted by david.garcia
    Can you explain me if it is (and why it is) possible to get a cross-domain (gpu->cpu) read bandwidth higher than the PCI one from a discrete GPU?.
    CL_MEM_ALLOC_HOST_PTR doesn't guarantee that the memory is allocated in any particular place. All it guarantees is that calls to clEnqueueMapBuffer() and clEnqueueMapImage() will not return CL_MAP_FAILURE.
    Oh
    In the last hour I set up a succinct test for the problem I encountered.

    Here is the link to the source code:
    Host code: http://www.gabrielecocco.it/fusion/SimpleMemoryTest.cpp
    Kernel: http://www.gabrielecocco.it/fusion/memory_test.cl

    And here is the output of the test (150GB/s for the 5870, 42 GB/s for the 6550D, 14GB/s for the CPU)
    C:\Users\gabriele\Desktop\CpuGpuTesting\Release>Si mpleMemoryTest.exe
    - Tested devices listed below
    Cypress[GPU]
    BeaverCreek[GPU]
    AMD A8-3800 APU with Radeon(tm) HD Graphics[CPU]

    - Creating opencl environment for each tested device...
    Getting platform id... DONE!
    Searching device (Cypress)... DONE!
    Creating context... DONE!
    Creating command queue... DONE!
    Loading kernel file... DONE!
    Creating program with source... DONE!
    Building program... DONE!
    Creating kernel read_linear DONE!

    Getting platform id... DONE!
    Searching device (BeaverCreek)... DONE!
    Creating context... DONE!
    Creating command queue... DONE!
    Loading kernel file... DONE!
    Creating program with source... DONE!
    Building program... DONE!
    Creating kernel read_linear DONE!

    Getting platform id... DONE!
    Searching device (AMD A8-3800 APU with Radeon(tm) HD Graphics)...DONE!
    Creating context... DONE!
    Creating command queue... DONE!
    Loading kernel file... DONE!
    Creating program with source... DONE!
    Building program... DONE!
    Creating kernel read_linear DONE!

    - Testing Cypress [GPU] (16777216 bytes buffer, 32 reads per thread)
    Estimated bandwidth: 151460.05 MB/s (success = 1)

    - Testing BeaverCreek [GPU] (16777216 bytes buffer, 32 reads per thread)
    Estimated bandwidth: 42080.92 MB/s (success = 1)

    - Testing AMD A8-3800 APU with Radeon(tm) HD Graphics [CPU] (16777216 bytes buffer, 32 reads per thread)
    Estimated bandwidth: 14809.57 MB/s (success = 1)

    - Test ended. Press a key to exit...

    -----------------

    So, should I desume that the buffer is placed on the GPU even if I specify the flags CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY?

    Thank you for your help!!!

Similar Threads

  1. cross platform/gpu floating point precision
    By sleap in forum WebGL - General
    Replies: 1
    Last Post: 05-24-2012, 02:53 AM
  2. Replies: 3
    Last Post: 05-12-2010, 03:09 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •