Search:

Type: Posts; User: steveStevens

Search: Search took 0.00 seconds.

  1. The bug report was submitted btw, will update...

    The bug report was submitted btw, will update when I get reply from eng.
  2. With the C++ wrapper it's not necessary. There...

    With the C++ wrapper it's not necessary. There are calls to the destructors in the C++ template functions in cl.hpp.

    Thanks for running that, but I would still like to see AMD multi-GPU (2+ PCIe...
  3. Also, could someone with two or more AMD GPUs...

    Also, could someone with two or more AMD GPUs please try my code out and confirm that there is concurrency? A profiler output would be nice, either text or graphical.
  4. Yeah, it's still messed up. Just to confirm I've...

    Yeah, it's still messed up. Just to confirm I've installed a new driver today:



    $ ldd program
    linux-vdso.so.1 => (0x00007ffd147f7000)
    libOpenCL.so.1 =>...
  5. Yeah I think this confirms that the CUDA runtime...

    Yeah I think this confirms that the CUDA runtime is running as it's supposed to be:

    http://imgur.com/a/TPoL1

    So basically...my libOpenCL.so is fucked. I'm going to try 352.30 for linux x64....
  6. Yeah I could rewrite in CUDA. That won't take too...

    Yeah I could rewrite in CUDA. That won't take too long ( a day). Or just try a multi GPU example sure. It's frustrating because the spec (https://www.khronos.org/registry/cl/specs/opencl-1.1.pdf)...
  7. Nope, exactly the same thing. The edits I made to...

    Nope, exactly the same thing. The edits I made to simple_events:




    this->ocl_device_queues.push_back(
    cl::CommandQueue(this->ocl_context, this->ocl_devices[k],...
  8. Thanks for the reply. As far as I understand I'm...

    Thanks for the reply. As far as I understand I'm already doing what you suggested, no? Check out 115-117 and 148-171 in main.cc of branch simple_events.



    std::vector<cl::Buffer> ones;
    ...
  9. NVIDIA Multi Device Command Queue Concurrency Issue

    I'm struggling to understand why the execution of OpenCL enqueue* function calls is seemingly sequential in a multi-GPU environment with two independent CommandQueues. I have two GTX 780s. I'm using...
  10. Tesla C2050 - OpenCL - Kernel Concurrency Issue

    Hi All,

    This problem has some complex background so I will attempt to abstract as much as possible. I'm posting here as well as on the OpenCL forums because my problems are occurring with use of...
  11. Replies
    1
    Views
    1,234

    Releasing Memory, Kernels, Devices etc

    I thought I read somewhere, (though, for the life of me, I can't find the source), that, using the C++ API you don't have to release devices/kernels/memory like w/ the C API as the destructors for...
  12. Replies
    2
    Views
    1,552

    I guess some pictures would help. Here is exactly...

    I guess some pictures would help. Here is exactly what is happening in both programs.

    1) generate gaussian vector
    2) zero pad gaussian vector to next next highest power of 2 length
    3) forward...
  13. Replies
    2
    Views
    1,552

    As a follow up, I'm also wondering if maybe the...

    As a follow up, I'm also wondering if maybe the problem is the number of intra-kernel operations I've got going on is too many for just using global memory and registers, (is that even a thing? a...
  14. Replies
    2
    Views
    1,552

    Enqueue/Finish Scheme for FFT

    This is likely the first part of 2 posts related to some trouble I have involving an FFT signal cross correlation module I'm creating, (makes use of circular convolution theorem, etc etc). I'd like...
  15. Re: enequeueNDRangeKernel - parallel execution on OpenCL dev

    I can't seem to find this in the spec, but does cl::CommandQueue::finish() also perform the functionality of flush()?

    I'm guessing I can do away with the event vector altogether and just have 3,...
  16. Re: enequeueNDRangeKernel - parallel execution on OpenCL dev

    I just did some more investigating, and came up with the following:

    std::vector<cl::CommandQueue> deviceQueues;
    std::vector<cl::Event> eventVector;

    // Global Range:
    cl::NDRange...
  17. Re: enequeueNDRangeKernel - parallel execution on OpenCL dev

    So I've been trying to allocate this stuff dynamically like so:

    std::vector<cl::CommandQueue> deviceQueues;

    cl::NDRange globalRange(d2 /nDevices);

    cl::NDRange localRange(LOOP_UNROLL);
    ...
  18. Re: enequeueNDRangeKernel - parallel execution on OpenCL dev

    Gorgeous response, thank you.
  19. enequeueNDRangeKernel - parallel execution on OpenCL device?

    say I have n openCL devices, and that the data, of size d2 has been partitioned into sections such that it complements compute topology, memory buffers have been allocated, etc.

    Given something...
Results 1 to 19 of 21