Search:

Type: Posts; User: bmerry

Page 1 of 2 1 2

Search: Search took 0.01 seconds.

  1. Replies
    2
    Views
    1,428

    If you're doing this as a way to optimise your...

    If you're doing this as a way to optimise your code, you should use the Computer Profiler, which can give you a lot of detail about your kernels and the GPU performance counters. NVIDIA seem to have...
  2. Replies
    5
    Views
    1,337

    I suspect that you misunderstand what OpenCL and...

    I suspect that you misunderstand what OpenCL and GPUs are capable of. Code (such as OpenSSL) written for a CPU cannot just run on a GPU - it has to be rewritten.
  3. Yes, I believe that is what CL_MEM_USE_HOST_PTR...

    Yes, I believe that is what CL_MEM_USE_HOST_PTR is intended for. Another alternative is to use CL_MEM_ALLOC_HOST_PTR and then map the buffer afterwards - although that might not guarantee suitable...
  4. If it helps, I don't think there is anything...

    If it helps, I don't think there is anything wrong with your program. I ran it on a GeForce 480 GTX and it runs in about a second. So it seems more likely to be some sort of system problem. Since...
  5. Replies
    32
    Views
    16,220

    Sticky: Note that you can use malloc, VirtualAlloc,...

    Note that you can use malloc, VirtualAlloc, whatever, IF the platform supports fine-grained system SVM. I don't know the details but as far as I know it isn't always possible to support fine-grained...
  6. Replies
    1
    Views
    998

    Re: Regarding the work group size

    It's entirely up to the OpenCL driver. If you're using an NVIDIA GPU then the timeline profiler can tell you the actual work-group size (provided your driver and CUDA installation are sufficiently...
  7. Replies
    2
    Views
    1,000

    Re: write_imagef and image1d

    I don't know why it isn't working, but when you get build failures you should definitely look at the build log to see what the compiler is telling you. There are also command-line tools for this...
  8. Replies
    2
    Views
    1,015

    Re: Infinite loop invalidating command queue

    As far as I know there is no standardized TDR in X11, but individual vendors might still implement it. For example, this is from the README in NVIDIA drivers:

    Option "Interactive" "boolean"

    ...
  9. Replies
    1
    Views
    1,245

    Re: Cache flushing after 'clEnqueueNDRangeKernel'

    That is entirely a function of the specific hardware and driver you're using. Firstly, it has no effect on correctness - caches are an implementation detail that are invisible apart from their effect...
  10. Fermi: overlapping kernels from an in-order queue

    Hi

    I've been trying to profile some code by querying CL_PROFILING_COMMAND_START / CL_PROFILING_COMMAND_STOP on every event and adding up the times. However, I'm getting back overlapping intervals...
  11. Replies
    1
    Views
    1,014

    Re: partial reductions

    I suspect you're using the term "reduction" to mean something different to what it normally means in parallel programming. What you're doing looks more like partitioning.

    Are there other data...
  12. Replies
    3
    Views
    1,478

    Re: warp size vs # of SPs per SM

    I'm guessing you have a 2nd-gen Fermi (cc 2.1). The scheduling on those is a little weird and I don't entirely have my head around it myself, but if you read the CUDA C Programming Guide appendix on...
  13. Thread: texture memory

    by bmerry
    Replies
    1
    Views
    1,066

    Re: texture memory

    Take a look at the clEnqueueWriteImage function.
  14. Replies
    1
    Views
    1,236

    Re: explicit copy from host to device

    If you're passing CL_MEM_COPY_HOST_PTR when creating the buffer, the implementation will take care of doing the copy (and it will do it synchronously). In this situation you can't pass a NULL...
  15. Replies
    2
    Views
    1,091

    Re: set image format

    Unextended OpenCL doesn't have a mechanism to share device memory between a buffer and an image. You need to refer to the cl_khr_image2d_from_buffer extension in the OpenCL 1.2 extensions...
  16. Replies
    3
    Views
    1,823

    Re: running on GPU but not on CPU

    There isn't enough information there to be able to help. At the very least you'll need a stack trace plus information about your OpenCL driver, some code, your compilation options and so on... but...
  17. Re: OpenCL performances on NVIDIA GTX 260 and ATI Radeon HD

    Are you sure that's your code? I don't see how that can compile given that pLocal is never defined. I also don't see how it can be computing a dot product, given that it outputs an array rather than...
  18. Replies
    2
    Views
    9,705

    Re: Multiple access to global memory

    Optimization is very specific to the hardware you're targeting, and also to the problem. Without much more detail you're only going to get vague answers. Some of the things that are generally a good...
  19. Determining the real amount of local memory available

    Hi

    I'm trying to maintain some code that automatically selects some compilation parameters by querying the device for maximum work group size, maximum local memory size and so on. It's not a true...
  20. Replies
    1
    Views
    1,656

    Re: Linking Kernel with Static Library

    If I understand your question right, this .a file is standard old CPU code, that you would like to call from inside a kernel? I doubt there is any portable way to do that, precisely because the...
  21. Replies
    1
    Views
    1,347

    Re: Weird behavior of my kernel

    The code is pretty much unreadable in that form. If you take out the string quoting, indent it, and explain how it works there is more chance that somebody might take the time to understand it. If...
  22. Replies
    1
    Views
    1,274

    Re: The most bizarre error

    The NVIDIA drivers cache kernels they've seen before, and I've definitely come across bugs in the caching before (mostly with #include). It might be that adding/removing the space affects whether...
  23. Re: Access to Intermediate Language under Xcode / 10.7 ???

    Under Linux on NVIDIA (and possibly Windows, never tried) you get back PTX, so if you can get access to such a machine it might help. Of course, the fact that the OSX implementation is doing...
  24. Replies
    2
    Views
    1,790

    Re: string sorting in opencl

    By the way, Google for "suffix array prefix doubling" to see the algorithm I'm referring to. I saw that in a previous post you'd used the brute-force algorithm, which is probably going to be much...
  25. Replies
    2
    Views
    1,790

    Re: string sorting in opencl

    Radix sort operates on integers rather than strings, so you'll need to construct an integer key for each string. If you implement the Burrows-Wheeler Transform using the suffix array approach then...
Results 1 to 25 of 33
Page 1 of 2 1 2