Type: Posts; User: chippies

Page 1 of 4 1 2 3 4

Search: Search took 0.00 seconds.

  1. You could use a parallel reduction with the min...

    You could use a parallel reduction with the min operator (instead of sum). Your reduction would simply keep track of the minimum value and the index of that value. If you only care about the...
  2. Are you sure you posted the right kernel? That...

    Are you sure you posted the right kernel? That one doesn't compile because kernels must have a return type of void (your's is missing a return type) and the identifier 'kernel' is a keyword...
  3. The Intel OpenCL C compiler tries to vectorise...

    The Intel OpenCL C compiler tries to vectorise your kernel, that's why the preferred vector size is always 1. If I remember one of their webinars correctly, the compiler will try to group work items...
  4. Replies

    One warning though, this may sometimes result in...

    One warning though, this may sometimes result in the compiler automatically removing sections of your kernel because their results are no longer used. A very simple example is that if you comment...
  5. Re: copying a variable from host memory to device memory

    Since you want to copy the result back to host memory, you will have to store the value in a cl_mem object. You will have to allocate an array of one element and an OpenCL buffer with enough bytes...
  6. Replies

    Re: 2D Arrays in local functions

    There is no new keyword in OpenCL. So the syntax will always be int oneDArray[100] for a 1D array, int twoDArray[5][5] for a 2D array, etc. Hence you would want to use uint z[5][5][5] I think. You...
  7. Re: enequeueNDRangeKernel - parallel execution on OpenCL dev

    You are right that you can enqueue the kernels in a loop over all the devices. This will result in the devices processing the data concurrently.

    You find out when a kernel has finished by using...
  8. Replies

    Re: OpenCL not finding devices as dll

    I don't have experience with your problem, but perhaps it would be easier if you used the exisiting .Net wrappers:
    The Open Toolkit library:...
  9. Replies

    Re: System freeze on kernel execution

    It is absolutely normal for this to cause your system to freeze. Your original code would access some part of GPU memory and write to it. This could overwrite some piece of data that is used by a...
  10. Replies

    Re: System freeze on kernel execution

    Your problem lies here:

    character = (unsigned char *)odata[0];

    This code takes the integer stored at location 0 in odata and casts it to a pointer to an unsigned char. I think you wanted...
  11. Replies

    Re: subbuffes + 1.0

    That sounds like a bug in the OpenCL implementation. Perhaps filing a bug report on the Khronos bug tracker will help, but since it is Nvidia, there is little hope for that.
  12. Replies

    Re: how to run on cpu graphics card

    It will run on the CPU. If you want the GPU that is built into the CPU then you must specify CL_DEVICE_TYPE_GPU.
  13. Replies

    Re: regarding some problems

    Why is vectorisation fast: lets say your CPU can process vectors of 4 float values in a single instruction. That means 4 operations get done at once. If you don't vectorise your code then the CPU...
  14. Re: Can i call the same kernel function multiple times in a

    You can call a kernel function an unlimited number of times within any sort of loops structure.
  15. Thread: vectorization

    by chippies

    Re: vectorization

    You might find it worthwhile to look up the vload* and vstore* functions. In your case, replace * with 16.
  16. Replies

    Re: OPENCL distributed computing.

    You can try VCL ( but that is specific to the Mosix Linux distribution.

    Alternatively, you can try using MPI to write distributed apps that run over multiple PCs...
  17. Re: Scheduling a work load that relies on sequential values

    If you can schedule your work over multiple threads when not using OpenCL then I don't see why you can't use multiple work items. That is the way to use all of the cores on your CPU.
  18. Replies

    Re: OpenCL Kernel thread execution time

    There is no direct way of measuring the time taken by each thread.

    Estimating the time per thread might be possible for realy simple kernels but requires assumptions about how the hardware works....
  19. Replies

    Re: OpenCL slow compiling on AMD card

    Nvidia has spent many more years on the various parts of their compiler architecture than what I think AMD has, hence I am not too surprised that AMD's compiler is slower.
  20. Replies

    Re: Device lost possible?

    I don't see any errors that are specific to you scenario, which makes me think that you could get any random error and that this will vary by vendor. CL_INVALID_DEVICE and CL_INVALID_PLATFORM might...
  21. Replies

    Re: can we use structure in opencl

    Just a note for getting the alignment right, I think you should be using the cl_* types, i.e. cl_char, cl_float, etc. for the fields in your structure. Looking at cl_platform.h shows me that...
  22. Re: clEnqueueReadBuffer gives the error CL_OUT_OF_RESOURCES

    Hi bajil, I have had the same inexplicable error on a perfectly valid clEnqueueReadBuffer call before on my GeForce GTX 560 Ti. I have always found that it was as a result of one of my kernels...
  23. Re: openCL and VC++ 2010 “front end compiler failed build”

    Without seeing your kernel source code, I can only guess, but clBuildProgram should not be giving you any error about not finding stdio.h because it should not be look for it. please don't put...
  24. Replies

    Re: row_pitch mishandled in ATI Radeon HD 7970

    You should post this with a complete minimal example reproducing the bug on the AMD OpenCL forums.
  25. Re: Performance of clEnqueueReadBuffer on different HW syste

    You don't mention the configuration of the system that is fast. The first thing that comes to mind is to ask what other software is running in the background on the Supermicro system? The other...
Results 1 to 25 of 90
Page 1 of 4 1 2 3 4