Search:

Type: Posts; User: Peccable

Page 1 of 2 1 2

Search: Search took 0.00 seconds.

  1. Replies
    1
    Views
    207

    You can use the same buffer as argument in both...

    You can use the same buffer as argument in both kernels. There is no need to transfer the data to the host unless you need to do host side computations on it. Just make sure you wait for the first...
  2. Replies
    1
    Views
    339

    Mistake in the 1.2 reference pages

    Not sure if this is the proper place to report this but in the reference pages for the clamp integer function it says:



    Whereas it should say
  3. Replies
    0
    Views
    330

    Bernstein polynomials

    I've done a bit of searching and it looks like C implementations of the Bernstein basis polynomials that evaluate arbitrary derivatives and do not use recursion are very hard to obtain.

    This seems...
  4. Replies
    7
    Views
    695

    So I transposed the global memory buffers and...

    So I transposed the global memory buffers and this gave some improvement. Now the VALUBusy is typically 70% - 80% (SALUBusy is 10%). And overall the kernel performs about 60 Gflops.

    Fetch size is...
  5. Replies
    7
    Views
    695

    Thanks a lot for your input. I've just tried the...

    Thanks a lot for your input. I've just tried the CodeXL profiler from AMD. Here is what it has to say about the kernel



    Method assembleMatrix__k3_Pitcairn1
    ExecutionOrder 543 ...
  6. Replies
    7
    Views
    695

    Thanks, yes its a sum of outer products,...

    Thanks, yes its a sum of outer products, essentially equivalent to multiplication of two rectangular matrices. I've found a few examples, maybe my google-fu is no good, but they all seem to suggest...
  7. Replies
    2
    Views
    414

    How do you obtain the error text about context...

    How do you obtain the error text about context creation? I don't see you checking the value of "ret" anywhere.
  8. Replies
    7
    Views
    695

    A little optimization help anyone?

    Here is the kernel code:



    __kernel void assembleMatrix(const int R, const int r0, const int c0, __global const REAL_TYPE *glo_A, __global const REAL_TYPE *glo_B, __global REAL_TYPE *glo_M)
    {
    ...
  9. Replies
    2
    Views
    762

    NVIDIA only provides support for OpenCL up to...

    NVIDIA only provides support for OpenCL up to version 1.1 (It comes together with the CUDA toolkit, download from their homepage). It is possible to use higher version features through the use of...
  10. You can call non-kernel functions in a kernel...

    You can call non-kernel functions in a kernel function.

    For example


    double doStuff(double a, double b, int n)
    {
    double ret = 1.0;

    for(int i = 0; i < n; i++)
  11. Replies
    3
    Views
    637

    Thanks and congratulations! I hope this will...

    Thanks and congratulations!

    I hope this will help to increase the activity here on this forum.
  12. Looks like you may have swapped row and col...

    Looks like you may have swapped row and col indices here:

    sum+=A[j*numAColumns+k]*B[k*numBColumns+i];
    C[j*numCColumns+i]=sum;


    Assuming r is the row index and c is the column index it should...
  13. Replies
    1
    Views
    553

    "professional" gpu's and pricing

    I've been looking at high end OpenCL 1.2 compatible GPU's and it seems the difference in prices between professional targeted and private targeted GPU's are quite high compared to potential benefit....
  14. Replies
    1
    Views
    861

    Re: Caching of source files

    If anyone got the same problem I sort of solved it by deleting everything in NVIDIA\ComputeCache every time I change source files that #included.
  15. Replies
    1
    Views
    861

    Caching of source files

    When writing code with multiple source files, for example

    "fox.cl"


    struct Tango {
    float4 donut;
    float4 snow;
    };
  16. Re: Emulating vector insert/delete in kernel -is this safe?

    Ok, thanks for the reply. And I suppose that the non parallelizable part might aswell be done on the CPU between the two kernel executions.
  17. Replies
    1
    Views
    887

    Re: clEnqueueCopyBufferToImage

    How do you get the context when using glut? In the reference for clCreateFromGLTexture3D it says under context:

    A valid OpenCL context created from an OpenGL 3D context.

    As far is I've gathered...
  18. Emulating vector insert/delete in kernel -is this safe?

    Say I have an array with elements of some type which I pass to the kernel as a constant source buffer. And I also have a destination buffer.

    Based on some condition on the value of the elements...
  19. Replies
    3
    Views
    1,882

    Re: function clGetPlatformIDs returning error

    I cant say what the problem is but there does not seem to be anything wrong with your code at least. It compiles and runs as expected on my system (code::blocks/MinGW/Quadro FX3800).

    Without more...
  20. Replies
    10
    Views
    2,408

    Re: passing array of typedef'd structs to kernel

    You might send the error code to a switch which converts them into a string and print it, something like this for example:



    bool checkError(cl_int errMsg, const char *at)
    {
    ...
  21. Replies
    5
    Views
    1,809

    Re: Small matrix operations

    Actually it was not too hard to figure out a way


    // One dimensional intersection of the open interval <0,1>
    bool sect1d(const float a, const float b, const float c, const float d)
    {
    bool...
  22. Replies
    5
    Views
    1,809

    Re: Small matrix operations

    Interesting, thanks for he feedback. It seems quite difficult to avoid branching at times. For example now I'm writing a function to test for intersecting tetrahedrons. Getting rid of conditionals...
  23. Replies
    5
    Views
    1,809

    Small matrix operations

    After googling a bit it appears as though open source implementations of small matrix operations for OpenCL are not easy to come by.

    I frequently need such functionality so I have started with 3...
  24. Replies
    5
    Views
    1,789

    Re: Global workgroup size and performance

    Could also be done like this (at the risk of having one superfluous multiple of local_ws):

    local_ws*(( N*M)/local_ws + 1)
    However shorter code isn't always better or clearer I'd say.
  25. Replies
    5
    Views
    1,789

    Re: Global workgroup size and performance

    Thanks, you are right. Reducing local work-size to 128 more than halved the time used for computations.
Results 1 to 25 of 26
Page 1 of 2 1 2