Search:

Type: Posts; User: PaulS

Page 1 of 3 1 2 3

Search: Search took 0.01 seconds.

  1. Re: CLH_ERROR_NO_BINARY_FOR_GPU when not using sampler!?

    You're right, that error does seem to be a general "something went wrong!" error, and the reasons for it are numerous. Sometimes it's worth compiling for the CPU as that can pick up errors in your...
  2. Re: Predicate variable must be in register state space

    ptxas sounds like a program which is an assembler for PTX code, which is nvidia's low level representation code for their GPUs. On Mac OS there's a compiler which compiles from OpenCL to PTX, and...
  3. Re: clBuildProgram returns CL_INVALID_BINARY for certain code

    Might be useful to know which platform you're on.

    According to the spec CL_INVALID_BINARY can only be returned "if (the) program is created with clCreateWithProgramBinary", so if you're supplying...
  4. Re: system goes into some indefinite loop - adding two vectors

    Size in the c code is 10. Loop in the kernel loops 100 times. Result is that you overflow your memory objects.

    With that change, and the addition of a print loop:

    0 + 6 = 6
    5 + 7 = 12
    0 + 2 =...
  5. Replies
    3
    Views
    2,862

    Re: Convolution with Wrapped Edges

    How about using a sampler variable that has CLK_ADDRESS_REPEAT set?
  6. Thread: why use barrier?

    by PaulS
    Replies
    2
    Views
    3,555

    Re: why use barrier?

    Each work item calculates a value in the array sharedPos (note sharedPos is local and indexed by tidx which is the local id) which is then used in the core of the loop. The first barrier is to ensure...
  7. Re: cl_command_queue , cl_context as member variable of a class?

    No, no rules like that. I'm doing it all the time without problems.

    The retain count you give is 0xFEEEFEEE in hex, which looks to me like a pattern somebody might use to blank out a released...
  8. Replies
    3
    Views
    4,629

    Re: Constant Memory latency

    Thanks for the link, but that sounds like it could be what's causing my problem. Each work-item is (intentionally) accessing these tables randomly, so I'll be cache missing. Sounds like it's worth an...
  9. Replies
    3
    Views
    4,629

    Constant Memory latency

    So we know that on GPU (Nvidia specifically) that global memory access is a *lot* slower than local storage. Does anybody know how the memory spaces, in particular constant memory, compare?

    I have...
  10. Replies
    3
    Views
    3,509

    Re: Convolution Example/Tutorial from AMD

    The problem with that is that it all depends on how your data is stored in memory. Assuming your colour components are interleaved (as is normal), then reading 16 pixels of red into a single vector...
  11. Replies
    2
    Views
    2,425

    Re: Call to fract causing crash

    Ok, just to close this, I found my problem.

    Once I'd gotten the integer part of my vector with fract I cast it to an int4



    float4 fp,ip;
    fp = fract(p, &ip);
    int4 idx = (int4) ip;
  12. Replies
    2
    Views
    2,425

    Re: Not a call to fract causing crash

    Ok, forget that. It's not the call to fract, but something else. I think the optimiser must be playing tricks with me as I was putting printf's in before/after that call. As I could only see the ones...
  13. Thread: error to string?

    by PaulS
    Replies
    4
    Views
    7,526

    Re: error to string?

    Feel free to use this as the basis of something, it's Objective C, but could be normal C with a little tweaking:


    + (NSString *) descriptionOfError:(cl_int) err {
    switch (err) {
    ...
  14. Replies
    2
    Views
    2,425

    Call to fract causing crash

    This may well be a bug in the implementation I'm using, but I wanted to check if this is sound code.

    In my kernel I have the following lines:


    float func (float4 p) {
    float4 ip, fp;
    ...
  15. Replies
    2
    Views
    3,964

    Non-power of 2 vector types

    Currently vectors can only be declared to be a power of 2 in length, eg. float2, float4, char8, int16.

    With that said, it is seems to be possible (i.e. I can't find anything in the spec to...
  16. Re: Not having functions return values via pointer arguments

    I know what you mean, but I hope you're not right.

    We already have vector type, and different memory spaces, so there's no getting away from the fact that the OpenCL language* isn't C. It would be...
  17. Replies
    14
    Views
    6,713

    Re: Time of clReleaseMemObject : strange behaviour

    Appologies if I've missed something, but isn't this just that the memory object can't be released until the asynchronous computation is complete. As soon as you add the kernel to the mix the work...
  18. Not having functions return values via pointer arguments

    There are various math functions which currently return a second value via a pointer given in the argument list. For example:

    gentype fract (gentype x, gentype *iptr)
    gentype sincos (gentype x,...
  19. Thread: Profiling Code

    by PaulS
    Replies
    15
    Views
    6,704

    Re: Profiling Code

    Pulling this up from the depths, as I've been working on other things the last couple of weeks, but wanted to reply to these points.


    On SnowLeopard at the moment, declaring those parameters as...
  20. Thread: Profiling Code

    by PaulS
    Replies
    15
    Views
    6,704

    Re: Profiling Code

    That's certainly a subtle distinction, but worth being aware of.

    The MADs seem to reduce the code a bit, and register usage (I've discovered that reading back the binary gives you an intermediate...
  21. Thread: Profiling Code

    by PaulS
    Replies
    15
    Views
    6,704

    Re: Profiling Code

    I thought that was what I was doing with the wait_group_events call. Would a memory barrier do something that doesn't?



    I'll give it a go and report back when I have some time.
  22. Thread: Profiling Code

    by PaulS
    Replies
    15
    Views
    6,704

    Re: Profiling Code

    I've now taken that improvement over to a version that shares an OpenGL VBO as the starPosition memory buffer object. This eliminates the read back and re-submit of all the position data to display...
  23. Thread: Profiling Code

    by PaulS
    Replies
    15
    Views
    6,704

    Re: Profiling Code

    That's a big improvement. Up to about 15.5 Million now, so we're nearly 4x from where we started. The kernel now looks like this:

    kernel void particle(constant int numberOfGalaxies,

    ...
  24. Thread: Profiling Code

    by PaulS
    Replies
    15
    Views
    6,704

    Re: Profiling Code

    It brings the GPU to approximate parity with the CPU (6M each). Worthwhile, but it's not the order of magnitude I'm looking for.



    I just gave it a go, and it cost me a little (about 300k...
  25. Replies
    8
    Views
    7,306

    Re: Command Queue going invalid

    This seems to be it:

    kernel void foobar(constant float * in,
    global float * out) {

    int gid = get_global_id(0);

    for (int i = 0; i <...
Results 1 to 25 of 73
Page 1 of 3 1 2 3