Search:

Type: Posts; User: sean.settle

Page 1 of 6 1 2 3 4

Search: Search took 0.00 seconds.

  1. There is no need to test the precedence as it is...

    There is no need to test the precedence as it is a requirement of the spec (clarified in the most recent 2.1 spec). There is no requirement that even with an empty kernel the two must be the same,...
  2. CL_KERNEL_WORK_GROUP_SIZE takes precedence over...

    CL_KERNEL_WORK_GROUP_SIZE takes precedence over CL_DEVICE_MAX_WORK_GROUP_SIZE because device info has no knowledge of the kernel resource footprints or kernel attributes that may specify the max or...
  3. A couple of pages from the end of the Altera SDK...

    A couple of pages from the end of the Altera SDK for OpenCL Programming Guide (https://www.altera.com/en_US/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf) you can find the current...
  4. No, there is not, but you could create your own...

    No, there is not, but you could create your own iterating over the nextafter built-in function.

    gentype nextafter (gentype x, gentype y)

    Computes the next representable single-precision...
  5. Correction, the %h in the printf above should...

    Correction, the %h in the printf above should really be %x. Sorry for that lapse.
  6. To print the raw bytes of a float variable, you...

    To print the raw bytes of a float variable, you could do something like this:

    float x = 1.f;
    unsigned char *y = (unsigned char *) &x;
    printf("%h %h %h %h\n", y[0], y[1], y[2], y[3]);
    ...
  7. Replies
    5
    Views
    225

    Some CPUs and GPUs can run concurrent kernels, so...

    Some CPUs and GPUs can run concurrent kernels, so pipes may be as efficient and less code than using global memory, but not nearly as efficient as they can implemented on FPGAs.
  8. To be certain, have you printed out the bytes for...

    To be certain, have you printed out the bytes for those floats as stored in memory on the host and device to narrow down the differences? Are these single or double floats (as written they are...
  9. You have two methods depending on what you really...

    You have two methods depending on what you really want to achieve.

    A) You can always enqueue a larger NDRange so you have many work-items executing that one kernel concurrently. You can even...
  10. Yes, it is possible because the architecture of...

    Yes, it is possible because the architecture of Altera FPGAs is such that when you compile your kernels the compiler automatically creates one or more custom compute units for each kernel. You can...
  11. Each kernel in your binary file created with the...

    Each kernel in your binary file created with the offline compiler (e.g., aoc foo.cl -o foo.aocx) may run concurrently. Simply create a separate command queue for each such kernel you want to execute...
  12. Replies
    20
    Views
    6,501

    Sticky: It would be very helpful if both platforms and...

    It would be very helpful if both platforms and devices had some concise info that could be queried to correlate built binaries back to the target system for which the binary was built (either online...
  13. Replies
    20
    Views
    6,501

    Sticky: I would like to propose a way to specify the...

    I would like to propose a way to specify the required or maximum NDRange at kernel compile time. The need for this capability was highlighted when tasks were deprecated since there is now no way the...
  14. Replies
    2
    Views
    3,444

    Re: Reverse (.rev) Vector Components

    Reversing built-in vector types using shuffles and swizzles are explicit, but not readily portable for different vector lengths because function overloading in not included in the OpenCL C kernel...
  15. Re: OpenCL 1.2 Specification Update Feedback Thread

    I see, thanks again!

    Oh, and I didn't catch the extension part because the link is still OpenCL 1.2 Extensions Specification (revision 15, released November 15, 2011), although it has actually...
  16. Replies
    3
    Views
    3,394

    Re: opencl sobel filter and theory

    Take a look at the Sobel Filter for some accessible background info, and at Image Convolution for more details about optimizing the kernel in OpenCL.
  17. Re: OpenCL 1.2 Specification Update Feedback Thread

    Thanks Affie,

    So my initial intuition was correct. However, I still don't see the use for the buffer/array image types. I can't tell how they're different than the regular 1D, 2D, and 3D types....
  18. Re: OpenCL 1.2 Specification Update Feedback Thread

    Hi, I don't know exactly if this pertains to the original or updated 1.2 specification. It may be just me, but I'm a bit confused about the definitions of the various image objects in section 5.3.1...
  19. Replies
    5
    Views
    4,118

    Re: unknown type name kernel in opencl

    Change "_kernel" to "__kernel" or "kernel", and similarly for "_global".
  20. Replies
    1
    Views
    1,791

    Re: pointer type conversion

    It is my understanding that OpenCL does not allow indexing into built-in vector types using pointers. If you wish to accomplish this you should use *.hi, *.lo, etc. I believe the reason for this is...
  21. Replies
    5
    Views
    4,118

    Re: unknown type name kernel in opencl

    It seems that you have one too few (or many) underscores in your attributes, e.g., __kernel or kernel. Please check that you use either two or no underscores first.
  22. Re: Regular question about dual gpu on single board

    Given that the 5970 and 6990 were dual GPU graphics cards seen by OpenCL as two GPU devices each with one half the total advertised memory, I would say that will also be the case for the 7990. You...
  23. Do host allocated buffers need to be manually deallocated?

    If a buffer is created with CL_MEM_ALLOC_HOST_PTR, does that buffer have to be manually deallocated, or is that automatically done inside of clReleaseMemObject when its counter reaches zero? ...
  24. Multiple devices and CL_DEVICE_MEM_BASE_ADDR_ALIGN

    Are there any assumptions I can make about CL_DEVICE_MEM_BASE_ADDR_ALIGN? Without any additional information, if I have to program for multiple devices within a context then I have to ensure things...
  25. Replies
    0
    Views
    1,920

    get_group_offset

    I have a use case for a get_group_offset function. Suppose you wish to program some reduction algorithm using multiple devices. One option is to use implicit buffer transfers using a subbuffer for...
Results 1 to 25 of 133
Page 1 of 6 1 2 3 4