Page 3 of 4 FirstFirst 1234 LastLast
Results 21 to 30 of 33

Thread: Official OpenCL 2.0 Feedback thread

  1. #21
    Newbie
    Join Date
    Sep 2013
    Posts
    1
    Naming conflict

    Hi,

    I'm working on a full port of the API to Fortran and specifically the image structure cl_image_format conflicts with CL_IMAGE_FORMAT enum value for quering image info.
    The problem is since Fortran is not type sensitive and just like in C, enums are available in global scope.
    Actually this is not a problem with OpenCL 2.0, but also related to previous release.

    The way I worked around this was to rename the enum values of cl_image_info to CL_IMAGE_INFO_FORMAT (same principal for others as well). Other than this, everything worked well.
    An important note to mention, is that there is no good way to change scoping in Fortran that will solve the problem for users, this way or another they will have to use both constructs and a compiler error will raise.

    One suggestion is to add the _INFO to all enums used for querying different objects, for conformance.
    I do know that it can break previous versions support. But it's a chance to do something good.
    As for the Fortran library, up to this point, it is the only place where full conformance to C OpenCL is broken.

    Regards,
    Moti.

  2. #22
    Quote Originally Posted by pelotoescogorciao View Post
    I don't like the SVM mechanism you proposed in the spec... because:

    1. The existence of clSVMAlloc() sounds like a bad idea. The whole point of shared virtual memory(SVM) should be to REUSE an existing void* pointer externally-allocated using C/C++ malloc(), VirtualAlloc(), etc...
    Some libraries have their own memory manager and they are closed-source. How would I call clSVMAlloc() then?
    Note that you can use malloc, VirtualAlloc, whatever, IF the platform supports fine-grained system SVM. I don't know the details but as far as I know it isn't always possible to support fine-grained system SVM (depending on what sort of cache coherency the system has), which is why there are multiple levels of SVM support in the spec. It would certainly be interested to get some idea of how widely supported system SVMs will be (I assume it is trivial on CPU devices, but APUs, integrated GPUs, discrete GPUs get progressing more interesting).
    Quote Originally Posted by pelotoescogorciao View Post
    2. To pass the CL_MEM_READ_WRITE/CL_MEM_READ_ONLY, etc... at creation time in clSVMAlloc() may be not a good idea: I would like to use the resource as readOnly for some kernels while writeOnly for others...
    The read/write specification should be done at launch/execution time, not at creation time !
    Surely then you just allocate it CL_MEM_READ_WRITE - that is no different from how buffers are allocated.
    Quote Originally Posted by pelotoescogorciao View Post
    3. clSetKernelArgSVMPointer() is a bit confusing... why I must use a different function to pass kernel's arguments?
    Why not to use clSetKernelArg() as you do with other params?
    The problem is that if you pass the pointer through clSetKernelArg, the driver has no way of knowing whether you've passed a cl_mem (buffer) or an SVM pointer - that's a limitation of C, which has no type-based function overloading.
    Quote Originally Posted by pelotoescogorciao View Post
    To resume: I don't like the mechanism you specified. I propose this instead:

    1. Let the user to allocate bytes using his own traditional C/C++ malloc/free calls (with a 4K-aligned requisite perhaps ).
    As soon as you impose a 4K alignment requirement, you're back to the same issue you raised at the start: it won't play nice with memory allocated by other libraries that aren't aware of OpenCL. If you have to do something OpenCL-specific to allocate memory, it might as well be clSVMAlloc, which will know what the actual hardware-specific requirements are.

  3. #23
    Junior Member
    Join Date
    Dec 2008
    Location
    Toronto, Ontario, Canada
    Posts
    19
    Hi everyone, I decided to provide detailed feedback in the form of blog articles. You can read them here: https://blog.ajguillon.com/

    The first relevant blog is: https://blog.ajguillon.com/2013/09/1...zation-issues/

    Thanks.

  4. #24
    Newbie
    Join Date
    Sep 2013
    Posts
    1
    Hi, All

    I'd be glad to see support for something similar to GL_ARB_draw_indirect in OpenGL world where you can specify buffer where your parameters for glEnqueueNDRange stored.

    It is very useful if you know your parameters to glEnqueueNDRange on GPU side and do not want to do round trip.

    For now only one workaround I can come up with, is to always add some kind of check and offset like this:

    void kernel my_kernel(__global * int myParamsCalculatedInPrevKernelRun, __global int* pout )
    {
    int count = myParamsCalculatedInPrevKernelRun[0];
    int offset = myParamsCalculatedInPrevKernelRun[1];

    int gi = get_global_id(0) + offset;
    if( gi >= count + offset) return.

    // kernel code
    *pout = ...
    }


    P.S. I know that partially this is covered by running kernel from kernel but this is not the same.

    Thanks

  5. #25
    Newbie
    Join Date
    Jul 2013
    Posts
    5

    Lightbulb

    Hi All,

    I noticed that the CL_DEVICE_MEM_BASE_ADDR_ALIGN device property isn't explained in table 4.3. All it says is
    The minimum value is the size (in bits) of the largest OpenCL built-in data type supported by the device (long16 in FULL profile, long16 or int16 in EMBEDDED profile) for devices that are not of type CL_DEVICE_TYPE_CUSTOM.
    That only describes a constraint on the property, not what it actually is. It appears the 1.0 spec actually had an explanation

    Describes the alignment in bits of the
    base address of any allocated memory
    object.
    Although this explanation doesn't seem great. I started a thread (the forum won't let me post the link to it, title is " Looking for a better explanation of CL_DEVICE_MEM_BASE_ADDR_ALIGN" ) and based on the discussion I'd like propose the following description for the CL_DEVICE_MEM_BASE_ADDR_ALIGN property:

    The bit alignment required for a sub-buffer object's origin ( see clCreateSubBuffer() ) to be used with this device.
    The minimum value is the size (in bits) of the largest OpenCL built-in data type supported by the device (long16 in FULL profile, long16 or int16 in EMBEDDED profile) for devices that are not of type CL_DEVICE_TYPE_CUSTOM.

  6. #26
    Junior Member
    Join Date
    Oct 2009
    Posts
    10
    Please texture compression support at least if not mandatory as a khr extension say cl_khr_compressed_textures and query from what formats read only images are supported etc..
    note projects as https://github.com/a2flo/oclraster i.e. emulating raster pipeline with OpenCL kernels one major feature missing for parity with graphics apis is use of compressed textures
    also another Project http://cudaopencl.blogspot.fr/2013/0...raytracer.html changes opengl calls to a custom raytracer which uses GPGPU almost surely so basically another example, etc..

  7. #27
    Junior Member
    Join Date
    Dec 2008
    Location
    Toronto, Ontario, Canada
    Posts
    19
    I just want to let you know that I have posted another article regarding SPIR, and how it might fit with the standard overall. I have a few more articles to write on OpenCL 2.0 before I am finished giving feedback (issues that I have with the memory and execution models), but I believe that the "big picture" discussions are now complete. The main theme of my articles has really been a separation of hardware and software concerns, which I feel is best expressed by this latest article.

    Please provide any feedback you may have on my writing, since this is the first set of detailed technical articles I have written, and I want to ensure that I am getting my message across.

    Here is a link to the most recent article: https://blog.ajguillon.com/2013/10/0...ck-and-vision/

  8. #28
    Junior Member
    Join Date
    Oct 2009
    Posts
    10
    Also I forgot to say another improvements:
    seems now that Opengl comes with sparse textures (well optional ARB extension) GL_ARB_sparse_texture and also DX11.2 (named tiled textures and also optional)
    so as both (D3D 11.2 and OGL 4.4) have compute support for sparse textures via their compute shaders seems OpenCL 2.0 should catch up and add an optional cl_khr_sparse_image..
    Also now OpenGL has cross vendor bindless tex support via optional ARB_bindless_texture (AMD expects to implement also) and support in compute shaders is supported..
    so seems OpenCL should similarly publish a optional cl_khr_bindless_image extension..

  9. #29
    Newbie
    Join Date
    Dec 2013
    Posts
    1
    Hi all,
    It would be nice to add blocking flag to clEnqueueUnmap* set of commands - because clEnqueueMap* can be easily done in (un)blocking way, while one have to deal with events to make sure that unmapping is completed.

  10. #30
    Senior Member
    Join Date
    Oct 2012
    Posts
    108

    Illegal cast in Appendix B - Portability

    The example at the bottom of page 363 in appendix B uses illegal casts:

    Code :
    float4 v = vload4( 0, x );
    uint4 y = (uint4) v; // legal, portable
    ushort8 z = (ushort8) v; // legal, not portable

    These casts are explicitely considered as "not allowed" in section 6.2.2 "Explicit casts" ("Explicit casts between vector types are not legal.")

    The example in Appendix B should use reinterpreting casting, such as:

    Code :
    float4 v = vload4( 0, x );
    uint4 y = as_uint4(v); // legal, portable
    ushort8 z = as_ushort8(v); // legal, not portable

    The same (wrong) example is present starting from OpenCL 1.0 Specification.

Page 3 of 4 FirstFirst 1234 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •