Page 8 of 10 FirstFirst ... 45678910 LastLast
Results 71 to 80 of 100

Thread: OpenCL C++ Bindings

  1. #71
    Junior Member
    Join Date
    Apr 2010
    Location
    Perth, WA
    Posts
    27

    Re: OpenCL C++ Bindings

    Quote Originally Posted by monarodan
    Hi All,

    Notice the "fixme" in the code. Passing NULL here contradicts the functions documentation.

    To answer the question that is asked, yes, we do want to allow wait event lists. As written, there is an unexpected time bomb waiting to go off for those who set up their queues to allow out of order execution. Can this please be rectified (in all 16 places) to pass "events" through to enqueueNDRangeKernel() rather than NULL in a release in the near future?

    Cheers,

    Dan
    I thought I'd fix this in my local header by simply passing "events" in place of NULL to enqueueNDRangeKernel(). However, there was an unexpected complication. When invoking a functor with one argument as follows:

    Code :
    VECTOR_CLASS<Event> events;
    cl::KernelFunctor func = ...;
    cl::Event e = func( a1, &events );

    The KernelFunctor:perator() that is matched by my compiler (VS200 is the one with two kernel args rather than the expected one kernel arg. If the default argument for the "events" parameter is removed, then the correct template function is used. Perhaps this is what the "fixme" in the code refers to?

    My resolution has been to remove the default value, but this may not satisfy many users of the API as they will have to pass NULL when invoking there functors when there is no events to wait on. I would like to know how others think this should be fixed.

    One possibility is to specify the events to wait on as part of binding of the KernelFunctor:

    Code :
    VECTOR_CLASS<Event> events;
    cl::KernelFunctor func = kernel.bind( queue, cl::NDRange(...),  cl::NDRange(...), &events );
    cl::Event e = func( a1 );

    Thoughts?

    Cheers,

    Dan
    Daniel Paull
    Real Engineers Think Bottom Up.

  2. #72

    Re: OpenCL C++ Bindings

    There has been some talk about merging the KernelFunctor convenience methods into the Kernel class. This is because the way the API is structured now is misleading. For example, it appears like the same kernel could be bound to multiple queues like the following:

    Code :
    KernelFunctor func1 = kernel.bind(queue1);
    func1(arg1);
     
    KernelFunctor func2 = kernel.bind(queue2);
    func1(arg2);

    This is quite evil when mixed with a multi-threading host or an asynchronous queue or non-blocking calls. The solution I think is to move the the operator() methods from cl::KernelFunctor into cl::Kernel. It appears to me that a cl_kernel object should only be used for a single kernel invocation at a time (though I can't find anything in the standard explicitly referring to what extent cl_kernel objects can be reused and when).

    So back to your question about event wait lists. With the above change I think it would make sense to add another method to the cl::Kernel object for setting an event wait list. At the end of the day it will look like this:

    Code :
    kernel.bind(queue, cl::NDRange(...),  cl::NDRange(...));
    kernel.waitForEvents(events);
    kernel(a1);

    Though if we were to go this route we could even separate out the NDRanges into their own method as well:
    Code :
    kernel.setCommandQueue(queue);
    kernel.setNDRange(cl::NDRange(...),  cl::NDRange(...));
    kernel.waitForEvents(events);
    kernel(a1);

    Though this is a part of the API I've never really had a solid feeling about how it should be designed. PyOpenCL does it differently, requiring everything to be passed to the functor. I started a thread over there to discuss the design (http://host304.hostmonster.com/pipermai ... 00288.html), and they seem to have the same "not quite sold" mentality. Personally, I can never remember long argument lists and I think they're prone to bugs, so I prefer the many methods approach.

    Any other preferences from people?

  3. #73
    Junior Member
    Join Date
    Apr 2010
    Location
    Perth, WA
    Posts
    27

    Re: OpenCL C++ Bindings

    Quote Originally Posted by coleb
    There has been some talk about merging the KernelFunctor convenience methods into the Kernel class.
    I second that!

    So back to your question about event wait lists. With the above change I think it would make sense to add another method to the cl::Kernel object for setting an event wait list. At the end of the day it will look like this:

    Code :
    kernel.bind(queue, cl::NDRange(...),  cl::NDRange(...));
    kernel.waitForEvents(events);
    kernel(a1);

    Though if we were to go this route we could even separate out the NDRanges into their own method as well:
    Code :
    kernel.setCommandQueue(queue);
    kernel.setNDRange(cl::NDRange(...),  cl::NDRange(...));
    kernel.waitForEvents(events);
    kernel(a1);

    Though this is a part of the API I've never really had a solid feeling about how it should be designed. <snip> . Personally, I can never remember long argument lists and I think they're prone to bugs, so I prefer the many methods approach.
    I guess there are a couple of design principals that I follow that might help us converge on an API.

    1) Identify attributes that are immutable for the lifetime of the class, pass these into the constructor and use them to initialise a const members. If the case of the Kernel, it may be reasonable to think of the binding to a queue to be immutable and set only when the Kernel is constructed. Note - if I understand the comments on the PyOpenCL thread you linked to correctly, the use of immutable members overcomes the multi-threading concerns of using a "bound kernel".

    2) Avoid having a many-function API (as you suggest) where it becomes an error to call one method if another has not yet been called. For example, it would be an error to invoke the kernel if you had not yet set the global NDRange (note that the local NDRange could default to a null range, so setting it could be stripped out into another method).

    3) Global functions should be considered as a means to add a layer on top of class to, say, simplify its use, without compromising the API of the class itself.

    I feel that the following code is not desirable because if you forget to call either bind() or setCommandQueue(), then operator() will have to indicate an error somehow (and that error is not a cl_int)!

    Code :
    kernel.setCommandQueue(queue);
    kernel.bind(cl::NDRange(...),  cl::NDRange(...));
    kernel.waitForEvents(events);
    kernel(a1);

    I'd like to understand the overheads of creating kernels better. I have assumed (and have only anecdotal evidence) that creating/destroying kernels is lightweight, so I do to try to reuse kernels in my code. Given that, I would tend to use overloaded constructors to specify the queue (required arg), global range (required arg), local range (optional arg) and events to wait on (optional arg). I would still use operator() to invoke the kernel as per the current KernelFunctor API. That's only four constructors to be written, assuming that default arguments aren't used.

    Cheers,

    Dan
    Daniel Paull
    Real Engineers Think Bottom Up.

  4. #74

    Re: OpenCL C++ Bindings

    Quote Originally Posted by monarodan
    I guess there are a couple of design principals that I follow that might help us converge on an API.

    1) Identify attributes that are immutable for the lifetime of the class, pass these into the constructor and use them to initialise a const members. If the case of the Kernel, it may be reasonable to think of the binding to a queue to be immutable and set only when the Kernel is constructed. Note - if I understand the comments on the PyOpenCL thread you linked to correctly, the use of immutable members overcomes the multi-threading concerns of using a "bound kernel".
    Using the same kernel from multiple threads will always be flat out evil. There's not much that can be done about it since clSetKernelArg and the subsequent clEnqueue* commands will always race. So I just make sure to always use separate kernels for separate threads. The key is that we don't want to give the impression from the C++ API that it is thread-safe (I think getting rid of the KernelFunctor accomplishes this).

    Quote Originally Posted by monarodan
    2) Avoid having a many-function API (as you suggest) where it becomes an error to call one method if another has not yet been called. For example, it would be an error to invoke the kernel if you had not yet set the global NDRange (note that the local NDRange could default to a null range, so setting it could be stripped out into another method).
    I agree with this principle. Though when the rubber hits the road this can be difficult to adhere to %100 of the time. For example, the STL does a fantastic job of following this principle. Though sometimes it slips, for example, vector::front is undefined on an empty vector.

    I just feel like with cl::Kernel we've been backed into a rather nasty corner and I'm keeping my eyes open for any way out.

    Quote Originally Posted by monarodan
    3) Global functions should be considered as a means to add a layer on top of class to, say, simplify its use, without compromising the API of the class itself.

    I feel that the following code is not desirable because if you forget to call either bind() or setCommandQueue(), then operator() will have to indicate an error somehow (and that error is not a cl_int)!

    Code :
    kernel.setCommandQueue(queue);
    kernel.bind(cl::NDRange(...),  cl::NDRange(...));
    kernel.waitForEvents(events);
    kernel(a1);

    I'd like to understand the overheads of creating kernels better. I have assumed (and have only anecdotal evidence) that creating/destroying kernels is lightweight, so I do to try to reuse kernels in my code. Given that, I would tend to use overloaded constructors to specify the queue (required arg), global range (required arg), local range (optional arg) and events to wait on (optional arg). I would still use operator() to invoke the kernel as per the current KernelFunctor API. That's only four constructors to be written, assuming that default arguments aren't used.

    Cheers,
    Dan
    So I've seen mixed evidence for how lightweight kernels are. Creating a kernel in the OSX implementation is quite fast (~1 microsecond). The NVidia Linux implementation is a lot worse, taking ~1 millisecond. I've already submitted a performance bug report to NVidia about this, though I don't have a sense of how fundamental a problem this is for them.

    So in my application I cache a cl::Kernel for every host thread and then launch it several times. Also note, each invocation then has a separate global range for me, so this is not constant across the lifetime of the kernel. Theoretically, the command queue could change as well: imagine a system that load balances across multiple command queues by dequeueing the next kernel to invoke from a protected FIFO queue. Also note, the global range is optional as well as kernels can be queued with clEnqueueTask. So in reality the command queue, global range, local range, events, and arguments are all mutable attributes of a kernel.

    Luckily, there is an error conditions for an invalid command queue being sent to clEnqueueNDRange, it is CL_INVALID_COMMAND_QUEUE. Hopefully, passing NULL will trigger this error. Also, the default global and local range can be (1, 1, 1), which is equivalent to a clEnqueueTask. So maybe setters aren't so bad...

    I'm not sure, just putting out more talking points.

    -Brian

  5. #75
    Junior Member
    Join Date
    Apr 2010
    Location
    Perth, WA
    Posts
    27

    Re: OpenCL C++ Bindings

    Quote Originally Posted by coleb
    Quote Originally Posted by monarodan
    2) Avoid having a many-function API (as you suggest) where it becomes an error to call one method if another has not yet been called. For example, it would be an error to invoke the kernel if you had not yet set the global NDRange (note that the local NDRange could default to a null range, so setting it could be stripped out into another method).
    I agree with this principle. Though when the rubber hits the road this can be difficult to adhere to %100 of the time. For example, the STL does a fantastic job of following this principle. Though sometimes it slips, for example, vector::front is undefined on an empty vector.
    Quite true, and ain't it annoying!

    Quote Originally Posted by coleb
    So I've seen mixed evidence for how lightweight kernels are. Creating a kernel in the OSX implementation is quite fast (~1 microsecond). The NVidia Linux implementation is a lot worse, taking ~1 millisecond. I've already submitted a performance bug report to NVidia about this, though I don't have a sense of how fundamental a problem this is for them.

    So in my application I cache a cl::Kernel for every host thread and then launch it several times. Also note, each invocation then has a separate global range for me, so this is not constant across the lifetime of the kernel. Theoretically, the command queue could change as well: imagine a system that load balances across multiple command queues by dequeueing the next kernel to invoke from a protected FIFO queue.
    If we are going to try and achieve reuse of kernels, then things change considerably. I think that cl::Kernel should remain as simple and flexible as possible, but possibly a pain to use. We can then write a set of utilities that make it simple to reuse kernels, say a "kernel pool" or similar concept. I like the way OpenGL works in this respect with the separation of the GL and GLU libraries. For example, glFrustum() is a pain to use (but very flexible), however, gluPerspective() is simple to use and satisfies almost all uses of the glFrustrum() method.

    As part of my work I will be writing something along the lines of a kernel pool and a buffer pool. I'll keep in mind the cl::Kernel design problem when doing this and post here if I get great ideas.

    Cheers,

    Dan
    Daniel Paull
    Real Engineers Think Bottom Up.

  6. #76
    Junior Member
    Join Date
    May 2010
    Posts
    4

    Re: OpenCL C++ Bindings

    Hi,

    I was beginning writing my one C++ wrapper when I saw in fact there was already one. But I was a little bit surprised on how you handle ressource management with reference counting... Why just simply use std::shared_ptr or do a similar thing? I will not explain the advantage of this design which you can find everywhere in the web (not intrusive, optional, etc.). Also for the Image class, I wrote one and the first things I did was add methods to access the image information:

    http://gitorious.org/motion_estimation/ ... cl/Image.h

    Is there a reason why you didn't do this?

  7. #77

    Re: OpenCL C++ Bindings

    Quote Originally Posted by Tanek
    Hi,

    I was beginning writing my one C++ wrapper when I saw in fact there was already one. But I was a little bit surprised on how you handle ressource management with reference counting... Why just simply use std::shared_ptr or do a similar thing? I will not explain the advantage of this design which you can find everywhere in the web (not intrusive, optional, etc.).
    All the objects handle the reference counting automatically for you the same as shared_ptr. So I'm not quite sure what the question is. Is it "why not just use shared_ptr?" I believe the answer to that is that a goal of the bindings was that they could work in the absence of the STL, this can really help with portability. For example, shared_ptr isn't that ubiquitous yet. It is only available in GCC >4.1. And boost is very heavy hammer to yield on an interface that is as portable as a single header file.

    Quote Originally Posted by Tanek
    Also for the Image class, I wrote one and the first things I did was add methods to access the image information:

    http://gitorious.org/motion_estimation/ ... cl/Image.h

    Is there a reason why you didn't do this?
    All this information is accessible on the underlying cl_mem object using the clGetImageInfo function with the cl_image_info constants. The equivalent in OpenCL C++ is to use the getImageInfo method:
    Code :
    cl::Image2D image(...);
     
    cl_image_format format = image.getImageInfo<CL_IMAGE_FORMAT>();
    size_t width = image.getImageInfo<CL_IMAGE_WIDTH>();
    size_t height = image.getImageInfo<CL_IMAGE_HEIGHT>();

    All OpenCL C++ objects also have an getInfo method used to query the object for specific properties. For example, to get the context the image is associated with:
    Code :
    cl::Image2D image(...);
     
    cl::Context context = image.getInfo<CL_MEM_CONTEXT>();

    Note, there's a known bug with reference counting when the getInfo method returns an OpenCL C++ object. The fix is in the works and will hopefully be out shortly.

  8. #78
    Junior Member
    Join Date
    May 2010
    Posts
    4

    Re: OpenCL C++ Bindings

    Quote Originally Posted by coleb
    All the objects handle the reference counting automatically for you the same as shared_ptr. So I'm not quite sure what the question is. Is it "why not just use shared_ptr?" I believe the answer to that is that a goal of the bindings was that they could work in the absence of the STL, this can really help with portability. For example, shared_ptr isn't that ubiquitous yet. It is only available in GCC >4.1. And boost is very heavy hammer to yield on an interface that is as portable as a single header file.
    You can also write your own very simplified shared_ptr and give an option to the user to use the implementation they want (like you did for vector and string). I question more the design (using a standard and non intrusive reference counting) than the implementation.

    All this information is accessible on the underlying cl_mem object using the clGetImageInfo function with the cl_image_info constants. The equivalent in OpenCL C++ is to use the getImageInfo method:
    Code :
    cl::Image2D image(...);
     
    cl_image_format format = image.getImageInfo<CL_IMAGE_FORMAT>();
    size_t width = image.getImageInfo<CL_IMAGE_WIDTH>();
    size_t height = image.getImageInfo<CL_IMAGE_HEIGHT>();
    Ok but this is probably a lot slower than having stored the information directly into the structure (since you need to call the driver) and not very obvious to use (you should probably add one of these lines in the example). An image.get_width() is more natural for me. But I can understand if you disagree with that since it begins to be subjective.

    But the reference counting... I would prefer a design that is standard and can use a standard implementation: everyone will know very soon shared_ptr like they know std::vector and will have their compiler providing it.

  9. #79
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: OpenCL C++ Bindings

    Everyone will know very soon shared_ptr like they know std::vector and will have their compiler providing it.
    Embedded/mobile programmers have to work in platforms where such libraries are not available and probably will not be available for years to come.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  10. #80

    Re: OpenCL C++ Bindings

    Quote Originally Posted by Tanek
    You can also write your own very simplified shared_ptr and give an option to the user to use the implementation they want (like you did for vector and string). I question more the design (using a standard and non intrusive reference counting) than the implementation.
    That's essentially what detail::Wrapper is, an implementation of shared_ptr.

    Also note the design of OpenCL C++ layer does not preclude you from using shared_ptr. The following should work:
    Code :
    std::shared_ptr<cl::Image2D> imagePtr;

    It may even be preferable from a performance point of view since std::shared_ptr is implemented using atomics and cl::detail::Wrapper is implemented using cl(Retain/Release)*. If the above doesn't work let us know and we can make sure the OpenCL C++ layer can accomodate it.

    Quote Originally Posted by Tanek
    (you should probably add one of these lines in the example).
    I agree it's a little obscure, more examples are definitely in order. AMD's implementation ships with OpenCL C++ examples, Apple and NVidia doesn't. If Apple and NVidia shipped examples (and maybe they will in the future) it would sure help solidify C++'s importance. Maybe Khronos should even ship a set of examples that should work across all implementations. Would be a fantastic teaching tool since all the examples that ship with implementations have dependencies on that implementation's example set up.

    Quote Originally Posted by Tanek
    Ok but this is probably a lot slower than having stored the information directly into the structure (since you need to call the driver) and not very obvious to use
    An image.get_width() is more natural for me. But I can understand if you disagree with that since it begins to be subjective.
    Premature optimization is the root of all evil. I have a production multi-threaded database server application that makes heavy use of the getInfo methods and have never seen a performance issue. If there is one the vendor should be notified.

    There are good reasons to keep the objects as lightweight as possible, i.e., sizeof(cl::Context) == sizeof(cl_context). When passing the objects to an argument handler they are very easy to translate into what OpenCL C needs. Also, the interface is very easy to update when new properties are added to the various OpenCL C objects (it's a single table within the header file).

    Thanks for the feedback, I hope you find the bindings useful enough to suit your needs.

Page 8 of 10 FirstFirst ... 45678910 LastLast

Similar Threads

  1. PyOpenCL: OpenCL Python Bindings
    By inducer77 in forum OpenCL
    Replies: 2
    Last Post: 11-03-2011, 05:46 AM
  2. OpenCL C# bindings
    By The Fiddler in forum OpenCL
    Replies: 1
    Last Post: 08-11-2009, 03:00 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •