PDA

View Full Version : OpenCL C++ Bindings



oddhack
03-12-2009, 02:46 AM
Some members of the OpenCL Working Group have developed an experimental set of C++ bindings to OpenCL 1.0. The bindings have been posted to the OpenCL Registry (http://www.khronos.org/registry/cl/). Feedback on the bindings should ideally be reported through Bugzilla, as described in the Registry, but can also be posted in this thread.

e_r
04-02-2009, 09:24 AM
I'm Mac OS user (Intel Core Duo2 + NVIDIA GeForce 9400M ), and I understand Snow Leopard will include OpenCL. I browsed the OpenCL overview brochure, but perhaps an experienced person could give me an idea of whether OpenCL will have any benefit for me. I do mainly numeric computation with C++. A lot of vectorized operations (STL) and cmath.

Is OpenCL relevant for my line of work?
What efficiency gains can I expect (x2, x10)?
What would be the best way to learn OpenCL?

pirxthepilot
06-11-2009, 08:37 PM
The file cl.hpp that is on the official site contains bindings for the revision 33; do you have updated one for newer revisions? (namely 48?)

oddhack
06-12-2009, 02:01 PM
The file cl.hpp that is on the official site contains bindings for the revision 33; do you have updated one for newer revisions? (namely 48?)

I've asked about updates - but do remember that these are not official Khronos products.

bgaster
06-12-2009, 02:09 PM
I have an updated version for rev .43 and i'm planning to update to .48 over the next few days. As soon as it is done I request for it to be posted on along side the updated headers.

initram
06-13-2009, 03:47 AM
I'm Mac OS user (Intel Core Duo2 + NVIDIA GeForce 9400M ), and I understand Snow Leopard will include OpenCL. I browsed the OpenCL overview brochure, but perhaps an experienced person could give me an idea of whether OpenCL will have any benefit for me. I do mainly numeric computation with C++. A lot of vectorized operations (STL) and cmath.

Is OpenCL relevant for my line of work?
What efficiency gains can I expect (x2, x10)?
What would be the best way to learn OpenCL?

1) yes
2) probably more than x10 if the problem is well suited. this is usually the case if you have a lot of vector operations and coalesced memory access. but a concretely answer to this question cannot be given without knowing the problem.
3) read the opencl reference guide and the nvidia cuda informations.

...) please open a new thread next time because this is off topic!

bgaster
06-22-2009, 08:18 PM
Jon,

I have posted an updated C++ bindings header for 1.0 rev 43 and would be greatful if you could post it to the website.

Regards,

Ben

cuder
06-23-2009, 11:53 AM
I have posted an updated C++ bindings header for 1.0 rev 43
Where we can find it? I want to use the C++ binding, but current version of cl.hpp seems to be incompatible with cl.h

bgaster
06-23-2009, 07:48 PM
Sorry for the delay but I forgot to build the corrsponding documentation and this is done now and hopefully the updated bindings and documentation should be up by tommorrow.

oddhack
06-23-2009, 08:34 PM
Where we can find it?

Benedict's updated C++ bindings are linked from the OpenCL Registry (http://www.khronos.org/registry/cl/) now.

cuder
06-24-2009, 07:42 AM
Thanks! But one note: ::clEnqueueAcquireExternalObjects isn't found nowhere.

bgaster
07-07-2009, 03:37 PM
Sorry for the delay but I have updated the C++ bindings:

http://www.khronos.org/registry/cl/

This includes removing the call to ::clEnqueueAcquireExternalObjects but also have added two new types cl::vector and cl:string that are alternatives too STL versions. See the documentation for details.

Ben

bgaster
07-29-2009, 03:07 PM
A new verison of the C++ bindings has been uploaded that fixes a bug introduced with the cl::vector class and its use with NDRange. It can be found here:

http://www.khronos.org/registry/cl/

workingdog
08-17-2009, 02:41 AM
I've been trying to compile the C++ bindings files with no success. Has anyone done this successfully on an intel iMac osx 10.5.8. I get the following errors:
cl.h:452: error: expected initializer before 'AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER'
.....
....

any ideas?

tom
08-21-2009, 04:46 AM
When handling multiple devices it would be handy to be able to maintain a vector of kernel functors, where each functor is associated with a command queue:

std::vector<cl::KernelFunctor> kernelFunctors;
for (int d = 0 ; d < commandQueues.size() ; d++)
{
kernelFunctors.push_back(kernel.bind(commandQueues[d], globalNDRange, localNDRange));
}
Any ideas for how to implement something like this, given that this won't work without being able to copy KernelFunctors?

tom
08-24-2009, 01:48 AM
Regarding my post above, I opened http://www.khronos.org/bugzilla/show_bug.cgi?id=200.

bgaster
08-24-2009, 09:20 PM
I have updated the C++ bindings to provide copy constructor for KernelFunctor and the following program now compiles without error:


#include <CL/cl.hpp>

int main(void)
{
std::vector<cl::CommandQueue> commandQueues;
cl::Kernel kernel;

cl::NDRange globalNDRange;
cl::NDRange localNDRange;

std::vector<cl::KernelFunctor> kernelFunctors;
for (int d = 0 ; d < commandQueues.size() ; d++) {
kernelFunctors.push_back(
kernel.bind(
commandQueues[d],
globalNDRange,
localNDRange));
}
}


Please let me know if you have any problems using it and thanks for the feedback.

bgaster
08-24-2009, 09:27 PM
> I've been trying to compile the C++ bindings files with no success. Has anyone done this successfully on an intel iMac osx 10.5.8. I get > the following errors:
> cl.h:452: error: expected initializer before 'AVAILABLE_MAC_OS_X_VERSION_10_6_AND_LATER'
> ....
> ....

> any ideas?

I think the issue is that OpenCL will not be supported by Apple until the release of Snow Leopard, version 10.6 of its popular OS X. You can find out more details about Snow Leopard on Apple's website with notice that it will be publicly available August 28. Once you have this upgrade there should be no problem using the C++ bindings.

tom
08-25-2009, 08:11 AM
I have updated the C++ bindings to provide copy constructor for KernelFunctor and the following program now compiles without error:
<snip>
Please let me know if you have any problems using it and thanks for the feedback.

Awesome, thanks!

GeoffHilton
09-02-2009, 07:33 AM
bgaster: Could you please put the bindings under version control (SourceForge, Google Code, GitHub, etc) ?

oddhack
09-02-2009, 08:38 PM
bgaster: Could you please put the bindings under version control (SourceForge, Google Code, GitHub, etc) ?

I've asked internally if we can open up the entire Khronos Registry area to public RO Subversion access, like we already have with the OpenGL Registry. It will take a few weeks to get people inside Khronos engaged on the discussion and come to a conclusion. I'm hopeful of a yes answer towards the end of September.

duanmu
09-22-2009, 10:41 AM
Great job! However, Program::getInfo() does not work (e.g., CL_PROGRAM_BINARY_SIZES) if the context has multiple devices.

coleb
10-01-2009, 05:06 PM
bgaster: Could you please put the bindings under version control (SourceForge, Google Code, GitHub, etc) ?

I've asked internally if we can open up the entire Khronos Registry area to public RO Subversion access, like we already have with the OpenGL Registry. It will take a few weeks to get people inside Khronos engaged on the discussion and come to a conclusion. I'm hopeful of a yes answer towards the end of September.

I'm looking forward to this. I would like to start a project that is SWIG bindings around the C++ bindings. Then you can theoretically use OpenCL from any language SWIG supports, Python, Java, CSharp, etc. We have an extensive amount of SWIG experience we can contribute.

mollitz
10-15-2009, 10:48 AM
Hi,

can't find any methods for creating a buffer from an opengl renderbuffer or texture. Don't they exist, or didn't i find them?
If they don't exist, when will they be added.

danbartlett
10-15-2009, 10:58 AM
Hi,

can't find any methods for creating a buffer from an opengl renderbuffer or texture. Don't they exist, or didn't i find them?
If they don't exist, when will they be added.

They can be found at:
http://www.khronos.org/registry/cl/
in particular:
http://www.khronos.org/registry/cl/api/1.0/cl_gl.h
http://www.khronos.org/registry/cl/extensions/khr/cl_khr_gl_sharing.txt

But you may still need to wait a while longer for NVidia/ATI to implement these fully.

mollitz
10-15-2009, 11:36 AM
there i found them. i ment i didn't find them in the c++-bindings.
Will they be added in near future?
how can i look up, what nvidia/ati drivers support until now? (e.g. where can i see which opencl-extensions they support)

dbs2
10-15-2009, 11:57 PM
You can query a device to see what extensions it supports. If cl/gl sharing is not listed, then it won't support those extensions. I know those are supported on the Mac, but I'm not sure about AMD/Nvidia.

eyebex
10-27-2009, 09:09 AM
I've asked internally if we can open up the entire Khronos Registry area to public RO Subversion access, like we already have with the OpenGL Registry. It will take a few weeks to get people inside Khronos engaged on the discussion and come to a conclusion. I'm hopeful of a yes answer towards the end of September.
Any update on this, being end of October now?

eyebex
10-27-2009, 09:17 AM
First of all, thanks for the C++ bindings. Being new to the bindings, I was looking at Platform::get() and assumed it supported to get NULL passed in order to query just the number of available platforms. But when passing NULL, it crashes, because the pointer is not checked. It's probably not very useful to query just the number of platforms after all, but in that case I'd recommend to make the argument a reference instead of a pointer to emphasize the argument is mandatory and NULL is not a valid value. In general, I highly recommend to "Use references when you can, and pointers when you have to.", see [1].

[1] "When should I use references, and when should I use pointers?"
http://www.parashift.com/c++-faq-lite/r ... ml#faq-8.6 (http://www.parashift.com/c++-faq-lite/references.html#faq-8.6)

eyebex
10-27-2009, 10:28 AM
[...] I was looking at Platform::get() and assumed it supported to get NULL passed in order to query just the number of available platforms.
BTW, I'm aware now that my assumption was complete nonsense, as get() does not return the number of available platforms, but an int to indicate the success of the call. But due to the lacking docs for the static get() method I did not realize this until looking at the source code =:-)

bgaster
11-13-2009, 03:56 PM
It is a good point about the pointers over references. The reasons pointers are used in certain places is that they map directly to where this are map as pointers in the underlying C model. In general, I agree that references should be used but I do not particularly like the fact that C++ allows something to be side-effected without indicating it on the side of the call. I understand the argument about the NULL pointer and will think more about this.

bgaster
11-13-2009, 03:59 PM
We have C++ bindings for the GL extensions that we will be releasing in the near future.

eyebex
11-13-2009, 04:12 PM
I assume you mean CL extensions?

mollitz
11-13-2009, 04:24 PM
cl_extension for oepngl-context sharing ;)

can you announce it here, when you release them?

thanks

bgaster
11-13-2009, 04:35 PM
Yes I did mean CL :-)

I will be sure to comment here when they go up.

anton
11-26-2009, 04:28 AM
Hi Ben, I'm really enjoying reading the C++ bindings header - great job!

I'm also trying to learn from it how to use the C API properly. One thing I notice is that in cl::Platform::get() you call clGetDeviceIDs twice: first to get the number of available platforms, and then to read them. The first call has the following parameters: (num_entries = 0, *platforms = NULL, *num_platforms = &n). According to Section 4.1 of the standard:

"The number of OpenCL platforms returned is the mininum of the value specified by num_entries or the number of OpenCL platforms available.
But if the number of returned platforms is the minimum, then it should be 0, not 1 as returned both by AMD's and NVIDIA's implementations. So where's the bug -- in the standard or in the code? :)

eyebex
11-26-2009, 04:43 AM
I guess the bug is with you :-) The "number of OpenCL platforms returned" refers to the number of platforms written to the "platforms" pointer, not to the integer returned in "num_platforms".

anton
11-26-2009, 04:52 AM
eyebex, you are right. But still there is a bug - a typo in the word "mininum" :).

P.S. I don't like when an API call can serve two separate functions: in this case, to get the number of available platforms and to get the platforms proper. But I see how it reduces the total number of API functions, which is a good thing.

kirill
12-16-2009, 10:14 PM
I have recently started using OpenCL via C++ binding, which are of great help. I have been trying to pass int2 arguments to the kernel and found that a most obvious way to do it does not work with the C++ bindings.


const char* OpenCLSource =
"__kernel void test1(__global uint* out, int2 a){"
" unsigned int i = get_global_id(0);"
" out[i] = a[i%2];}"
int main(void) {
cl_int2 a = {0,1};
const int sz = 128;
cl::Buffer _out( context,CL_MEM_WRITE_ONLY, sz*sizeof(unsigned int) );
//Setup code omitted for brevity..............
cl::KernelFunctor func = kernel.bind(queue,cl::NDRange(sz),cl::NDRange() );
func(_out, a).wait(); //Generates incorrect calls to C API:
// Currently: clSetKernelArg(object_, 1, sizeof(int*), &a);
// Should be: clSetKernelArg(object_, 1, sizeof(int[2]), a );

The work around is to use

kernel.setArg(1,sizeof(cl_int2), a);
But the first method is cleaner and is a default way to do it, unless you know better. I have implemented a fix which I will post in the next message.

kirill
12-16-2009, 10:40 PM
As described in my previous message, fixed size vector arguments are not handled correctly. To support correct handling of arguments of type float2, int4 , etc... I propose the following change


--- cl_orig.hpp Thu Dec 17 16:21:25 2009
+++ cl.hpp Thu Dec 17 15:40:17 2009
@@ -2990,10 +2991,39 @@
template <typename T>
struct KernelArgumentHandler
{
- static ::size_t size(const T& value) { return sizeof(T); }
- static T* ptr(T& value) { return &value; }
+ static ::size_t size(const T& value) { return sizeof(T); }
+ static void const* ptr(T const& value) { return &value; }
};

+template <typename T>
+struct KernelArgumentHandler< T[2] >
+{
+ static ::size_t size(const T value[2]) { return sizeof(T[2]); }
+ static void const* ptr(const T value[2]) { return value; }
+};
+
+template <typename T>
+struct KernelArgumentHandler< T[4] >
+{
+ static ::size_t size(const T value[4]) { return sizeof(T[4]); }
+ static void const* ptr(const T value[4]) { return value; }
+};
+
+template <typename T>
+struct KernelArgumentHandler< T[8] >
+{
+ static ::size_t size(const T value[8]) { return sizeof(T[8]); }
+ static void const* ptr(const T value[8]) { return value; }
+};
+
+template <typename T>
+struct KernelArgumentHandler< T[16] >
+{
+ static ::size_t size(const T value[16]) { return sizeof(T[16]);}
+ static void const* ptr(const T value[16]) { return value; }
+};
+
+
template <>
struct KernelArgumentHandler<LocalSpaceArg>
{
@@ -3225,7 +3255,7 @@
* generated.
*/
template <typename T>
- cl_int setArg(cl_uint index, T value)
+ cl_int setArg(cl_uint index, T const & value)
{
return detail::errHandler(
::clSetKernelArg(


Basically I define template specialisations for data types T[2],T[4],T[8] and T[16] for any basic type T. Also with my compiler (Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.3052 for 80x86) I had to change
Kernel::setArg(index, T value) // TO
Kernel::setArg(index, T const & value) for this scheme to work, but I think this might be a more proper way of doing it anyway.

bgaster
12-16-2009, 11:27 PM
I have not tested this yet, getting late in Austin, but would something like this work instead:

template <typename T, int N>
struct KernelArgumentHandler<T[N]>
{
static ::size_t size(const T value[N]) { return sizeof(T[N]); }
static T* ptr(T value[N]) { return &value; }
};

Note that the references have to be removed as C++ does not allow arrays to be passed by reference but this should not be an issue.

You can now do:

cl_int2 v = {0,1};
setArg(n, v);

kirill
12-17-2009, 12:13 AM
You are right this is a much shorter way to approach it. Here is a new diff:

--- cl_orig.hpp Thu Dec 17 16:21:25 2009
+++ cl.hpp Thu Dec 17 17:59:12 2009
@@ -2990,10 +2991,17 @@
template <typename T>
struct KernelArgumentHandler
{
- static ::size_t size(const T& value) { return sizeof(T); }
- static T* ptr(T& value) { return &value; }
+ static ::size_t size(const T& value) { return sizeof(T); }
+ static void const* ptr(T const& value) { return &value; }
};

+template <typename T, int N>
+struct KernelArgumentHandler< T[N] >
+{
+ static ::size_t size(const T value[N]) { return sizeof(T[N]); }
+ static void const* ptr(const T value[N]) { return value; }
+};
+
template <>
struct KernelArgumentHandler<LocalSpaceArg>
{
@@ -3225,7 +3233,7 @@
* generated.
*/
template <typename T>
- cl_int setArg(cl_uint index, T value)
+ cl_int setArg(cl_uint index, T const & value)
{
return detail::errHandler(
::clSetKernelArg(


I still think KernelArgumentHandler::ptr should take a cont reference and return a const void* . And setArg should take const reference also, just like KernelFunctor::operator().

I am sure it is just typo, but just in case: ptr() in your example should return value and not its address.

bgaster
12-17-2009, 12:31 AM
Thinking about this some more I'm not sure this makes sense.

I agree that it will work with

template <typename T, int N>
struct KernelArgumentHandler<T[N]>
{
static ::size_t size(const T value[N]) { return sizeof(T[N]); }
static T* ptr(T value[N]) { return value; }
};

But my thoughts are that this is not a bug in the C++ bindings but in the particular implementation of cl_int2. It is not valid to pass an array as an argument to an OpenCL kernel, it must be a base type or a struct/union. This leads me to wonder how the cl_int2 is defined, can you post it's definition?

I think you are correct about the const I'll check tomorrow.

coleb
12-17-2009, 09:51 AM
This seems related to an annoyance with OpenCL vector types in CPU code I've always had. The following code won't even compile:



#include <vector>

#include "cl.hpp"

int main(int, char *[])
{
std::vector<cl_int2> vector;

cl_int2 vtype = {1, 2};
vector.push_back(vtype);

return 0;
}


I get the following error message with gcc 4.2.1 on Snow Leopard:



coleb@shark~/oecl$ make CLVectorTypesAnnoyance
/usr/bin/c++ -g -W -Wall -pedantic -O3 -Wno-long-long -ffast-math -arch x86_64 -msse3 -I./openeye/toolkits/include -I. -I./include_private -c CLVectorTypesAnnoyance.cpp
In file included from CLVectorTypesAnnoyance.cpp:3:
cl.hpp:1165: warning: unused parameter ‘errStr’
/usr/include/c++/4.2.1/ext/new_allocator.h: In member function ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Tp*, const _Tp&) [with _Tp = int32_t [2]]’:
/usr/include/c++/4.2.1/bits/stl_vector.h:604: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’
CLVectorTypesAnnoyance.cpp:10: instantiated from here
/usr/include/c++/4.2.1/ext/new_allocator.h:107: error: ISO C++ forbids initialization in array new
/usr/include/c++/4.2.1/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’:
/usr/include/c++/4.2.1/bits/stl_vector.h:608: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’
CLVectorTypesAnnoyance.cpp:10: instantiated from here
/usr/include/c++/4.2.1/bits/vector.tcc:252: error: array must be initialized with a brace-enclosed initializer
/usr/include/c++/4.2.1/bits/stl_vector.h:608: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’
CLVectorTypesAnnoyance.cpp:10: instantiated from here
/usr/include/c++/4.2.1/bits/vector.tcc:256: error: invalid array assignment
/usr/include/c++/4.2.1/bits/stl_algobase.h: In static member function ‘static _BI2 std::__copy_backward<_BoolType, std::random_access_iterator_tag>::__copy_b(_BI1, _BI1, _BI2) [with _BI1 = int32_t (*)[2], _BI2 = int32_t (*)[2], bool _BoolType = false]’:
/usr/include/c++/4.2.1/bits/stl_algobase.h:465: instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1 = int32_t (*)[2], _BI2 = int32_t (*)[2]]’
/usr/include/c++/4.2.1/bits/stl_algobase.h:474: instantiated from ‘static _BI2 std::__copy_backward_normal<<anonymous>, <anonymous> >::__copy_b_n(_BI1, _BI1, _BI2) [with _BI1 = int32_t (*)[2], _BI2 = int32_t (*)[2], bool <anonymous> = false, bool <anonymous> = false]’
/usr/include/c++/4.2.1/bits/stl_algobase.h:540: instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 = int32_t (*)[2], _BI2 = int32_t (*)[2]]’
/usr/include/c++/4.2.1/bits/vector.tcc:253: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’
/usr/include/c++/4.2.1/bits/stl_vector.h:608: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’
CLVectorTypesAnnoyance.cpp:10: instantiated from here
/usr/include/c++/4.2.1/bits/stl_algobase.h:433: error: invalid array assignment
/usr/include/c++/4.2.1/bits/stl_construct.h: In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = int32_t [2], _T2 = int32_t [2]]’:
/usr/include/c++/4.2.1/bits/stl_uninitialized.h:87: instantiated from ‘_ForwardIterator std::__uninitialized_copy_aux(_InputIterator, _InputIterator, _ForwardIterator, std::__false_type) [with _InputIterator = int32_t (*)[2], _ForwardIterator = int32_t (*)[2]]’
/usr/include/c++/4.2.1/bits/stl_uninitialized.h:114: instantiated from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = int32_t (*)[2], _ForwardIterator = int32_t (*)[2]]’
/usr/include/c++/4.2.1/bits/stl_uninitialized.h:254: instantiated from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp>) [with _InputIterator = int32_t (*)[2], _ForwardIterator = int32_t (*)[2], _Tp = int32_t [2]]’
/usr/include/c++/4.2.1/bits/vector.tcc:275: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’
/usr/include/c++/4.2.1/bits/stl_vector.h:608: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = int32_t [2], _Alloc = std::allocator<int32_t [2]>]’
CLVectorTypesAnnoyance.cpp:10: instantiated from here
/usr/include/c++/4.2.1/bits/stl_construct.h:81: error: ISO C++ forbids initialization in array new
make: *** [CLVectorTypesAnnoyance.o] Error 1


Basically, the vector types are interpreted as arrays, which can only be used in certain very restricted ways. If I had my way I would have defined them as fields in structs:


typedef struct _cl_int2 {
cl_int x;
cl_int y;
} cl_int2;


Unfortunately, this would break backwards compatibility. It could be done in C++ without breaking source level compatibility since we could define operator[] on the struct.

bgaster
12-17-2009, 10:27 AM
This is actually being worked on at Khronos and there is a new cl_platform.h that resolves both issues discussed here, i.e. vector types will not be defined as simple arrays and this will fix the issue with copy constructors and also the problem here with array types in setArg.

So I propose we hold on this until this appears publicly, which will be very soon, and I can tell you that both of these examples work without change on this new cl_platform.h. Of course, when this will be released as part of a particular vendors implementation is up to them and you will need to take this up there.

kirill
12-17-2009, 03:31 PM
It is true that "typedef T TN[N];" are a bit of Frankentypes: not really a pointer not really a value, and they could be annoying. But they are also very handy in some circumstances. I am not sure what changes are planned for cl_platform.h, but I am certain that won't be without a compromise.

As far as vendors are concerned they only see "const void*" and "size_t". All they should care about is that pointer is of appropriate alignment and the size matches that of the kernel argument. I guess I will have to use my own type in the meantime. This will do the trick for cl_float2 for example
namespace cl{ typedef struct _float2{ cl_float2 v;} float2;}

kirill
01-06-2010, 05:45 PM
Hello again. I am writing some convenience methods that wrap my data structures to be used with opencl via C++ bindings and I am experiencing some const correctness issues. I believe methods

cl::CommandQueue::enqueueWriteBuffer
cl::CommandQueue::enqueueWriteImage

should have
void const * ptr, not void * ptr. The underlying methods they use have const qualifiers for that variable.

Sorry if it has been raised already.

coleb
01-07-2010, 10:09 AM
You are correct, I noticed that the other day as well.

Thanks,
Brian

mollitz
01-13-2010, 07:27 AM
Are there some news according to the GL-Context-Sharing extension?

lebox
01-27-2010, 03:05 PM
Hi.

I have just downloaded rev 45 of the c++ bindings.
I'm compiling on mac os x 10.6.2 with gcc 4.2 from xcode.

I was just trying to find my way around the openCL so I stared with some basic basic programs but when I tried


#include <utility>
#define __CL_ENABLE_EXCEPTIONS
#define __NO_STD_VECTOR // Use cl::vector and cl::string and
#define __NO_STD_STRING // not STL versions, more on this later
#include "cl.hpp"

int main(int argc, char** argv) {
const char * helloStr = "__kernel void "
"hello(void) "
"{ "
" "
"} ";


cl_int err = CL_SUCCESS;
try {
cl::Context context(CL_DEVICE_TYPE_CPU, 0, NULL, NULL, &err);

cl::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();

cl::Program::Sources source(1, std::make_pair(helloStr,strlen(helloStr)));
cl::Program program_ = cl::Program(context, source);
program_.build(devices);

cl::Kernel kernel(program_, "hello", &err);

cl::CommandQueue queue(context, devices[0], 0, &err);
cl::KernelFunctor func = kernel.bind(queue, cl::NDRange(4, 4),cl::NDRange(2, 2));

func().wait();
}
catch (cl::Error& err) {
std::cerr
<< "ERROR: "
<< err.what()
<< "("
<< err.err()
<< ")"
<< std::endl;
}

return EXIT_SUCCESS;
}


first I hade to fix:

typedef VECTOR_CLASS< std::pair<const void*, ::size_t> > Binaries;
typedef VECTOR_CLASS< std::pair<const char*, ::size_t> > Sources;

to

typedef VECTOR_CLASS< ::std::pair<const void*, ::size_t> > Binaries;
typedef VECTOR_CLASS< ::std::pair<const char*, ::size_t> > Sources;

but now I get:
Build xCodeOpenGl of project xCodeOpenGl with configuration Debug

CompileC build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/Objects-normal/x86_64/main.o main.cpp normal x86_64 c++ com.apple.compilers.gcc.4_2
cd /Users/eske/Documents/xCodeOpenGl
setenv LANG en_US.US-ASCII
/Developer/usr/bin/gcc-4.2 -x c++ -arch x86_64 -fmessage-length=0 -pipe -Wno-trigraphs -fpascal-strings -fasm-blocks -O0 -Wreturn-type -Wunused-variable -D_GLIBCXX_DEBUG=1 -D_GLIBCXX_DEBUG_PEDANTIC=1 -isysroot /Developer/SDKs/MacOSX10.6.sdk -mfix-and-continue -fvisibility-inlines-hidden -mmacosx-version-min=10.6 -gdwarf-2 -iquote /Users/eske/Documents/xCodeOpenGl/build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/xCodeOpenGl-generated-files.hmap -I/Users/eske/Documents/xCodeOpenGl/build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/xCodeOpenGl-own-target-headers.hmap -I/Users/eske/Documents/xCodeOpenGl/build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/xCodeOpenGl-all-target-headers.hmap -iquote /Users/eske/Documents/xCodeOpenGl/build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/xCodeOpenGl-project-headers.hmap -F/Users/eske/Documents/xCodeOpenGl/build/Debug -I/Users/eske/Documents/xCodeOpenGl/build/Debug/include -I/Users/eske/Documents/xCodeOpenGl/build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/DerivedSources/x86_64 -I/Users/eske/Documents/xCodeOpenGl/build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/DerivedSources -c /Users/eske/Documents/xCodeOpenGl/main.cpp -o /Users/eske/Documents/xCodeOpenGl/build/xCodeOpenGl.build/Debug/xCodeOpenGl.build/Objects-normal/x86_64/main.o

In file included from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/debug/formatter.h:37,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/debug/debug.h:141,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/bits/stl_algobase.h:78,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/vector:66,
from /Users/eske/Documents/xCodeOpenGl/main.cpp:7:
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/typeinfo:144: error: expected class-name before '{' token
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/typeinfo:158: error: expected class-name before '{' token
In file included from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ext/new_allocator.h:37,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/i686-apple-darwin10/x86_64/bits/c++allocator.h:39,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/bits/allocator.h:53,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/vector:67,
from /Users/eske/Documents/xCodeOpenGl/main.cpp:7:
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/new:59: error: expected class-name before '{' token
In file included from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/debug/safe_base.h:38,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/debug/safe_sequence.h:41,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/debug/vector:40,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/vector:78,
from /Users/eske/Documents/xCodeOpenGl/main.cpp:7:
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ext/concurrence.h:71: error: expected class-name before '{' token
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ext/concurrence.h:79: error: expected class-name before '{' token
In file included from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ios:48,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ostream:45,
from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/iostream:45,
from /Users/eske/Documents/xCodeOpenGl/main.cpp:9:
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/bits/ios_base.h:208: error: expected class-name before '{' token
In file included from /Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/iostream:45,
from /Users/eske/Documents/xCodeOpenGl/main.cpp:9:
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ostream: In destructor 'std::basic_ostream<_CharT, _Traits>::sentry::~sentry()':
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ostream:415: error: there are no arguments to 'uncaught_exception' that depend on a template parameter, so a declaration of 'uncaught_exception' must be available
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ostream:415: error: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ostream: In destructor 'std::basic_ostream<_CharT, _Traits>::sentry::~sentry() [with _CharT = char, _Traits = std::char_traits<char>]':
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/bits/ostream_insert.h:84: instantiated from 'std::basic_ostream<_CharT, _Traits>& std::__ostream_insert(std::basic_ostream<_CharT, _Traits>&, const _CharT*, std::streamsize) [with _CharT = char, _Traits = std::char_traits<char>]'
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ostream:517: instantiated from 'std::basic_ostream<char, _Traits>& std::operator<<(std::basic_ostream<char, _Traits>&, const char*) [with _Traits = std::char_traits<char>]'
/Users/eske/Documents/xCodeOpenGl/main.cpp:34: instantiated from here
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/ostream:415: error: 'uncaught_exception' was not declared in this scope

is this a know bug? I tried to search the Bugzilla but found nothing.

thanks,
eske

RickA
01-29-2010, 03:31 PM
I'm having issues compiling the C++ OpenCL bindings.

I'm running on a MacBook Pro 2009 under WinXP with nVidia driver version 190.89.

Compiling the following code

#include <cstdio>
#include <cstdlib>
#include "c:/dev/3rdparty/opencl/cl.hpp"

int main(void)
{
cl_int err = CL_SUCCESS;
cl::Context context(CL_DEVICE_TYPE_GPU, 0, NULL, NULL, &err);
cl::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>(); // this line gives the error
return 0;
}

Gives me this compile error under Visual Studio 2005:

1>------ Build started: Project: opencltest, Configuration: Debug Win32 ------
1>Compiling...
1>main.cpp
1>c:\dev\test\opencl\opencltest\opencltest\main.cpp( 9) : error C2440: 'initializing' : cannot convert from 'cl::detail::param_traits<cl::detail::cl_context_info,4225>::param_type' to 'cl::vector<T>'
1> with
1> [
1> T=cl::Device
1> ]
1> No constructor could take the source type, or constructor overload resolution was ambiguous
1>Build log was saved at "file://c:\dev\test\opencl\opencltest\opencltest\Debug\Bui ldLog.htm"
1>opencltest - 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

louiswu
02-04-2010, 11:33 AM
Hi there,

I'm trying to work out how to use the sampler from the c++ bindings however there is no constructor given for the class where I was expecting something similar to the current openCL implementation which is like:

clCreateSampler(cxGPUContext, true, CL_ADDRESS_REPEAT, CL_FILTER_LINEAR, &ciErrNum);

Am I missing something in how to properly construct the cl::Sampler object? Thanks in advance.

mollitz
02-05-2010, 04:58 AM
Hey Ricka,

I'm not sure, but I think you have to define

#define __NO_STD_VECTOR
before including cl.hpp to use cl::vector. But I can't see, the reason, why the compiler doesn't confirm the missing of cl::vector

henkkas
03-03-2010, 04:26 AM
clCreateSampler(cxGPUContext, true, CL_ADDRESS_REPEAT, CL_FILTER_LINEAR, &ciErrNum);

Am I missing something in how to properly construct the cl::Sampler object? Thanks in advance.

Yesterday I started to learn OpenCL and stumbled on this C++ binding which seems to be the easy way of doing OpenCL programming.

It seems to me that the Sampler class is incomplete there should be a constructor with cl_addressing_mode and cl_filter_mode parameters or maybe even a Context member function:


Sampler Context::getSampler(cl_addressing_mode a, cl_filter_mode m);


It is possible to use the Sampler by initializing the Wrapper with a C API call like this


cl::Sampler linear;
linear() = clCreateSampler(context(), CL_TRUE, CL_ADDRESS_REPEAT, CL_FILTER_LINEAR, &err);

henkkas
03-03-2010, 05:01 AM
Hi,

Should it be possible to convert from cl::NDRange to cl::size_t<3> ?

I tried
cl::size_t<3> size = cl::NDRange(64,64,64);

Compiling this code with MSVC2008 produced this error
error C2440: 'initializing' : cannot convert from 'cl::NDRange' to 'cl::size_t<N>'
with
[
N=3
]
No constructor could take the source type, or constructor overload resolution was ambiguous

I am using std::vectors so __NO_STD_VECTOR is NOT defined.

it would just have been so convenient to use
enqueueMapImage(image, CL_TRUE, CL_MAP_WRITE, NDRange(0,0,0), NDRange(64,64,64), 0, 0);

wouldn't it even be appropriate for enqueueMapImage function (and others) to use the NDRange for offset and size parameter?

coleb
03-03-2010, 04:07 PM
Hi,
Should it be possible to convert from cl::NDRange to cl::size_t<3> ?


You can convert from cl::NDRange to size_t *. However, I would disagree that there should be an implicit conversion operator to go to cl::size_t<3>. This allows breaking some of the type safety afforded by using a cl::size_t<3>. For example, if we were to allow the conversion and if the 3rd dimension was accidentally left out like the following:



enqueueMapImage(image, CL_TRUE, CL_MAP_WRITE, NDRange(0,0), NDRange(64,64,64), 0, 0);


The compiler would allow this, and due to how memory allocators usually allocate memory, the program would probably run fine. Worse yet, memory is usually zero, so the program would crash only sometimes when the 3rd spot in memory occasionally wasn't zero.

A better option may be specializing the cl::size_t<3> class constructor to allow the syntax you want


template <int N>
struct size_t : public cl::vector< ::size_t, N> { };

template <>
struct size_t<3> : public cl::vector< ::size_t, 3>
{
size_t(size_t x, size_t y, size_t z)
{
push_back(x);
push_back(y);
push_back(z);
}
};


Then the same ease of use can be achieved.



enqueueMapImage(image, CL_TRUE, CL_MAP_WRITE, size_t<3>(0,0,0), size_t<3>(64,64,64), 0, 0);


And the compiler would quickly throw an error if this occurred:



enqueueMapImage(image, CL_TRUE, CL_MAP_WRITE, size_t<3>(0,0), size_t<3>(64,64,64), 0, 0);

bgaster
04-19-2010, 12:59 PM
I have uploaded a new version of the cl.hpp which addresses many of the issues discussed here. Please try them out and let me know how they go.

grabner
04-27-2010, 09:37 AM
cl::CommandQueue::getInfo<CL_QUEUE_CONTEXT>() doesn't increase the reference count of the cl::Context object returned by this method. This results in a program crash when the destructur is called for the unreferenced instance. See http://www.khronos.org/bugzilla/show_bug.cgi?id=296 for details and a piece of code that demonstrates this problem.

Since I learned that this is a known (and non-trivial) issue, I would like to discuss how this could be addressed. Are there any proposals for a fix?

Thanks & kind regards,
Markus

coleb
04-27-2010, 10:32 AM
cl::CommandQueue::getInfo<CL_QUEUE_CONTEXT>() doesn't increase the reference count of the cl::Context object returned by this method. This results in a program crash when the destructur is called for the unreferenced instance. See http://www.khronos.org/bugzilla/show_bug.cgi?id=296 for details and a piece of code that demonstrates this problem.

Since I learned that this is a known (and non-trivial) issue, I would like to discuss how this could be addressed. Are there any proposals for a fix?

Thanks & kind regards,
Markus

I've done the following in my local copy:


namespace detail {
// Specialized for cl_context
template <typename Func>
struct GetInfoHelper<Func, cl::Context>
{
static cl_int
get(Func f, cl_uint name, cl::Context* param)
{
cl_uint err = f(name, sizeof(cl::Context), param, NULL);
if (err != CL_SUCCESS) {
return err;
}

return ReferenceHandler<cl::Context::cl_type>::retain(param->object_);
}
};
}


The same needs to be done for all the types derived from cl::detail::Wrapper. Unfortunately, C++ isn't clever enough to specialize on a generic base class to effect all derived class (like Java generics do). So the above code needs to be duplicated for every OpenCL type. Not too bad since it can be done pseudo-manually with preprocessor macros.

The other tricky part is it needs cl::detail::Wrapper<T>::object_ to be a public object. This is a hack. In reality what is needed is getters for the cl_* type inside the object. Something like the following:



T cl::detail::Wrapper<T>::getCLObject() { return object_; }


I'm planning on submitting the above patches once my company is accepted as a Khronos member and I'm granted SVN access. So what do you think of the fix? Can you think of improvements?

matrem
04-28-2010, 02:43 AM
In my opencl c++ layer there is a generic retain/release system.

I have these base classes :

namespace cl
{
namespace base
{
/** Object wich contain a cl type instance. */
template<typename CLType_>
class CLObject
{
public:
typedef CLType_ CLType;

static CLType const & GetObjectCLInstance(CLObject<CLType_> const & object);
CLObject & operator = (CLObject const & copy);

// bool operator == (CLObject<CLType_> const & comp) const;
bool CLEquals(CLObject<CLType_> const & comp) const;

protected:
typedef CLObject<CLType_> CLObjectBase;

CLObject();
CLObject(CLObject const & copy);
explicit CLObject(CLType const & copy);

template<typename OtherCLType>
static OtherCLType const & GetCLInstance(CLObject<OtherCLType> const & object);

CLType clInstance_;
};

/** Object wich can be retained and released from opencl. */
template<typename ChildType, typename CLType>
class CLRefObject : public CLObject<CLType>
{
public:
CLRefObject();
~CLRefObject();
CLRefObject(CLRefObject const & copy);
explicit CLRefObject(CLType const & copy);
CLRefObject & operator = (CLRefObject const & copy);

protected:
typedef CLRefObject<ChildType, CLType> CLRefObjectBase;
};

/** Object wich can be instanciated from a cl type instance. */
template<typename ChildType, typename CLType>
class InstanciableFromCLObject
{
public:
typedef ChildType APIType;

static APIType CreateObjectFromCLInstance(CLType const & clInstance);

protected:
typedef InstanciableFromCLObject<ChildType, CLType> InstanciableFromCLObjectBase;

InstanciableFromCLObject();
InstanciableFromCLObject(InstanciableFromCLObject const &);
};

/** Compound object from CLObject and InstanciableFromCLObject. */
template<typename ChildType, typename CLType>
class CLObjectInstanciableFromCLObject : public CLObject<CLType>, public base::InstanciableFromCLObject<ChildType, CLType>
{
protected:
typedef CLObjectInstanciableFromCLObject<ChildType, CLType> CLObjectInstanciableFromCLObjectBase;

CLObjectInstanciableFromCLObject();
CLObjectInstanciableFromCLObject(CLObjectInstancia bleFromCLObject const & copy);
explicit CLObjectInstanciableFromCLObject(CLType const & copy);
};

/** Compound object from CLRefObject and InstanciableFromCLObject. */
template<typename ChildType, typename CLType>
class CLRefObjectInstanciableFromCLObject : public base::CLRefObject<ChildType, CLType>, public base::InstanciableFromCLObject<ChildType, CLType>
{
protected:
typedef CLRefObjectInstanciableFromCLObject<ChildType, CLType> CLRefObjectInstanciableFromCLObjectBase;

CLRefObjectInstanciableFromCLObject();
CLRefObjectInstanciableFromCLObject(CLRefObjectIns tanciableFromCLObject const & copy);
explicit CLRefObjectInstanciableFromCLObject(CLType const & copy);
};

} // namespace base

} //namespace cl

Where CLRefObject is responsible for retain/release system :

//CLRefObject
template<typename ChildType, typename CLType>
CLRefObject<ChildType, CLType>::~CLRefObject()
{
if(this->clInstance_ != NULL)
{
detail::error::manage(
ChildType::ReleaseCLObject(this->clInstance_),
"clRelease*"
);
}
}

template<typename ChildType, typename CLType>
CLRefObject<ChildType, CLType>::CLRefObject(CLRefObject const & copy)
: CLObject<CLType>(copy)
{
detail::error::manage(
ChildType::RetainCLObject(this->clInstance_),
"clRetain*"
);
}

template<typename ChildType, typename CLType>
CLRefObject<ChildType, CLType>::CLRefObject(CLType const & copy)
: CLObject<CLType>(copy)
{
detail::error::manage(
ChildType::RetainCLObject(this->clInstance_),
"clRetain*"
);
}

template<typename ChildType, typename CLType>
CLRefObject<ChildType, CLType> & CLRefObject<ChildType, CLType>::operator = (CLRefObject const & copy)
{
detail::error::manage(
ChildType::ReleaseCLObject(this->clInstance_),
"clRelease*"
);

CLObject<CLType>::operator = (copy);

detail::error::manage(
ChildType::RetainCLObject(this->clInstance_),
"clRetain*"
);

return *this;
}

Each CLRefObject child have to define RetainCLObject and ReleaseCLObject function (example with context):

cl_int Context::RetainCLObject(cl_context const & clInstance)
{
return clRetainContext(clInstance);
}

cl_int Context::ReleaseCLObject(cl_context const & clInstance)
{
return clReleaseContext(clInstance);
}

And InstanciableFromCLObject provide API to create a "c++ opencl object" from a "cl_object" :

//InstanciableFromCLObject
template<typename ChildType, typename CLType>
ChildType InstanciableFromCLObject<ChildType, CLType>::CreateObjectFromCLInstance(CLType const & clInstance)
{
return APIType(clInstance);
}

And for example Context class inherit from CLRefObjectInstanciableFromCLObject (wich have the api of CLRefObject and InstanciableFromCLObject) :

class OPENCL_CPP_WINDLL Context : public base::CLRefObjectInstanciableFromCLObject<Context, cl_context>
{

And so when we call GetContext on a commandqueue :

Context CommandQueue::GetContext() const
{
cl_context clContext;
GetInfo<CONTEXT>(clContext);
return Context::CreateObjectFromCLInstance( clContext );
}

It automatically create a Context from a cl_context, with CreateObjectFromCLInstance wich call the CLRefObject constructor and so retain object.

grabner
04-29-2010, 05:03 AM
Thanks for your proposals, I slightly modified them and came up with an approach that works for me. Basically it does the following:
*) define a template function "fixReferenceCount()" which does nothing
*) overload this function for all relevant types (e.g., cl::Context), and call their retain() method
*) call "fixReferenceCount()" for data returned by the native OpenCL functions (in GetInfoHelper::get())

It might be incomplete and not fitting into cl.hpp style, but you get the idea. I made Wrapper<T>::retain() public to simplify things, but it can probably be done without this.

What do you think about this method?

Thanks & kind regards,
Markus


Here is the patch against cl.hpp as of 2010-04-23 10:16:50 -0500:


--- R:/ATI Stream/include/CL/cl-1.0.hpp Thu Apr 29 11:46:57 2010
+++ R:/ATI Stream/include/CL/cl.hpp Thu Apr 29 13:44:44 2010
@@ -824,7 +824,9 @@
static cl_int
get(Functor f, cl_uint name, T* param)
{
- return f(name, sizeof(T), param, NULL);
+ cl_int ret = f(name, sizeof(T), param, NULL);
+ fixReferenceCount(param);
+ return ret;
}
};

@@ -1205,8 +1207,6 @@

cl_type& operator ()() { return object_; }

-protected:
-
cl_int retain() const
{
return ReferenceHandler<cl_type>::retain(object_);
@@ -7336,6 +7336,27 @@

return event;
}
+
+template <typename T>
+static inline void fixReferenceCount(T *param)
+{
+};
+
+#define FIX_REFERENCE_COUNT(T) \
+static inline void fixReferenceCount(T *param) \
+{ \
+ param->retain(); \
+}
+
+FIX_REFERENCE_COUNT(Context);
+FIX_REFERENCE_COUNT(CommandQueue);
+FIX_REFERENCE_COUNT(Memory);
+FIX_REFERENCE_COUNT(Sampler);
+FIX_REFERENCE_COUNT(Program);
+FIX_REFERENCE_COUNT(Kernel);
+FIX_REFERENCE_COUNT(Event);
+
+#undef FIX_REFERENCE_COUNT

#undef __ERR_STR
#if !defined(__CL_USER_OVERRIDE_ERROR_STRINGS)

monarodan
05-03-2010, 12:52 AM
Hi All,

Just wondering why the cl::Buffer is not a template class. This would allow better type safety and avoiding the need for multiplying by sizeof( xxx ) everywhere. For example:



template<class T>
class Buffer : public Memory
{
public:

Buffer(
const Context& context,
cl_mem_flags flags,
::size_t numElements,
T* host_ptr = NULL,
cl_int* err = NULL)
{
cl_int error;
object_ = ::clCreateBuffer(context(), flags, numElements*sizeof(T), (void*)host_ptr, &error);

detail::errHandler(error, __CREATE_BUFFER_ERR);
if (err != NULL) {
*err = error;
}
}

//! Default constructor; buffer is not valid at this point.
Buffer() { }
};


Methods such as cl::CommandQueue::enqueueReadBuffer() would become template methods also and deal with a cl::Buffer<T>.

This seems more inline with the way containers work in the STL and provides a level of compile time type safety that one might expect from a C++ wrapper.

Thoughts?

Cheers,

Dan

matrem
05-03-2010, 01:48 AM
Using a template<T> class Buffer will force developers to always use T with this buffer.
But OpenCL offers more flexibility, it permits to use the same buffer many times with several types.

I think a c++ opencl layer, should offer the same level of liberty than the original c API.

monarodan
05-03-2010, 01:52 AM
Hi Again,

When I got around to looking at cl::CommandQueue::enqueueReadBuffer() closely, I was surprised to find that it took a pointer to vector of events, with an empty vector being indicated by passing a null pointer. This seems quite anti-C++.

Original implementation:



cl_int enqueueReadBuffer(
const Buffer& buffer,
cl_bool blocking,
::size_t offset,
::size_t size,
void* ptr,
const VECTOR_CLASS<Event>* events = NULL,
Event* event = NULL) const
{
return detail::errHandler(
::clEnqueueReadBuffer(
object_, buffer(), blocking, offset, size,
ptr,
(events != NULL) ? (cl_uint) events->size() : 0,
(events != NULL) ? (cl_event*) &events->front() : NULL,
(cl_event*) event),
__ENQUEUE_READ_BUFFER_ERR);
}


Note that as written, the behavior when an empty vector is passed is undefined as the return value of "events->front()" is undefined. This is not noted in the functions documentation, nor is it asserted (or otherwise checked) that the size of the vector is greater than zero.

What is the reasoning for not implementing the function along the lines of the following:



cl_int enqueueReadBuffer(
const Buffer& buffer,
cl_bool blocking,
::size_t offset,
::size_t size,
void* ptr,
const VECTOR_CLASS<Event>& events = VECTOR_CLASS<Event>(),
Event* event = NULL) const
{
return detail::errHandler(
::clEnqueueReadBuffer(
object_, buffer(), blocking, offset, size,
ptr,
events.size(),
events.size() ? (cl_event*) &events.front() : NULL,
(cl_event*) event),
__ENQUEUE_READ_BUFFER_ERR);
}


Cheers,

Dan

monarodan
05-03-2010, 02:05 AM
Using a template<T> class Buffer will force developers to always use T with this buffer.
But OpenCL offers more flexibility, it permits to use the same buffer many times with several types.

I think a c++ opencl layer, should offer the same level of liberty than the original c API.

It should be noted that a cl::Buffer is just a facade to a cl_mem object and you must be aware of this to use it (for example, the cl::Buffer copy constructor does not create a new cl_buffer - instead, it aliases it and bumps the reference count). Personally, I see cl::Buffer acting more like a smart pointer than an actual buffer, in which case its name should really be something like cl::BufferPtr. Now it seems clear that the idea of binding a cl::BufferPtr to an existing cl_buffer make sense, so you do not lose the opportunity of reusing a buffer with a different data type at all.

C++ programmers are used to using reinterpret cast on pointers, so a reinterpret cast from cl::BufferPtr<T> to cl::BufferPtr<U> should be a valid concept (so as to not lower your level of liberty over the original c API).

Cheers,

Dan

monarodan
05-03-2010, 02:10 AM
Just a quick correction - I've used the term cl_buffer where I should have used cl_mem in my previous post.

Dan



Using a template<T> class Buffer will force developers to always use T with this buffer.
But OpenCL offers more flexibility, it permits to use the same buffer many times with several types.

I think a c++ opencl layer, should offer the same level of liberty than the original c API.

It should be noted that a cl::Buffer is just a facade to a cl_mem object and you must be aware of this to use it (for example, the cl::Buffer copy constructor does not create a new cl_buffer - instead, it aliases it and bumps the reference count). Personally, I see cl::Buffer acting more like a smart pointer than an actual buffer, in which case its name should really be something like cl::BufferPtr. Now it seems clear that the idea of binding a cl::BufferPtr to an existing cl_buffer make sense, so you do not lose the opportunity of reusing a buffer with a different data type at all.

C++ programmers are used to using reinterpret cast on pointers, so a reinterpret cast from cl::BufferPtr<T> to cl::BufferPtr<U> should be a valid concept (so as to not lower your level of liberty over the original c API).

Cheers,

Dan

pauldoo
05-03-2010, 02:30 AM
I am finding the C++ bindings way easier to use than the C ones. They are great. :)

One suggestion from me is that "__CL_ENABLE_EXCEPTIONS" should be the default, and there should exist a '#define' only to disable exceptions. This to me would feel more like C++.

matrem
05-03-2010, 08:29 AM
... its name should really be something like cl::BufferPtr ..
You right, another class which would be over cl::Buffer (why not cl::BufferPtr) could implement your "one data type" concept.
But perhaps it's not the c++ OpenCL layer, but another one (a bit more high level).

Howerver, I think class cl::Buffer could have some of its function template such as his constructor, enqueue read buffer ....


... are used to using reinterpret cast...I think c++ developers like avoiding reinterpret_cast when they can.


"__CL_ENABLE_EXCEPTIONS" should be the default+1

monarodan
05-03-2010, 06:05 PM
... its name should really be something like cl::BufferPtr ..
You right, another class which would be over cl::Buffer (why not cl::BufferPtr) could implement your "one data type" concept.
But perhaps it's not the c++ OpenCL layer, but another one (a bit more high level).

Howerver, I think class cl::Buffer could have some of its function template such as his constructor, enqueue read buffer ....


Template member functions really only get you part way there - their user would remove casting host pointers to void* and you would not longer need to multiply by sizeof(x) anymore. However, it's still not very type safe.

I have implemented my own template class that wraps the cl::Buffer for the moment, but I think that it should be a concept baked into the standard C++ bindings.



... are used to using reinterpret cast...I think c++ developers like avoiding reinterpret_cast when they can.


They also like avoiding cast to void* when they can. They also like type checking performed by the compiler.

monarodan
05-03-2010, 11:33 PM
Hi All,

Something that all users of the C++ bindings should be aware of is the following. When using the template magic to bind kernel args, you will call a function declared like the following:


/*! \brief Enqueue a command to execute a kernel on a device.
*
* \param a1 is used argument 0 for the kernel call.
*
* \param events specifies the list of events that need to complete before
* this particular command can be executed. If \a events is NULL, its
* default value, then this particular command does not wait on any event
* to complete. The events specified in \a events act as
* synchronization points.
* \return An event that identifies this particular kernel
* execution instance.
*
* \note In the case that exceptions are enabled and error value
* other than CL_SUCCESS is generated, then cl::Error exception is
* generated, otherwise the returned error is stored in the Kernel
* object and can get accessed using \a get_error.
*/
template<typename A1>
inline Event operator()(
const A1& a1,
const VECTOR_CLASS<Event>* events = NULL);


Whose implementation is as follows:



template<typename A1>
Event KernelFunctor::operator()(
const A1& a1,
const VECTOR_CLASS<Event>* events)
{
Event event;

kernel_.setArg(0,a1);

err_ = queue_.enqueueNDRangeKernel(
kernel_,
offset_,
global_,
local_,
NULL, // bgaster_fixme - do we want to allow wait event lists?
&event);

return event;
}


Notice the "fixme" in the code. Passing NULL here contradicts the functions documentation.

To answer the question that is asked, yes, we do want to allow wait event lists. As written, there is an unexpected time bomb waiting to go off for those who set up their queues to allow out of order execution. Can this please be rectified (in all 16 places) to pass "events" through to enqueueNDRangeKernel() rather than NULL in a release in the near future?

Cheers,

Dan

monarodan
05-04-2010, 06:44 PM
Hi All,

Notice the "fixme" in the code. Passing NULL here contradicts the functions documentation.

To answer the question that is asked, yes, we do want to allow wait event lists. As written, there is an unexpected time bomb waiting to go off for those who set up their queues to allow out of order execution. Can this please be rectified (in all 16 places) to pass "events" through to enqueueNDRangeKernel() rather than NULL in a release in the near future?

Cheers,

Dan

I thought I'd fix this in my local header by simply passing "events" in place of NULL to enqueueNDRangeKernel(). However, there was an unexpected complication. When invoking a functor with one argument as follows:



VECTOR_CLASS<Event> events;
cl::KernelFunctor func = ...;
cl::Event e = func( a1, &events );


The KernelFunctor::operator() that is matched by my compiler (VS2008) is the one with two kernel args rather than the expected one kernel arg. If the default argument for the "events" parameter is removed, then the correct template function is used. Perhaps this is what the "fixme" in the code refers to?

My resolution has been to remove the default value, but this may not satisfy many users of the API as they will have to pass NULL when invoking there functors when there is no events to wait on. I would like to know how others think this should be fixed.

One possibility is to specify the events to wait on as part of binding of the KernelFunctor:



VECTOR_CLASS<Event> events;
cl::KernelFunctor func = kernel.bind( queue, cl::NDRange(...), cl::NDRange(...), &events );
cl::Event e = func( a1 );


Thoughts?

Cheers,

Dan

coleb
05-05-2010, 11:06 AM
There has been some talk about merging the KernelFunctor convenience methods into the Kernel class. This is because the way the API is structured now is misleading. For example, it appears like the same kernel could be bound to multiple queues like the following:



KernelFunctor func1 = kernel.bind(queue1);
func1(arg1);

KernelFunctor func2 = kernel.bind(queue2);
func1(arg2);


This is quite evil when mixed with a multi-threading host or an asynchronous queue or non-blocking calls. The solution I think is to move the the operator() methods from cl::KernelFunctor into cl::Kernel. It appears to me that a cl_kernel object should only be used for a single kernel invocation at a time (though I can't find anything in the standard explicitly referring to what extent cl_kernel objects can be reused and when).

So back to your question about event wait lists. With the above change I think it would make sense to add another method to the cl::Kernel object for setting an event wait list. At the end of the day it will look like this:



kernel.bind(queue, cl::NDRange(...), cl::NDRange(...));
kernel.waitForEvents(events);
kernel(a1);


Though if we were to go this route we could even separate out the NDRanges into their own method as well:


kernel.setCommandQueue(queue);
kernel.setNDRange(cl::NDRange(...), cl::NDRange(...));
kernel.waitForEvents(events);
kernel(a1);


Though this is a part of the API I've never really had a solid feeling about how it should be designed. PyOpenCL does it differently, requiring everything to be passed to the functor. I started a thread over there to discuss the design (http://host304.hostmonster.com/pipermai ... 00288.html (http://host304.hostmonster.com/pipermail/pyopencl_tiker.net/2010-March/000288.html)), and they seem to have the same "not quite sold" mentality. Personally, I can never remember long argument lists and I think they're prone to bugs, so I prefer the many methods approach.

Any other preferences from people?

monarodan
05-05-2010, 07:02 PM
There has been some talk about merging the KernelFunctor convenience methods into the Kernel class.

I second that!



So back to your question about event wait lists. With the above change I think it would make sense to add another method to the cl::Kernel object for setting an event wait list. At the end of the day it will look like this:



kernel.bind(queue, cl::NDRange(...), cl::NDRange(...));
kernel.waitForEvents(events);
kernel(a1);


Though if we were to go this route we could even separate out the NDRanges into their own method as well:


kernel.setCommandQueue(queue);
kernel.setNDRange(cl::NDRange(...), cl::NDRange(...));
kernel.waitForEvents(events);
kernel(a1);


Though this is a part of the API I've never really had a solid feeling about how it should be designed. <snip> . Personally, I can never remember long argument lists and I think they're prone to bugs, so I prefer the many methods approach.


I guess there are a couple of design principals that I follow that might help us converge on an API.

1) Identify attributes that are immutable for the lifetime of the class, pass these into the constructor and use them to initialise a const members. If the case of the Kernel, it may be reasonable to think of the binding to a queue to be immutable and set only when the Kernel is constructed. Note - if I understand the comments on the PyOpenCL thread you linked to correctly, the use of immutable members overcomes the multi-threading concerns of using a "bound kernel".

2) Avoid having a many-function API (as you suggest) where it becomes an error to call one method if another has not yet been called. For example, it would be an error to invoke the kernel if you had not yet set the global NDRange (note that the local NDRange could default to a null range, so setting it could be stripped out into another method).

3) Global functions should be considered as a means to add a layer on top of class to, say, simplify its use, without compromising the API of the class itself.

I feel that the following code is not desirable because if you forget to call either bind() or setCommandQueue(), then operator() will have to indicate an error somehow (and that error is not a cl_int)!



kernel.setCommandQueue(queue);
kernel.bind(cl::NDRange(...), cl::NDRange(...));
kernel.waitForEvents(events);
kernel(a1);


I'd like to understand the overheads of creating kernels better. I have assumed (and have only anecdotal evidence) that creating/destroying kernels is lightweight, so I do to try to reuse kernels in my code. Given that, I would tend to use overloaded constructors to specify the queue (required arg), global range (required arg), local range (optional arg) and events to wait on (optional arg). I would still use operator() to invoke the kernel as per the current KernelFunctor API. That's only four constructors to be written, assuming that default arguments aren't used.

Cheers,

Dan

coleb
05-06-2010, 11:40 AM
I guess there are a couple of design principals that I follow that might help us converge on an API.

1) Identify attributes that are immutable for the lifetime of the class, pass these into the constructor and use them to initialise a const members. If the case of the Kernel, it may be reasonable to think of the binding to a queue to be immutable and set only when the Kernel is constructed. Note - if I understand the comments on the PyOpenCL thread you linked to correctly, the use of immutable members overcomes the multi-threading concerns of using a "bound kernel".


Using the same kernel from multiple threads will always be flat out evil. There's not much that can be done about it since clSetKernelArg and the subsequent clEnqueue* commands will always race. So I just make sure to always use separate kernels for separate threads. The key is that we don't want to give the impression from the C++ API that it is thread-safe (I think getting rid of the KernelFunctor accomplishes this).



2) Avoid having a many-function API (as you suggest) where it becomes an error to call one method if another has not yet been called. For example, it would be an error to invoke the kernel if you had not yet set the global NDRange (note that the local NDRange could default to a null range, so setting it could be stripped out into another method).


I agree with this principle. Though when the rubber hits the road this can be difficult to adhere to %100 of the time. For example, the STL does a fantastic job of following this principle. Though sometimes it slips, for example, vector::front is undefined on an empty vector.

I just feel like with cl::Kernel we've been backed into a rather nasty corner and I'm keeping my eyes open for any way out.



3) Global functions should be considered as a means to add a layer on top of class to, say, simplify its use, without compromising the API of the class itself.

I feel that the following code is not desirable because if you forget to call either bind() or setCommandQueue(), then operator() will have to indicate an error somehow (and that error is not a cl_int)!



kernel.setCommandQueue(queue);
kernel.bind(cl::NDRange(...), cl::NDRange(...));
kernel.waitForEvents(events);
kernel(a1);


I'd like to understand the overheads of creating kernels better. I have assumed (and have only anecdotal evidence) that creating/destroying kernels is lightweight, so I do to try to reuse kernels in my code. Given that, I would tend to use overloaded constructors to specify the queue (required arg), global range (required arg), local range (optional arg) and events to wait on (optional arg). I would still use operator() to invoke the kernel as per the current KernelFunctor API. That's only four constructors to be written, assuming that default arguments aren't used.

Cheers,
Dan


So I've seen mixed evidence for how lightweight kernels are. Creating a kernel in the OSX implementation is quite fast (~1 microsecond). The NVidia Linux implementation is a lot worse, taking ~1 millisecond. I've already submitted a performance bug report to NVidia about this, though I don't have a sense of how fundamental a problem this is for them.

So in my application I cache a cl::Kernel for every host thread and then launch it several times. Also note, each invocation then has a separate global range for me, so this is not constant across the lifetime of the kernel. Theoretically, the command queue could change as well: imagine a system that load balances across multiple command queues by dequeueing the next kernel to invoke from a protected FIFO queue. Also note, the global range is optional as well as kernels can be queued with clEnqueueTask. So in reality the command queue, global range, local range, events, and arguments are all mutable attributes of a kernel.

Luckily, there is an error conditions for an invalid command queue being sent to clEnqueueNDRange, it is CL_INVALID_COMMAND_QUEUE. Hopefully, passing NULL will trigger this error. Also, the default global and local range can be (1, 1, 1), which is equivalent to a clEnqueueTask. So maybe setters aren't so bad...

I'm not sure, just putting out more talking points. :-)

-Brian

monarodan
05-06-2010, 05:28 PM
2) Avoid having a many-function API (as you suggest) where it becomes an error to call one method if another has not yet been called. For example, it would be an error to invoke the kernel if you had not yet set the global NDRange (note that the local NDRange could default to a null range, so setting it could be stripped out into another method).


I agree with this principle. Though when the rubber hits the road this can be difficult to adhere to %100 of the time. For example, the STL does a fantastic job of following this principle. Though sometimes it slips, for example, vector::front is undefined on an empty vector.


Quite true, and ain't it annoying!



So I've seen mixed evidence for how lightweight kernels are. Creating a kernel in the OSX implementation is quite fast (~1 microsecond). The NVidia Linux implementation is a lot worse, taking ~1 millisecond. I've already submitted a performance bug report to NVidia about this, though I don't have a sense of how fundamental a problem this is for them.

So in my application I cache a cl::Kernel for every host thread and then launch it several times. Also note, each invocation then has a separate global range for me, so this is not constant across the lifetime of the kernel. Theoretically, the command queue could change as well: imagine a system that load balances across multiple command queues by dequeueing the next kernel to invoke from a protected FIFO queue.


If we are going to try and achieve reuse of kernels, then things change considerably. I think that cl::Kernel should remain as simple and flexible as possible, but possibly a pain to use. We can then write a set of utilities that make it simple to reuse kernels, say a "kernel pool" or similar concept. I like the way OpenGL works in this respect with the separation of the GL and GLU libraries. For example, glFrustum() is a pain to use (but very flexible), however, gluPerspective() is simple to use and satisfies almost all uses of the glFrustrum() method.

As part of my work I will be writing something along the lines of a kernel pool and a buffer pool. I'll keep in mind the cl::Kernel design problem when doing this and post here if I get great ideas.

Cheers,

Dan

Tanek
05-25-2010, 07:41 PM
Hi,

I was beginning writing my one C++ wrapper when I saw in fact there was already one. But I was a little bit surprised on how you handle ressource management with reference counting... Why just simply use std::shared_ptr or do a similar thing? I will not explain the advantage of this design which you can find everywhere in the web (not intrusive, optional, etc.). Also for the Image class, I wrote one and the first things I did was add methods to access the image information:

http://gitorious.org/motion_estimation/ ... cl/Image.h (http://gitorious.org/motion_estimation/motion_estimation/blobs/master/src/cl/Image.h)

Is there a reason why you didn't do this?

coleb
05-26-2010, 09:41 AM
Hi,

I was beginning writing my one C++ wrapper when I saw in fact there was already one. But I was a little bit surprised on how you handle ressource management with reference counting... Why just simply use std::shared_ptr or do a similar thing? I will not explain the advantage of this design which you can find everywhere in the web (not intrusive, optional, etc.).


All the objects handle the reference counting automatically for you the same as shared_ptr. So I'm not quite sure what the question is. Is it "why not just use shared_ptr?" I believe the answer to that is that a goal of the bindings was that they could work in the absence of the STL, this can really help with portability. For example, shared_ptr isn't that ubiquitous yet. It is only available in GCC >4.1. And boost is very heavy hammer to yield on an interface that is as portable as a single header file.



Also for the Image class, I wrote one and the first things I did was add methods to access the image information:

http://gitorious.org/motion_estimation/ ... cl/Image.h (http://gitorious.org/motion_estimation/motion_estimation/blobs/master/src/cl/Image.h)

Is there a reason why you didn't do this?

All this information is accessible on the underlying cl_mem object using the clGetImageInfo function with the cl_image_info constants. The equivalent in OpenCL C++ is to use the getImageInfo method:


cl::Image2D image(...);

cl_image_format format = image.getImageInfo<CL_IMAGE_FORMAT>();
size_t width = image.getImageInfo<CL_IMAGE_WIDTH>();
size_t height = image.getImageInfo<CL_IMAGE_HEIGHT>();


All OpenCL C++ objects also have an getInfo method used to query the object for specific properties. For example, to get the context the image is associated with:


cl::Image2D image(...);

cl::Context context = image.getInfo<CL_MEM_CONTEXT>();


Note, there's a known bug with reference counting when the getInfo method returns an OpenCL C++ object. The fix is in the works and will hopefully be out shortly.

Tanek
05-26-2010, 10:28 AM
All the objects handle the reference counting automatically for you the same as shared_ptr. So I'm not quite sure what the question is. Is it "why not just use shared_ptr?" I believe the answer to that is that a goal of the bindings was that they could work in the absence of the STL, this can really help with portability. For example, shared_ptr isn't that ubiquitous yet. It is only available in GCC >4.1. And boost is very heavy hammer to yield on an interface that is as portable as a single header file.


You can also write your own very simplified shared_ptr and give an option to the user to use the implementation they want (like you did for vector and string). I question more the design (using a standard and non intrusive reference counting) than the implementation.



All this information is accessible on the underlying cl_mem object using the clGetImageInfo function with the cl_image_info constants. The equivalent in OpenCL C++ is to use the getImageInfo method:


cl::Image2D image(...);

cl_image_format format = image.getImageInfo<CL_IMAGE_FORMAT>();
size_t width = image.getImageInfo<CL_IMAGE_WIDTH>();
size_t height = image.getImageInfo<CL_IMAGE_HEIGHT>();



Ok but this is probably a lot slower than having stored the information directly into the structure (since you need to call the driver) and not very obvious to use (you should probably add one of these lines in the example). An image.get_width() is more natural for me. But I can understand if you disagree with that since it begins to be subjective.

But the reference counting... I would prefer a design that is standard and can use a standard implementation: everyone will know very soon shared_ptr like they know std::vector and will have their compiler providing it.

david.garcia
05-26-2010, 12:18 PM
Everyone will know very soon shared_ptr like they know std::vector and will have their compiler providing it.

Embedded/mobile programmers have to work in platforms where such libraries are not available and probably will not be available for years to come.

coleb
05-26-2010, 12:52 PM
You can also write your own very simplified shared_ptr and give an option to the user to use the implementation they want (like you did for vector and string). I question more the design (using a standard and non intrusive reference counting) than the implementation.


That's essentially what detail::Wrapper is, an implementation of shared_ptr.

Also note the design of OpenCL C++ layer does not preclude you from using shared_ptr. The following should work:


std::shared_ptr<cl::Image2D> imagePtr;


It may even be preferable from a performance point of view since std::shared_ptr is implemented using atomics and cl::detail::Wrapper is implemented using cl(Retain/Release)*. If the above doesn't work let us know and we can make sure the OpenCL C++ layer can accomodate it.



(you should probably add one of these lines in the example).


I agree it's a little obscure, more examples are definitely in order. AMD's implementation ships with OpenCL C++ examples, Apple and NVidia doesn't. If Apple and NVidia shipped examples (and maybe they will in the future) it would sure help solidify C++'s importance. Maybe Khronos should even ship a set of examples that should work across all implementations. Would be a fantastic teaching tool since all the examples that ship with implementations have dependencies on that implementation's example set up.



Ok but this is probably a lot slower than having stored the information directly into the structure (since you need to call the driver) and not very obvious to use
An image.get_width() is more natural for me. But I can understand if you disagree with that since it begins to be subjective.


Premature optimization is the root of all evil. I have a production multi-threaded database server application that makes heavy use of the getInfo methods and have never seen a performance issue. If there is one the vendor should be notified.

There are good reasons to keep the objects as lightweight as possible, i.e., sizeof(cl::Context) == sizeof(cl_context). When passing the objects to an argument handler they are very easy to translate into what OpenCL C needs. Also, the interface is very easy to update when new properties are added to the various OpenCL C objects (it's a single table within the header file).

Thanks for the feedback, I hope you find the bindings useful enough to suit your needs.

Tanek
05-26-2010, 09:55 PM
[quote=Tanek]
You can also write your own very simplified shared_ptr and give an option to the user to use the implementation they want (like you did for vector and string). I question more the design (using a standard and non intrusive reference counting) than the implementation.


That's essentially what detail::Wrapper is, an implementation of shared_ptr.

Also note the design of OpenCL C++ layer does not preclude you from using shared_ptr. The following should work:


std::shared_ptr<cl::Image2D> imagePtr;


It may even be preferable from a performance point of view since std::shared_ptr is implemented using atomics and cl::detail::Wrapper is implemented using cl(Retain/Release)*. If the above doesn't work let us know and we can make sure the OpenCL C++ layer can accomodate it.


If cl::Wrapper was an implementation of shared_ptr it would not be the parent of all the class of OpenCL or is there something I don't get? Also the constructor and destructor of Image2D are using the function retain/release (indirectly by calling the constructor/destructor of Wrapper) so I don't see how shared_ptr could be faster than using cl::Wrapper since anyway the functions of cl::Wrapper will be called by shared_ptr.



Premature optimization is the root of all evil. I have a production multi-threaded database server application that makes heavy use of the getInfo methods and have never seen a performance issue. If there is one the vendor should be notified.

There are good reasons to keep the objects as lightweight as possible, i.e., sizeof(cl::Context) == sizeof(cl_context). When passing the objects to an argument handler they are very easy to translate into what OpenCL C needs. Also, the interface is very easy to update when new properties are added to the various OpenCL C objects (it's a single table within the header file).
[/quote[

You convinced me on this one ;).

[quote:g2mg0dsh]
Thanks for the feedback, I hope you find the bindings useful enough to suit your needs.
[/quote:g2mg0dsh][/quote:g2mg0dsh]

Thanks for your time answering my questions!

coleb
05-27-2010, 09:30 AM
If cl::Wrapper was an implementation of shared_ptr it would not be the parent of all the class of OpenCL or is there something I don't get?


Very likely there's something I'm not getting about shared_ptr as well, we're not allowed to use it here yet since we have to support compilers as old as GCC 3.2 so I don't have that much experience with it. It seems super cool though, especially in a multi-threaded environment.

The boost library states the following about the best practice for shared_ptr:


A simple guideline that nearly eliminates the possibility of memory leaks is: always use a named smart pointer variable to hold the result of new. Every occurence of the new keyword in the code should have the form:
shared_ptr<T> p(new Y);
It is, of course, acceptable to use another smart pointer in place of shared_ptr above


To me, the only different between detail::Wrapper and shared_ptr is that detail::Wrapper uses the reference counting inherent in the OpenCL C API, versus doing its own reference counting.



Also the constructor and destructor of Image2D are using the function retain/release (indirectly by calling the constructor/destructor of Wrapper) so I don't see how shared_ptr could be faster than using cl::Wrapper since anyway the functions of cl::Wrapper will be called by shared_ptr.


Writing a function like the following:


sharer_ptr<cl::Context> GetContext()
{
shared_ptr<cl::Context> p(new cl::Context(...));
return p;
}


Should mean that there is only one cl::Context retain call (actually, it would be implicit in the clCreateContext). The rest of the copying and reference counting in the return of the variable 'p' is done in the context of the shared_ptr object using cl::Context as a pointer (not having to call the cl::Context ctor or dtor at all).

Whereas in the following:


cl::Context GetContext()
{
cl::Context context(...);
return context;
}


There can be several calls (depending on how smart your compiler is as optimizing return by value) to cl::Context retain/release. Theoretically, shared_ptr and the OpenCL retain/release calls should be equivalently performant. Though it is possible that an implementation is using some heavy weight mutexes instead of the cheaper atomics that shared_ptr uses. I'm curious whether anyone has seen this because the above C++ pass by value annoyance is seen in the OpenCL C++ wrappers when cl::Kernel::setArg is called on a cl::Memory object. Since cl::Kernel::setArg is templatized to pass that argument by value there are a few extra retain/release calls incurred that would not be seen in the straight C API. There are ways to optimize this away, just curious if anyone has seen this performance hit.

Tanek
05-27-2010, 10:36 AM
If cl::Wrapper was an implementation of shared_ptr it would not be the parent of all the class of OpenCL or is there something I don't get?


Very likely there's something I'm not getting about shared_ptr as well, we're not allowed to use it here yet since we have to support compilers as old as GCC 3.2 so I don't have that much experience with it. It seems super cool though, especially in a multi-threaded environment.


shared_ptr is available in the tr1 STL since GCC3.1:

http://anteru.net/2008/09/01/260/

So if your requirement is compatibility with 3.1, then you can definitely use shared_ptr.

coleb
05-27-2010, 10:41 AM
shared_ptr is available in the tr1 STL since GCC3.1:

http://anteru.net/2008/09/01/260/

So if your requirement is compatibility with 3.1, then you can definitely use shared_ptr.

Whoa, cool. Now I need to dig through the other compilers (msvc7, xlc, solaris CC, etc). We live in a messy world. :-)

grimm
06-08-2010, 05:38 PM
I'm having some difficulties with the C++ bindings. It's the same issue I was having with the Ruby bindings, when ever I tried to create a new context I would get back a -32 or CL_INVALID_PLATFORM error. I'm using a Nvidia geforce 9400 GPU with the latest drivers for it on a Fedora 10 box. The Ruby bindings author had to fix his code to work with the new CUDA 3.0 version of the Nvidia drivers. Does the C++ binding also need this fix? The context creation works for older drivers, just not the newer ones. I have tried both 195.36.15 and 195.36.24 drivers and they both have this same issue. I have also tried both the rev 48 and the previous version of the C++ bindings with no luck.

Grimm

matrem
06-09-2010, 01:42 AM
You try to create a context just from a type without passing any platform?

grimm
06-09-2010, 09:54 AM
I'm following the C++ OpenCL bindings 1.0 examples and using the following:

cl::Context context(CL_DEVICE_TYPE_GPU, 0, NULL, NULL, &err);

As far as I know, creating the platform first and passing it in is not necessary and I don't have to with the Ruby bindings. I do know that once the Ruby bindings were fixed to support the new driver they work great now. I'm getting the same error with the C++ bindings, so I suspect that it's the same problem.

Grimm

matrem
06-10-2010, 12:09 AM
The problem is, I think, that the ICD loader can't find which driver it have to query, without an ICD object...

In the extension specification (http://www.khronos.org/registry/cl/exte ... hr_icd.txt (http://www.khronos.org/registry/cl/extensions/khr/cl_khr_icd.txt)) we can read :


...
At every OpenCL function call, the ICD Loader infers the Vendor ICD function to call from the arguments to the function.
...
Functions which do not have an argument from which the vendor implementation may be inferred are ignored, with the exception of clGetExtensionFunctionAddress
...

grimm
06-10-2010, 04:57 PM
Thanks Matrem, but I don't know OpenCL enough to determine that level of a problem. :wink: I did go ahead and put in a bug report about this and another issue I'm having with the bindings. I also confirmed that this is an issue by running the hello test program that is in the binding header file and not just an issue with my code. Here is the code I ran to test it if you are interested.


#define __CL_ENABLE_EXCEPTIONS
//#define __NO_STD_VECTOR
#define __NO_STD_STRING

#if defined(__APPLE__) || defined(__MACOSX)
#include <OpenCL/cl.hpp>
#else
#include <CL/cl.hpp>
#endif
#include <cstdio>
#include <cstdlib>
#include <iostream>

const char * helloStr = "__kernel void "
"hello(void) "
"{ "
" "
"} ";

int
main(void)
{
cl_int err = CL_SUCCESS;
try {
cl::Context context(CL_DEVICE_TYPE_CPU, 0, NULL, NULL, &err);

std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();

cl::Program::Sources source(1,
std::make_pair(helloStr,strlen(helloStr)));
cl::Program program_ = cl::Program(context, source);
program_.build(devices);

cl::Kernel kernel(program_, "hello", &err);

cl::CommandQueue queue(context, devices[0], 0, &err);
cl::KernelFunctor func = kernel.bind(
queue,
cl::NDRange(4, 4),
cl::NDRange(2, 2));

func().wait();
}
catch (cl::Error err) {
std::cerr
<< "ERROR: "
<< err.what()
<< "("
<< err.err()
<< ")"
<< std::endl;
}

return EXIT_SUCCESS;
}


Grimm

grimm
06-10-2010, 05:57 PM
Looks like you were correct Matrem. :D I just got a response from my bug report, sounds like they need to update their example code.


This program is not correct, in the case that the ICD extension is supported.
In this case a context must be created with a valid platform, so something
like:

std::vector<cl::Platform> platforms;
err = cl::Platform::get(&platforms);
//check err

// simply pick the first platform
cl_context_properties properties[] = {
CL_CONTEXT_PLATFORM, (cl_context_properties)platforms[0](),
0, 0
};
context = cl::Context(
CL_DEVICE_TYPE_GPU, properties, NULL, NULL, &err);

Closing this as invalid.

bgaster
07-01-2010, 01:32 PM
I am pleased to announce that there is an updated version of the C++ bindings for OpenCL 1.1 that fixes a number of issues, including:

- reference counting
- native kernels

I've also added support for the extension cl_ext_device_fission, which can be enabled by adding:

#define CL_DEVICE_FISSION

before including cl.hpp.

I plan to use this approach for all optional extensions, assuming people are ok with this approach?

cl_platform.h now has definitions for the cl_typeN host types that allow the use of std::vector<cl_typeN> and so on.

There is still a number of open issues that we are working on but with reference counting fixed and cl_platform.h this should account for most of the difficulties.

Stib
09-12-2010, 07:24 AM
Can someone tell me, what happened to the


cl::CommandQueue

cl_int setProperty (cl_command_queue_properties properties, cl_bool enable, cl_command_queue_properties *old_properties=NULL) const
Enable or disable the properties of a command-queue.

I was using it in OCL 1.0, and it is now (OCL 1.1) nowhere to be found...

Or, if it is obsolete now, how can i change the propertys of an alredy existing command queue??

bwatt
09-12-2010, 03:22 PM
It was deprecated in OpenCL 1.1. You now have to release the old one and create a new one.

Stib
09-12-2010, 04:51 PM
I see. I'm not happy, but thanks, for the answer!

eyebex
01-07-2011, 06:19 AM
FYI, I was in need of a patched version of cl.hpp which addressed some issues, esp. this one (http://www.khronos.org/bugzilla/show_bug.cgi?id=254), and I needed it now, and in a public place where others could download it.

So I decided to create an unofficial mirror of the CL headers from Khronos' SVN repository, and as I prefer Git over SVN, I did it here (http://repo.or.cz/w/opencl-headers.git). Based on that read-only mirror, I created a fork (http://repo.or.cz/w/opencl-headers/bugzilla-patches.git) which contains some patches from Bugzilla for the impatient ones like me.

Hopefully some of you find this central location for an unofficially patched version convenient, and would like to contribute.

eyebex
04-18-2011, 11:56 PM
The file at

http://www.khronos.org/registry/cl/api/1.1/cl.hpp

*has* bindings for rev. 48 (OpenCL 1.0). Rev. 33 refers to OpenCL 1.1.

SteveYoungs
06-10-2011, 09:03 AM
Is there any update KernelFunctor ignoring waitEvents issue mentioned in May 2010 in this post

http://www.khronos.org/message_boards/viewtopic.php?p=7215#p7215

I just hit the same problem in my own code before finding the above post. It would be good to get this fixed. I can supply a fix for review if required.

Steve

rothpc
06-27-2011, 08:15 AM
The C++ bindings header (the version from March 3 2010, I guess - the combination of version numbers and dates in the file makes it hard to tell) requires OpenGL headers. Is this necessary? By my reading of the OpenCL 1.1 specification, OpenGL interoperability is an optional extension (see Sections 9.7-9.9). We have started to use the C++ bindings but are finding some users can't build our software because they do not have OpenGL installed. Our software does not use OpenGL.

By my reading of the license in cl.hpp, I can distribute a modified version of the header that either removes or #ifdefs the OpenGL interoperability declarations as long as I keep the license header, but it would be preferable if the "official" version distributed at the Khronos repository either #ifdef'd the OpenGL declarations or moved them to separate headers as is done with the C bindings.

My apologies if this has been discussed before - I didn't see it in this topic.

xls
06-28-2011, 10:48 AM
Is it possible to add constructors for memory objects (Buffers, Images) that take a regular OpenCL cl_mem object? I can see the ref counting to be a possible issue but right now there is no way to treat cl_mem objects created with IHV extensions the same way as those created with the C++ bindings.

Spulenpfeifen
09-20-2011, 10:57 AM
Hi,

does anyone else experience problems with

cl::size_t<3> ret = kernel.getWorkGroupInfo<CL_KERNEL_COMPILE_WORK_GROUP_SIZE>(device);

returning bogus values?

I think there is a bug where the copy constructor of cl::vector<::size_t, N> fails to copy the data on return from getWorkGroupInfo as the vector's empty_ flag is erroneously never set to false.

Someone can confirm?