Official SYCL 1.2 Provisional feedback thread
March 19, 2014 – San Francisco, Game Developer’s Conference – The Khronos™ Group today announced the release of SYCL™ 1.2 as a provisional specification to enable community feedback. SYCL is a royalty-free, cross-platform abstraction layer that enables the development of applications and frameworks that build on the underlying concepts, portability and efficiency of OpenCL™, while adding the ease-of-use and flexibility of C++. For example, SYCL can provide single source development where C++ template functions can contain both host and device code to construct complex algorithms that use OpenCL acceleration - and then enable re-use of those templates throughout the source code of an application to operate on different types of data.
The SYCL 1.2 provisional specification supports OpenCL 1.2 and has been released to enable the growing community of OpenCL developers to provide feedback before the specification is finalized. The specification and links to feedback forums are available at: www.khronos.org/opencl/sycl.
While SYCL is one possible solution for high-level parallel programming that leverages C++ programming techniques, the OpenCL group encourages innovation in diverse programming models for heterogeneous systems, including building on top of the SPIR™ low-level intermediate representation, using the open source CLU libraries for prototyping, or through custom techniques.
“Developers have been requesting C++ for OpenCL to help them build large applications quickly and efficiently and there are lots of useful C++ libraries that want to port to OpenCL,” said Andrew Richards, CEO at Codeplay and chair of the SYCL working group. “SYCL makes this possible and we are looking forward to the community feedback to help drive the final release and future roadmap. We are especially keen to work with C++ library developers who want to accelerate their libraries using the performance of OpenCL devices.”
SYCL 1.2 Features
SYCL 1.2 will enable industry innovation in OpenCL-based programming frameworks:
- API specifications for creating C++ template libraries and compilers using the C++11 standard;
- Easy to use, production grade API that can be built on-top of OpenCL and SPIR;
- Compatible with standard CPU C++ compilers across multiple platforms, as well as enabling new SYCL-based device compilers to target OpenCL devices;
- Asynchronous, low-level access to OpenCL features for high performance and low-latency – while retaining ease of use;
- Khronos open royalty-free standard - to guarantee ongoing support and reciprocal IP coverage;
- OpenGL® Integration to enable sharing of image and textures with SYCL as well as OpenCL;
- Development in parallel with OpenCL – future releases are expected to support upcoming OpenCL 2.0 implementations and track future OpenCL releases.
An Overview of SYCL 1.2
OpenCL DevU at GDC 2014
Last edited by khronos; 03-19-2014 at 06:49 AM.
Reason: Added links to events and landing page
Going through the specs slowly. Very high level feedback is that we need more examples. I already mentioned this to Andrew on twitter.
Also, compilation workflow is really unclear.
1. Let us say I have my favourite C++11 compiler installed. GCC, VS2013 whatever. Let's say I do NOT have any other compilers installed, nor any OpenCL drivers and just want to compile SYCL code to (parallel) native code using my native compiler. I guess this will require you to release some header files so that classes such as cl::sycl:buffer are understood by the C++ compiler to allow it to generate CPU code. This will be useful for development at least and for porting code to platforms where OpenCL drivers are not available (eg: WinRT).
Will this supported? If so, how do things work? Should we expect a royalty-free solution for this?
2. Single-source SYCL compilers are easy to understand. These will take all your source files, including both regular C++ and SYCL, and generate a single binary containing both host and device code. What about multi-compiler solutions mentioned? Are those solutions likely to look like, say, nvcc? I.e. compiling device code itself and inserting any required glue code for host, and then pass all the original host code as well as generated host code to available C++ compiler such as gcc, vs etc.?
Good questions, thankyou
Any SYCL implementation is require to support execution of any code on the host CPU using just the host compiler as well as execution of device code on one or more OpenCL devices. A host-only implementation would not be conformant, but you could use a conformant implementation of SYCL to run code only on host.
SYCL is a royalty-free standard. Whether a specific implementation has licensing terms requiring payment or royalties is up to individual implementers.
How SYCL is compiled is not actually defined in the spec. This was a deliberate decision to allow implementers freedom. However, an implementation could operate like this:
You compile your source file with a SYCL device compiler and it produces a header file containing the compiled kernel and implementation-specific glue code to invoke the kernel on an OpenCL device. E.g. mysyclcompiler mysourcefile.cpp -omysyclheader.h
Then you could compile the same source file with your host compiler and tell it where the compiled kernel header is. E.g. gcc -c -DSYCLHEADER="mysycleheader.h" mysourcefile.cpp
The sycl header files and runtime sort out the rest.
Alternative approaches would still be valid
Thank you for your feedback.
More examples would definitely help in describing the features of the SYCL specification and this is something that we are currently looking into. We will shortly be posting a series of blogs on the Codeplay website that will be aimed at describing the SYCL programming model and the available work flow solutions as well as providing more practical examples.
A couple of typographical issues:
p.14: "For a kernel to access local memory on a device, the user can either create a dynamically-sized local accessor object to the kernel as a parameter." -- typically "either" is followed by an "or", and also "to the kernel as a parameter" seems like it is missing something ahead of it.
p.75: "the device." is hanging there by itself with blank space above it. It seems like something is missing prior to it.
Thanks, we will have a look at these.
I am happy to see something like SyCL develop. The question of how to we best program accelerators is still unanswered. I doubt there is an universal answer at all. The more things we try, the closer we will get to a satisfying solution. So I applaud your efforts. I read the provisional specification and I have a few questions and comments:
* SyCL is not something that can be implemented as a standard C++ library but is a compiler extension or an additional compiler, not unlike C++ AMP is that correct?
* command_group: this concept seems to try to fuse memory transfer and compute together; with command_groups are things like pipelines and double-buffering still possible? How would one go about implementing overlapping copy and compute using the command_group concept?
* The accessor seems interesting - it's actual usefulnesses can best be assessed once we can implement code using SyCL: when can we expect a working prototype? I really dislike the name "accessor" though. C++ AMP calls this an array_view which is a lot nicer.
* I dislike the name of the queue concept; it is too generic and usually means something completely different. I know there is the namespace but still. I obsessed about the exact name for the thing that is a stream or a command_queue and came up with the concept of a 'feed' in my GPU library Aura.
Do I take it correctly from the specs, that now structs or classes can be passed to kernels? Or what exactly is the difference in capturing variables and passing them as lamdba parameters?
Yes, SYCL allows you to pass any struct that is POD and doesn't contain pointers.
Variables that are captured by the lambda are kernel arguments, these can be accessors, samplers and POD data types which don't contain pointers. The lambda parameters are specific types that are constructed within the kernel that are used to give host/device compatible access to the current work item's id information. For example, the parallel_for API takes an item object as the lambda's parameter.
Is there any way to get early access to the reference implementation done by Codeplay? Or at least get an estimate, as to when it will be available? (Although I would be glad to give it a spin)
Tags for this Thread