In the SIGGRAPH BOF presentation it is stated that "Writes to 3D images is now a core feature" and in the press release above "Improved image support including sRGB images and 3D image writes". However, in the references pages for OpenCL 2.0, it still states that writes to 3D images is an optional feature "Writes to a 3D image memory object are only supported when the cl_khr_3d_image_writes extension is implemented".
I find this confusing. Will writes to 3D images be available as a core feature and not just through an extension?
I truly hope that writes to 3D images will be mandatory as it gives huge performance gains in medical image processing field where we mostly work on 3D images. Currently, only AMD and Intel supports this feature on their GPUs.
I'd like a name for the architecture and sub-architecture via clGetDeviceInfo. Optionally also Series, Type and Version to split up the now useless CL_DEVICE_NAME. For CUDA you have compute capability, but on OpenCL you need have to have a lookup-table. For AMD you need an alike lookup-table as you get only a code-name, which is sort of a sub-architecture.
This information is very useful to support the decision which optimisation(s) can be used. For this we now only have information about the local memory, global mem cache, etc.
Writes to 3D images is a core feature in OpenCL 2.0. This looks like a bug in the reference pages.
Originally Posted by smistad
The 24-bit depth / stencil format is supported as an optional extension. These are described in the OpenCL 2.0 extension specification (refer to section 9.11)
I would really like a way to uniquely identify devices, particularly in a way that would allow identifying between platforms. What I am working on essentially is a system to distribute different tasks (unrelated processes working on different problems) between all devices in a system. However the same device could still be available on multiple platforms, and I would like to avoid scheduling a task on 2 different platforms that are using the same physical device. For example, a common and easy to detect without a feature would be if you have both the AMD and Intel platforms installed, both will expose the same physical CPU device. In that particular case it is easy to detect, but more difficult cases arise with multiple GPUs. What I want is the PCIe ID, or some device hash that would be based on it for PCIe devices for uniquely identifying a device within the system.
Just a quick note that I have read the OpenCL 2.0 specification and have made extensive notes to provide feedback. I am currently writing it up, and it will take me a week or so to compile all of my notes into something easily read. I haven't seen much activity on this topic, and I wanted to reassure committee members that feedback is coming, so don't pass the spec yet!
There's a discrepancy in the OpenCL C Specification document (version 11):
- Paragraph 6.5.1 (global) states that: "Variables defined at program scope (...) can also be declared in the global address space."
- Paragraph 6.5.3 (constant) states that: "Variables in the program scope must be declared in the __constant address space."
awesome, thank you!
Originally Posted by affie
I hope it gets at least AMD/NV support
I don't like the SVM mechanism you proposed in the spec... because:
1. The existence of clSVMAlloc() sounds like a bad idea. The whole point of shared virtual memory(SVM) should be to REUSE an existing void* pointer externally-allocated using C/C++ malloc(), VirtualAlloc(), etc...
Some libraries have their own memory manager and they are closed-source. How would I call clSVMAlloc() then?
Also, clSVMAlloc() requires a CL context which usually is linked to an specific cl_platformId and cl_deviceId set. Now, imagine I alloc a C/C++ struct and I want to share it to ALL the CL's platforms, contexts and devices...
It would be impossible to share because I should create a void* pointer for each context with clSVMAlloc() ... what if create a context per device in order to launch many kernels from different threads? The resource could not be shared !
2. To pass the CL_MEM_READ_WRITE/CL_MEM_READ_ONLY, etc... at creation time in clSVMAlloc() may be not a good idea: I would like to use the resource as readOnly for some kernels while writeOnly for others...
The read/write specification should be done at launch/execution time, not at creation time !
3. clSetKernelArgSVMPointer() is a bit confusing... why I must use a different function to pass kernel's arguments?
Why not to use clSetKernelArg() as you do with other params?
To resume: I don't like the mechanism you specified. I propose this instead:
1. Let the user to allocate bytes using his own traditional C/C++ malloc/free calls (with a 4K-aligned requisite perhaps ).
2. Create the __svmRead__, __svmWrite__ and __svmReadWrite__ attributes and use them in the kernel's args:
void myKernel ( const __svmRead__ float3 *input1, /* "input1" will be marked CL_MEM_READ_ONLY due to "const" and svmRead */
__svmWrite__ float *output1, /* "output1" will be marked CL_MEM_WRITE_ONLY */
__svmReadWrite__ float *output2 ) /* "output2" will be marked CL_MEM_READ_WRITE */
Also, it would be a good idea to add some "cached" ones like __svmReadCached__ and __svmReadNonCached__, so the user could control much better the need of hardware caches.
That's much simpler and flexible... and you won't need the clSetKernelArgSVMPointer(), clSVMAlloc() and clSVMFree() functions at all.
Last edited by pelotoescogorciao; 09-09-2013 at 01:30 PM.
1. Also, the 2.0 spec is lacking a very important feature we're demanding for ages... The existence of a flag to disable the GPU driver's watchdog.
Currently, if a kernel takes more than 3-5 seconds it's aborted and the graphics driver is reset ><
You simply cannot predict the time some tasks gonna take, because they are very branched or because your app is designed to run over very heterogeneous hardware with very different speeds ( like a Geforce 8500GTS vs a Titan ).
If have a very time consuming task that I cannot split effectively, I should get a way to indicate the driver I don't want a stop by the watchdog omg!
So, please, add a CL_LONG_EXECUTION_TIME_KERNEL flag to clEnqueueNDRangeKernel() and clEnqueueTask().
2, Also, would be good to add a flag to indicate if the CL is a SoC/APU using shared memory, so we can plan better the buffer's flags to use.
So, pls, add a CL_DEVICE_INTEGRATED flag to clGetDeviceInfo() as CUDA does.
3. Finally, please, add compression formats to images... so we can use texture compression
Last edited by pelotoescogorciao; 09-09-2013 at 01:30 PM.