Khronos public bugtracker – Bug 875
Validator Design and Implementation
Last modified: 2013-11-14 10:33:27 PST
Bug to keep track of WebCL Validator Design and Implementation.
After reviewing the response from Vincit to the RFQ, and the proposed implementation, a clarification concerning built-ins is needed
The easiest question to answer would be,
What build-in functions should not be supported?
Here are some general notes about the current design and implementation. Feel free to ask for more details and clarifications.
1. The validator is implemented as a Clang tool because:
- access to internal Clang interfaces is required
- it's useful to have a separate executable binary for validation
2. The validator can't be implemented without changes to Clang because not all useful aspects of Clang can be extended or reused from the tool. The required modifications simply add some missing OpenCL related features (e.g. image access qualifiers) or apply minor changes into existing functionality (e.g. printing of AST).
During development we need a Clang repository with a branch for our modifications. We expect the changes to be upstreamable so that ultimately the validator can be built against standard Clang repository.
3. Clang provides good interfaces for traversing ASTs and therefore we don't expect major problems in finding nodes that need to be manipulated.
4. Doing the source to source transformation can be done in a couple of ways. One way is to use rewriters to manipulate source code as text. A better way is to manipulate AST nodes before printing the tree as source code.
Unfortunately Clang doesn't provide any stable interfaces for manipulating ASTs. This is an area that requires more investigation. There are internal interfaces, such as TreeTransform, for manipulating ASTs, but those have been developed for a certain specific purpose and not for general use. However, TreeTransform seems promising and it can probably be used directly for certain transformations. Some manipulations may need to be done manually.
5. To sum up, the validator tool performs the following phases:
- preprocessing callbacks (e.g. check includes and extension pragmas)
- AST visitors to check WebCL restrictions on valid OpenCL AST
- AST visitors to check nodes that need to be changed (e.g. pointer accesses)
- AST manipulation (e.g. pointer access checks)
- AST visitor to print the tree as source code
(In reply to comment #2)
> 4. Doing the source to source transformation can be done in a couple of
> ways. One way is to use rewriters to manipulate source code as text. A
> better way is to manipulate AST nodes before printing the tree as source
> Unfortunately Clang doesn't provide any stable interfaces for manipulating
> ASTs. This is an area that requires more investigation. There are internal
> interfaces, such as TreeTransform, for manipulating ASTs, but those have
> been developed for a certain specific purpose and not for general use.
> However, TreeTransform seems promising and it can probably be used directly
> for certain transformations. Some manipulations may need to be done manually.
I'm not aware of the modifications after clang 3.2 but the AST was designed to be immutable,althought replace one node by another is not an issue, nevertheless when you try more modifications, the problem becomes a little harder: a good case is dangling pointers-> I don't know how you will handle that but if I suppose that you are planning to use something like SSA, heavy modifications will be made in the AST and I think that it will be hard to keep its integrity.
So, what is the cost of using your own intermediate representation constructed from clang AST? (it is an open question, but I fear the cost is too high for that solution)
Do we have contacts with clang devs to see which approach should be used or the hooks which will allow us to do the job?
Created attachment 142 [details]
Could you please provide some more details on why modifying the AST is favored over rewriters? Specific examples would help,
There was an excellent reference made on the Clang mailing list concerning an opensource project in Germany that performs Clang-based AST manipulation and rewriting of the AST to source.
The group seems very knowledgeable and could be a good source to explore, especially considering the source-code is open-source.
(In reply to comment #5)
> Could you please provide some more details on why modifying the AST is
> favored over rewriters? Specific examples would help,
We hadn't yet evaluated rewriting and AST manipulation properly when I wrote comment 2. We had just wrote some simple AST manipulation cases, which were easy to do, but later realized that we don't necessarily want to do any complex manipulations.
We are currently writing example test cases to see how input sources should be transformed and will choose the rewriting method if we don't see any problems with that approach. So we have pretty much changed our mind regarding rewriting and AST manipulation.
(In reply to comment #6)
> There was an excellent reference made on the Clang mailing list concerning
> an opensource project in Germany that performs Clang-based AST manipulation
> and rewriting of the AST to source.
> The group seems very knowledgeable and could be a good source to explore,
> especially considering the source-code is open-source.
We've already checked out that code as well and it was the main reason for being so optimistic about AST manipulations in the first place. However, we'll chose the rewriting method unless analysis of test cases shows that we will face problems with rewriting.
Thanks a lot for your input Steven.
Here is updated weekly worst case schedule with single developer:
week 21 13-05-20: Trivial read-write Clang pass which traverses through AST tree and writes it back to source code.
week 22 2013-05-27: --
week 23 2013-06-03: Tool which reads in WebCL code and checks that restrictions mentioned in (25.4.2013 WebCL draft chapter 7).
week 24 2013-06-10: Tool also understand special WebCL types. Unit tests case to verify that each restriction is checked. Tool works with WebCL input, which does not have any built-in calls.
week 25 2013-06-17: ---
week 26 2013-06-24: ---
week 27 2013-07-01: ---
week 28 2013-07-08: Tool with improved capability to collect all required information from input source to be able to add instrumentation in next phase.
week 29 2013-07-15: ---
week 30 2013-07-22: ---
week 31 2013-07-29: ---
week 32 2013-08-05: ---
week 33 2013-08-12: ---
week 34 2013-08-19: ---
week 35 2013-08-26: ---
week 36 2013-09-02: Tool with improved ability to output instrumented OpenCL compatible source code.
week 37 2013-09-09: ---
week 38 2013-09-16: ---
week 39 2013-09-23: ---
week 40 2013-09-30: Final delivery
We are internally trying to have the final delivery already during August to be sure to meet the deadline. Currently with features we are in somewhere between weeks 26 and 29. We have finished prototype, which reads OpenCL code in and does some preprocessor validating and finds out some simple cases where checks are needed (e.g. indexing). Basically we have all the pieces together for final implementation.
This week we have been taking a head start TDD:ish way to Milestone 3 by writing non trivial test case (Radix sort algorithm from a OpenCL text book) instrumentation by hand. We verify that it does all needed memory checks and runs correctly. After that we start implementing the functionality to generate the same kind of checks automatically. This will be the main automated test case for verifying correct functionality.
Small discussion on the Clang forum concerning mutable AST,
The WebCL Validator project is being developed in open source, and may be accessed from:
To facilitate broad adoption of the Validator by browser vendors, it is recommended that the following benchmarks be collected, to get a sense of any performance penalties. This will also help in performance tuning the validator implementation:
1. Overall cost of instrumentation (relative to un-instrumented code)
2. Cost of global memory initialization (done by validator)
3. Cost of private and local memory initialization (done by validator)
4. Overhead (runtime) resulting from execution of code added/instrumented, to prevent out of bounds memory accesses
5. Overhead of static analysis
With respect to 1, the overall cost of instrumentation is dependent on the amount of private memory required, the amount of memory reads versus computation, and static analysis would too be dependent on a specific kernel. Therefore, this is a little tough to define,
In respect to 2, Fabien pointed out it is extremely trivial to do the global memory initialization during the allocation phase in the browser. Having the validator instrument host-side code could be problematic, and because of the trivialness, if implemented by the underlining WebCL implementation in the browser, it seems to make sense to leave global-memory initialization out of the validator work.
One could do some micro-benchmarks looking at various sizes of private, shared, and global memories and provide a rather linear graph, with respect to size, that would show performance degradation. What is also difficult is static analysis, cause where does one draw the line? I think the simplest, kernel agnostic, analysis should be there but not much more then that. However, a more careful review is required to see the level of analysis that can be performed in the limited amount of time the validator has to run. Given that, what does one believe is a reasonable amount of time to run? Remember, there is often more than one kernel needing instrumentation. Then again, the time is dependent on the underlining hardware too.
> In respect to 2, Fabien pointed out it is extremely trivial to do the
> global memory initialization during the allocation phase in the browser.
> Having the validator instrument host-side code could be problematic, and
> because of the trivialness, if implemented by the underlining WebCL
> implementation in the browser, it seems to make sense to leave global-memory
> initialization out of the validator work.
Should it be another bullet in a section of the WD (like a so dreamed section "Suggestion to implementers" - ok the name is bad, but you got the idea).. Otherwise, it would be be unclear to implementers that it should be done in the browser side.
(In reply to comment #11)
> To facilitate broad adoption of the Validator by browser vendors, it is
> recommended that the following benchmarks be collected, to get a sense of
> any performance penalties. This will also help in performance tuning the
> validator implementation:
1. Overall cost of instrumentation (relative to
> un-instrumented code)
2. Cost of global memory initialization (done by
3. Cost of private and local memory initialization (done by
4. Overhead (runtime) resulting from execution of code
> added/instrumented, to prevent out of bounds memory accesses
5. Overhead of
> static analysis
I wanted to clarify, that even though we currently have the memory initializtion
OpenCL 1.2 extension (which provides local and private memory initialization), we have requested Vinsit to add initialization of local and private memory, in addition to global memory initialization. We want to ensure that even if a vendor does not have the OpenCL 1.2 memory initialization extension supported, the validator will be able to help prevent memory leakage, through memory initialization.
(In reply to comment #13)
> Should it be another bullet in a section of the WD (like a so dreamed
> section "Suggestion to implementers" - ok the name is bad, but you got the
> idea).. Otherwise, it would be be unclear to implementers that it should be
> done in the browser side.
The WG would have to reach an agreement on this first, ie where should the global-memory init take place. Currently, it is on the table, another proposed idea is to require the validator to do the global-memory init. The latter may have been the plan all along, but given the trivialness to do in the implementation side it seems like a better choice now.
(In reply to comment #15)
> (In reply to comment #13)
> Should it be another bullet in a section of the
> WD (like a so dreamed
> section "Suggestion to implementers" - ok the name
> is bad, but you got the
> idea).. Otherwise, it would be be unclear to
> implementers that it should be
> done in the browser side.
The WG would
> have to reach an agreement on this first, ie where should the global-memory
> init take place. Currently, it is on the table, another proposed idea is to
> require the validator to do the global-memory init. The latter may have been
> the plan all along, but given the trivialness to do in the implementation
> side it seems like a better choice now.
[Tasneem] My recommendation would be to include global memory initialization as part of the work done by the validator, instead of recommending that the developer initialize the global memory. This can be visited after the kernel analysis and instrumentation work (as noted in the current agreement with Vincit) is completed.
As a background to members who may not have been part of the earlier security related discussions, for memory initialization, the WebCL working group looked at all the options available to WebCL. We proposed as extensions to the OpenCL WG, features which the working group felt had to be addressed by OpenCL, either because addressing it in WebCL would have been inefficient or infeasible. Since local and private memory initialization was not feasible through WebCL, we proposed it as an extension to OpenCL WG, with the agreement from WebCL members, that the responsiblity of global memory initialization would be that of WebCL (this may be enforced by the validator, or the browser).
We can stage the memory initialization enforcement as follows.
1. The validator queries device for support of the OpenCL memory initialization extension, if it is not supported by the underlying OpenCL, then the validator initializes local and private memory.
2. Once Vincit completes the work currently noted in the contract, then we can address the additional work of global memory initialization by the validator, which would require updates to the host side WebCL code.
We updated the design documentation regarding preprocessing and command line options:
Validator needs to know device extensions in order to process correct #ifdef branches in the input source. The properties can be asked from a local OpenCL driver if the device name is passed as a command line option. Dependency to OpenCL driver is avoided if the extensions are passed directly as command line options.
We should have 3 weeks remaining for finishing the validator tool. This should be enough for delivering a tool that matches original requirements. However, I'm afraid we won't be able to implement any extra tasks within that timeframe.
How detailed schedule would you like to see for the rest of the project? Shall we list all of the remaining tasks and a time estimate for each of them? We'll construct the schedule tomorrow.
Generally the hardest thing to estimate is time used for debugging Clang. At the moment the biggest problems with Clang are:
- debugging command line option parsing
- debugging recursive rewriter usage
We updated the issue list at github with tasks that need to be done for WebCL validator delivery. The list includes also some optional tasks.
A rough estimate for the remaining tasks is 25 days. This doesn't include host side modifications and optional tasks.
In section 3.2.4 of the WebCL Kernel Validator Phase 2b document:
"3.2.4 Make certain run-time checks more friendly for autovectorization
Currently all checks are done by checking that address is inside certain min, max range. This is pretty hard to understand for autovectorization.
If we can trace base address to one limits, which is the case for all private memory and usual case with local memory accesses we can also change address clamping to use % operand instead. This should make job of autovectorizers easier if user does all memory accessing with index operations"
Instead of using the % operator, it might be more efficient to use: index = (index UnsignedLessThan size) ? index : 0
The following Validator APIs were identified during the WebCL-F2F at Samsung earlier this month,
If there are any further comments, or suggestions, please detail them here.
Typedef void* ProgT
// Valid pnames would be IS_VALID, VALIDATION_LOG, KERNEL_COUNT
ValidateProgram (const char * source, const char *const * activeWebCLExtList, const char * buildOptions);
bool isProgramValid (ProgT prog);
size_t getProgramLog (ProgT prog, size_t maxLen, char* buffer);
size_t getProgramValidatedSource (ProgT prog, size_t maxLen, char *buffer);
size_t getProgramValidatedSPIR (ProgT prog, size_t maSize, void *buffer);
size_t getKernelCount (ProgT prog);
size_t getKernelName(ProgT prog, size_t kernelID, size_t maxLen, char* buffer);
size_t getKernelArgCount(ProgT prog, size_t kernelID);
size_t getKernelArgInfo(ProgT prog, size_t kernelID, size_t argID, (enum) pname pnameVal, size_t retValSize, void* retVal);
void destroyProgram(ProgT prog);
There has been a lot of discussion, cross bug talk, and it was agreed that this bug would be closed, given verboseness of thread, etc.
For Validator Library API talk, please see the following issue logged against the WebCL-Validator khronos git repo.
Please open a new bug if you would like to carry on discussing a specific point from this thread and include a link to this bug.