|This seems sensible, although it's probably actually disallowed by the spec already. The WebGL spec does mandate strict adherence to the GLSL ES 1.0 spec, which disallows those characters, so it seems that this is actually just a bug in the shader validation.|
On Dec 10, 2010, at 5:44 PM, James Robinson wrote:
A number of WebGL APIs accept string parameters as DOMString: getExtension(), bindAttribLocation(), getAttribLocation(), getUniformLocation(), and shaderSource(). Several of these functions have corresponding getters either directly (such as shaderSource() / getShaderSource()) or indirectly (a uniform name is specified by shaderSource() and then queried later by getUniformLocation()). Unfortunately, WebGL does not specify any encoding or charset requirements for these functions which leads to some inconsistencies in current implementations and some more serious hypothetical concerns.
The OpenGL ES 1.00 specification version 17 defines the source character set for shaders as a subset of ASCII and defines the source string as sequence of characters from this set ((http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
, sections 3.1 and 3.2). Current WebGL implementations do not appear to strictly enforce this character set. In Minefield 4.0b8pre and Chrome 9.0.597.10 on linux I can successfully compile a shader that contains characters in comments from the ASCII set but outside the allowed range, the extended ASCII set, unicode but not in ascii, and unmatched surrogate pairs. The shader source round-trips inconsistently, though - in Minefield the shader source string does not round trip through shaderSource()/getShaderSource() losslessly if the source string contains characters outside of ASCII although it does round trip characters in ASCII that are not in the set allowed by OpenGL ES. Additionally it appears that at least some of this validation is being performed by the underlying GL implementation, not by the WebGL bindings layer, so the behavior might vary depending on the exact driver implementation.
I propose that we adopt OpenGL ES's allowed character set definition and generate an INVALID_VALUE error if a specified DOMString contains any characters outside the set to ensure consistent behavior and to be extra sure that we don't pass data to lower-level systems that they may not expect. Proper unicode handling is tricky business and can often lead to subtle bugs. For example, if code confuses the character count of a string with the number of bytes in the string memory access errors can be introduced that will not be exposed by testing that uses only ASCII characters. Some operations on unicode strings are lossy and map multiple inputs to the same output which could confuse validation logic. Some drivers may simply not expect to receive non-ASCII data at all.