[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Public WebGL] Invalid string parameters in WebGL APIs accepting DOMStrings


A number of WebGL APIs accept string parameters as DOMString: getExtension(), bindAttribLocation(), getAttribLocation(), getUniformLocation(), and shaderSource().  Several of these functions have corresponding getters either directly (such as shaderSource() / getShaderSource()) or indirectly (a uniform name is specified by shaderSource() and then queried later by getUniformLocation()).  Unfortunately, WebGL does not specify any encoding or charset requirements for these functions which leads to some inconsistencies in current implementations and some more serious hypothetical concerns.

The OpenGL ES 1.00 specification version 17 defines the source character set for shaders as a subset of ASCII and defines the source string as sequence of characters from this set ((http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf, sections 3.1 and 3.2).  Current WebGL implementations do not appear to strictly enforce this character set.  In Minefield 4.0b8pre and Chrome 9.0.597.10 on linux I can successfully compile a shader that contains characters in comments from the ASCII set but outside the allowed range, the extended ASCII set, unicode but not in ascii, and unmatched surrogate pairs.  The shader source round-trips inconsistently, though - in Minefield the shader source string does not round trip through shaderSource()/getShaderSource() losslessly if the source string contains characters outside of ASCII although it does round trip characters in ASCII that are not in the set allowed by OpenGL ES.  Additionally it appears that at least some of this validation is being performed by the underlying GL implementation, not by the WebGL bindings layer, so the behavior might vary depending on the exact driver implementation.

I propose that we adopt OpenGL ES's allowed character set definition and generate an INVALID_VALUE error if a specified DOMString contains any characters outside the set to ensure consistent behavior and to be extra sure that we don't pass data to lower-level systems that they may not expect.  Proper unicode handling is tricky business and can often lead to subtle bugs.  For example, if code confuses the character count of a string with the number of bytes in the string memory access errors can be introduced that will not be exposed by testing that uses only ASCII characters.  Some operations on unicode strings are lossy and map multiple inputs to the same output which could confuse validation logic.  Some drivers may simply not expect to receive non-ASCII data at all.

For reference, DOMString is defined by WebIDL to map closely to the _javascript_ string type: http://www.w3.org/TR/WebIDL/#idl-DOMString.  A DOMString is composed of a sequence of unsigned 16 bit numbers that typically represent a UTF-16 encoded string.  However, DOMString does not require that the contents be a valid UTF-16 string and it's quite possible to construct strings in _javascript_ that are not valid UTF-16.  For example the snippet "a𐅃c".substring(0,2) will produce a string containing an unmatched surrogate pair.  This is significant in that it makes some safe-seaming transformations (like converting a DOMString to UTF-8) non-invertible.

- James