[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WEBGL_texture_from_depth_video extension proposal



unsigned short 5-6-5 refers to a format where the size of a texel is short (16-bits). It encodes RGB values, where there are 5 bits for red, 6 bits for green and 5 bits for blue.

Taking an unsigned short 16-bit value, and uploading it to an unsigned short 5-6-5 texture, is what's commonly referred to as "packing". You pack some larger piece of data, into multiply reduced precision channels (other forms of packing are things like RGBE, packing a rendered depth to 2 bytes, packing a normal into 2 bytes and so forth).

Packed values are not well behaved for interpolation operations. Interpolation in OpenGL happens at these stages:
In each of these cases, what the GPU tries to do, is to take a value it assumes to be a single atomic piece of numerical data, and mix it with another such piece of data. To demonstrate, let's suppose you have some arbitrary depth value of 1101111010111000 (57016). Let's average this with another one like 0110001001001001 (25161). The average is  1010000010000000 (41088). In 5-6-5 this would be chopped into (11011+01100)/10 = 10011, (110101+010010)/10 = 100011, (11000+01001)/10 = 10000. If you reassemble that to a short you get 1001110001110000 (40048). You will notice that 40048 is not the average between 57016 and 25161.

And that is why you cannot use the aforementioned operations with packed values. Some of these operations do not matter much to the uploaded depth data. You will not use mipmapping because you cannot render to mipmaps in WebGL 1.0, and gl.generateMipmap() may go trough the CPU, which makes it infeasible for video data. You will not blend these values unmodified because you'd likely read them out before blending. You wouldn't anti-alias the raw values and the same applies to alpha to coverage.

However there is one operation that will be frequently used, which is linear magnification interpolation. For instance a common usecase for depth data is to make some or other kind of artsy experiment where you'll offset a mesh by the depth value as well as color it in some way by the depth value. Both of these would have to use nearest, which can be an acceptable choice for the mesh in case the mesh exactly matches the video resolution. However as a full-hd video contains 1920x1080 pixels, the resulting mesh would be over 4 million triangles, which might be a tad on the expensive side. If you use a far smaller mesh, you'll run into problems of aliasing, and so it'd be desirable to average say 4 pixels in the depth texture to get one depth for a vertex. A cheap way to do that is to create a mesh 960x540 (a million triangles) and sample at the center between pixels. Of course that doesn't work on 5-6-5. And so you'd have to sample the 4 surrounding pixels to get an average. Likewise the fragment shader would probably require linear interpolation for magnification for most usecases.

As a sidenote, even if you sample at the centroid for a gl.LINEAR texture, for data that cannot be interpolated, you will get garbage, because interpolation might still be applied and due to floating point rounding error and other precision artifacts you are rarely sampling exactly the spot where the you get no inference from nearby values.

For these reasons, what's likely going to happen with these depth values in practical use, is this:
It'd a rare usecase indeed that somebody would want to directly work with the data as-is.

On Tue, Nov 11, 2014 at 6:49 AM, Ben Adams <thundercat@illyriad.co.uk> wrote:
i.e. would this mean nearest sampling would be required where the mapping between screen pixels and depth camera is not 1:1

(though I don't know if there is sampling types on a depth buffer :)


On 11 November 2014 05:34, Ben Adams <thundercat@illyriad.co.uk> wrote:
Would 5-6-5 cause interpolation issues? Is it and rgb or float texture?


On 10 November 2014 20:33, Florian Bösch <pyalot@gmail.com> wrote:
On Mon, Nov 10, 2014 at 8:43 PM, Kenneth Russell <kbr@google.com> wrote:
It'll only be efficient to upload depth videos to WebGL textures using
the internal format which avoids converting the depth values during
the upload process. That's why UNSIGNED_SHORT_5_6_5 was chosen as the
single supported format for uploading this content to WebGL 1.0. It's
not desirable for either the browser implementer or the web developer
to support uploading depth videos to lots of random texture formats if
they won't be efficient. The Media Capture group should comment on
what formats depth cameras tend to output, and are likely to output in
the future.

I think it's demonstratable that conversion between formats is reasonably efficient if it can be done on-GPU, which is something that's just about getting done for <video> now.

The reason I'm not in favor of fixing this to ushort 5-6-5 is because it is quite often the case that an app developer would want something else to use. So for instance because you cannot interpolate 5-6-5 that's been bastardized to hold a single depth value, you'd then proceed to write your own framebuffer to decode it to say, byte, int, float or what have you. Likewise, 5-6-5 smells smack of an internal format, that's liable to change with whoever's putting out the next depth capture device, and so, latest by that point, you'll be converting something like say, a floating point depth TO 5-6-5, which would be more than a little ironic.