unsigned short 5-6-5 refers to a format where the size of a texel is short (16-bits). It encodes RGB values, where there are 5 bits for red, 6 bits for green and 5 bits for blue.
Taking an unsigned short 16-bit value, and uploading it to an unsigned short 5-6-5 texture, is what's commonly referred to as "packing". You pack some larger piece of data, into multiply reduced precision channels (other forms of packing are things like RGBE, packing a rendered depth to 2 bytes, packing a normal into 2 bytes and so forth).
Packed values are not well behaved for interpolation operations. Interpolation in OpenGL happens at these stages:
- magnified texture lookups due to interpolation
- minified texture lookups due to mipmapping and anisotropy
- blending (when outputting packed values)
- anti-aliasing (when outputting packed values)
- alpha to coverage
In each of these cases, what the GPU tries to do, is to take a value it assumes to be a single atomic piece of numerical data, and mix it with another such piece of data. To demonstrate, let's suppose you have some arbitrary depth value of 1101111010111000 (57016). Let's average this with another one like 0110001001001001 (25161). The average is 1010000010000000 (41088). In 5-6-5 this would be chopped into (11011+01100)/10 = 10011, (110101+010010)/10 = 100011, (11000+01001)/10 = 10000. If you reassemble that to a short you get 1001110001110000 (40048). You will notice that 40048 is not the average between 57016 and 25161.
And that is why you cannot use the aforementioned operations with packed values. Some of these operations do not matter much to the uploaded depth data. You will not use mipmapping because you cannot render to mipmaps in WebGL 1.0, and gl.generateMipmap() may go trough the CPU, which makes it infeasible for video data. You will not blend these values unmodified because you'd likely read them out before blending. You wouldn't anti-alias the raw values and the same applies to alpha to coverage.
However there is one operation that will be frequently used, which is linear magnification interpolation. For instance a common usecase for depth data is to make some or other kind of artsy experiment where you'll offset a mesh by the depth value as well as color it in some way by the depth value. Both of these would have to use nearest, which can be an acceptable choice for the mesh in case the mesh exactly matches the video resolution. However as a full-hd video contains 1920x1080 pixels, the resulting mesh would be over 4 million triangles, which might be a tad on the expensive side. If you use a far smaller mesh, you'll run into problems of aliasing, and so it'd be desirable to average say 4 pixels in the depth texture to get one depth for a vertex. A cheap way to do that is to create a mesh 960x540 (a million triangles) and sample at the center between pixels. Of course that doesn't work on 5-6-5. And so you'd have to sample the 4 surrounding pixels to get an average. Likewise the fragment shader would probably require linear interpolation for magnification for most usecases.
As a sidenote, even if you sample at the centroid for a gl.LINEAR texture, for data that cannot be interpolated, you will get garbage, because interpolation might still be applied and due to floating point rounding error and other precision artifacts you are rarely sampling exactly the spot where the you get no inference from nearby values.
For these reasons, what's likely going to happen with these depth values in practical use, is this:
- upload the depth to 5-6-5
- decode the depth to some interpolatable format
- use the depth data
It'd a rare usecase indeed that somebody would want to directly work with the data as-is.