[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WEBGL_dynamic_texture redux

On Tue, Nov 20, 2012 at 1:19 AM, Mark Callow <callow.mark@artspark.co.jp> wrote:
> Sorry for my long delay replying. Thanks for the feedback. Answers in-line.

No worries. Thoughtful exchange is preferable to hurried response,
IMHO, if the participants are diligent in their duties.

> On 2012/11/05 10:16, David Sheets wrote:
>>   //
>>   // connectVideo
>>   //
>>   // Connect video from the passed HTMLVideoElement to the texture
>>   // currently bound to TEXTURE_EXTERNAL_OES on the active texture
>>   // unit.
>>   //
>>   // First a wdtStream object is created with its consumer set to
>>   // the texture. Once the video is loaded, it is set as the
>>   // producer. This could potentially fail, depending on the
>>   // video format.
>>   //
>>   // interface wdtStream {
>>   //   enum state {
>>   //     // Consumer connected; waiting for producer to connect
>>   //     wdtStreamConnecting,
>>   //     // Producer & consumer connected. No frames yet.
>>   //     wdtStreamEmpty,
>>   //     wdtStreamNewFrameAvailable, // when does this state occur?
>>   //     wdtStreamOldFrameAvailable, // when does this state occur?
> NewFrameAvailable occurs when the producer puts a new frame into the stream
> transitioning from either Empty or OldFrameAvailable. The state becomes
> OldFrameAvailable after acquireImage, when the previous state was
> NewFrameAvailable.
>>   //     wdtStreamDisconnected
>>   //   };
>>   //   // Time taken from acquireImage to posting drawing buffer; default
>> 0? // units? microseconds?
> Yes.
>>   //   readonly int consumerLatency;
>>   //   // Frame # (aka Media Stream Count) of most recently inserted frame
>>   //   // Value is 1 at first frame.
>>   //   readonly int producerFrame;
>>   //   // MSC of most recently acquired frame.
>>   //   readonly int consumerFrame;
>>   //   // timeout for acquireImage; default 0
>>   //   int acquireTimeout; // units? microseconds?
> Yes.
>>   //   // readonly int freq; ? do videos get one per channel? only max
>> frequency of all media streams?
> What would you use this for? Without knowing the use case, the obvious
> answer is the frequency (framerate) of the producer video stream. However
> several modern video formats do not have a fixed framerate.

I would use this for relating frame counts to
latencies/timeouts/clocks. Specifically, without knowing the inherent
maximum frequency of the producer, the consumer must guess its frame
budget initially and back-off causing A/V glitches. If the maximum
frequency of the stream is known from the beginning, the consumer can
use prior knowledge about the environment to adjust its rendering.
Perhaps I'd like to render at some fractional frequency? Or a multiple
without trying to over-acquire? Or I've been barely compositing a
30fps video but now the user has loaded a 60fps source.

If the video format has a variable framerate and I infer the framerate
from the beginning which has many static frames (titles, black), I
will overestimate my frame budget.

Are you planning on exposing the variable framerate of decoders to
stream consumers in the msc? Is it bad to insert "duplicate" frames
and increment the msc when the source framerate drops below max? I
think consistency is valuable here and simplifies the author's work.
If I test my generic video app with constant framerate videos but not
variable framerate videos, I might be disappointed with performance on
sophisticated codecs in future. Can this consistency abstraction leak
in some way I am missing?

>>   //   void setConsumerLatency(int); // linear with consumerLatency? >0?
>> clamped? how does it exert backpressure on source?
> I don't understand the first question. Yes >0.

Is setConsumerLatency purely a (ranged) setter? Does it linearly scale
its argument? If I pass setConsumerLatency a value in the valid range,
is that the value later read by consumerLatency?

> I think it will be clamped to
> some maximum that is either set by the source or specified by the app, in
> both cases when the stream is created.

When the source is bound? See below for the discussion of
source-stream correspondence. When would the app need to set an upper
bound in the stream object rather than in some wrapping function?

> The purpose of this is to fine tune the sync with audio. If the latency
> decreases, the source would have to repeat a frame or frames until the frame
> at the front of the queue is the one corresponding to audio time <at> where
> <at> = t + consumerLatency. If the latency increases, the source will have
> to skip a frame or frames in order to build up the latency.

I think there are two uses for consumerLatency which should be
separated conceptually. Use 1 is resynchronizing the audio and video
without changing rendering. Use 2 is resynchronizing the audio and
video due to a rendering change.

Use 1:

We notice that the video latency has decreased from large k to 0 so
the user is seeing video from t+k but hearing audio from t. Setting
the latency to 0 will repeat (or pause) frames as you have said. Will
pause vs repeat be specified?

We notice that the video latency has increased from 0 to large k so
the user is seeing video from t-k but hearing audio from t. Setting
the latency to k will skip frames by reading video from t+k instead of

Use 2:

A known latency change (e.g. direct rendering => filtering or
filtering => direct rendering) doesn't necessarily cause any audio or
video glitches. To accomplish this requires multiple time-offset
streams or a delay/replay facility. This is a use for composed streams
(see end).

Suppose latency decreases from large k to 0. The audio stream is
playing audio samples from t while the video decoder is producing
frames from t+k so when they display, they will be in sync at t.
setConsumerLatency is called with 0. The rendering system takes k to
drain and synchronized frames are output (after delay k).

Suppose latency increases from 0 to large k. The audio stream is
playing audio samples from t while the video decoder is producing
frames from t. setConsumerLatency is called with k. The k latency
rendering system immediately begins receiving frames from t+k which
begin appearing at t+k. Meanwhile, our 0 latency rendering is still
receiving frames from t until t+k.

How can this use case be supported? Will it be supported? Is it
possible/feasible? It seems to require the ability to generate two
streams for the increased latency case. I believe composed streams
could support this use case.

Performance video art and other dynamic video compositing apps require
this capability. Is there some other way to accomplish it?

>>   // };
>>   //
>>   //
>>   function connectVideo(ctx, video) // Whose function is this? Your
>> internal implementation or an abstraction of the proposed interface?
> The application's.

Would it be possible to spec a (end-user implementable) convenience
function with similar behavior? A user onload handler would be called
after the appropriate stream setup.

>>   {
>>     g.loadingFiles.push(video);
>>     g.videoReady = false;
>>     // What is g? gl? seems related to the stream but videoReady is racy
> g is an object that contains the application's global variables to avoid
> polluting the global namespace. Why is it racy? It's only set in the onload
> function and never cleared.

If I call connectVideo(ctx,video) and connectVideo(ctx,video2),
videoReady is raced. You address this below, though.

>>     //-----------------------------
>>     // Options for connecting to video
>>     //-----------------------------
>>     // OPTION 1: method on WDT extension augments video element
>>     // with a wdtStream object.
>>     ctx.dte.createStream(video); // What property of video makes this
>> possible? Is that property part of this specification?
> There is no exposed property of video that makes creating the stream
> possible. But this option causes a wdtStream property on the video to be
> defined. If you have a suggestion about (a) what property would be useful
> and (b) how I can spec it in a WebGL extension and not the HTMLVideoElement
> specification, I am happy to listen.

I think HTMLVideoElement's membership in WebGLDynamicTextureSource
would be helpful.

Perhaps something like this (valid?) WebIDL:

interface WebGLDynamicTextureSource { };
HTMLVideoElement implements WebGLDynamicTextureSource;

>>     assert(video.wdtStream.state == wdtStreamConnecting);
>>     //-----------------------------
>>     // OPTION 2: method returns a stream object.
>>     g.vstream = ctx.dte.createStream(ctx /* video? empty? */); //
>> Q.vstream = ctx.dte.createStream(video); assert(Q.vstream == g.vstream); ?
> Empty. The video has not been loaded yet.

This would make the stream equality assertion fail and allow multiple
(mutually unsynchronized?) streams for the same source. The
application would then be responsible for setting the latencies to
coincide somehow. This is rather unfortunate for simple libraries. See
below for a hybrid (but incomplete) solution.

>>     assert(g.vstream.state == wdtStreamConnecting);
>>     //-----------------------------
>>     video.onload = function() {
>>         g.loadingFiles.splice(g.loadingFiles.indexOf(video), 1);
>>         try {
>>           // OPTION 1: video object augmented with stream
>>           video.wdtStream.connect(); // If the stream is _part_ of video
>> this hardly seems useful to consumers.
> Yes I suppose the video element knows when it is loaded and could
> automatically connect itself up to the stream.

If every stream has a single set of sources known at creation time,
this makes sense.

>>           assert(video.wdtStream.state == wdtStreamEmpty);
>>           //-----------------------------
>>           // OPTION 2: separate stream object
>>           g.vstream.connectProducer(video); // What property of video
>> makes this possible? Is that property part of this specification?
> See above.

interface WebGLDynamicTextureSource { };
HTMLVideoElement implements WebGLDynamicTextureSource;

>>           assert(g.stream.state == wdtStreamEmpty);
>>           //------------------------------
>>           if (!video.autoplay) { // is this inverted? NOT autoplay ->
>> play?
>>             video.play(); // Play video
>>           }
> If autoplay is set, the video should start playing without any help from the
> application. It is the application's choice here to start playing the video
> once loaded.


>>           g.videoReady = true; // do you mean if g.loadingFiles is length
>> 0?
> Since there is only one video in this example, it doesn't matter but the
> flag should be per video. The application is using it to see whether to call
> {acquire,release}Image or not.

This answers the race condition issue above.

>>         } catch (e) {
>>           window.alert("Video texture setup failed: " + e.name);
>>         }
>>       };
>>   }
>>   function drawFrame(gl)
>>   {
>>     var lastFrame;
>>     var syncValues;
>>     var latency;
>>     var graphicsMSCBase;
>>     // Make sure the canvas /* buffer? */ is sized correctly.
> Yes the buffer.
>>     reshape(gl);
>>     // Clear the canvas
>>     gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
>>     // Matrix set-up deleted ...
>>     // To avoid duplicating everything below for each option, use a
>>     // temporary variable. This will not be necessary in the final
>>     // code.
>>     // OPTION 1: augmented video object
>>     var vstream = g.video.wdtStream;
>>     // OPTION 2: separate stream object
>>     var vstream = g.vstream;
>>     // In the following
>>     //   UST is a monotonically increasing counter never adjusted by NTP
>> etc.
>>     //   The unit is nanoseconds but the frequency of update will vary
>> from
>>     //   system to system. The average frequency at which the counter is
>>     //   updated should be 5x the highest MSC frequency supported. For
>>     //   example if highest MSC is 48kHz (audio) the update frequency
>>     //   should be 240kHz. Most OSes have this kind of counter available.
>>     //
>>     //   MSC is the media stream count. It is incremented once/sample; for
>>     //   video that means once/frame, for audio once/sample. For graphics,
>>     //   it is incremented once/screen refresh. // good! on most machines,
>> when I time a rendercycle, I get 60-75 Hz?
>>     //
>>     //   CPC is the canvas presentation count. It is incremented once
>>     //   each time the canvas is presented. // this is totally detached
>> from time, yes?
>>     //
> Yes.
>>     if (graphicsMSCBase == undefined) {
>>         graphicsMSCBase = gl.dte.getSyncValues().msc;
>>     }
>>     if (lastFrame.msc && vstream.producerFrame > lastFrame.msc + 1) {
>>       // Missed a frame! Simplify rendering?
>>     }
>>     if (!latency.frameCount) {
>>       // Initialize
>>       latency.frameCount = 0;
>>       latency.accumTotal = 0;
>>     }
>>     if (lastFrame.ust) {
>>       syncValues = gl.dte.getSyncValues();
>>       // interface syncValues {
>>       //     // UST of last present
>>       //     readonly attribute long long ust;
>>       //     // Screen refresh count (aka MSC) at last present
>>       //     // Initialized to 0 on browser start
>>       //     readonly attribute long msc;

On page load? Browser start leaks information that may not be
otherwise discoverable. Thoughts?

>>       //     // Canvas presentation count at last present
>>       //     // Initialized to 0 at canvas creation.
>>       //     readonly attribute long cpc;
>>       // };
>>       // XXX What happens to cpc when switch to another tab?
>>       if (syncValues.msc - graphicsMSCBase != syncValues.cpc) { // this
>> assumes the media rates are locked to the rendering rates
> No. Read the comment below. This relates only to whether the 3D rendering is
> keeping up with the screen refresh.
>>         // We are not keeping up with screen refresh!
>>         // Or are we? If cpc increment stops when canvas hidden,
>>         // will need some way to know canvas was hidden so app
>>         // won't just assume its not keeping up and therefore
>>         // adjust its rendering.

My previous comment ("this assumes...") was too imprecise. I believe
we are talking about the same issue. The CPC is not related to time
*at all* (see above) whereas the screen MSC is incremented at a fixed
frequency. This condition compares cycles to counter. Why does screen
refresh matter here if the canvas is being presented slower? It seems
to me like we should be using our graphicsMSCBase to track the CPC
delta instead of the MSC delta. Have I missed something?

>>         graphicsMSCBase = syncValues.msc; // reset base.

When this occurs once, it will occur every subsequent invocation
(increasing at same rate as normal CPC but now CPC always greater,
never equal).

>>       }
>>       latency.accumValue += syncValues.ust - lastFrame.ust;

How do we know the last present corresponds to the last draw command?
What if the GPU pipeline is several frames deep? Rotating 900 degrees
!= rotating 180 degrees. Each dynamic texture sink somehow needs a
per-stream presentation msc to fix this, I think. Is there some
invariant in all rendering systems that prevents this?

>>       latency.frameCount++;
>>       if (latency.frameCount == 30) { // is this 30 the fps of the encoded
>> video? can it be retrieved from the stream source somehow?
> No. It is just the number of frames I picked to over which to average the
> latencies. I'm not sure there is any advantage to picking a number based on
> the fps of the source.

When we are drawing faster (at a potentially uneven rate) than the
frames are produced (acquireImage times out) (at a potentially uneven
rate without the max frequency behavior above), our accumulator
increases a lot erroneously. Should the latency only be updated when
the next frame is successfully acquired?

I would lock the latency estimation window to a ust duration unless
the source frequency is fixed to gain a consistent real-world
frequency (and reliable observables) and prevent jitters or slow
resync. Is there a reason to not use an exponential moving average?

We won't really know if this matters until a video decoder gets
plugged into a rendering context inside of a browser compositor in a
non-trivial way. Maybe it won't matter. Is there an issue exposing the
maximum framerate of a texture source? It could be unknown? Is there
an issue clamping it to the frequency of the display? Very high
frequency videos? Seems unlikely...

>>         vstream.setConsumerLatency(latency.accumValue / 30);
>>         latency.frameCount = 0;
>>         latency.accumValue = 0;
>>       }
>>     }
>>     if (g.videoReady) {
>>       if (g.video.wdtStream.acquireImage()) {
>>         // Record UST of frame acquisition.
>>         // No such system function in JS so it is added to extension.
>>         lastFrame.ust = gl.dte.ustnow();
>>         lastFrame.msc = vstream.consumerFrame;
>>       }
>>       // OPTION 2:
>>       vstream.acquireImage();
>>       lastFrame = g.stream.consumerFrame; // lastFrame.msc = ...
> Yes.
>>     }
>>     // Draw the cube
>>     gl.drawElements(gl.TRIANGLES, g.box.numIndices, gl.UNSIGNED_BYTE, 0);
>>     if (g.videoReady)
>>       vtream.releaseImage();
>>     // Show the framerate
>>     framerate.snapshot();
>>     currentAngle += incAngle;
>>     if (currentAngle > 360)
>>         currentAngle -= 360;
>>   }
> How many streams may exist for a given media source? If multiple, do they
> communicate amongst themselves and buffer frames for sharing? If yes, this
> suggests that streams have a source separate from its sinks. This source
> must have a property that tracks the maximum consumer latency.
> Keep it simple. 1 only I think.

Excellent! I agree but this requires a hybrid of the approaches above.
Specifically, the same stream must be created from the source which
implies the constructor of the stream requires the source object. I
don't think it matters whether the stream lives on the DOM element or
as a separate object so long as it is singleton for a source.

How should applications with multiple consumers with varying latency
be handled? My use case is live preview of Apple Photobooth filters in
separate DOM elements. Does each source get a single stream *per
context*? How do I synchronize across contexts? If I can compose
streams, I can build a sorted dynamic texture source chain with
increasing latency to feed the canvases.

> What type of object may be used to generate a stream source?
> Do you mean what kind of object can be a stream source?  HTMLVideoElement,
> etc.

Yes but seeing as we are avoiding changing the concrete interfaces of
sources, the stream object will exist separate from and in one-to-one
correspondence with the source. Can canvas elements be sources? Can
DOM elements be privileged sources? Is it possible to specify that
objects implementing a given empty IDL interface are valid producers?

> Some media sources (cameras, live streams), cannot seek into the future. How
> does an application with multiple sinks attached to these sources
> synchronize those outputs? Setting all consumer latencies equally?
> In those cases I think the consumerLatency will be clamped to zero and you
> will have to put up with poor audio synchronization. The only other option
> is to give the application control over the audio stream so it can specify a
> delay when the stream starts. It would not be able to adjust that delay
> without introducing audio glitches so it could only make a best guess.

I was under the impression that consumerLatency informed audio
buffering in the video decoder case.

Is there a problem with buffering live sources to t+k delayed
real-time? This is a case where knowing the maximum framerate of the
stream in advance would be helpful to avoid audio glitches due to
running overbudget.

> Can streams be concatenated? Is the result a stream? I don't think this
> should be part of the API but I think it should be possible to build on top
> of the API.
> Do you mean have one stream be the source for another stream? What is the
> use case?

I mean have one stream represent stream A followed by stream B. The
use case is using stream as a common abstraction between a consumer
library and a producer web app. I guess rebinding a new dynamic
texture is OK but I am concerned about sourcing from HTMLVideoElement.
Is the stream connected to the current video decoder of the
HTMLVideoElement source? What happens when I change the
HTMLVideoElement source? If it's the same stream, this would be
concatenation of video streams but would seem to break invariants in
the source representation (frame counts, frequencies, CORS-ness).
Should this functionality be part of the stream instead? Should the
stream need to know about all of its sources at creation time?

Sourcing one stream from another I would call 'composition' and I
believe it has applications for multiple consumers at varying
latencies (see Photobooth example above).

> Is it possible to construct a stream object from a WebGL renderbuffer? Can a
> developer do this or must the browser implementor be involved? What is the
> minimal set of interfaces that is required to give developers this kind of
> flexibility?
> I don't see the point of introducing this added complexity. I can't think of
> anything you could do with this that you can't accomplish with an FBO unless
> streams start to be used by other parts of the web platform. In terms of the
> underlying EGLStream, I don't think there is currently any extension
> supporting using a renderbuffer as a producer, only an EGLSurface. If there
> is hardware support for this, the browser would have to provide a function
> to connect the stream to the renderbuffer as a producer.

Stream objects provide an interface for synchronized images from
outside a GL context to a dynamic texture inside a GL context. Here
are my use cases, please let me know if there are better ways to
accomplish them:

1. I use unprivileged resources and readPixels for picking. If I have
multiple contexts in the main thread, producing streams from FBOs lets
me share rendering between them with low latency while sequestering
unprivileged data in the display context. This lets me render most of
the scene once (assuming MRT), composite unprivileged resources for
display, and still be able to read-back most of the rendering for
screen capture or a single pixel for picking.

2. I have a WebGL context in the main thread and a WebGL context in a
worker (to keep the main loop smooth despite texture loads/shader
compiles). I don't need to share general GL objects, just an
expensive-to-compute-or-update dynamic texture.

3. I want to get a video screen capture (GL context only) of what my
user is seeing (including timing glitches) as part of a bug report. No
video encoder interface has yet been specified/implemented AFAIK but
OpenMAX appears to offer this capability. Will this be a separate API
for "streams-going-to-encoders"? Why?

Thanks again, Mark. I really appreciate your taking the time to
discuss these matters with me. I apologize for my ignorance of some of
these systems. I'd also like to express my deep appreciation for your
recent excellent work on floating point textures.

Warm regards,

David Sheets

"Freedom's just another word for nothing left to lose
And nothing ain't worth nothing but it's free" ~ Kris Kristofferson
and Fred Foster

"Perfection is achieved, not when there is nothing more to add, but
when there is nothing left to cut away." ~ Antoine de Saint-Exupéry

You are currently subscribed to public_webgl@khronos.org.
To unsubscribe, send an email to majordomo@khronos.org with
the following command in the body of your email:
unsubscribe public_webgl