[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WEBGL_dynamic_texture redux



Hi David,

Finally I'm getting back to the rest of your questions and comments. It's tough to switch contexts after the time that has elapsed. I hope you can still remember the questions I am responding to.

All,

I will produce a new draft of the extension as soon as I can. I'll give myself a deadline of 2 weeks. It need to include the new interfaces & methods shown in the current version of the sample (in the extensions/proposals section of the GIT repository) after I've modified them to work with frames that provide their presentation time, if I can figure out a way.

On 2012/11/22 11:15, David Sheets wrote:
Are you planning on exposing the variable framerate of decoders to stream consumers in the msc? Is it bad to insert "duplicate" frames and increment the msc when the source framerate drops below max? I think consistency is valuable here and simplifies the author's work. If I test my generic video app with constant framerate videos but not variable framerate videos, I might be disappointed with performance on sophisticated codecs in future. Can this consistency abstraction leak in some way I am missing?
msc is intended to tell the application how many samples have come through the decoder so it can tell if it has missed any. So it is bad to artificially increment it.

I suspect issues are more likely to arise if initial testing took place with a variable-frame-rate video and then a fixed-frame-rate video is used. It is likely possible to fake a fixed frame rate but I prefer to explore using the presentation time stamp of the video before going that route.
When the source is bound? See below for the discussion of
source-stream correspondence. When would the app need to set an upper
bound in the stream object rather than in some wrapping function?
The sample (the one mentioned above) shows this. I think the max latency will have to be set at the time the stream is bound to the producer. I think producers may have trouble increasing the read-ahead on the fly. By setting it when the producer is established, the producer can decode enough frames to cover the maximum possible latency before it starts to deliver any and before it starts to play the audio. It can continue to decode the video that amount of time ahead regardless of which frames it is actually putting into the stream.
Use 1:

We notice that the video latency has decreased from large k to 0 so
the user is seeing video from t+k but hearing audio from t. Setting
the latency to 0 will repeat (or pause) frames as you have said. Will
pause vs repeat be specified?
I want to specify a smooth transition rather than a jump. It's more difficult to specify when variable-frame-rate must be taken into consideration. I'll have to think about language.
Use 2:

...

How can this use case be supported? Will it be supported? Is it
possible/feasible? It seems to require the ability to generate two
streams for the increased latency case. I believe composed streams
could support this use case.

Performance video art and other dynamic video compositing apps require
this capability. Is there some other way to accomplish it?
I hadn't thought about this at all.

    if (lastFrame.ust) {
      syncValues = gl.dte.getSyncValues();
      // interface syncValues {
      //     // UST of last present
      //     readonly attribute long long ust;
      //     // Screen refresh count (aka MSC) at last present
      //     // Initialized to 0 on browser start
      //     readonly attribute long msc;
On page load? Browser start leaks information that may not be
otherwise discoverable. Thoughts?
I have been looking at the High Resolution Time spec. It starts a timer at page load. It might be appropriate to start the screen MSC then too. What information would be discoverable, other than how long the browser has been running? Is that information a security concern?

      
      //     // Canvas presentation count at last present
      //     // Initialized to 0 at canvas creation.
      //     readonly attribute long cpc;
      // };
      // XXX What happens to cpc when switch to another tab?
      if (syncValues.msc - graphicsMSCBase != syncValues.cpc) { // this
assumes the media rates are locked to the rendering rates
No. Read the comment below. This relates only to whether the 3D rendering is
keeping up with the screen refresh.

        // We are not keeping up with screen refresh!
        // Or are we? If cpc increment stops when canvas hidden,
        // will need some way to know canvas was hidden so app
        // won't just assume its not keeping up and therefore
        // adjust its rendering.
My previous comment ("this assumes...") was too imprecise. I believe
we are talking about the same issue. The CPC is not related to time
*at all* (see above) whereas the screen MSC is incremented at a fixed
frequency. This condition compares cycles to counter. Why does screen
refresh matter here if the canvas is being presented slower? It seems
to me like we should be using our graphicsMSCBase to track the CPC
delta instead of the MSC delta. Have I missed something?
A +ve cpc (canvas presentation count) delta would tell the application that it missed a browser composition cycle. But it gives no information about whether browser composition is keeping up with screen refresh or video frame rate. It is when the video hits the screen that is important for audio sync. Tracking the screen MSC lets the app. account for delays in itself and the browser. It should probably track the cpc delta as well to improve decisions about what to do to reduce any delays.


        graphicsMSCBase = syncValues.msc; // reset base.
When this occurs once, it will occur every subsequent invocation
(increasing at same rate as normal CPC but now CPC always greater,
never equal).
Yup. I've changed it to
graphicsMSCBase = syncValues.msc - syncValues.cpc;
but not submitted it to github yet.
      }
      latency.accumValue += syncValues.ust - lastFrame.ust;
How do we know the last present corresponds to the last draw command?
What if the GPU pipeline is several frames deep?
We don't. Why does it matter? What is important for audio sync is when the pixels of the frame hit the display. Ah!! When you say draw command do you mean the last requestAnimationFrame->drawFrame() cycle rather than a gl.drawXXX command?

That is not a GPU pipeline issue. It's a browser compositor issue. The DT spec will say that getSyncValues() function returns the time at which the page whose composition includes the canvas image from the previous draw cycle hit the screen. It would be a lot easier to specify if we had an explicit present() function.

Rotating 900 degrees != rotating 180 degrees.
Not sure what this has to do with what we are discussing.
Each dynamic texture sink somehow needs a
per-stream presentation msc to fix this,
Why? If the sinks are all in 1 canvas that canvas's pixels are going to hit the display at the same time. I think the same will be true if the page has multiple canvases. If not, the application should track sync separately for each canvas. cpc should be per canvas.
      latency.frameCount++;
      if (latency.frameCount == 30) { // is this 30 the fps of the encoded
video? can it be retrieved from the stream source somehow?
No. It is just the number of frames I picked to over which to average the
latencies. I'm not sure there is any advantage to picking a number based on
the fps of the source.
When we are drawing faster (at a potentially uneven rate) than the
frames are produced (acquireImage times out) (at a potentially uneven
rate without the max frequency behavior above), our accumulator
increases a lot erroneously. Should the latency only be updated when
the next frame is successfully acquired?

I would lock the latency estimation window to a ust duration unless
the source frequency is fixed to gain a consistent real-world
frequency (and reliable observables) and prevent jitters or slow
resync. Is there a reason to not use an exponential moving average?
I think you are correct that my approach to latency calculation is too simplistic. Locking it to a ust duration seems reasonable.
We won't really know if this matters until a video decoder gets
plugged into a rendering context inside of a browser compositor in a
non-trivial way. Maybe it won't matter. Is there an issue exposing the
maximum framerate of a texture source?
I don't think so.
 It could be unknown?
Unfortunately it is unknown because HTMLVideoElement does not include a frame rate attribute.

I need to complete the exercise of modifying the example to use frame presentation times since that seems to be the web way.
How many streams may exist for a given media source? 

Keep it simple. 1 only I think.
Excellent! I agree but this requires a hybrid of the approaches above.
Specifically, the same stream must be created from the source which
implies the constructor of the stream requires the source object. I
don't think it matters whether the stream lives on the DOM element or
as a separate object so long as it is singleton for a source.
Yes we need to do something to ensure that only one stream can be created from a source.
How should applications with multiple consumers with varying latency
be handled? My use case is live preview of Apple Photobooth filters in
separate DOM elements. Does each source get a single stream *per
context*? How do I synchronize across contexts? If I can compose
streams, I can build a sorted dynamic texture source chain with
increasing latency to feed the canvases.
I haven't thought about this use case. I need to study the underlying EGL & GL extensions to see if they support an EGLStream being sent to multiple contexts.
What type of object may be used to generate a stream source?

Do you mean what kind of object can be a stream source?  HTMLVideoElement,
etc.
Yes but seeing as we are avoiding changing the concrete interfaces of
sources, the stream object will exist separate from and in one-to-one
correspondence with the source. Can canvas elements be sources? Can
DOM elements be privileged sources? Is it possible to specify that
objects implementing a given empty IDL interface are valid producers?
I'm not sure it is necessary to support these other objects as producers, except Canvas. The purpose of the extension is to allow data to be moved from video decoder to GPU or internally within the GPU without CPU intervention. I'd prefer to limit our ambitions for the first version.
Is there a problem with buffering live sources to t+k delayed
real-time? This is a case where knowing the maximum framerate of the
stream in advance would be helpful to avoid audio glitches due to
running overbudget.
I expect that could be done.
I mean have one stream represent stream A followed by stream B. The
use case is using stream as a common abstraction between a consumer
library and a producer web app. I guess rebinding a new dynamic
texture is OK but I am concerned about sourcing from HTMLVideoElement.
Is the stream connected to the current video decoder of the
HTMLVideoElement source? What happens when I change the
HTMLVideoElement source? If it's the same stream, this would be
concatenation of video streams but would seem to break invariants in
the source representation (frame counts, frequencies, CORS-ness).
Should this functionality be part of the stream instead? Should the
stream need to know about all of its sources at creation time?
This is another topic I will have to give thought to. It's probably easiest at first to say you can't change the HTMLVideoElement source once bound to a stream.
Sourcing one stream from another I would call 'composition' and I
believe it has applications for multiple consumers at varying
latencies (see Photobooth example above).
I would like to ensure we don't put anything in the spec. that would block this but I don't think we should put effort into supporting it in the first version.

Stream objects provide an interface for synchronized images from
outside a GL context to a dynamic texture inside a GL context. Here
are my use cases, please let me know if there are better ways to
accomplish them:

1. I use unprivileged resources and readPixels for picking. If I have
multiple contexts in the main thread, producing streams from FBOs lets
me share rendering between them with low latency while sequestering
unprivileged data in the display context. This lets me render most of
the scene once (assuming MRT), composite unprivileged resources for
display, and still be able to read-back most of the rendering for
screen capture or a single pixel for picking.
I think allowing texture sharing between contexts is the way forward for this case.
2. I have a WebGL context in the main thread and a WebGL context in a
worker (to keep the main loop smooth despite texture loads/shader
compiles). I don't need to share general GL objects, just an
expensive-to-compute-or-update dynamic texture.
Again I think texture sharing is the answer for this case.
3. I want to get a video screen capture (GL context only) of what my
user is seeing (including timing glitches) as part of a bug report. No
video encoder interface has yet been specified/implemented AFAIK but
OpenMAX appears to offer this capability. Will this be a separate API
for "streams-going-to-encoders"? Why?
I agree that this use case should use the same stream object as the interface between the canvas and the video encoder. This is another reason why Canvas should be a producer.

Thanks for the feedback and interesting use cases.

Regards

    -Mark

--
注意:この電子メールには、株式会社エイチアイの機密情報が含まれている場合が有ります。正式なメール受信者では無い場合はメール複製、 再配信または情報の使用を固く禁じております。エラー、手違いでこのメールを受け取られましたら削除を行い配信者にご連絡をお願いいたし ます.

NOTE: This electronic mail message may contain confidential and privileged information from HI Corporation. If you are not the intended recipient, any disclosure, photocopying, distribution or use of the contents of the received information is prohibited. If you have received this e-mail in error, please notify the sender immediately and permanently delete this message and all related copies.