[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WEBGL_dynamic_texture extension proposal






On Mon, Jul 23, 2012 at 1:22 PM, Chris Marrin <cmarrin@apple.com> wrote:

On Jul 20, 2012, at 3:29 AM, Mark Callow <callow_mark@hicorp.co.jp> wrote:

 On 13/07/2012 19:42, Mark Callow wrote:

Before I go ahead and change the draft I want to know if people comfortable having an extension that mirrors a non-Khronos OpenGL ES extension? As I said it
Since I heard no objections, I've just committed a highly revised draft based with the TEXTURE_EXTERNAL parts mirroring GL_NV_EGL_stream_consumer_external. You can find it at
http://www.khronos.org/registry/webgl/extensions/proposals/WEBGL_dynamic_texture/
The commands for connecting sources and acquiring and releasing image frames now follow the semantics of EGLStream. I've also added a dynamicTextureSetConsumerLatencyUsec(HTMLVideoElement) method that I think is needed to help with audio synchronization.

I've been thinking about Chris's suggestion to request a frame with a particular time-stamp. Why would WebGL applications need something like that while regular Web apps manage without it? The only difference I can see is the almost certain increased latency of making the frame visible, hence the new method.
"regular Web Apps" typically don't play with video. They create a video element and then play, pause, change the play head, play at different rates, etc. The video rate control, playback and rendering are all handled by the same native driver. And in fact on OSX and iOS that native driver does use timestamps to control when frames appear.

WebGL needs to fetch a frame from the video provider and then render it. Since there is a disconnect between the two, there has to some way to control which of possibly several available frames should be used. Your dynamicTextureSetConsumerLatencyUsec() does this somewhat. But that method makes my stomach hurt. It tries to do exactly what I'm talking about but in a very indirect and (to me) confusing way. That call is specifying the difference between when the call to acquire is made and the frame will hit the display. What's the difference between that and asking for a frame for a given time because that's when you determined it will hit the display? 

And by doing it in an indirect way like you've specified you're making it harder to do the other thing I mentioned before. In the future, if I want to work on two frames at a time, I will have to ask for frames for two separate timestamps. If the timestamp is merely another parameter to dynamicTextureAcquireImage() I can do that. If I were to try to do it with dynamicTextureSetConsumerLatencyUsec() I would have to fool the system into giving me the frame I want by asking a frame at 33 ms more latency than the previous frame.

It just seems like adding another API call is not nearly as good as just adding a timestamp param to dynamicTextureAcquireImage(). It could even be an optional parameter.

I would like to understand this better. I'm not a video expert but having the API take a timestamp of the frame I want sounds like I could ask for any frame at random and the system might have to seek, online, through a mult-gigabyte file. Is this timestamp a hint? Does it have to be within some range of where video is being decoded? Does the decoder have to keep N frames around? Does it need to buffer up N frames? Is there some other timestamp for audio as well? Sorry if these are noob questions for video. I'm just trying to understand how the API your suggesting would be implemented. 

In a playback only system I would expect the video playback code to decode N frames in advance and buffer up some audio and however it does it, make sure the correct frame of video gets displayed at the same time the corresponding audio is played. But I would not expect the client side to tell the video "give me the frame for time N" since in order for the video decoder to be efficient it can't handle random requests. If the client starts asking for N * 2 or N * 3 the video decoder is forced to skip frames. If asks for N-1 or N-20 it has to go in reverse. And none of these have anything to do with audio. Does it roll back the audio as for a time before the current time or play the audio at double speed if I ask pass in time * 2 for requested time?

If it does have to seek then it seems like acquire needs to be async. It might come back nearly immediately if the frame is available or it might be some arbitrary number of seconds if the time requested is distant from the current time. If there are some limits then what are they?


 

Regards

    -Mark

-- 
注意:この電子メールには、株式会社エイチアイの機密情報が含まれている場合が有ります。正式なメール受信者では無い場合はメール複製、 再配信または情報の使用を固く禁じております。エラー、手違いでこのメールを受け取られましたら削除を行い配信者にご連絡をお願いいたし ます。

NOTE: This electronic mail message may contain confidential and privileged information from HI Corporation. If you are not the intended recipient, any disclosure, photocopying, distribution or use of the contents of the received information is prohibited. If you have received this e-mail in error, please notify the sender immediately and permanently delete this message and all related copies.


-----
~Chris Marrin
cmarrin@apple.com