[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WEBGL_dynamic_texture extension proposal

And below is Acorn's reply to my reply.

Based on feedback from you all I am going to recast WEBGL_dynamic_texture as wrapping GL_NV_EGL_stream_consumer_external which is basically GL_EGL_image_external without the bits we don't want anyway. For the 3 new functions, I will borrow some language from EGL_KHR_stream_consumer_gltexture which has 3 similar functions and I'll be looking at this and EGL_KHR_stream for dealing with synchronization. I've also sent out a feeler to the author of this frame-accurate HTML5 video sample for his input.




Inline comments

On Thu, Jul 12, 2012 at 07:19:24PM -0700, Mark Callow wrote:
Hi Acorn,

Thanks for the feedback. You sent it to 3dweb. Is there a reason? Can I forward it and this reply to public-webgl?
Sure!  I wasn't aware of the public one - feel free to send it wherever it makes sense.  (Including the inline comments below if you like)

Comments interleaved below.

On 13/07/2012 05:38, Acorn Pooley wrote:

Feedback on the WEBGL_dynamic_texture extension:


An example is the use of a texture that is acquired. With EGLImage semantics
(and current WEBGL_dynamic_texture semantics) the texture can be modified
asynchronously at any time while it is being used (this results in
implementation dependent behavior which will vary from one implementation to
another). With EGLStream semantics the acquired image can never be modified
while it is acquired.

The WEBGL_dynamic_texture spec. explicitly says that the "acquired" image will not change until release is called, unless I'm badly misunderstanding my own language. In other words I believe I'm already overlaying EGLStream semantics.
[ACORN] Good point.  The issue I meant to raise was the idea that the producer cannot modify the image while it is acquired which your language seemed to imply.  If the image is double (or triple or more) buffered then there is no reason to stop the producer from creating new frames (or, for an image or canvas, modifying a copy of the image in a back buffer) while the texture has a "front buffer" copy acquired.  The difference between EGLImage and EGLStream is that an EGLStream represents a collection of color buffers -- at any particular point in time the consumer may be consuming one, and the producer may be modifying one (but never the same one that the consumer is consuming).  With EGLStream all the synchronization is handled by the driver and from the app point of view it "just works".  The EGLImage, on the other hand, is a single color buffer that is shared among several entities some of which may be writing and others of which may be reading it -- all concurr
ently.  The app is responsible for all synchronization.  I think for WebGL the EGLStream semantics make a lot more sense - they are simpler to use and less subject to variance from one implementation to another.  The EGLStream semantics can certainly be implemented by a browser using EGLImages (or even glTex[Sub]Image).  But I think it makes more sense to define and describe the API in terms of the simpler to use EGLStream semantics.  The current language "feels" like it is much more EGLImage centric (e.g. the idea that the producer is not allowed to modify the image while the consumer is consuming sounds like it is describing a single shared color buffer and not a collection of color buffers).

Another example is the use of a dynamic texture after
dynamicTextureReleaseImage has been called. EGLStream semantics require that a
TEXTURE_EXTERNAL texture that has no image acuired be treated as incomplete --
sampling it always returns black (0,0,0,1) pixels. The semantics of EGLImage
which are in the current extension wording are basically "undefined results"
which will make life difficult for developers and make content appear
differently on different implementations.

Gregg raised this issue. I'd be happy to change so that black pixels are returned after release but I am not sure how that can be implemented on a GLES implementation that supports only EGLImage. Is there a way? I suppose the WebGL implementation could just unbind the texture from TEXTURE_EXTERNAL.
[ACORN] Yes, I would expect an implementation based on EGLImages or glTexImage2D would just bind a black texture to the texture target whenever no image is acquired.

Another area that much thought has gone into is how TEXTURE_EXTERNAL should
work when the source is a video which is playing at a different rate than the
WebGL rendering loop. EGLStream addresses this by stating that "Acquire" gets
the image which should be displayed next. ... I think it is a good idea to require it to
work this way. Otherwise there is too much flexibility and different WebGL
implementations may act differently.

I'll be happy to specify this semantic which is what I suspect browsers use when playing <video> elements.. But is this enough to really sync to the audio? How are the different delays for rendering audio and video managed? The delay rendering via WebGL is likely to be different that when the browser renders the video directly.

Chris Marrin suggested passing a timestamp of the desired frame to acquireFrame but it is not clear to me that that is sufficient to solve the problem. Please see his messages in the thread in public-webgl.

How are the delays managed with EGLStream?
[ACORN] Audio delays are one of the parts of EGLStream that was worked on the hardest.  I encourage you to read the EGL_KHR_stream spec
which describes it in detail (search for the parts that describe CONSUMER_LATENCY). But here is a summary:
The consumer knows better than anyone how long it is going to hold onto the image after acquiring it and before it is viewed by the user.  This is the "Consumer Latency".  It is a combination of
 - any processing the app does to the image
 - how long the app takes to render the scene (after acquiring and processing the image)
 - how long the app takes to call eglSwapBuffers (or the WebGL equivalent) after rendering the scene
 - how long the SwapBuffers call takes to get the image to the screen where the user sees it.  Depending on the system this can include:
    - GPU rendering the scene
	- compositing the scene (browser compositing)
	- GPU rendering the browser composition
	- window system compositing
	- GPU rendering the window system composition
	- waiting for vblank
	- etc
It is difficult to predict what this time will be exactly, but it does not need to be exact.  A rough guess at the average latency is probably good enough.

In EGLStream the consumer is responsible for estimating this latency and telling the stream.  The stream then tells the producer.  The producer is then responsible for producing each image at the time it should be viewed by the user MINUS the consumer latency.

If this is being implemented in the browswer using EGLImages (or glTexSubImage) then the browser needs to take this consumer latency and use it to decide which image gets acquired when the Acquire call is made.  Example: imagine a 60 fps video.  The consumer latency is 32 msec.  The audio starts playing at time 0.  The WebGL app calls Acquire at time 16.  Since the consumer latency is 32 msec, this means that this image that is being acquired will not be displayed until time 48 (16+32).  Therefore this call to Acquire should "latch" frame 3 of the video, because that is the frame that should be visible at time 48 in order to be synced to the audio.

The other problem is how to calculate the consumer latency in the first place.  This is difficult for an app that needs to work on many platforms.  Hopefully the browser will know something about what platform it is running on and therefore be able to estimate the consumer latency.  For example on an android system with a compositing browser and a certain GPU you know that there are 2 compositing cycles (the browser and surfaceflinger), and approximately how long those cycles will take.  So the consumer latency can be estimated.  Even on a completely unknown system you can probably make a decent guess (and the browser could expose a control panel option to adjust the latency in case it guesses wrong - the same value is likely to work for most websites).


(There is also the GL_KHR_stream_fifo extension which allows the app to opt in
to different semantics (never drop frames) but that is probably less useful --
you can decide if you need that option or not.)

I wasn't aware of this extension. Thanks.


My reaction to most of this is covered above.


- In section Dynamic textures: It says
If a texture object to which a dynamic source is bound is bound to a
texture target other than TEXTURE_EXTERNAL the dynamic source will be
ignored. Data will be sampled from the texture object's regular data
I think it is a mistake to allow the same texture to be bound at one time
to a TEXTURE_EXTERNAL target and at another time to a different target. In
GLES2 this is an error (INVALID_OPERATION). Once a texture is bound to a
TEXTURE_EXTERNAL target it may never be bound to any other target. Once a
texture is bound to a target other than TEXTURE_EXTERNAL it may never
be bound to the TEXTURE_EXTERNAL target. (Maybe it is not possible to have
the same semantics in WebGL, but if it is possible I think this is

So you are really making two different types of texture depending on which target you bind the name to, to create the object. Why do you think it is a mistake to allow different bindings at different times?

If we go with this semantic, I wonder if we should introduce a new texture type in WebGL to make this semantic clear. The name of the current WebGL method is createTexture() although it is specified to only gen the texture name.
The different texture targets really are completely different types of textures.   You would not create a cubemap texture and then try to use it as a 3D texture, right?  That would not make sense.  It also makes no sense to try to use a TEXTURE_EXTERNAL texture as a 2D texture or vice versa.  You have a good point that it might be worth making this distinction at createTexture() time.  In the GLES2 api the texture does not get its type until it is first bound to a texture target, and after that happens its type is fixed until the texture is destroyed.  I don't know what the right answer for WebGL is.