[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] For review: ANGLE_timer_query extension



A few questions:

1) What is the intended use of timer queries? Mainly for profiling
applications, or mainly to change run-time behavior of applications?

2) Regardless of whether the API uses polling or callbacks, issuing
thousands of timer queries isn't going to work to affect an app's
run-time behavior. It will be possible to issue more queries than can
be processed in a single frame (again, either by polling or via
callbacks). It is inevitable that the app will have to throttle its
issuing of timer queries.

3) Apparently, there are difficulties supporting timer queries on
tiled rendering architectures. More information should be coming from
the OpenGL ES working group soon. The differences in behavior at least
require some API changes, and at worst, would prevent timer queries
from being exposed to WebGL on these architectures. Does this fact
change your opinion about how timer queries are intended to be used?

If timer queries were intended only for use in debugging or profiling,
that might make it easier to specify them, and to implement consistent
behavior across platforms. They could then be specified similarly to
WEBGL_debug_renderer_info. The specification could say that the use of
timer queries may impose a performance impact on applications, and
should not be used in production. A statement like this might make it
more feasible to support them in WebGL on mobile platforms.

I think the most reliable way to adjust level of detail dynamically
would be to make adjustments based on the application's overall frame
rate over time, rather than measuring the GPU-side execution time of
individual draw calls. What do you think?

-Ken



On Wed, Apr 3, 2013 at 1:41 PM, Ben Vanik <benvanik@google.com> wrote:
> Yeah - that's what I was saying above. Both of these are acceptable and
> still allow the API to be used efficiently.
>
>
> On Wed, Apr 3, 2013 at 1:39 PM, Gregg Tavares <gman@google.com> wrote:
>>
>> Another way around this issue is to leave the API as is but
>>
>> (1) require that you must get a true result from checking the result is
>> available before checking the result will return the correct value
>>
>> This would prevent assuming the result is available without actually
>> checking that it is available.
>>
>> (2) require that implementation cache the result and available status
>> outside JavaScript events.
>>
>> This would make it impossible to spin loop because the it would never
>> change. Any spin loop would loop forever
>>
>>
>>
>>
>>
>>
>> On Wed, Apr 3, 2013 at 1:24 PM, Florian Bösch <pyalot@gmail.com> wrote:
>>>
>>> Far as I understand it, one query object corresponds to a start and end
>>> time. Two integer values in nanoseconds.
>>>
>>> That means that you cannot use the same query object at the same place
>>> over and over. You might overwrite previous measurement. Until you actually
>>> have a measurement by polling that query object, you cannot use it again,
>>> correct? So you're going to allocate thousands of query objects each frame?
>>>
>>>
>>> On Wed, Apr 3, 2013 at 10:16 PM, Ben Vanik <benvanik@google.com> wrote:
>>>>
>>>> You don't use one per frame - you use many. That's what the simple
>>>> examples don't show.
>>>>
>>>> A typical frame in a complex scene has many nested drawing batches -
>>>> like for each pass for each depth mode for each shader for each texture for
>>>> each buffer etc. You put timers around those things - sometimes 100+. Since
>>>> you'll want to support high latency you'll want a couple sets of these
>>>> timers. For the application we're building there may be 1000+ timers in
>>>> flight at any given time.
>>>>
>>>> Just as you pipeline readback from framebuffers/etc so that you aren't
>>>> blocking the GPU, you schedule your timer readback the same way -- on frame
>>>> N you are checking to see if the timers from frame N-1 or N-2 are available
>>>> yet. And using clever querying you can quickly check all of them -- for
>>>> example if the results of the last timer from frame N-1 is available then
>>>> you know all the timers from frame N-2 are available too -- no need to check
>>>> them.
>>>>
>>>> When it comes to getting the values out it varies what you actually want
>>>> to get. For performance testing you may query all timers every frame. For
>>>> runtime testing deployed to real user machines most frames you may only
>>>> query the outermost timer - if it says the frame took <10ms or draw (or some
>>>> other threshold) you can just ignore the rest. But if it did take >Nms you
>>>> can start searching down the timer tree to find what took the time. A simple
>>>> binary search can then tell you exactly what kind of operation was slow for
>>>> that user and allow you to report that back to a diagnostics service, change
>>>> rendering quality, or even switch rendering engines to ensure the user has
>>>> the best experience.
>>>>
>>>> This kind of complex scenario is an example of one that we would like to
>>>> ship but would be unable to if the overhead imposed impacted performance
>>>> significantly. When building applications that try to schedule every
>>>> fractional millisecond of the main javascript loop any additional wasted
>>>> time that's not providing value is unacceptable.
>>>>
>>>>
>>>>
>>>> On Wed, Apr 3, 2013 at 1:04 PM, Florian Bösch <pyalot@gmail.com> wrote:
>>>>>
>>>>> I don't have any personal issue with the API style either way, if you
>>>>> say callbacks are too slow, fine. Let's not do callbacks. I think that
>>>>> either API style has its pitfalls for beginners.
>>>>>
>>>>>
>>>>> On Wed, Apr 3, 2013 at 9:49 PM, Ben Vanik <benvanik@google.com> wrote:
>>>>>>
>>>>>> The way I see it, the query API would work like this in a browser such
>>>>>> as Chrome where rendering occurs in another thread (though it can be done
>>>>>> similarly for other implementations):
>>>>>>
>>>>>> - user js runs init:
>>>>>>   - createQuery()
>>>>>>     - added to command buffer to send over to the gpu process
>>>>>>     - stashed in a 'query map' on the renderer side
>>>>>> - user js runs frame:
>>>>>>   - beginQuery()
>>>>>>   - drawElements()
>>>>>>   - endQuery()
>>>>>>     - commands added to buffer, sent to gpu process
>>>>>>   - queryCounter()
>>>>>>     - returns the value of the renderer-side query object immediately
>>>>>> - no blocking
>>>>>> - gpu process:
>>>>>>    - run command buffer, see active timers, schedule them for
>>>>>> processing
>>>>>>    - for each scheduled counter: query, if available then queue for
>>>>>> sending back to renderer in a batch
>>>>>> - renderer message from gpu:
>>>>>>    - for each updated query:
>>>>>>      - find in query map, set value
>>>>>> - user js runs frame:
>>>>>>    - queryCounter()
>>>>>>      - returns the new value that was just set
>>>>>
>>>>>
>>>>> I don't understand how you could get accurate timings with just one
>>>>> query object for every frame. By the time you get to poll the value, there
>>>>> might have elapsed multiple frames, but you only have one state.
>>>>> querycounter doesn't capture the actual render time since it returns
>>>>> immediately without blocking. So wouldn't you have to allocate a new query
>>>>> object each frame? Isn't that also gonna be a killjoy for jerky animation?
>>>>
>>>>
>>>
>>
>

-----------------------------------------------------------
You are currently subscribed to public_webgl@khronos.org.
To unsubscribe, send an email to majordomo@khronos.org with
the following command in the body of your email:
unsubscribe public_webgl
-----------------------------------------------------------