[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] For review: ANGLE_timer_query extension



Another way around this issue is to leave the API as is but

(1) require that you must get a true result from checking the result is available before checking the result will return the correct value

This would prevent assuming the result is available without actually checking that it is available.

(2) require that implementation cache the result and available status outside _javascript_ events.

This would make it impossible to spin loop because the it would never change. Any spin loop would loop forever






On Wed, Apr 3, 2013 at 1:24 PM, Florian Bösch <pyalot@gmail.com> wrote:
Far as I understand it, one query object corresponds to a start and end time. Two integer values in nanoseconds.

That means that you cannot use the same query object at the same place over and over. You might overwrite previous measurement. Until you actually have a measurement by polling that query object, you cannot use it again, correct? So you're going to allocate thousands of query objects each frame?


On Wed, Apr 3, 2013 at 10:16 PM, Ben Vanik <benvanik@google.com> wrote:
You don't use one per frame - you use many. That's what the simple examples don't show.

A typical frame in a complex scene has many nested drawing batches - like for each pass for each depth mode for each shader for each texture for each buffer etc. You put timers around those things - sometimes 100+. Since you'll want to support high latency you'll want a couple sets of these timers. For the application we're building there may be 1000+ timers in flight at any given time.

Just as you pipeline readback from framebuffers/etc so that you aren't blocking the GPU, you schedule your timer readback the same way -- on frame N you are checking to see if the timers from frame N-1 or N-2 are available yet. And using clever querying you can quickly check all of them -- for example if the results of the last timer from frame N-1 is available then you know all the timers from frame N-2 are available too -- no need to check them.

When it comes to getting the values out it varies what you actually want to get. For performance testing you may query all timers every frame. For runtime testing deployed to real user machines most frames you may only query the outermost timer - if it says the frame took <10ms or draw (or some other threshold) you can just ignore the rest. But if it did take >Nms you can start searching down the timer tree to find what took the time. A simple binary search can then tell you exactly what kind of operation was slow for that user and allow you to report that back to a diagnostics service, change rendering quality, or even switch rendering engines to ensure the user has the best experience.

This kind of complex scenario is an example of one that we would like to ship but would be unable to if the overhead imposed impacted performance significantly. When building applications that try to schedule every fractional millisecond of the main _javascript_ loop any additional wasted time that's not providing value is unacceptable.



On Wed, Apr 3, 2013 at 1:04 PM, Florian Bösch <pyalot@gmail.com> wrote:
I don't have any personal issue with the API style either way, if you say callbacks are too slow, fine. Let's not do callbacks. I think that either API style has its pitfalls for beginners.


On Wed, Apr 3, 2013 at 9:49 PM, Ben Vanik <benvanik@google.com> wrote:
The way I see it, the query API would work like this in a browser such as Chrome where rendering occurs in another thread (though it can be done similarly for other implementations):

- user js runs init:
  - createQuery()
    - added to command buffer to send over to the gpu process
    - stashed in a 'query map' on the renderer side
- user js runs frame:
  - beginQuery()
  - drawElements()
  - endQuery()
    - commands added to buffer, sent to gpu process
  - queryCounter()
    - returns the value of the renderer-side query object immediately - no blocking
- gpu process:
   - run command buffer, see active timers, schedule them for processing
   - for each scheduled counter: query, if available then queue for sending back to renderer in a batch
- renderer message from gpu:
   - for each updated query:
     - find in query map, set value
- user js runs frame:
   - queryCounter()
     - returns the new value that was just set

I don't understand how you could get accurate timings with just one query object for every frame. By the time you get to poll the value, there might have elapsed multiple frames, but you only have one state. querycounter doesn't capture the actual render time since it returns immediately without blocking. So wouldn't you have to allocate a new query object each frame? Isn't that also gonna be a killjoy for jerky animation?