[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] WebGL perf regression tests moved to GitHub; design still in flux






On Mon, Oct 1, 2012 at 5:05 PM, Florian Bösch <pyalot@gmail.com> wrote:
On Tue, Oct 2, 2012 at 12:22 AM, Gregg Tavares (社用) <gman@google.com> wrote:
Actually I don't know what each person's goals are.

My goal was to provide a harness to be able to find out how much stuff you can draw at 60fps using different techniques. Tests that stall the pipeline with calls to gl.finish will not do that. This is especially true in Chrome with its multi process architecture were WebGL is just generating commands that end up getting executed in parallel in another process. Calling gl.finish stalls both processes and removes all the parallelism.
Well unless you call a stalling function (such as finish, texImage2D, bufferData, readPixels, compileShader, linkProgram, uniform**[v]) it doesn't much matter that it's off process. The driver itself won't stall on other calls (at least in theory). And data delivered as in those stalling calls will stall the other process anyway, since it's got to finish reading the bits before it can let the sending process continue lest that process deletes or modifies the stuff in flight (actually map buffers would be really nice, but then we're getting into fence territory and dragons live there). Anyways, if you emit no stalling calls whatsoever the rendering queue would just fill up and you'd be none the wiser at 60fps, so the browser has to finish it eventually, that'll be when the browsers GL context performs a buffer swap at the very latest (assuming an accelerated compositor). So what you can do without gl.finish() is pretty much the same that you can do with gl.finish(), but if you don't call gl.finish() you might be free (to some degree or other) to do other stuff while the GPU and GPU process churns over the rendering. So gl.finish() is actually a fairly good way to measure how long stuff takes. Of course it's not a terribly performant way to do things because while you wait to see how long stuff takes, you could do other stuff, but you get the drift.

I'm not sure I follow what you're trying to explain. If I have 1 core I get this

1:[S][LONG][S][LONG][S][LONG][S][LONG][S][LONG][S][LONG]

On 2 cores I get this

1:[S][S][S][S][S][S]
2:[LONG][LONG][LONG][LONG][LONG][LONG]

In the example above I'm doing 6 short [S] and 6 long [LONG] operations where I made their size represent the time they take to execute. 

With 2 processes I can execute 4 more operations, 10 total in the same amount of time 1 core took to process 6 opretations.

1:[S][S][S][S][S][S][S][S][S][S]
2:[LONG][LONG][LONG][LONG][LONG][LONG][LONG][LONG][LONG]

That assumes I'm not GPU bound but if I'm GPU bound the only thing that matters is the GPU.