[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] context loss, slowdowns and aw-snap dialogs

I'm not optimistic that the browser can magically solve these issues. The browser could arbitrarily impose VRAM consumption limits, but to date we've avoided doing so because it would categorically prevent high-end content from running, even on hardware that can handle it. Querying available VRAM is an ill-posed problem for two reasons: 1) GPU vendors haven't wanted to standardize these definitions and queries, and 2) other running applications can affect the current application's behavior, at least on certain OSs that have less-than-ideal VRAM paging implementations.

As I suggested before, I think well-written applications should use a heuristic like 100 bytes of VRAM per pixel, which should scale well from mobile to desktop GPUs. Applications allocating gigabytes of VRAM are pushing the boundaries, and will probably have to query the user about whether to attempt these sorts of allocations at all.

Concrete suggestions and code snippets which could improve browsers' behavior are welcome, as are suggestions on the reporting mechanisms browsers should ask from graphics drivers during low-memory and out-of-memory situations.

On Mon, May 18, 2015 at 3:40 AM, Florian Bösch <pyalot@gmail.com> wrote:
So I started poking about context loss some more (http://codeflow.org/issues/vram/gpu-process-crash.html), and then I encountered this:

Inline image 1
(Sorry, the program "chrome" closed unexpectedly. Your computer does not have enough free memory to automatically analyze the problem and send a report to the developers.)

This was provoked by allocating 64gigs of vram in a single loop at window.onload (in 4k texture chunks). The tab that this originated from was then dead. A second run did not exhibit this behavior, but instead managed to be somehow worse.

The tab was still alive, but no context restore was ever received and the aw snap page was displayed. The user would press that 3x and now no WebGL was forthcoming anymore. Page reloads would not help either, the whole UA had to be restarted.

Inexplicably, the same test run in Firefox just allocated all those textures and never presented a context loss and even webgl still seemed to work afterwards. I don't really know what that means, maybe somebody can explain that to me (Jeff?)

Getting somewhat miffled at this point, I decided to spread things out across animation frames so I could at least see what was done last. It turns out that FF just keeps allocating textures forever. In Chrome at least some information was now visible (http://codeflow.org/issues/vram/index.html)

0000: start
0001: context lost
0003: vram use: 15936.0mb
0004: run interrupted

So Chrome allocated 15gigs of ram (I have a 3-gig GPU) before context loss happened). A context loss event was generated, but no out of memory error was ever received. No context restore happened.

So I went and tested this on my android Nexus 4 now:

This yielded:

0000: start
0001: gl.OUT_OF_MEMORY
0002: vram use: 2496.0mb
0003: run interrupted

So no context loss at all, instead an out of memory getError was generated. So I thought well, what happens if we don't stop there? It turns out nothing, the context is never lost, it just keeps generating out of memory errors.

Now the same on iOS:

0000: start
0001: gl.OUT_OF_MEMORY
0002: vram use: 1024.0mb
0003: run interrupted

Ok, so how about we let it run along beyond that? It keeps generating glErrors for a while, and then: "A problem occured with this webpage so it was reloaded". Again, no context (and no chance to recover from it).

On OSX chrome behaves similar to firefox, it just keeps allocating forever. I still don't know what that means (anybody elaborate?)

So on to a conclusion:
  1. If you overstep ram boundaries you will get either context loss or a gl.OUT_OF_MEMORY error (but not both)
  2. It may crash the GPU process/browser/tab (but not reliably)
  3. If you get a context loss, there is no context restore (so handling that event is completely useless for this case)
  4. If you get a gl.OUT_OF_MEMORY, it's possible you might recover for this, but only if you recover quickly enough before some UA reloads your page (and they might reload anyways).
To put this under a suitable heading: COMPLETELY BONKERS.

I have conversations with people from time to time about how much resources they can use, and how to handle cases of running out of them. There is no answer to this question. It's just spray&pray and don't even attempt to recover.

Do I really have to emphasize that this is in no way, shape or form a way to write reliable WebGL content? This needs a solution that is:
  • Behaving the same on every UA
  • Allows to avoid the issue and destroying the whole context before it happens
  • And if the context is destroyed, allows to gracefully recover from it
  • Know the reason why the context was lost so allocation hysteresis can be avoided.
  • Does not disable WebGL entirely for the entire UA
  • Doss not randomly crash the UA/GPU process/tab