[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] Typed Arrays in W3C Specifications | Fwd: Updates to File API



On 5/13/10 12:51 PM, Vladimir Vukicevic wrote:
So in thinking about this more, here's a few comments/problems. A number of these are really comments about the File API itself; let me know if you want me to forward this elsewhere (or feel free to do so).

I don't think Blob makes sense as a base class for a File -- a Blob isn't a File, especially once we can talk about slicing blobs and whatnot.

The idea behind Blob is that it represents binary data too big to be read synchronously, and thus only asynchronous operations were suitable. Blob's goal was to be used asynchronously on the platform. I agree that decoupling File and Blob *may* make sense, but I'll note that:


* Slice operations
* Obtaining a URL
* Obtaining a type

were all desired use cases for both for binary data -- Blobs -- *and* for Files. You can find justification for this here [1][2], but in a nutshell the use cases are:

* Pretty much anywhere you have a blob of data, you might want to hand it off to the browser, even if it wasn't a user supplied file.
** Viewing a single chapter of a book in a frame
** Slicing one episode out of a media format (DVD) and handing it to the video element; player controls start and end at episode boundaries
** Pack a number of small files to speed download (with compression), then parse them apart.


In order for the URLs on these blobs to be useful, they'd have to have mediaType.

See also FileWriter and BlobBuilder [3].
But continuing that thought, then it doesn't make sense for a Blob to have a type or url -- what does it mean to have a "type" for a fragment of data read from the network? Or a URL? I think as Chris was getting at on the call this morning, we're really just talking about a bare ArrayBuffer when we talk about such a chunk.

The WebGL use case may not be served by Blobs, actually, but by "bare ArrayBuffers." There's no need for Blobs to intermediate here, but hopefully I've represented that "types" might be useful. Google wants Content-Dispositions on Blobs as well, but I disagree with this (perhaps because the Chrome download manager may be distinct from other browsers, and "triggering" Blob download with a Content-Disposition might be a valid use case here).
  But, a Blob can be useful when allowing access using different data types.

In my thinking, a blob would look like this:

interface Blob {
readonly long offset; // offset from the start of the stream , -1 if not available
readonly unsigned long size;

Fine so far, but:
// if the implementation has Typed Arrays:
readonly ArrayBuffer buffer;

Stipulation: Blobs *must* be accessed asynchronously! Do you disagree that Blobs should only behave asynchronously? If you want *synchronous* Blobs, along with *asynchronous* Files, then we should separate the two and not have File inherit from Blob.
// for all implementations
readonly DOMString binaryString;
readonly DOMString text;
readonly DOMString dataURL;
DOMString getAsTextWithEncoding(in DOMString encoding);

Asynchronous!
// the above could also be "ArrayBuffer asArrayBuffer();" "DOMString asBinaryString();" etc. so
// that we don't need the encoding bit as a separate function

Blob slice(unsigned long startIndex, unsigned long endIndex);
};

Now -- /if/ we were to make Typed Arrays a requirement for File API (which I don't think we can), then we could consider adding ways to convert from an ArrayBuffer to a binary string, data URL, etc. and not need any of the above, though even then having the offset would be handy when you have a lot of reads in flight.

A File would look like:

interface File {
readonly DOMString type;
readonly DOMString uri;
readonly DOMString name;
readonly unsigned long long length;
Yes, this is what File *does* look like, but it follows an inheritance model.

};

with no attachment to a Blob; just an object that represents a File, obtained from an<input>  or other element.

FileReader (which, I'll be honest, doesn't really make much sense to me -- why do we need a separate object to read from Files, as opposed to reading using the File directly? But, ok, that's not relevant here, and I guess it does isolate reading.)

FileReader exists in order to separate Files from reading from a file directly. This was an API choice, born out of LOTS of discussion (see for example [4] and follow the thread -- the model changed substantially). In an early draft, File objects fired asynchronous *callback* based read methods which existed *on* the File object. This changed to FileReader + Events after enormous discussion on the public-webapps WG listserv.
interface FileReader {
void read(in File file, [optional] in unsigned long long startOffset, [optional] in unsigned long long length);

readonly attribute Blob result;
};

How does the read behave? Is there an event model associated with FileReader on the main thread?
interface FileReaderSync {
Blob read(in File file, [optional] in unsigned long long startOffset, [optional] in unsigned long long length);
};

Note that the above explicitly takes File elements as input -- it's a FileReader after all -- and has Blobs as the result from the read operation.

That seems like a much cleaner separation to me -- you have File objects that have associated name/uri/type/etc. You use a FileReader to read a Blob from a file; then you can ask that Blob to give you the data that was read in one of a number of representations. No need to decide up front how you want to read a file -- with the current API it would be hard to do something like charset detection... you wouldn't be able to read something as a binary string first, try to guess a charset, and then ask for it again with an encoding without doing another FileReader, even though you already have the data.

What you say above is true. But actually, what's the use case for charset detection if you *don't* want the File read as text? In the existing model, you can do charset detection if you read as text.
Given interfaces like the above, putting Blob to work for XHR and even structured storage seems very straightforward -- for XHR, you'd have a responseBlob property in the result. There'd be some overlap, for example responseBlob.text is likely to be the same as responseText (I don't know the details, so don't know if they're specified differently), but that's not an issue.

(responseBlob is still a proposal being hashed out on XHR). Under the current design, it *MUST* be accessed asynchronously. We can revisit that if need be, and I'm amenable to changing the inheritance model.


-- A*
[1] http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0659.html
[2] http://www.mail-archive.com/public-webapps@w3.org/msg06137.html
[3] http://dev.w3.org/2009/dap/file-system/file-writer.html
[4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0576.html
-----------------------------------------------------------
You are currently subscribe to public_webgl@khronos.org.
To unsubscribe, send an email to majordomo@khronos.org with
the following command in the body of your email: