[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Public WebGL] Typed Arrays in W3C Specifications | Fwd: Updates to File API



On Thu, May 13, 2010 at 5:09 PM, Vladimir Vukicevic
<vladimir@mozilla.com> wrote:
>
> Just to follow up -- my proposal assumes that Blobs and Files are decoupled,
> and that Blobs just represent a chunk of data that has already been
> read/received, and not something from which you would read or receive.  The
> operations on a Blob in my email are all synchronous because there's no IO
> -- the data is already there, it's already been read.  You're just choosing
> the format to receive it in.
>
> FileReader behaves identically to how it's specified now -- the only
> difference is that you don't choose how to read the data ahead of time, you
> always read a Blob object and then you can perform operations on that blob
> to access the data.  You'll note that FileReader in the draft has a
> DOMString result, because DOMString is the only type that you could read,
> even if you generated that DOMString in different ways.  To shoehorn
> arraybuffers in there, you'd have to add "ArrayBuffer resultBuffer;" as
> well, and make "result" invalid when "readAsArrayBuffer" is called.
>
> Part of the confusion here is a naming and usage issue; my fault there, as
> to me a Blob is an arbitrary chunk of unstructured data... thinking of a
> file as a blob was just not connecting in my head.  So, change all instances
> of Blob to something like DataChunk in my proposal.  Then you have:
>
>
> interface DataChunk {
>    readonly long offset;  // offset from the start of the stream, -1 if not
> available
>    readonly unsigned long length;
>
>    // if the implementation has Typed Arrays:
>    readonly ArrayBuffer buffer;
>
>    // for all implementations
>    readonly DOMString binaryString;
>    readonly DOMString text;
>    readonly DOMString dataURL;
>    DOMString getAsTextWithEncoding(in DOMString encoding);
>
>    // the above could also be "ArrayBuffer asArrayBuffer();"  "DOMString
> asBinaryString();" etc. so
>    // that we don't need the encoding bit as a separate function
>
>    // slice to access sub-chunks of this data, when access as text or
> dataURL is desired.
>    DataChunk slice(unsigned long startIndex, unsigned long endIndex);
> };
>
> With the renaming, a File can inherit from a Blob interface if that's really
> desired (still doesn't make sense to me, especially if you want to reuse
> Blob for other things -- you can't represent a network stream as a Blob,
> because it's not a finite thing, for example.  But it doesn't affect this
> proposal).
>
> interface File {
>    readonly DOMString type;
>    readonly DOMString uri;
>    readonly DOMString name;
>    readonly unsigned long long length;
> };
>
> The FileReader is greatly simplified -- note that originally I omitted all
> the event stuff for clarity, but that probably muddled things more -- that
> model is still identical.  With only a single read() call, you can also get
> rid of the awkward bits around only the last "read..." method actually
> taking effect.
>
> interface FileReader {
>   void read(in File file,
>             [optional] in unsigned long long startOffset,
>             [optional] in unsigned long long length);
>
>   readonly attribute DataChunk result;

In this API style, would it be better to make the DataChunk a property
of the onprogress event rather than the FileReader? That would make
the FileReader more stateless.

-Ken

>   //
>   // all stuff below is identical as to what's already in the
>   // proposed spec and behaves the same
>   //
>
>   void abort();
>
>   // states
>   const unsigned short EMPTY = 0;
>   const unsigned short LOADING = 1;
>   const unsigned short DONE = 2;
>
>   readonly attribute unsigned short readyState;
>
>   readonly attribute FileError error;
>
>   // event handler attributes
>   attribute Function onloadstart;
>   attribute Function onprogress;
>   attribute Function onload;
>   attribute Function onabort;
>   attribute Function onerror;
>   attribute Function onloadend;
>
> };
>
> interface FileReaderSync {
>     DataChunk read(in File file,
>                    [optional] in unsigned long long startOffset,
>                    [optional] in unsigned long long length);
> };
>
>
> Does that make more sense?  There's no weirdness about how to stick an
> ArrayBuffer on Blob -- now that I understand the original Blob usage more, I
> don't think that makes sense given that reads have to be async.  Also, the
> XHR case becomes a matter of adding a "responseDataChunk" property, and
> WebSockets can easily also add a DataChunk to the message received events.
>
> In the future, if a new data representation were to be added, adding it to
> DataChunk immediately makes it available in all APIs that use DataChunk -- I
> don't think we'll need one for raw data, since ArrayBuffer is as low level
> as it gets... but you can image a crazy world where, for example, you can
> read some kind of raw bytecode that a VM inside the browser can interpret...
> then adding a "executableByteCodeFunction" property to DataChunk immediately
> makes it possible to read and execute that byte code from a file, network
> socket, or XHR without any changes to those APIs.  (Yes, I realize it's a
> crazy example, please don't focus on the example itself too much, but more
> the idea of adding a new data type :-)
>
>     - Vlad
>
> ----- "Arun Ranganathan" <arun@mozilla.com> wrote:
>
> On 5/13/10 12:51 PM, Vladimir Vukicevic wrote:
>> So in thinking about this more, here's a few comments/problems. A number
>> of these are really comments about the File API itself; let me know if you
>> want me to forward this elsewhere (or feel free to do so).
>>
>> I don't think Blob makes sense as a base class for a File -- a Blob isn't
>> a File, especially once we can talk about slicing blobs and whatnot.
>
> The idea behind Blob is that it represents binary data too big to be
> read synchronously, and thus only asynchronous operations were
> suitable.  Blob's goal was to be used asynchronously on the platform.  I
> agree that decoupling File and Blob *may* make sense, but I'll note that:
>
> * Slice operations
> * Obtaining a URL
> * Obtaining a type
>
> were all desired use cases for both for binary data -- Blobs --  *and*
> for Files.  You can find justification for this here [1][2], but in a
> nutshell the use cases are:
>
> * Pretty much anywhere you have a blob of data, you might want to hand
> it off to the browser, even if it wasn't a user supplied file.
> ** Viewing a single chapter of a book in a frame
> ** Slicing one episode out of a media format (DVD) and handing it to the
> video element; player controls start and end at episode boundaries
> ** Pack a number of small files to speed download (with compression),
> then parse them apart.
>
> In order for the URLs on these blobs to be useful, they'd have to have
> mediaType.
>
> See also FileWriter and BlobBuilder [3].
>> But continuing that thought, then it doesn't make sense for a Blob to have
>> a type or url -- what does it mean to have a "type" for a fragment of data
>> read from the network? Or a URL? I think as Chris was getting at on the call
>> this morning, we're really just talking about a bare ArrayBuffer when we
>> talk about such a chunk.
>
> The WebGL use case may not be served by Blobs, actually, but by "bare
> ArrayBuffers."  There's no need for Blobs to intermediate here, but
> hopefully I've represented that "types" might be useful.  Google wants
> Content-Dispositions on Blobs as well, but I disagree with this (perhaps
> because the Chrome download manager may be distinct from other browsers,
> and "triggering" Blob download with a Content-Disposition might be a
> valid use case here).
>>   But, a Blob can be useful when allowing access using different data
>> types.
>>
>> In my thinking, a blob would look like this:
>>
>> interface Blob {
>> readonly long offset; // offset from the start of the stream , -1 if not
>> available
>> readonly unsigned long size;
>>
>
> Fine so far, but:
>> // if the implementation has Typed Arrays:
>> readonly ArrayBuffer buffer;
>>
>
> Stipulation: Blobs *must* be accessed asynchronously!  Do you disagree
> that Blobs should only behave asynchronously?  If you want *synchronous*
> Blobs, along with *asynchronous* Files, then we should separate the two
> and not have File inherit from Blob.
>> // for all implementations
>> readonly DOMString binaryString;
>> readonly DOMString text;
>> readonly DOMString dataURL;
>> DOMString getAsTextWithEncoding(in DOMString encoding);
>>
>
> Asynchronous!
>> // the above could also be "ArrayBuffer asArrayBuffer();" "DOMString
>> asBinaryString();" etc. so
>> // that we don't need the encoding bit as a separate function
>>
>> Blob slice(unsigned long startIndex, unsigned long endIndex);
>> };
>>
>> Now -- /if/ we were to make Typed Arrays a requirement for File API (which
>> I don't think we can), then we could consider adding ways to convert from an
>> ArrayBuffer to a binary string, data URL, etc. and not need any of the
>> above, though even then having the offset would be handy when you have a lot
>> of reads in flight.
>>
>> A File would look like:
>>
>> interface File {
>> readonly DOMString type;
>> readonly DOMString uri;
>> readonly DOMString name;
>> readonly unsigned long long length;
>>
> Yes, this is what File *does* look like, but it follows an inheritance
> model.
>
>> };
>>
>> with no attachment to a Blob; just an object that represents a File,
>> obtained from an<input>  or other element.
>>
>> FileReader (which, I'll be honest, doesn't really make much sense to me --
>> why do we need a separate object to read from Files, as opposed to reading
>> using the File directly? But, ok, that's not relevant here, and I guess it
>> does isolate reading.)
>>
>
> FileReader exists in order to separate Files from reading from a file
> directly.  This was an API choice, born out of LOTS of discussion (see
> for example [4] and follow the thread -- the model changed
> substantially).  In an early draft, File objects fired asynchronous
> *callback* based read methods which existed *on* the File object.  This
> changed to FileReader + Events after enormous discussion on the
> public-webapps WG listserv.
>> interface FileReader {
>> void read(in File file, [optional] in unsigned long long startOffset,
>> [optional] in unsigned long long length);
>>
>> readonly attribute Blob result;
>> };
>>
>
> How does the read behave?  Is there an event model associated with
> FileReader on the main thread?
>> interface FileReaderSync {
>> Blob read(in File file, [optional] in unsigned long long startOffset,
>> [optional] in unsigned long long length);
>> };
>>
>> Note that the above explicitly takes File elements as input -- it's a
>> FileReader after all -- and has Blobs as the result from the read operation.
>>
>> That seems like a much cleaner separation to me -- you have File objects
>> that have associated name/uri/type/etc. You use a FileReader to read a Blob
>> from a file; then you can ask that Blob to give you the data that was read
>> in one of a number of representations. No need to decide up front how you
>> want to read a file -- with the current API it would be hard to do something
>> like charset detection... you wouldn't be able to read something as a binary
>> string first, try to guess a charset, and then ask for it again with an
>> encoding without doing another FileReader, even though you already have the
>> data.
>>
>
> What you say above is true.  But actually, what's the use case for
> charset detection if you *don't* want the File read as text?  In the
> existing model,  you can do charset detection if you read as text.
>> Given interfaces like the above, putting Blob to work for XHR and even
>> structured storage seems very straightforward -- for XHR, you'd have a
>> responseBlob property in the result. There'd be some overlap, for example
>> responseBlob.text is likely to be the same as responseText (I don't know the
>> details, so don't know if they're specified differently), but that's not an
>> issue.
>>
>
> (responseBlob is still a proposal being hashed out on XHR).  Under the
> current design, it *MUST* be accessed asynchronously.  We can revisit
> that if need be, and I'm amenable to changing the inheritance model.
>
> -- A*
> [1] http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0659.html
> [2] http://www.mail-archive.com/public-webapps@w3.org/msg06137.html
> [3] http://dev.w3.org/2009/dap/file-system/file-writer.html
> [4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0576.html
>
>

-----------------------------------------------------------
You are currently subscribe to public_webgl@khronos.org.
To unsubscribe, send an email to majordomo@khronos.org with
the following command in the body of your email: