My solution now, XHR2 is using ArrayBuffer. The ArrayBuffer as binary sequence contains multipart-content, video, audio, graphic, text and so on with multiple content-types. All in One Response.
In modern browser, having DataView, StringView and Blob for different Components. See also: http://rolfrost.de/video.html for more details.