CNProj Final Report
CNProj Final Report
From the early to late 2000s, video playback on the web mostly relied on the flash plugin.
This was because, at the time, there was no other mean to stream video on a browser. Users had
the choice between either installing third-party plugins like flash or Silverlight, or not being able
to play any video at all.
To fill that hole, the WHATWG began to work on a new version of the HTML standard
including, among other things, video and audio playback natively.
This standard became what is now known as HTML5. Thus HTML5 brought, among other things,
the <video> tag to the web.This new tag allows you to link to a video directly from the HTML,
much like a <img> tag would do for an image.
For example:
<html>
<head>
<meta charset="UTF-8">
<title>My Video</title>
</head>
<body>
<video src="some_video.mp4" width="1280px" height="720px" />
</body>
</html>
This HTML will allow the page to stream some_video.mp4 directly on any browser that supports
the corresponding codecs (and HTML5, of course).
The “Media Source Extensions” (more often shortened to just “MSE”) is a specification from the
W3C that most browsers implement today. It was created to allow those complex media use cases
directly with HTML and JavaScript.
Those “extensions” add the MediaSource object to JavaScript. As its name suggests, this will be
the source of the video, or put more simply, this is the object representing our video’s data.
As described before, we still use the HTML5 video tag,and its src attribute. Only this time, we're
not adding a link to the video, we're adding a link to the MediaSource object.
To allow this kind of use cases the W3C defined the URL.createObjectURLstatic method. This
API allows creating an URL, which will actually refer not to a resource available online, but
directly to a JavaScript object created on the client.
// creating the MediaSource, just with the "new" keyword, and the URL
for it
const myMediaSource = new MediaSource();
const url = URL.createObjectURL(myMediaSource);
The video is not actually directly “pushed” into the MediaSource for playback, SourceBuffers are
used for that.
A MediaSource contains one or multiple instances of those. Each being associated with a type of
content.
audio
video
SourceBuffers are all linked to a single MediaSource and each will be used to add our video’s
data to the HTML5 video tag directly in JavaScript.
As an example, a frequent use case is to have two source buffers on our MediaSource: one for the
video data, and the other for the audio.
Separating video and audio allows to also managing them separately on the server-side. Doing so
leads to several advantages as well. This is how it works:
Media Segments
In the previous code snippets, what we have assumed is that the audio and video files are two
whole files uploaded completely at once.
What actually happens in the advanced video players is that video and audio data are split into
multiple “segments”. These segments can come in various sizes, but they often represent between
2 to 10 seconds of content.
All those video/audio segments then form the complete video/audio content. Those “chunks” of
data add a whole new level of flexibility: instead of pushing the whole content at once, we can just
push progressively multiple segments.
The audio or video files might not truly be segmented on the server-side, the Range HTTP header
might be used instead by the client to obtain those files segmented.
This means that we thankfully do not have to wait for the whole audio or video content to be
downloaded to begin playback. We often just need the first segment of each.
Adaptive Streaming
Many video players have an “auto quality” feature, where the quality is automatically chosen
depending on the user’s network and processing capabilities. This is a central concern of a web
player called adaptive streaming. This behavior is also enabled thanks to the concept of media
segments.
On the server-side, the segments are actually encoded in multiple qualities. For example, our
server could have the following files stored:
./audio/
├── ./128kbps/
| ├── segment0.mp4
| ├── segment1.mp4
| └── segment2.mp4
└── ./320kbps/
├── segment0.mp4
├── segment1.mp4
└── segment2.mp4./video/
├── ./240p/
| ├── segment0.mp4
| ├── segment1.mp4
| └── segment2.mp4
└── ./720p/
├── segment0.mp4
├── segment1.mp4
└── segment2.mp4
A web player will then automatically choose the right segments to download as the network or
CPU conditions change.
This is entirely done in JavaScript. For audio segments, it could, for example, look like the
following:
/**
* Push audio segment in the source buffer based on its number
* and quality
* @param {number} nb
* @param {string} language
* @param {string} wantedQuality
* @returns {Promise}
*/
function pushAudioSegment(nb, wantedQuality) {
// The url begins to be a little more complex here:
const url = "http://my-server/audio/" +
wantedQuality + "/segment" + nb + ".mp4");
return fetch(url)
.then((response) => response.arrayBuffer());
.then(function(arrayBuffer) {
audioSourceBuffer.appendBuffer(arrayBuffer);
});
}
/**
* Translate an estimated bandwidth to the right audio
* quality as defined on server-side.
* @param {number} bandwidth
* @returns {string}
*/
function fromBandwidthToQuality(bandwidth) {
return bandwidth > 320e3 ? "320kpbs" : "128kbps";
}
// first estimate the bandwidth. Most often, this is based on
// the time it took to download the last segments
const bandwidth = estimateBandwidth();
const quality = fromBandwidthToQuality(bandwidth);
pushAudioSegment(0, quality)
.then(() => pushAudioSegment(1, quality))
.then(() => pushAudioSegment(2, quality));