WebVTT and captioning on the web

  • Author: Silvia Pfeiffer
  • Date: 4 Mar 2013

The HTML5 <video> and <track> elements have brought native captions to the web. Interoperable captions between browsers are enabled through the new file format WebVTT (Web Video Text Tracks).

Since the introduction of the <video> element into HTML around 2007, accessibility advocates have asked for a means to include captioning natively into the web platform. The specifications were developed in about 2010, with the <track> element in combination with the WebVTT file format and a TextTrack JavaScript API.

Successively, browsers have implemented these features and by now, all but Firefox — who is not far behind — have released at least the minimum support necessary to display captions on web video. This includes the recently released Internet Explorer 10 (IE10) for Windows 7 as well as current versions of Opera, Google Chrome and Safari.

Let's look at how we can make use of it.

Authoring WebVTT

A WebVTT file is a line-based text format that provides a sequence of timed text cues for a media element. A timed text cue is what we would call a "caption" — it consists of the caption text, and a start and end time to synchronise the text's display on top of the video.

Here is an example WebVTT file:

WEBVTT

0:00:03.040 --> 0:00:06.920
So, I just wanted to introduce you to W3C,

0:00:06.920 --> 0:00:09.680
and to do so, I have some exciting information:

0:00:10.000 --> 0:00:13.800
W3C has been acquired by Twitter.

0:00:13.800 --> 0:00:15.302
[AUDIENCE GIGGLES]

Note: The file starts with an identifier string "WEBVTT" in all caps and separated from the cues by an empty line. The cues themselves begin with the start time, which is relative to the start time of the video. It is separated from the end time by a --> string. On the next line (or lines), the cue text is provided. Cues are separated from each other by an empty line.

This is the simplest form a WebVTT file can take.

There are several extension features that enable you to provide richer formatting and placement of text as well as internationalisation. If you are interested in these, you should read up on the full syntax of WebVTT. For those in the know, WebVTT supports all the features of CEA-608 and CEA-708 captions, the US standards for analog (the famous line 23 captions) and digital TV captioning respectively.

Using WebVTT in the <video> element

Once we have authored your WebVTT file, we save the file with a .vtt extension on a web server. Then we can include it into a webpage through the <track> element.

Here is an example:

<video src="http://www.accessiq.org/DougSchepers-W3C.webm" controls>
  <track src="http://www.accessiq.org/DougSchepers-W3C_cc_en.vtt" kind="captions" srclang="en-US" default>
</video>

NB. The inclusion of "http://www.accessiq.org/" in the code can be ignored.

This synchronises the WebVTT file with the timeline of the video, in this case with the WebM file. The @default attribute on the <track> element signifies that the particular track is activated by default for all users.

Here's what this looks like in Opera 13.14:

Screenshot of video with captions in Opera 13.14

We've played the video for three seconds and the active cue is the one saying "So, I just wanted to introduce you to W3C," which is rendered on top of the video.

Take notice that Opera just displays the captions, but doesn't provide a mechanism to turn captions on or off in their default controls. If we hadn't provided the @default attribute, it would have been difficult to discover the availability of the captions.

This is why Chrome and Safari provide a "CC" button.

Here is what the same example looks like in Chrome:

Screenshot of video with captions in Chrome, with "CC" button on player controls

You can see that Chrome exposes a button which allows the user to turn captions on or off under their own control. In Opera, a developer would need to implement this functionality using the TextTrack API.

Make sure to run this from a web server and not a local file, because Chrome and Safari do not recognise the WebVTT file unless it comes from a web server.

Multiple alternative caption tracks

Now, it is also possible to add more than one WebVTT file to a video. In particular, consider providing captions (or subtitles) for users in different countries. The user then has the choice which of the tracks to activate.

Here is the example above with an additional German caption track:

<video controls>
  <source src="http://www.accessiq.org/DougSchepers-W3C.webm">
  <source src="http://www.accessiq.org/DougSchepers-W3C.mp4">
  <track src="http://www.accessiq.org/DougSchepers-W3C_cc_en.vtt" kind="captions" srclang="en-US">
  <track src="http://www.accessiq.org/DougSchepers-W3C_cc_de.vtt" kind="captions" srclang="de-DE" default>
</video>

NB. The inclusion of "http://www.accessiq.org/" in the code can be ignored.

Safari Nightly and Internet Explorer 10 have implemented the exposure of this as a menu on top of the video for user selection. IE10 also requires that the server serves the files with the text/VTT mime type.

Here is the example in Safari Nightly:

Screenshot of video with menu options for multiple captions and subtitles in Safari Nightly

You can see that a 'speech bubble' button opens a menu to show which tracks are available and which one is currently active. In our tests with IE10 on a Windows server, we were not able to get it to load the videos, so we can't provide pictures of this.

How to make use of <track> in practice

While we're all waiting for the browsers to catch up implementing <track> support and the caption button and menu in their controls, we can already make use of this new means of markup by using a JavaScript API. You can either write your own parsing code for <track> and WebVTT, or you can make use of one of the existing JavaScript projects available.

Here are some video players that support <track>, WebVTT and a menu of captions in their controls:

Note: Kaltura has a handy chart on HTML5 Video Player Comparison.

For developing your own video controls, here are some libraries that provide a uniform TextTrack API across browsers no matter what their implementation status:

Make sure that your polyfill of choice supports the newer WebVTT file format and not the older WebSRT file format. Alternatively, update the library of choice with the features you are after and feed that back into the open source library — after all, that's what open source is all about.

Final remarks

It's time you get yourself acquainted with WebVTT because it is the browser format of choice for captions for the future.

Here are some links for further reading:

As a final note, let's whet your appetite for video accessibility using WebVTT even more: as soon as browsers have captions under control, they are planning to implement better support for blind users of web videos, too.

This includes:

  • making the default video controls screen reader accessible (some browsers have already achieved this)
  • exposing audio descriptions authored in WebVTT files through speech synthesis
  • enabling semantic navigation through video by using chapters authored in WebVTT

The brave new world of accessible web video has only just begun.