Why make videos accessible?
Making videos accessible usually means adding some sort of text equivalent of spoken words through captions, transcripts, or description of the audio. Captions are usually used for individuals who cannot hear the audio it also benefits non-native speakers, users with audio muted or viewers watching a video with poor quality audio.
For education purposes, if a lecture is recorded and put on the web, having captions or transcripts allows everyone to read what is being said or discussed in class. This can provide better comprehension, material review, and information processing for all students with different learning styles.
- By January 1, 2014, new internet websites and web content on those sites must conform with WCAG 2.0 Level A.
- As of January 1, 2021, all internet websites and web content must conform with WCAG 2.0 Level AA, other than, success criteria 1.2.4 Captions (Live), and success criteria 1.2.5 Audio Descriptions (Pre-recorded).
WCAG 2.0 Guideline 1.2.1 - "An alternative for time-based media is provided that presents equivalent information for prerecorded audio-only content."
WCAG 2.0 Guideline 1.2.2 - "Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labelled as such."
Please refer to "Video Captioning and Audio Transcripts" of the Website Accessibility section for further information.
Closed vs. Open Captions
Most people are familiar with closed captioning. Closed Captions provide a text equivalent of the audio and can be turned on or off. Most TV’s include the technology to turn on or off closed captions for programming viewed by persons who are deaf or hard of hearing. It is also useful in noisy environments like restaurants and bars.
Open Captions look and display the same information as closed captions except they cannot be turned off. They are a permanent part of the video and are always displayed much like subtitles in a foreign language film.
Transcripts allow anyone that cannot access content from web audio or video to read a text transcript instead. Transcripts do not have to be verbatim accounts of the spoken word in a video but they can contain additional descriptions, explanations, or comments that may be beneficial. Transcripts allow deaf/blind users to get content through the use of refreshable Braille and other devices. For most web video, both captions and a text transcript should be provided.
You can make your own transcripts using speech recognition software such as Dragon Naturally Speaking or speech-to-text built-in to Windows or Mac OSX Lion. The results should be reviewed for errors, especially when low-quality audio or unusual words are used.
Described video describes non-verbal actions in a program. They are extremely useful on the web if visual content in web video provides important content not available through the audio alone.
Captioning a video on the web involved embedding a file that contains time synchronized text along with the video. On the web, the primary multimedia technologies are Microsoft's Windows Media Player, Apple's Quicktime, RealNetwork's RealPlayer, and Macromedia Flash. Each media player handles captions differently. Below are some common technologies and terms that apply to captioning within the various media players.
Common web accessibility guidelines indicate that captions should be synchronized at approximately the same time that audio would be available, equivalent to that of the spoken word, and accessible to those who need it.
Captioning file types
Timed Text or DFXP
Stands for "Distribution Format Exchange Profile", is a World Wide Web Consortium (W3C) draft standard. It is an XML markup language specifically designed for marking up timed text, or captions. It is the most common format used by Flash video players that support closed captions.
SMIL (Synchronized Multimedia Integration Language)
A standards-based language used by Quicktime and RealPlayer to control the layout and presentation of visual and audible items. SMIL is used to control the display, positioning, and timing of captions and audio/video multimedia. The captions themselves are stored in a Text Track file if you’re using Quicktime or a RealText file if you’re using RealPlayer.
SAMI (Synchronized Accessible Media Interchange)
This is Microsoft’s technique for adding captions for Windows Media Player. A SAMI file contains the text to be displayed within the captions and information that synchronizes individual caption displays to the multimedia presentation.
SubRip (.srt) and SubViewer (.sub)
These formats are very simple, but slightly different, text formats. Both are officially supported by YouTube, as described on the YouTube help page Getting Started: Adding/Editing Captions. Although it is not specifically documented, YouTube also supports captions uploaded in DFXP and SAMI formats.
In order to provide a closed caption file for use with a video player you can choose to make your own using software or out-source to a company for a fee.
Making your own transcript
Frequently, people or departments develop scripts to be used in their videos. These scripts already provide the text for a caption or transcript. If you need to create a transcript from scratch you may wish to investigate using voice recognition software. One of the most used is Dragon Naturally Speaking , even the Windows and Mac OSX operating systems have built-in voice recognition. Unfortunately, speech recognition does not offer 100% accuracy and some editing or training with the product may be required.
Speech recognition works best when the system is familiar or trained with one speaker's voice. It does not work well with two or more people having a conversation as the software cannot distinguish between multiple voices. Voice recognition software may also have a difficultly with accurately transcribing certain words like medical or engineering terms. Another drawback is these systems can be easily stymied when a speaker has an accent.
When these conditions exist, the general results of creating a transcript may be poor and usable. Even correcting the files may be more time consuming than creating a transcription from scratch.
- Camtasia (cost involved)
- Captionate (cost involved)
- CC for Flash (free)
- Hi-Caption (cost involved, PC)
- MAGpie (free PC and Mac)
- MovieCaptioner (cost involved, Mac and Windows)
Captioning Services (cost involved)
Please note: for the purposes of this tutorial we will show how to caption videos using MAGpie 2.0 which is free and Mac/PC compatible. Other software would be similar to use.
Once MAGpie is downloaded and installed, open the software and select “File” then “New Project…” or Control + N keys.
Now set the styles you want for your captions and speakers. The default styles work for most captions, although it is preferred that captions be centred (#2 on screenshot). At the bottom of the window you may enter values for video width and height for the size in which you want your video to display. Caption width should usually be set to the same width as your video. The caption height should be 80 pixels high (#3 on screenshot) which should be the default. Select “OK” when you’re finished.
This opens the “Create New Project Track” window automatically. Here you can create captions or audio descriptions for your media. Since we’re creating closed captions select “Captions” then select “OK”.
MAGpie’s Main Interface
Contained on the main interface you will find text areas to input start and end times, the speaker, and the caption text. There are also controls to control the playback of the media you’re captioning which will open in a separate window automatically.
There are two modes available for navigating the main MAGpie captioning window - Editing Mode and Navigation Mode. Editing Mode allows you to enter caption information into the Window. Navigation Mode allows you to move from caption to caption or between the different options for each caption. To enter Navigation Mode, press Shift + Enter. You can then use the arrow keys to navigate through the table cells. To enter Editing Mode, press Enter or simply begin typing in one of the table cells.
Media Playback Controls
The media clip can be controlled by the menu as stated above but it may be simpler to note that it may also be controlled by the following keyboard commands:
|Insert Start Time from Player||F9|
|Insert End Time from Player||F10|
Make sure your media file is at the beginning by looking timecode area to the right of the playback buttons. It should read 0:00:00.00. These numbers represent the time, with the first digit representing hours, the next two minutes, the next two seconds, and the final two (after the decimal) represent 100ths of a second.
Press F6 to listen to the first sentence or portion of a sentence and then press F6 again to pause. Select the Caption area for Row 1 and type what was spoken. You typically want to limit each row to one or two short sentences. There is a limit as to how much text can be displayed for each caption. If a complete sentence does not fit, you can break it into two captions. Once you have entered the text for the first caption, press the Enter key twice in quick succession. This will insert a new row for a second caption.
Continue until the audio in the media is transcribed. This work can be tedious and time consuming depending on the length of the media you’re using. Do a quick double check to make sure all the captions are of an appropriate length. MAGpie has a built-in spell check under “Edit” then “Check Spelling”.
MAGpie provides a text importer that allows the user to transcribe text using a word-processing document other than the MAGpie interface. Common word-processing programs include MS WordPad, MS Word, Corel WordPerfect, etc.
This is a handy option if for example a professor wants to upload a video of a lecture. The professor could use speech-to-text software (e.g. Dragon Naturally Speaking, speech-to-text built-in to Windows or Mac OSX Lion) during the lecture to capture what they said in MS Word. By repeating questions from students and other dialogue from interactions a transcript of the lecture may be produced quite easily.
- Edit the transcriptions for errors and spelling mistakes.
- Insert a carriage return between any block of texts or sentences.
- Make sure to insert one extra carriage return after the last line of text in your transcribed file.
- Save your information as a Text file (with or without line breaks).
- Place the cursor in the first cell of the Caption column in the MAGpie interface and then select “Captions” from the menu bar, then “Insert Captions from file…“.
Formatting and Editing Captions
Once you have imported the transcribed audio portion of the multimedia clip into the MAGpie interface, it may be necessary to format or edit the appearance of the captions in the media player. Select a cell that contains a caption. The cell will be highlighted by a green border. Under the Style command on the menu bar, the text color, font alignment, font appearance can be manipulated. It is generally recommended to maintain consistency of the appearance of the captions for the duration of the multimedia clip.
To synchronize the video to the captioned dialogue:
- Position the multimedia clip you wish to caption at the beginning of the presentation. The timecode should read 0:00:00.00.
- Press F6 key to start the multimedia clip.
- When you hear the initial words of the first caption, press the F9 key. This will “capture” the timecode of multimedia clip and insert this timecode under the Start Time column heading.
- Continue listening to the media and pressing F9 for each caption.
Please Note: It is very important to save your work often by selecting “File” then “Save As” from the menu.
Exporting Caption Files
Now you’re ready to export your caption information in a format that can be used in Quicktime, RealPlayer, Windows Media Player, or Flash. From the menu select “Export” then the media format of the media you are captioning. Review Captioning file types.
Even though your media may be captioned, you should still provide a transcript. Select “Export” then “Plain text”.
If you are embedding your video into a website, you may do so using an inline frame or iFrame (e.g. YouTube uses iframes). In general, iFrames are accessible on a screen reader, but a TITLE for the iFrame is recommended.
Example HTML code:
<iframe allowfullscreen="" frameborder="0" height="225" src="//www.youtube.com/embed/32RV8_wHg0A" title="Melissa Vassallo on disability - Queen's Accessibility Hub" width="400"></iframe>