Media on the Web

Native audio and video, and what the platform gives you

Native media

Before HTML5, playing video or audio on a web page required a browser plug-in — most commonly Adobe Flash, sometimes QuickTime or Windows Media Player. Plug-ins were separate programs that the browser loaded in a sandbox; they had their own security vulnerabilities, their own update cycles, and they did not work at all on mobile devices. Apple famously refused to support Flash on iPhone, which helped accelerate its decline.

HTML5 (2008–2014) added <audio> and <video> as first-class HTML elements. The browser itself became the media player. Flash was formally deprecated by Adobe in 2020. Today, native media is the only approach that works consistently across desktop, mobile, and assistive technologies.

One interface, two elements

<audio> and <video> are both instances of HTMLMediaElement, a DOM interface defined by the HTML Living Standard. Because they share the same base interface, they expose an identical JavaScript API:

  • play() / pause() — start or stop playback
  • currentTime — read or seek to a position in seconds
  • duration — total length in seconds (read-only)
  • volume — a number 0.0–1.0
  • muted — boolean; muting does not change volume
  • playbackRate — 1.0 is normal speed; 0.5 is half speed; 2.0 is double
  • paused, ended, readyState — state inspection

The only meaningful difference between the two elements is that <video> also renders a visual frame (and optionally a poster image), while <audio> has no visual rendering of its own beyond the controls bar when controls is present.

What the platform provides

Native media elements give you more than just a play button. The browser exposes a complete toolkit — most of which you can adopt incrementally:

Built-in playback UI

Adding the controls attribute gives the element a full playback interface — play/pause, scrubber, volume, fullscreen — built and maintained by the browser vendor. You get this for free with a single attribute. The styling and exact layout varies between browsers and operating systems, which is why you will sometimes see sites build custom controls using JavaScript and the HTMLMediaElement API instead. You will explore custom controls in Tutorial 06: Controlling with JavaScript.

Multiple formats via <source>

No single codec is supported by every browser. By nesting one or more <source> elements inside <audio> or <video>, you list candidate files in order of preference. The browser tries each in sequence and uses the first format it can decode. This is how you serve modern, efficient codecs to browsers that support them while still providing a fallback for older ones. Codec choices are covered in depth in Tutorial 04: Formats and Codecs.

Captions and subtitles via <track>

The <track> element attaches a WebVTT caption file to a media element. The browser renders the cue text on top of the video at the right time, provides a caption menu in the native controls, and exposes the cue data to JavaScript. Captions are not optional for accessibility — they are required by WCAG 2.1 for pre-recorded synchronised media. See Tutorial 05: Captions and Tracks.

Full scripting via HTMLMediaElement

Every aspect of playback — state, timing, buffering, events — is scriptable through HTMLMediaElement. You can build media players that sync to external data, skip chapters, react to user gestures, or drive animations precisely to the playhead position. This is covered in Tutorial 06: Controlling with JavaScript.

Audio processing via the Web Audio API

The Web Audio API is a separate, powerful interface that lets you route audio from a media element through a graph of processing nodes — equalizers, compressors, visualizers, spatial audio, and more. It treats audio as a signal-processing pipeline, not just a source you can play and pause. See Tutorial 08: Web Audio API.

A first look: video in the browser

Before diving into individual features, here is the simplest useful <video> element: two source formats, a poster image shown before playback begins, and the browser's native controls. The browser picks the first <source> it can decode; the last line of text is the fallback shown only in browsers that do not understand <video> at all (essentially none in 2024).

<video controls poster="../assets/poster.jpg" width="480" height="320"> <source src="../assets/sample-video.mp4" type="video/mp4"> <source src="../assets/sample-video.webm" type="video/webm"> Your browser does not support the video element. </video>

Notice there is no autoplay attribute. Browsers block autoplay with sound by default (and have done so since around 2018). Using controls and letting the user initiate playback is both the correct approach for accessibility and the one that will actually work reliably across browsers.

Self-host vs embed

When you add video to a web project you have two fundamentally different options: host the file yourself on your own server, or embed a player from a third-party service like YouTube or Vimeo. Both approaches use HTML elements, but the trade-offs are significant enough that choosing between them is a deliberate architectural decision.

Self-hosting

When you self-host, you upload the video file(s) to your own server (or a CDN you control) and use <video> with <source> elements to reference them directly. The benefits are considerable:

  • Full control — you choose the codec, bitrate, poster image, captions, and controls
  • Privacy — no third-party JavaScript runs on your page; no viewer data is shared with an external service
  • No ads, no recommendations — the browser plays your file; there is no platform UI injected around it
  • Offline capability — a Service Worker can cache self-hosted files for offline playback

The costs are real too:

  • Bandwidth — video files are large; every view transfers data from your server, and that transfer costs money at scale
  • No adaptive streaming out of the box — a static MP4 is a single fixed-bitrate file; platforms like YouTube use HLS or DASH to serve different quality levels to different network conditions automatically
  • Encoding — you must encode and store multiple formats (MP4/H.264, WebM/VP9 or AV1) yourself

Embedding from a third-party service

YouTube, Vimeo, and similar platforms provide an <iframe> embed code. You paste it into your HTML and the platform serves everything: the file, the player, the adaptive streaming, the CDN. The trade-offs are the inverse:

  • Free hosting and CDN — the platform absorbs bandwidth costs
  • Adaptive streaming — the player automatically switches quality based on the viewer's connection
  • Third-party dependency — if the service goes down or removes your video, it disappears from your page
  • Privacy cost — the platform's JavaScript runs in the iframe and can track viewers, even on your site
  • Reduced control — you cannot easily script the player via HTMLMediaElement; each platform has its own JavaScript API

For a production site with high traffic, a hybrid is common: use a platform for publicly-shared marketing video (where discovery and streaming quality matter most), and self-host short clips or instructional content where privacy and control are the priority. Embedding external video using <iframe> is covered in Tutorial 07: Embedding External Video.