Native media
Before HTML5, playing video or audio on a web page required a browser plug-in — most commonly Adobe Flash, sometimes QuickTime or Windows Media Player. Plug-ins were separate programs that the browser loaded in a sandbox; they had their own security vulnerabilities, their own update cycles, and they did not work at all on mobile devices. Apple famously refused to support Flash on iPhone, which helped accelerate its decline.
HTML5 (2008–2014) added <audio> and <video> as
first-class HTML elements. The browser itself became the media player. Flash was
formally deprecated by Adobe in 2020. Today, native media is the only approach that
works consistently across desktop, mobile, and assistive technologies.
One interface, two elements
<audio> and <video> are both instances of
HTMLMediaElement, a DOM interface defined by the HTML
Living Standard. Because they share the same base interface, they expose an identical
JavaScript API:
play()/pause()— start or stop playbackcurrentTime— read or seek to a position in secondsduration— total length in seconds (read-only)volume— a number 0.0–1.0muted— boolean; muting does not changevolumeplaybackRate— 1.0 is normal speed; 0.5 is half speed; 2.0 is doublepaused,ended,readyState— state inspection
The only meaningful difference between the two elements is that <video>
also renders a visual frame (and optionally a poster image), while
<audio> has no visual rendering of its own beyond the controls bar
when controls is present.
What the platform provides
Native media elements give you more than just a play button. The browser exposes a complete toolkit — most of which you can adopt incrementally:
Built-in playback UI
Adding the controls attribute gives the element a full playback interface
— play/pause, scrubber, volume, fullscreen — built and maintained by the browser
vendor. You get this for free with a single attribute. The styling and exact layout
varies between browsers and operating systems, which is why you will sometimes see
sites build custom controls using JavaScript and the HTMLMediaElement API
instead. You will explore custom controls in
Tutorial 06: Controlling with JavaScript.
Multiple formats via <source>
No single codec is supported by every browser. By nesting one or more
<source> elements inside <audio> or
<video>, you list candidate files in order of preference. The
browser tries each in sequence and uses the first format it can decode. This is how
you serve modern, efficient codecs to browsers that support them while still providing
a fallback for older ones. Codec choices are covered in depth in
Tutorial 04: Formats and Codecs.
Captions and subtitles via <track>
The <track> element attaches a WebVTT caption file to a media
element. The browser renders the cue text on top of the video at the right time,
provides a caption menu in the native controls, and exposes the cue data to JavaScript.
Captions are not optional for accessibility — they are required by WCAG 2.1 for
pre-recorded synchronised media. See
Tutorial 05: Captions and Tracks.
Full scripting via HTMLMediaElement
Every aspect of playback — state, timing, buffering, events — is scriptable through
HTMLMediaElement. You can build media players that sync to external data,
skip chapters, react to user gestures, or drive animations precisely to the playhead
position. This is covered in
Tutorial 06: Controlling with JavaScript.
Audio processing via the Web Audio API
The Web Audio API is a separate, powerful interface that lets you route audio from a media element through a graph of processing nodes — equalizers, compressors, visualizers, spatial audio, and more. It treats audio as a signal-processing pipeline, not just a source you can play and pause. See Tutorial 08: Web Audio API.
A first look: video in the browser
Before diving into individual features, here is the simplest useful <video>
element: two source formats, a poster image shown before playback begins, and the
browser's native controls. The browser picks the first <source>
it can decode; the last line of text is the fallback shown only in browsers that do
not understand <video> at all (essentially none in 2024).
Notice there is no autoplay attribute. Browsers block autoplay with
sound by default (and have done so since around 2018). Using controls
and letting the user initiate playback is both the correct approach for accessibility
and the one that will actually work reliably across browsers.
Self-host vs embed
When you add video to a web project you have two fundamentally different options: host the file yourself on your own server, or embed a player from a third-party service like YouTube or Vimeo. Both approaches use HTML elements, but the trade-offs are significant enough that choosing between them is a deliberate architectural decision.
Self-hosting
When you self-host, you upload the video file(s) to your own server (or a CDN you
control) and use <video> with <source> elements
to reference them directly. The benefits are considerable:
- Full control — you choose the codec, bitrate, poster image, captions, and controls
- Privacy — no third-party JavaScript runs on your page; no viewer data is shared with an external service
- No ads, no recommendations — the browser plays your file; there is no platform UI injected around it
- Offline capability — a Service Worker can cache self-hosted files for offline playback
The costs are real too:
- Bandwidth — video files are large; every view transfers data from your server, and that transfer costs money at scale
- No adaptive streaming out of the box — a static MP4 is a single fixed-bitrate file; platforms like YouTube use HLS or DASH to serve different quality levels to different network conditions automatically
- Encoding — you must encode and store multiple formats (MP4/H.264, WebM/VP9 or AV1) yourself
Embedding from a third-party service
YouTube, Vimeo, and similar platforms provide an <iframe> embed
code. You paste it into your HTML and the platform serves everything: the file,
the player, the adaptive streaming, the CDN. The trade-offs are the inverse:
- Free hosting and CDN — the platform absorbs bandwidth costs
- Adaptive streaming — the player automatically switches quality based on the viewer's connection
- Third-party dependency — if the service goes down or removes your video, it disappears from your page
- Privacy cost — the platform's JavaScript runs in the iframe and can track viewers, even on your site
- Reduced control — you cannot easily script the player via
HTMLMediaElement; each platform has its own JavaScript API
For a production site with high traffic, a hybrid is common: use a platform for
publicly-shared marketing video (where discovery and streaming quality matter most),
and self-host short clips or instructional content where privacy and control are
the priority. Embedding external video using <iframe> is covered in
Tutorial 07: Embedding External Video.