10: URI Architecture - URLs Tutorials

URI, URL, URN: The Terminology

Before diving into architecture, let's clarify the terms—they're often confused.

URI: Uniform Resource Identifier

The umbrella term. A URI is any string that identifies a resource. It encompasses both URLs and URNs.

// All of these are URIs:
https://example.com/page           // Also a URL
urn:isbn:0451450523                // Also a URN
mailto:user@example.com            // URL (locates a mailbox)
tel:+1-555-555-5555                // URL (locates a phone)

URL: Uniform Resource Locator

A URI that tells you how to access a resource—it provides a location and retrieval mechanism.

// URLs specify location AND access method:
https://example.com/page     // Use HTTPS to get from this server
ftp://files.example.com/doc  // Use FTP to get from this server
file:///home/user/doc.txt    // Access local filesystem

URN: Uniform Resource Name

A URI that names a resource persistently—independent of location or access method.

// URNs name resources without locating them:
urn:isbn:0451450523         // This book, wherever it is
urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6  // This unique thing
urn:ietf:rfc:3986           // This RFC document

The Relationship

┌─────────────────────────────────────────────┐
│                    URI                      │
│         (Uniform Resource Identifier)       │
│                                             │
│   ┌─────────────┐       ┌─────────────┐    │
│   │    URL      │       │    URN      │    │
│   │  (Locator)  │       │   (Name)    │    │
│   │             │       │             │    │
│   │ http://...  │       │ urn:...     │    │
│   │ ftp://...   │       │             │    │
│   │ mailto:...  │       │             │    │
│   └─────────────┘       └─────────────┘    │
│                                             │
└─────────────────────────────────────────────┘

URIs include both URLs (locators) and URNs (names)

The Original Vision

In the 1990s, the architects of the web envisioned a three-part system for identifying resources:

1. URLs: Where Things Are

URLs would tell you how to get a resource right now. They're practical, concrete, and immediately useful.

https://publisher.com/books/978-0-123-45678-9.pdf

Problem: URLs are fragile. When the publisher reorganizes their website, the URL breaks. The resource still exists—you just can't find it.

2. URNs: What Things Are

URNs would provide persistent, location-independent names. A URN would identify a resource forever, regardless of where it's stored.

urn:isbn:978-0-123-45678-9

This ISBN URN identifies the same book whether it's at publisher.com, amazon.com, or your local library. The name never changes even as locations change.

3. URCs: What Things Are Like

URCs (Uniform Resource Characteristics or Citations) would describe resources—metadata like author, date, format, subject, and relationships.

// Hypothetical URC (never standardized):
urc:isbn:978-0-123-45678-9
  author: "Jane Smith"
  title: "Introduction to Web Architecture"
  published: 2024
  format: PDF
  language: en
  subjects: [web, architecture, URIs]

The Complete Vision

These three pieces would work together:

URN provides a permanent name
URC provides metadata about the resource
Resolution service maps URN → current URL(s)

// The dream workflow:
1. User requests: urn:isbn:978-0-123-45678-9
2. Resolver returns:
   - URC metadata (what is this?)
   - Current URLs (where can I get it?)
     - https://publisher.com/books/intro-web.pdf
     - https://library.edu/ebooks/isbn-978-0-123-45678-9
     - https://archive.org/details/intro-web-architecture
3. User chooses preferred source

This would solve link rot permanently. URLs could change freely; URNs would persist; resolution would maintain the mapping.

What Actually Happened

The vision was elegant. The reality was different.

URLs Won

URLs became the de facto identifier system for the web. They're:

Self-resolving: Put a URL in a browser, get the resource
Simple: No resolver infrastructure needed
Immediate: No indirection or lookup delays
Widely supported: Every browser, every tool understands them

URNs: Limited Adoption

URNs exist but never achieved mainstream web use:

// URN namespaces that exist:
urn:isbn:...       // Books (International Standard Book Number)
urn:issn:...       // Serials (periodicals)
urn:ietf:rfc:...   // IETF RFCs
urn:uuid:...       // Universally Unique Identifiers
urn:oid:...        // Object Identifiers (ITU-T/ISO)

The problem: you can't put a URN in a browser address bar and get anything. There's no universal resolution infrastructure.

URCs: Never Materialized

URCs never progressed beyond proposal stage. Instead, metadata evolved separately:

HTML meta tags
Dublin Core
Schema.org
Open Graph
JSON-LD

These are embedded in or alongside resources—not in a separate URC system.

Why the Vision Failed

Infrastructure cost: Universal resolution requires global infrastructure nobody wanted to build or pay for
Governance challenges: Who controls the resolver? Who assigns URNs?
URLs were good enough: Despite link rot, URLs work well enough for most uses
Chicken-and-egg: Without widespread use, no reason to build infrastructure; without infrastructure, no reason to use URNs
Web's growth outpaced standards: The web grew faster than standards bodies could coordinate

Modern Persistent Identifier Systems

While the universal URN vision didn't materialize, domain-specific persistent identifier systems have succeeded.

DOI: Digital Object Identifier

DOIs are the de facto standard for scholarly publishing.

// DOI format:
doi:10.1000/xyz123

// Resolvable via HTTPS:
https://doi.org/10.1000/xyz123

// Resolves to current location of the document

DOI success factors:

Domain-specific (academic publishing)
Centralized registration authority (DOI Foundation)
Publishers pay to register and maintain
HTTP resolver makes them URL-compatible

Handle System

The underlying technology for DOIs, usable for any identifier:

// Handle format:
hdl:11234/56789

// Also HTTP-resolvable:
https://hdl.handle.net/11234/56789

ORCID: Researcher Identifiers

Persistent identifiers for researchers (people, not documents):

// ORCID format:
https://orcid.org/0000-0002-1825-0097

// Links a researcher to all their work across publishers

ARK: Archival Resource Key

Used by libraries and archives:

// ARK format:
ark:/12345/xyz789

// Resolvable via institutional resolver:
https://n2t.net/ark:/12345/xyz789

PURL: Persistent URL

A redirection service that provides URL persistence:

// PURL redirects to current location:
https://purl.org/dc/terms/title
// Redirects to Dublin Core vocabulary

Common Pattern

Note what these all have in common:

HTTP resolver as a bridge (work in browsers)
Domain-specific governance (not universal)
Institutional commitment to maintenance
Registration/fee model for sustainability

The Semantic Web Approach

The Semantic Web offered another take on resource identification and metadata.

Linked Data Principles

Use URIs as names for things
Use HTTP URIs so people can look up names
When someone looks up a URI, provide useful information (using RDF)
Include links to other URIs so people can discover more things

RDF: Resource Description Framework

RDF provides the metadata layer that URCs envisioned:

// RDF triple: subject - predicate - object
<https://example.com/book/123>
  <http://purl.org/dc/terms/title>
  "Introduction to Web Architecture" .

<https://example.com/book/123>
  <http://purl.org/dc/terms/creator>
  <https://example.com/person/jane-smith> .

Schema.org

A more practical, HTML-embedded approach to metadata:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Book",
  "name": "Introduction to Web Architecture",
  "author": {
    "@type": "Person",
    "name": "Jane Smith"
  },
  "isbn": "978-0-123-45678-9",
  "url": "https://example.com/books/intro-web"
}
</script>

This embeds URC-like metadata directly in HTML—no separate system needed.

Lessons for Modern Development

What can we learn from URI architecture history?

1. Simplicity Wins

URLs succeeded despite being "worse" than URNs because they're simpler. No resolver needed—just paste and go.

// Simple wins over elegant:
https://example.com/doc.html     // Works immediately
urn:example:doc-001              // Requires infrastructure

2. Backwards Compatibility Matters

Modern PIDs (DOIs, etc.) work because they bridge to URLs. Pure URNs without HTTP resolvers are dead ends.

// DOI success formula:
doi:10.1234/xyz    →    https://doi.org/10.1234/xyz
                         ↓
                   (resolves in any browser)

3. Domain-Specific Solutions Beat Universal Ones

DOIs work for academic publishing. ORCIDs work for researchers. Neither tried to solve identification universally.

4. Institutional Commitment Required

Persistent identifiers require someone to maintain the mapping. Without governance and funding, persistence is impossible.

5. Embed Metadata, Don't Separate It

URCs as a separate system failed. Schema.org embedding succeeded. Keep data and metadata together when possible.

Practical Implications

For Web Developers

Use URLs as identifiers: They're the universal system that actually works
Design permanent URLs: Apply "Cool URIs" principles (no technology, no hierarchy)
Embed metadata: Use Schema.org, Open Graph, meta tags
Implement redirects: When URLs must change, maintain redirects forever

For Content Publishers

Consider DOIs: For academic or reference content that needs citation permanence
Use canonical URLs: Declare the authoritative URL for each resource
Plan for institutional change: What happens to URLs if organization restructures?

For API Designers

Resource identifiers in URLs: /api/v2/users/123
Stable identifiers: IDs shouldn't change even if storage changes
Version your APIs: /api/v2/ allows evolution without breaking clients

For Data Architects

Separate identity from location: Internal IDs shouldn't be URLs
Plan identifier schemes: How will you generate unique, stable IDs?
Consider federated identity: Will resources be referenced across systems?

The Future of Resource Identification

What might evolve?

Decentralized Identifiers (DIDs)

A W3C standard for decentralized, self-sovereign identifiers:

did:example:123456789abcdefghi

// Properties:
// - No central registration authority
// - Cryptographically verifiable
// - Can represent any entity (person, org, thing)

Content-Addressable Storage

Systems like IPFS identify content by hash, not location:

// IPFS content identifier:
ipfs://QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco

// Same content = same identifier, regardless of where it's stored

Web3 and Blockchain-Based Identity

Blockchain-based systems offer another approach to persistent, decentralized identification—though with significant tradeoffs in complexity and energy use.

The Ongoing Tension

The fundamental tension remains:

Locations are practical but impermanent
Names are permanent but require resolution
Content hashes are permanent but don't handle versioning

No perfect solution exists. Each system makes tradeoffs.