Server Architecture Patterns

How servers handle thousands of simultaneous connections

The Concurrency Challenge

A web server must handle many clients simultaneously. While one client waits for a database query, another is uploading a file, and a third is downloading an image. How a server manages these concurrent connections defines its architecture—and determines its performance characteristics under load.

There are three fundamental approaches, each with distinct trade-offs:

  1. Process-per-request — Fork a new process for each connection
  2. Thread-per-request — Spawn a thread for each connection
  3. Event-driven — One thread handles many connections via async I/O

Process-Per-Request

The simplest model: when a request arrives, the server forks a new process to handle it. Each process has its own memory space and runs independently.

┌─────────────────────────────────────────────────────────────┐
│                     Master Process                          │
│                    (listens on port 80)                     │
└──────────────────────────┬──────────────────────────────────┘
                           │ fork()
         ┌─────────────────┼─────────────────┐
         │                 │                 │
         ▼                 ▼                 ▼
   ┌───────────┐     ┌───────────┐     ┌───────────┐
   │  Worker   │     │  Worker   │     │  Worker   │
   │ Process 1 │     │ Process 2 │     │ Process 3 │
   │           │     │           │     │           │
   │ Request A │     │ Request B │     │ Request C │
   └───────────┘     └───────────┘     └───────────┘
				
Apache prefork MPM: each request gets its own process

Apache's prefork MPM uses this model. It pre-forks a pool of worker processes, and each incoming connection is handed to an available worker.

# Apache prefork configuration <IfModule mpm_prefork_module> StartServers 5 # Initial workers MinSpareServers 5 # Minimum idle workers MaxSpareServers 10 # Maximum idle workers MaxRequestWorkers 150 # Max concurrent requests MaxConnectionsPerChild 0 # Requests before worker restarts </IfModule>

Advantages

  • Complete isolation—one crash doesn't affect others
  • Works with non-thread-safe code (old PHP)
  • Simple mental model
  • Easy debugging (one process = one request)

Disadvantages

  • High memory usage (~30MB per process)
  • Process creation overhead
  • Hard limit on concurrent connections
  • Context switching between processes is expensive

Thread-Per-Request

An improvement on process-per-request: threads share memory within a single process, reducing overhead while still providing parallelism.

┌─────────────────────────────────────────────────────────────┐
│                      Server Process                         │
│                                                             │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   │
│   │ Thread  │   │ Thread  │   │ Thread  │   │ Thread  │   │
│   │    1    │   │    2    │   │    3    │   │    4    │   │
│   │         │   │         │   │         │   │         │   │
│   │ Req A   │   │ Req B   │   │ Req C   │   │ (idle)  │   │
│   └─────────┘   └─────────┘   └─────────┘   └─────────┘   │
│                                                             │
│                   Shared Memory Space                       │
└─────────────────────────────────────────────────────────────┘
				
Apache worker/event MPM: threads within processes

Apache's worker MPM combines processes and threads: multiple processes each run multiple threads. This balances isolation with efficiency.

# Apache worker configuration <IfModule mpm_worker_module> StartServers 2 # Initial processes MinSpareThreads 25 # Minimum idle threads (across all processes) MaxSpareThreads 75 # Maximum idle threads ThreadsPerChild 25 # Threads per process MaxRequestWorkers 150 # Max concurrent requests </IfModule>

Advantages

  • Lower memory than process-per-request
  • Faster context switching
  • Shared caches and connection pools
  • Better resource utilization

Disadvantages

  • Thread safety issues (race conditions, deadlocks)
  • One crash can kill all threads in the process
  • Still limited by thread count
  • Stack memory per thread (~1MB default)

Event-Driven (Async I/O)

The modern approach: instead of dedicating a thread to each connection, a single thread manages thousands of connections using non-blocking I/O and an event loop.

┌─────────────────────────────────────────────────────────────┐
│                     Event Loop Thread                       │
│                                                             │
│   ┌─────────────────────────────────────────────────────┐   │
│   │                    epoll / kqueue                    │   │
│   │                                                      │   │
│   │   Ready: [conn_42, conn_187, conn_5, conn_891]       │   │
│   └─────────────────────────────────────────────────────┘   │
│                            │                                │
│              ┌─────────────┴─────────────┐                  │
│              ▼                           ▼                  │
│   ┌─────────────────────┐     ┌─────────────────────┐      │
│   │   Handle conn_42    │     │   Handle conn_187   │      │
│   │   (read request)    │     │   (send response)   │      │
│   └─────────────────────┘     └─────────────────────┘      │
│                                                             │
│   10,000+ connections managed by one thread                 │
└─────────────────────────────────────────────────────────────┘
				
Nginx/Node.js: event-driven, non-blocking I/O

Nginx and Node.js use this model. The server never blocks waiting for a single connection—it continuously processes whichever connections are ready.

# Nginx worker configuration worker_processes auto; # One worker per CPU core worker_connections 10000; # Connections per worker # With 8 cores: 8 × 10,000 = 80,000 concurrent connections

Advantages

  • Massive concurrency (tens of thousands)
  • Very low memory per connection
  • No context switching overhead
  • Excellent for I/O-bound workloads

Disadvantages

  • CPU-bound work blocks all connections
  • Complex programming model (callbacks, promises)
  • Debugging is harder (no stack per request)
  • Must avoid blocking operations

The blocking trap: In an event-driven server, a single blocking operation (synchronous file read, CPU-intensive computation) stalls all connections. Node.js developers learn this the hard way when they accidentally use fs.readFileSync() in a request handler.

The C10K Problem

In 1999, Dan Kegel posed the "C10K problem": how can a single server handle 10,000 concurrent connections? At the time, this seemed impossibly ambitious. The answer was the event-driven architecture.

Architecture Memory for 10K connections Practical limit
Process-per-request ~300 GB (30MB × 10K) ~500 connections
Thread-per-request ~10 GB (1MB × 10K) ~2,000 connections
Event-driven ~100 MB (~10KB × 10K) 100,000+ connections

Today we talk about C100K and C1M—handling hundreds of thousands or millions of concurrent connections. Event-driven architecture makes this possible, but the choice of architecture depends on your workload, not just connection count.

Choosing the Right Architecture

There's no universally "best" architecture. The right choice depends on your workload characteristics:

Workload Best Architecture Why
High concurrency, I/O-bound Event-driven (Nginx) Minimal overhead per connection
CPU-intensive processing Thread/process pool Utilize multiple cores without blocking
Legacy non-thread-safe code Process-per-request Process isolation prevents race conditions
Mixed workload Hybrid (Nginx + app server) Right tool for each job

The Hybrid Approach

Most production systems combine architectures. A common pattern:

                        Internet
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              Nginx (Event-driven)                           │
│                                                             │
│   • TLS termination        • Static file serving           │
│   • Compression            • Rate limiting                  │
│   • Load balancing         • Connection pooling             │
└──────────────────────────┬──────────────────────────────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
         ▼                 ▼                 ▼
   ┌───────────┐     ┌───────────┐     ┌───────────┐
   │  Node.js  │     │  Node.js  │     │  Node.js  │
   │  Process  │     │  Process  │     │  Process  │
   │           │     │           │     │           │
   │ App Logic │     │ App Logic │     │ App Logic │
   └───────────┘     └───────────┘     └───────────┘
				
Nginx handles connections, Node.js handles application logic

Nginx excels at connection handling, TLS, and static files. The application server (Node.js, Python, Ruby) handles business logic. Each component does what it's best at.

Why not just Node.js alone? You can run Node.js directly, but you lose Nginx's battle-tested defaults for rate limiting, request size limits, timeout handling, and security hardening. More on this in Tutorial 13: The Node.js Model.

What's Next

Understanding architecture helps you make informed decisions about server selection and configuration. The next tutorial covers the practical side: configuring servers to serve your sites using virtual hosts, document roots, and MIME types.