03: Data Collection Methods - Analytics Tutorials

Three Approaches

There are three fundamental approaches to collecting analytics data, each capturing data at a different point in the client-server communication:

Server logs — The web server automatically records every request: IP address, URL, status code, User-Agent, timestamp. In theory, any HTTP packet, and that could include extra data provided using Client Hints. A motivating aspect of logging is that no client code changes are needed. While this is the oldest method, it is also the most reliable one, and complete analytics solutions leverage logs.
Network capture (packet sniffing) — Intercepting traffic between client and server to inspect the full request/response. Requires network access but no server or client changes. Largely obsolete for content analysis since HTTPS, unless within a datacenter, where you could install a certificate on the device.
Client-side scripts — JavaScript code running in the browser or other native code in a mobile application that captures events (clicks, scrolls, errors, timing) and sends them to a collector. This is the most flexible method and collects a wealth of information, but it requires code deployment and depends on JavaScript being enabled.

  ┌──────────┐                                           ┌──────────┐
  │  Client  │─────────────── Network ──────────────────▶│  Server  │
  │ (Browser)│                                           │          │
  └──────────┘                                           └──────────┘
       │                        │                             │
  Client-side              Network capture              Server logs
  scripts capture:         captures:                    capture:
  • DOM events             • Request/response           • IP address
  • Scroll depth             headers                    • URL path
  • Click coordinates      • Payload content            • Status code
  • JS errors                (HTTP only, not            • User-Agent
  • Performance timing       HTTPS content)             • Timestamp
  • Viewport size          • Connection metadata        • Response size

Comparison of Methods

Aspect	Server Logs	Network Capture	Client-Side Scripts
What it captures	HTTP requests received by server	All traffic on the wire	Any browser event or state
Requires code changes?	No — built into web servers	No — passive observation	Yes — must add JS to pages
Captures client events?	No — only sees requests	No — only sees network traffic	Yes — clicks, scrolls, errors
Works with HTTPS?	Yes — runs on the server	Metadata only — content encrypted	Yes — runs in the browser
Performance impact	Minimal — logging is routine	Variable — depends on volume	Variable — adds JS payload
Privacy concerns	Moderate — IP, paths	High — deep packet inspection	Very High — can capture anything
Example tools	Apache/Nginx logs, GoAccess, AWStats	Wireshark, tcpdump	Google Analytics, custom beacons