Two Opposing Philosophies
There are two opposing philosophies for what data to collect:
- Broad collection ("collect everything") — Capture every event, every click, every timing metric. Store it all. Figure out what matters later. The advantage is discovering things you did not know to look for.
- Targeted collection ("collect specific things") — Define specific questions first. Instrument only what answers those questions. The advantage is lower cost, lower risk, and clearer compliance.
Comparing the Approaches
| Aspect | Broad Collection | Targeted Collection |
|---|---|---|
| Data volume | Very high | Low to moderate |
| Storage cost | High and growing | Predictable and manageable |
| Discovering unknowns | Strong — data is already there | Weak — must add new instrumentation |
| Privacy risk | High — you may collect PII without realizing | Low — you know exactly what you have |
| GDPR alignment | Poor — violates data minimization | Good — purpose-limited collection |
| Setup time | Fast initially, hard to query later | Slower initially, easy to query later |
In practice, most teams start broad and narrow over time as they learn what matters. The key constraint is privacy law.
Data Minimization and Data Smog
GDPR's data minimization principle says you should only collect data you have a specific purpose for. "We might need it someday" is not a purpose. Before collecting any user data, define why you need it, how long you will keep it, and who will access it. This applies whether you build first-party or use third-party analytics.
Information hoarding is a serious privacy risk, and it is better to collect as little as possible until you know what you need. A breach could result in a data leak that could be very damaging to your business. Furthermore, too much data can create a form of data smog that may actually obfuscate rather than illuminate.