Collection Philosophy: Broad vs. Targeted

Collect everything or collect with purpose — balancing discovery against privacy and cost

Two Opposing Philosophies

There are two opposing philosophies for what data to collect:

  • Broad collection ("collect everything") — Capture every event, every click, every timing metric. Store it all. Figure out what matters later. The advantage is discovering things you did not know to look for.
  • Targeted collection ("collect specific things") — Define specific questions first. Instrument only what answers those questions. The advantage is lower cost, lower risk, and clearer compliance.

Comparing the Approaches

Aspect Broad Collection Targeted Collection
Data volume Very high Low to moderate
Storage cost High and growing Predictable and manageable
Discovering unknowns Strong — data is already there Weak — must add new instrumentation
Privacy risk High — you may collect PII without realizing Low — you know exactly what you have
GDPR alignment Poor — violates data minimization Good — purpose-limited collection
Setup time Fast initially, hard to query later Slower initially, easy to query later

In practice, most teams start broad and narrow over time as they learn what matters. The key constraint is privacy law.

Data Minimization and Data Smog

GDPR's data minimization principle says you should only collect data you have a specific purpose for. "We might need it someday" is not a purpose. Before collecting any user data, define why you need it, how long you will keep it, and who will access it. This applies whether you build first-party or use third-party analytics.

Information hoarding is a serious privacy risk, and it is better to collect as little as possible until you know what you need. A breach could result in a data leak that could be very damaging to your business. Furthermore, too much data can create a form of data smog that may actually obfuscate rather than illuminate.