Builder's Corner: You're paying to store the risk you were trying to prevent

Builder's Corner: You're paying to store the risk you were trying to prevent

Builder's Corner: You're paying to store the risk you were trying to prevent

Ask an engineer why the system collects everything and stores it indefinitely, and you'll usually get the same answer: compliance requires it. Auditors ask for it. We might need it someday. If something goes wrong, we need to prove what happened.

Not a bad answer. But there's a version nobody says out loud: we're storing data to protect ourselves from consequences we'll only face if that data gets out.

The compliance loop

Here's how it usually works. Users provide identity data to access a service. The service stores it — verified email, payment information, identification documents, behavioural history — to satisfy regulatory requirements, fraud detection, and audit trails. The dataset grows. More users, more interactions, more data retained longer than necessary because deciding what to delete costs more effort than just keeping it.

This feels like risk management. It's risk accumulation.

Every row in a user table is a potential liability. Every stored document is a breach surface. Every piece of identification held "just in case" carries regulatory consequences proportional to its sensitivity if it ever leaks. The compliance regime was designed to reduce risk. The data it required you to collect became the risk it was meant to prevent.

Not a paradox — a design pattern that made sense at a particular moment in the history of digital identity, and that nobody has seriously revisited since.

What you actually need to know

Most verification use cases don't need identity data. They need a fact.

Is this user over 18? Has this person already claimed this reward? Is this a unique human, or a duplicate account? These are yes/no questions. They don't require storing the documents that answer them.

The data-collection model came from a real constraint: to verify a fact about a user, you had to collect the evidence yourself, and it sat in your database long after the check was done because there was nowhere else for it to go.

That constraint was about infrastructure, not the problem. The problem was always just the question.

The breach cost is not the breach

The immediate cost of a data breach is obvious: incident response, forensics, customer notification, regulatory fines. Big numbers, mostly recoverable.

The less visible cost is proportional to what you were holding. A service that kept a verified credential rather than the underlying documents has a much smaller blast radius than one that held full identity dossiers on every user — not because the attack was less sophisticated, but because there was less to take.

Security conversations focus on prevention. The question that gets less attention is: if prevention fails, what exactly did we expose? Minimising what you hold doesn't prevent breaches. It changes what a breach actually means.

The data model question

Engineers who own this layer feel the tension directly. Compliance asks for retention. Legal wants a paper trail. Product wants longitudinal data for analysis. The path of least resistance is to store more for longer.

The tension doesn't resolve within the current model. But the model isn't fixed.

A system built around verified facts rather than stored documents works differently. You verify the claim at the point you need it — is this user over 18, is this a unique human, has this person already been paid — and store a record that the check passed. You don't keep the evidence the check was based on.

This is the principle behind zero-knowledge proofs, selective disclosure credentials, and a handful of privacy-preserving identity approaches that have been technically mature for years without making it into mainstream application design. Partly that's inertia. Partly the tooling to implement them easily didn't exist until recently. Partly it's that most teams inherited a data model they never had a reason to question.

The tooling exists now. Most teams just haven't looked.

The "just in case" inventory

Almost every user-facing system has data collected during onboarding that was never strictly required. It stayed because keeping it costs less than reviewing it. Nobody has looked at it since the original implementation decision.

This isn't negligence — it's the logical outcome of cheap storage and ambiguous deletion obligations. But it's liability with no offsetting value.

The question most data models can't answer cleanly: what would we actually lose by deleting everything we don't actively use? Usually, not much. The risk reduction from deleting it is real.

Verify facts. Don't hoard data. Smaller attack surface, lower compliance overhead, and a cleaner answer the next time someone asks what you're doing with everything you collected.

For Developers

Looking to implement secure, privacy-preserving identity verification for your organization? Our enterprise solutions can help you eliminate fraud and build customer trust.

For Enterprises

Ready to integrate Humanity Protocol into your applications? Our developer tools make it easy to add human verification with just a few lines of code.