Wizard

Synthetic vs. Anonymized Real Data — Decision Tree

Published May 10, 2026

The synthetic-vs-anonymized decision often gets framed as 'how realistic does the data need to be?' That's the wrong question — both can be made realistic. The right questions are about legal-risk tolerance, edge-case coverage, and how much the test environment is going to be examined. This decision tree walks through them.

What you walk away with

~2 min · 3 questions · 3 possible outcomes
  • A specific 'use synthetic' / 'use anonymized' / 'either, with caveats' recommendation.
  • A one-paragraph rationale citing the binding constraint.
  • A linked artifact (the comparison, the relevant assessment, or a 'talk to us' if your situation is unusual).
Question 1 of ~3

Does the test / dev / demo environment ever process production PII or NPI?

Even occasional flow counts. If anyone has had to file a finding about this, the answer is yes.

FAQ

Why doesn't this go deeper into anonymization techniques?

Because anonymization rigor is downstream of the legal-risk decision. If the firm accepts the residual re-identification risk, the technique discussion is engineering. If it doesn't, no technique is sufficient.

What about a hybrid?

Real for distributional grounding, synthetic for edge-case top-up. The wizard surfaces this on the anonymized branch — 'talk to us' opens that conversation.