wealthschemaresourcesarticlesSynthetic data procurement — a vendor evaluation workbook for fintech buyers
Article

Synthetic data procurement — a vendor evaluation workbook for fintech buyers

A four-stage workbook — define the use case, build the longlist, ask the eleven questions that separate operational vendors from well-marketed ones, recognize the seven red flags. The procurements that go badly skip stage one.

WealthSchema StaffProcurement workbookJul 11, 20263 min read

If you're the procurement, security, or engineering lead evaluating synthetic data vendors for a fintech, you're operating in a market that's matured rapidly in the past three years and where the differences between credible vendors are not always obvious from marketing materials. We evaluated the field ourselves in 2023 before deciding to build WealthSchema, and the workbook below is what we wish someone had handed us. It's vendor-agnostic where it can be — we have an obvious commercial interest in some of the answers, but we've tried to write it as a buyer would use it, not as a seller would write it.

The workbook is in four stages: define the use case, build a longlist, run the eleven evaluation questions, and apply the seven-red-flag filter. Skipping stage one is the modal failure mode — buyers who engage five vendors with different specialties before they've decided which problem they're solving end up with three weeks of demos and no signal.

Stage 1 — Define the use case before you contact vendors

The single highest-leverage thing a buyer can do is define the use case clearly before engaging vendors. Most procurement processes that go badly do so because the buyer engaged five vendors with different specialties before they'd decided which problem they were solving.

Six questions to answer before any vendor outreach

  • What's the primary use case? (Non-prod environment provisioning, ML training, compliance testing, sales-eng demo, analytics enablement, load testing.)
  • What's the source-data context? (Production database to preserve distributions of vs. generating from scratch.)
  • What's the regulatory frame? (GDPR, CCPA, HIPAA, GLBA, FCRA, ECOA, SR 11-7 model risk.)
  • What's the volume expectation? (Hundreds, thousands, millions of records.)
  • What's the refresh cadence? (One-time, annual, continuous synthesis.)
  • Who's the internal stakeholder cohort? (Engineering, ML, compliance, security, legal, procurement, finance.)

A use case defined in one paragraph that any internal stakeholder can read and understand is the prerequisite for an effective procurement. Without it, the procurement process tends to expand into evaluating tools that solve problems the buyer doesn't have.

Stage 2 — Build the vendor longlist

We've published head-to-head comparisons for most major synthetic-data vendors. The major categories of vendors in the synthetic-data-for-fintech market today:

 CategoryRepresentative vendorsBest when
Schema-preserving synthesis (enterprise)Tonic.ai, MOSTLY AI, Synthesized, Hazy, Howso, K2viewYou have production data and want a privacy-safe copy.
ML-first privacy-preserving generationGretel, MOSTLY AI (some configs), HowsoYou need formal privacy guarantees on derivative datasets.
Open-source synthesis librariesSDV (Synthetic Data Vault), Faker, academic librariesYou have engineering capacity and want full control.
Data masking + synthesisDelphix, Privitar (Informatica), K2viewMasking and provisioning are primary; synthesis is bundled.
Archetype-driven domain-specificWealthSchema, vertical-specialist shopsFintech-specific compliance, planning-engine, ML-validation use cases where coverage-by-design matters more than fidelity-to-source-distribution.

A typical longlist might include 4–6 vendors across 2–3 of these categories. A well-bounded use case usually points to a single category; a use case spanning multiple categories often deserves separate procurement processes per category.

Stage 3 — The questions that matter

Once you have a longlist, here's the question battery we'd recommend, organized by the audience that should ask it.

Engineering questions

  • Show me the schema documentation. (Per-field type, range, derivation, validation.)
  • Show me a sample dataset I can inspect before I commit. (Free download without sales engagement is a strong confidence signal.)
  • What's the data delivery format? (JSON, CSV, Parquet, DB export, API.)
  • What's the integration surface? (Files, API, on-demand from a config interface.)
  • How do I version the data? Can I pin? Can I diff between versions?
  • What happens when the vendor changes the data? Do my tests break? What's the migration path?
  • What's the performance footprint? Throughput at scale? Per-record cost?

ML-team questions

  • How was the data calibrated? (Source distributions, time period, statistical validation.)
  • What's the data's intended distribution? (Production-realistic, balanced for training, demographically-targeted.)
  • How do I tell if the data is fit for my model? What statistical tests does the vendor run? Can I reproduce them?
  • What's the privacy guarantee? (Differential-privacy parameters for schema-preserving; relationship to source for archetype-driven.)
  • Can I generate counterfactuals? (Records that vary in one dimension while holding others constant — critical for fair-lending and explainability.)
  • What's the temporal structure? (Single-point, longitudinal, generative time-series.)
  • Is the data fully synthetic or derived from real data? (Different regulatory implications under HIPAA and GLBA.)
  • What contractual representations does the vendor make? (Non-derivation, freedom from re-identification, indemnification.)
  • How is the data licensed? (Per-seat, per-record, per-use; restrictions on derivative work, redistribution, training of resold models.)
  • What's the vendor's security posture? (SOC 2 Type II, ISO 27001, pen test cadence, breach history.)
  • What's the data residency? (Where it lives, where it transits.)
  • What's the vendor's regulatory experience? (Used in successful exams, model validations, audit defenses?)

Security questions

  • How is the data delivered? (Direct download, S3 transfer, API.)
  • What access controls protect the corpus? (Especially for API delivery.)
  • What happens to the data after engagement ends? (Termination, destruction, ongoing access.)
  • Has the vendor been breached? (A clean answer is better than no answer; 'never breached, no IR plan' is worse than 'handled incidents, here's our process.')
  • What sub-processors does the vendor use, and how does the vendor manage them?

Procurement and commercial questions

  • What's the pricing model? (Per-record, per-corpus, per-API-call, subscription, perpetual.)
  • What's included in base price? (Methodology docs, integration support, custom configs, refreshes, support.)
  • What's the minimum commitment? (Annual minimums vs. single corpora outright.)
  • What's the renewal mechanic? (Auto-renew, renegotiation, opt-in. Watch multi-year escalators.)
  • What's the cancellation provision? (Termination for convenience, retention obligations, ongoing access.)
  • What references can I speak with? (Specifically in your domain, at firms of your scale.)

Stage 4 — The structural red flags

Across vendor evaluations we've reviewed, several red flags reliably distinguish less-credible vendors:

Red flag
Can't show data without a sales call
Confident vendors publish samples. Less-confident ones gate the data behind a process designed to extract a commitment before you've seen it.
Red flag
'AI' without specifics
'AI-powered' is sometimes a real description and sometimes a marketing veneer over rule-based or template-based generation. Push for specifics.
Red flag
Can't articulate schema-preserving vs. archetype-driven
If the vendor doesn't know which category they're in, they probably haven't thought rigorously about the use cases they're suited for.
Red flag
Case studies are all enterprise logos with no specifics
'Three of the top five banks' with no detail about the actual engagement is, in our experience, often a small-engagement-overstated relationship.
Red flag
Can't quantify privacy guarantees
For schema-preserving synthesis, formal parameters (DP-epsilon, k-anonymity bounds) should be quantifiable. 'We protect privacy' without numbers is marketing.
Red flag
Uncapped buyer liability
Standard market terms include caps; uncapped liability is buyer-disadvantageous and other vendors don't impose it.
Red flag
Requires the vendor's preferred toolchain
Legitimate vendors deliver in standard formats. Toolchain-required vendors create lock-in worth pricing in.

Stage 5 — The structural signs of a good vendor

The inverse pattern — what good vendors look like:

  • Sample data available without sales engagement.
  • Methodology documentation that goes beyond marketing — specific calibration sources, statistical validation, limitations.
  • Reference customers in your domain who can be spoken with.
  • Clear positioning vs. competitors — knows where their tool fits and where it doesn't.
  • Standard data delivery formats (JSON, CSV, Parquet, with sensible naming and structure).
  • Reasonable contractual terms — standard liability caps, clear data licensing, clear cancellation.
  • Stable team with publicly identifiable technical leadership.
  • Transparent pricing — at least a signal, even if final pricing is enterprise-negotiated.

Stage 6 — The decision framework

  1. 1
    Disqualify on hard requirements
    If your use case requires HIPAA-compliant data and a vendor can't provide it, they're out regardless of how good other answers are.
  2. 2
    Score on weighted criteria
    Use case fit, technical fit, compliance fit, commercial fit, vendor stability. Categories should weigh differently for different procurements.
  3. 3
    Validate the top two with a paid pilot
    Pilot the data against your actual use case, not a vendor-provided demo scenario.
  4. 4
    Reference-check the finalists
    Specifically ask: integration friction, vendor responsiveness, data quality issues, would they buy again.
  5. 5
    Negotiate before signing
    Initial terms are positions. Liability caps, pricing, renewal terms, and termination provisions are routinely negotiable.

A note on the WealthSchema-versus-others question

Since we're WealthSchema and we have an obvious interest in some of these answers, the honest framing of where we fit:

Our vendor comparison library has head-to-head pieces for each of the major alternatives. We've tried to write them as buyers, not as sellers — if any read otherwise, please tell us.

Closing

Three things distinguish a procurement that ends in a defensible artifact from one that ends in a re-run eighteen months later. The use case is named before any vendor is contacted — domain, schema, longitudinal depth, validation gates the data has to clear, the production model it backs. The eleven questions and seven red flags are run against every shortlisted vendor with the answers written down, not absorbed verbally during a demo. The contract is gated on a paid pilot against a use-case-specific spec, not on the demo data.

A team that does those three things doesn't end up in the vendor-selected-by-marketing-language outcome. A team that skips any of them does, and the cost is paid in the migration that follows. If WealthSchema belongs in the evaluation set, the free sample on GitHub lets you inspect the schema before any sales engagement; if there's a question we missed, tell us.

Key takeaways

  • Define the use case in a single paragraph before contacting any vendor — most failed procurements engage five vendors before defining the problem.
  • Five vendor categories, mostly non-overlapping: schema-preserving (enterprise), ML-first privacy-preserving, open-source, masking+synthesis, archetype-driven domain-specific.
  • Sample-data-without-sales-engagement is the single most reliable confidence signal for synthetic-data vendor credibility.
  • Liability caps, pricing tiers, renewal mechanics, and termination provisions are all routinely negotiable — initial terms are negotiating positions.
  • Pilot the top two finalists against your actual use case, not the vendor's demo scenario; reference-check on integration friction and 'would you buy again.'

Related reading: