Does the auditor need to see the actual synthetic data?

Usually no. The auditor is testing controls, not data. They'll sample the manifests, the vendor attestation, the change tickets, and the environment matrix. They generally won't open the corpus itself.

What if synthetic data isn't currently in our SOC 2 boundary?

It might be implicitly in scope and you don't realize it. If a non-prod environment uses synthetic data, the auditor's confidentiality control sampling will land there. Better to surface synthetic data explicitly with this package than to surprise the auditor mid-engagement.

SOC 2 Synthetic-Data Evidence Package Template

SOC 2 Evidence Package — Synthetic Data In Scope

Firm: [FIRM_NAME] · Fiscal year: [FISCAL_YEAR] · Audit window: [AUDIT_WINDOW_START] → [AUDIT_WINDOW_END]

Information security owner: [INFORMATION_SECURITY_OWNER]

1. Scope of synthetic data within the SOC 2 boundary

[FIRM_NAME] uses a third-party synthetic-wealth-data corpus from [SYNTHETIC_DATA_VENDOR], current version [CURRENT_CORPUS_VERSION], in the following systems: [LIST_SYSTEMS]. The corpus contains no personally identifiable information of any actual individual (per vendor attestation, filed separately). Production customer data does not flow into the systems where the synthetic corpus operates.

What goes hereguidance — delete before sending

Replace [LIST_SYSTEMS] with the actual systems / environments. Be specific: 'staging environment / dev environment / sandbox / training-data pipeline / fraud-rule-tuning sandbox.' The boundary the auditor walks is exactly this list.

2. CC2.1 — Information communication

Control: [FIRM_NAME] maintains documented policies that address the use of synthetic data, communicate them to relevant personnel on hire and annually, and require attestation of understanding.

Evidence: the firm's information security policy (sec-policy-[FISCAL_YEAR].pdf) includes a synthetic-data section. Annual training (training-[FISCAL_YEAR].pdf) includes a synthetic-data module with documented completion. Acknowledgment records are stored in the LMS and exportable on request.

3. CC6.1 — Logical access

Control: Access to the synthetic corpus is least-privilege, role-based, and reviewed quarterly. Standing access is the exception, not the default.

Evidence: access matrix (access-[FISCAL_YEAR]-Q1, Q2, Q3, Q4.csv) lists every individual with corpus access, role assignment, and last review date. Quarterly access-review meeting minutes are filed under access-reviews-[FISCAL_YEAR]/.

Why is access to synthetic data a control?guidance — delete before sending

It isn't a regulatory control (no NPI), but it is a SOC 2 control because the corpus is intellectual property of the vendor and a controlled asset of the firm. The license terms typically restrict redistribution; access controls are how the firm enforces them.

4. CC6.7 — Transmission

Control: When the corpus is transmitted (between the vendor's distribution and the firm's systems, or between firm systems), it's encrypted in transit. Distribution is via authenticated, signed package downloads.

Evidence: vendor delivery is via signed S3 presigned URLs with TLS 1.2+. Internal distribution is via the firm's standard encrypted artifact registry. Sample manifest with hash verification: corpus-manifest-[CURRENT_CORPUS_VERSION].json.

5. CC8.1 — Change management

Control: Corpus changes are reviewed, tested, approved, and traceable. The current corpus version is identified in every release record.

Evidence: corpus-change-tickets-[FISCAL_YEAR].csv lists every corpus version change in the audit window with: version, ticket id, reviewer, approval date, deployment date, affected systems. Sample release record: release-2026-Q2-1 includes the corpus pin [CURRENT_CORPUS_VERSION].

Sampling noteguidance — delete before sending

The auditor will sample 5-10 corpus changes from this list and verify the ticket trail end-to-end. Make sure each ticket has reviewer + approver + test evidence + deployment record. Missing one of these is the most common change-management finding.

6. Confidentiality TSC — synthetic data as the structural answer

Control: Customer data is protected via classification, encryption-at-rest, encryption-in-transit, and segregation. Test, demo, and development environments do not process production customer data. Synthetic data is used in those environments to support the segregation control structurally.

Evidence: environment matrix (env-data-flows-[FISCAL_YEAR].csv) shows every non-production environment, the data classes present, and the synthetic-corpus version in use. Production customer data does not appear in any test / demo / dev environment row.

Why this is structurally cleaner than anonymizationguidance — delete before sending

Anonymization-as-confidentiality control depends on the rigor of the anonymization technique and is subject to re-identification risk. Synthetic data has no real PII by construction; the confidentiality control is structural rather than process-dependent. Auditors universally prefer this argument.

7. Sample artifacts (for the auditor to sample)

corpus-manifest-[CURRENT_CORPUS_VERSION].json — version, signature, integrity hash, change-log link
vendor-attestation-[FISCAL_YEAR].pdf — written attestation of no real PII in the corpus
license-[SYNTHETIC_DATA_VENDOR]-[FISCAL_YEAR].pdf — current license agreement
access-[FISCAL_YEAR]-Q4.csv — most recent quarterly access review
corpus-change-tickets-[FISCAL_YEAR].csv — change-management evidence
env-data-flows-[FISCAL_YEAR].csv — environment / data-class matrix
training-[FISCAL_YEAR].pdf — synthetic-data section of annual security training

Pre-audit checklistguidance — delete before sending

Walk this list 30 days before the audit window opens. Each artifact should be: dated, identifiable, accessible to the auditor, and consistent with the others. Inconsistencies between artifacts (e.g. corpus version named differently across tickets and manifest) are the most common finding source.