Template

SOC 2 Synthetic-Data Evidence Package Template

Published May 10, 2026

When synthetic data appears in a SOC 2 Type II observation window, the auditor asks the same structural questions: how is it classified, how is it changed, what controls protect it, and what's the evidence trail? This template ships a drop-in evidence package — control narratives + sample artifacts — that addresses each question in the form auditors actually accept.

What you walk away with

~25 min · 7 slots · 27 blocks
  • Control narratives for CC2.1 (information communication), CC6.1 (logical access), CC6.7 (transmission), CC8.1 (change management), and the Confidentiality TSC.
  • Sample artifacts the auditor will sample from — corpus-version manifest, change-management ticket exemplar, access-review extract.
  • A documented argument for why production data does not flow into test environments, supported by the corpus inventory.
0 / 7 filled0%

Variables

Live document preview

SOC 2 Evidence Package — Synthetic Data In Scope

Firm: [FIRM_NAME] · Fiscal year: [FISCAL_YEAR] · Audit window: [AUDIT_WINDOW_START] → [AUDIT_WINDOW_END]

Information security owner: [INFORMATION_SECURITY_OWNER]

1. Scope of synthetic data within the SOC 2 boundary

[FIRM_NAME] uses a third-party synthetic-wealth-data corpus from [SYNTHETIC_DATA_VENDOR], current version [CURRENT_CORPUS_VERSION], in the following systems: [LIST_SYSTEMS]. The corpus contains no personally identifiable information of any actual individual (per vendor attestation, filed separately). Production customer data does not flow into the systems where the synthetic corpus operates.

What goes hereguidance — delete before sending

Replace [LIST_SYSTEMS] with the actual systems / environments. Be specific: 'staging environment / dev environment / sandbox / training-data pipeline / fraud-rule-tuning sandbox.' The boundary the auditor walks is exactly this list.

2. CC2.1 — Information communication

Control: [FIRM_NAME] maintains documented policies that address the use of synthetic data, communicate them to relevant personnel on hire and annually, and require attestation of understanding.

Evidence: the firm's information security policy (sec-policy-[FISCAL_YEAR].pdf) includes a synthetic-data section. Annual training (training-[FISCAL_YEAR].pdf) includes a synthetic-data module with documented completion. Acknowledgment records are stored in the LMS and exportable on request.

3. CC6.1 — Logical access

Control: Access to the synthetic corpus is least-privilege, role-based, and reviewed quarterly. Standing access is the exception, not the default.

Evidence: access matrix (access-[FISCAL_YEAR]-Q1, Q2, Q3, Q4.csv) lists every individual with corpus access, role assignment, and last review date. Quarterly access-review meeting minutes are filed under access-reviews-[FISCAL_YEAR]/.

Why is access to synthetic data a control?guidance — delete before sending

It isn't a regulatory control (no NPI), but it is a SOC 2 control because the corpus is intellectual property of the vendor and a controlled asset of the firm. The license terms typically restrict redistribution; access controls are how the firm enforces them.

4. CC6.7 — Transmission

Control: When the corpus is transmitted (between the vendor's distribution and the firm's systems, or between firm systems), it's encrypted in transit. Distribution is via authenticated, signed package downloads.

Evidence: vendor delivery is via signed S3 presigned URLs with TLS 1.2+. Internal distribution is via the firm's standard encrypted artifact registry. Sample manifest with hash verification: corpus-manifest-[CURRENT_CORPUS_VERSION].json.

5. CC8.1 — Change management

Control: Corpus changes are reviewed, tested, approved, and traceable. The current corpus version is identified in every release record.

Evidence: corpus-change-tickets-[FISCAL_YEAR].csv lists every corpus version change in the audit window with: version, ticket id, reviewer, approval date, deployment date, affected systems. Sample release record: release-2026-Q2-1 includes the corpus pin [CURRENT_CORPUS_VERSION].

Sampling noteguidance — delete before sending

The auditor will sample 5-10 corpus changes from this list and verify the ticket trail end-to-end. Make sure each ticket has reviewer + approver + test evidence + deployment record. Missing one of these is the most common change-management finding.

6. Confidentiality TSC — synthetic data as the structural answer

Control: Customer data is protected via classification, encryption-at-rest, encryption-in-transit, and segregation. Test, demo, and development environments do not process production customer data. Synthetic data is used in those environments to support the segregation control structurally.

Evidence: environment matrix (env-data-flows-[FISCAL_YEAR].csv) shows every non-production environment, the data classes present, and the synthetic-corpus version in use. Production customer data does not appear in any test / demo / dev environment row.

Why this is structurally cleaner than anonymizationguidance — delete before sending

Anonymization-as-confidentiality control depends on the rigor of the anonymization technique and is subject to re-identification risk. Synthetic data has no real PII by construction; the confidentiality control is structural rather than process-dependent. Auditors universally prefer this argument.

7. Sample artifacts (for the auditor to sample)

  1. corpus-manifest-[CURRENT_CORPUS_VERSION].json — version, signature, integrity hash, change-log link
  2. vendor-attestation-[FISCAL_YEAR].pdf — written attestation of no real PII in the corpus
  3. license-[SYNTHETIC_DATA_VENDOR]-[FISCAL_YEAR].pdf — current license agreement
  4. access-[FISCAL_YEAR]-Q4.csv — most recent quarterly access review
  5. corpus-change-tickets-[FISCAL_YEAR].csv — change-management evidence
  6. env-data-flows-[FISCAL_YEAR].csv — environment / data-class matrix
  7. training-[FISCAL_YEAR].pdf — synthetic-data section of annual security training
Pre-audit checklistguidance — delete before sending

Walk this list 30 days before the audit window opens. Each artifact should be: dated, identifiable, accessible to the auditor, and consistent with the others. Inconsistencies between artifacts (e.g. corpus version named differently across tickets and manifest) are the most common finding source.

Unfilled slots show as [VARIABLE_NAME] so the partial document still reads. Filling in the form on the left substitutes them inline.

What to do with this

Adopt as the firm's standard SOC 2 evidence package for the synthetic-data domain. Pre-populate the artifacts referenced. File with the InfoSec / compliance team's audit-readiness folder. Refresh per audit window.

Calibrated against: AICPA Trust Services Criteria + observed SOC 2 Type II evidence patterns 2022-2025Calibrated against the AICPA Trust Services Criteria (2017 with 2022 points-of-focus revisions) and observed evidence patterns from fintech Type II audits 2022-2025. Synthetic-data-specific elements are calibrated against published guidance from auditing firms on synthetic-data-in-scope engagements.

FAQ

Does the auditor need to see the actual synthetic data?

Usually no. The auditor is testing controls, not data. They'll sample the manifests, the vendor attestation, the change tickets, and the environment matrix. They generally won't open the corpus itself.

What if synthetic data isn't currently in our SOC 2 boundary?

It might be implicitly in scope and you don't realize it. If a non-prod environment uses synthetic data, the auditor's confidentiality control sampling will land there. Better to surface synthetic data explicitly with this package than to surprise the auditor mid-engagement.