Individual recently entering the banking system, building credit from scratch, using prepaid cards and check cashing.
U-01 models the household entering the banking system for the first time, or having moved from cash-and-prepaid into a checking account within the past 24 months. It is the canonical test corpus for CDFI onboarding, ITIN-friendly KYC, and the FDIC-defined unbanked-to-banked transition.
U-01 exists because the unbanked-to-banked transition is a high-friction surface that mainstream financial software is structurally unprepared for. The 2021 FDIC National Survey of Unbanked and Underbanked Households put the U.S. unbanked rate at 4.5% — roughly 5.9 million households — disproportionately concentrated in Black, Hispanic, immigrant, and lower-income communities. Recently-banked households carry no traditional credit history, transact primarily in cash or prepaid card, and may identify with an ITIN rather than an SSN. KYC and CIP workflows under the BSA/Patriot Act allow ITIN-as-identifier under FinCEN guidance, but most platforms gate against SSN-only — a structural exclusion that fair-lending regulators have flagged. The corpus surfaces the household exactly when these flows must function correctly: first checking account, first secured card, first ACH origination.
Cash-flow shape is mass-market: median gross income of $57,026, p25–p75 of $51k–$64k. Net-worth median of $77,782 is depressed by a 10-of-22 cluster in the $-500–$70k band, reflecting the cash-economy household that has neither accumulated investments nor accumulated traditional debt. Liquid assets of $28k median are notably low; investable assets at $50k median reflect that any savings present are typically held outside conventional brokerage. Every household carries credit-card balances (typically secured or starter cards in the credit-build phase), 11 of 22 carry auto loans (often title-loan or buy-here-pay-here), and 9 of 22 hold mortgages — disproportionately CDFI-originated, FHA, or non-QM portfolio products.
What distinguishes U-01 from F-04 (first-generation wealth builder) or U-02 (low-income working family) is the recency of banking access. F-04 has a banking relationship and is building from scratch with conventional rails; U-02 is in the system but persistently low-income; U-01 is in the act of crossing the rail itself. That structural difference matters for CDFI product testing — Bank On certified accounts, second-chance checking, and community-development credit unions all underwrite to a transition-state customer profile that U-01 specifically models. The age range (18–40, median 32) skews younger, and 41% carry dependents, surfacing the family-banking dimension where one adult opens an account that the rest of the household will eventually use.
Aggregated across the 22 U-01 households in the shipped v3 corpus corpus. Numbers describe the corpus, not population claims.
Liam is the stress tail of the U-01 corpus: $57k income against a $443 liquid position and the corpus's outsized liability stack — a high-cost subprime debt load against a young, recently-banked single in a tier-1 HCOL metro. This is the household that breaks an underwriting model relying on average behavior; for a CDFI origination workflow he is exactly the customer the product is designed to serve, and any fair-lending model needs to score him on cash flow rather than the absent credit file.
Every U-01 household ships with — at minimum — these JSON fields populated. The full schema is documented in the data set you purchase.
Three buyer profiles drive U-01 demand. CDFIs and community banks use the corpus to test Bank On certified account onboarding, second-chance checking eligibility, and CRA-eligible small-dollar lending underwriting where the borrower has no traditional credit file. Neobanks and ITIN-friendly fintechs (including immigrant- and community-focused neobanks plus remittance-affiliated banks) test CIP and identity-verification flows against ITIN-as-identifier and against documentary-evidence KYC for households without a five-year address history. Fair-lending compliance teams at all-tier banks use it to validate that automated decisioning does not produce disparate-impact outputs against thin-file or ITIN applicants, including for HMDA-reportable mortgage applications where the same applicant would otherwise be excluded.
U-01 deliberately excludes households that have been continuously banked for more than a few years; once the relationship is established and a credit file is maturing, the profile moves to F-04 (first-generation wealth builder) or U-02 (low-income working family). Cannabis-industry workers locked out of banking for industry rather than identity reasons belong in N-04, where the §280E and state-compliance overlays apply. Recent immigrants are a strong overlap segment, but U-03 is the cleaner test for visa-status and remittance-driven scenarios; U-01 covers the broader population including U.S.-born unbanked. UHNW unbanked (intentional cash holders, off-grid wealthy) are not modeled — the corpus is mass-market income-band by construction. Finally, the 'underbanked' state (has a checking account but still uses payday or check-cashing) is partially represented but better tested via U-02 with an overlay.
The income and demographic shape was anchored during v3 synthesis to the FDIC 2021 National Survey of Unbanked and Underbanked Households, with state distribution informed by the same survey's CBSA-level unbanked rates (CA, TX, MI concentration matches the survey's higher-rate metros). The thin-file credit posture and CDFI-product alignment were informed by NCUA call report data on community-development credit-union member demographics. Per CLAUDE.md §9, the v3 corpus is frozen and not regenerable; calibration descriptions reflect synthesis intent rather than auditable distribution-fit statistics.
U-02 (low-income working family) is in the banking system but persistently low-income with EITC and SNAP eligibility. Reach for U-02 when the testing question is benefits-enrollment workflows; reach for U-01 when it is first-account onboarding.
U-03 (recent immigrant — working) is the visa-status-driven overlap. Use U-03 when the cleaner test is remittance flows, FBAR, and ITIN-from-immigration; use U-01 for the broader thin-file population including U.S.-born unbanked.
F-04 (first-generation wealth builder) has an established banking relationship and is building from scratch on conventional rails. U-01 is the transition state that precedes F-04.
S-02 (post-bankruptcy) shares the thin-file friction but had prior mainstream access and lost it. Different fair-lending narrative, different product fits — second-chance checking and rebuilder cards in S-02 versus first-time-banking products in U-01.
U-01 — Unbanked / Recently Banked represents the household that is just entering the banking system: no traditional credit history, prepaid-card and check-cashing usage prior to the transition, and possible ITIN-based identity. The corpus models the inflection point that CDFIs, ITIN-friendly fintechs, and Bank On certified accounts are designed to serve.
No. A subset of the corpus uses ITIN rather than SSN, but U-01 is broader than visa-status-driven unbanked. The cleaner ITIN-and-immigration overlap is U-03; U-01 covers U.S.-born unbanked and recently-banked households as well.
The corpus models households within roughly the first 24 months of having a primary checking account. The transition is captured in the thin-file credit posture, the limited ACH history, and the prepaid-card-residual behavior — the corpus does not encode a specific months-since-first-account field.
Conventional underwriting models reject thin-file applicants by default. Regulators (CFPB, OCC, Fed) have flagged this as a disparate-impact concern under ECOA. U-01 is the corpus you point an underwriting model at to see whether it produces lawful adverse-action notices and meaningful alternative-data-aware decisions.
U-01 is tagged for six bundles — B14, B19, B23, B26, B29, and B30 — covering behavioral finance, student-debt patterns, fair-lending compliance, demographic overlays, CDFI/underserved coverage, and transitional-household coverage.
No. The shipped v3 corpus is frozen and not regenerable from current code (CLAUDE.md §9). Sampler improvements land in a future v4 release with per-archetype golden fixtures in CI to prevent silent drift.
Download households matching this archetype as part of a Wealth Data Set.
Browse Data Sets