Quick download: For most users, start with cpp_clean_v1.csv (29 MB). For everything in one archive, download CPP_Data_Release.zip (694 MB).

Tier 1: Analysis-Ready Core

The starting point for most analyses. Key demographics, IQ scores, breastfeeding, birth outcomes, and SES variables, cleaned and labeled.

FileDescriptionRowsColsSize
cpp_clean_v1.rds R binary format with factor labels 59,391 185 5.6 MB Download
cpp_clean_v1_codebook.csv Variable-level codebook for cpp_clean_v1 (type, missingness, descriptives, descriptions) 185 12 28 KB Download

Tier 2: Full CPPVAR Extraction

Every documented variable from the CPPVAR summary file, with complete codebook documentation.

FileDescriptionRowsColsSize
cppvar_all_columns.csv Every documented column from CPPVAR.ASC 59,391 1,236 164 MB Download
CPP_Codebook.csv Publication-quality codebook (1,239 entries) 1,239 278 KB Download
cppvar_codebook.csv Raw auto-parsed codebook (retained for reproducibility) 1,140 208 KB Download

Tier 3: Complete CPPMASTER Card Data

All 309 punch card types from the 6.1-million-record master file, parsed into named-column CSVs with per-card codebooks. Individual card files and the comprehensive merged file are available in the full release archive.

FileDescriptionSize
CPPMASTER_Data_Dictionary.csv Complete field-level data dictionary (16,295 fields) 2.5 MB Download
CPPMASTER_Codebook.csv Consolidated field-level codebook (6,280+ fields) 588 KB Download
vol3b_crossref.csv CPPVAR-to-CPPMASTER cross-reference (906 mappings) 70 KB Download
SAS_Variable_Catalog.csv Authoritative field definitions from 24 JHU SAS programs 1.1 MB Download

Tier 3d: Integrated Domain Files

All Stata .dta datasets and standalone CSVs from the NARA/NBER Johns Hopkins collection, merged into 11 domain-specific analysis files (RDS format), each keyed by the standard 9-digit case identifier with multi-card records coalesced to one row per child/pregnancy.

FileDescriptionRowsColsSize
nichd_psychology.rds 8-month Bayley, 3-year speech/language/hearing, 4-year Stanford-Binet, 7-year WISC, WRAT, Bender-Gestalt, audiology 44,445 894 16.2 MB Download
nichd_pediatric.rds Neonatal exam, 4-month/1-year pediatric exams, growth, 7-year neurology and conditions 50,099 1,651 23.6 MB Download
nichd_mother_path.rds Placenta pathology, maternal obstetric data 29,388 540 4.9 MB Download
nichd_summary7.rds Cumulative conditions through age 7 36,757 216 1.3 MB Download
nichd_serology_dta.rds 19 serology Stata files: ABO/Rh typing, clinical infection, antibody titers 62,407 395 2.7 MB Download
nichd_adm44.rds Death certificates and non-liveborn outcomes 4,001 19 56 KB Download
standalone_health.rds Congenital malformations, 7-year abnormalities, toxemia, ruptured membranes 111,608 201 3.2 MB Download
standalone_ses.rds Socioeconomic indices at registration and age 7 80,768 108 1.3 MB Download
standalone_serology.rds Serology card CSVs 89,509 307 2.8 MB Download
standalone_drugs.rds Generic and brand drug name files 49,214 43 607 KB Download
standalone_other.rds Visit/schedule, speech/language/hearing, W17 relationship files 218,079 246 3.0 MB Download
master_codebook_integrated.csv Documentation for all 4,814 variables across 11 domains 4,814 246 KB Download

Tier 4: Unified Wide Dataset

Every child crossed with every variable from every card, merged into a single wide table.

FileDescriptionRowsColsSize
cpp_unified_wide.rds All card + standalone data merged per child (R binary) 64,834 4,862 133 MB Download
cpp_unified_wide.csv Same, CSV format (in full release archive) 64,834 4,862 674 MB In Archive
cpp_unified_manifest.csv Variable manifest with missingness and summary statistics 4,862 288 KB Download
cpp_unified_supplementary.csv 2,534 sparse variables (<1% coverage), mergeable by case_id 64,834 2,534 In Archive
cpp_unified_supplementary_codebook.csv Codebook for supplementary sparse variables (also browseable on Codebook page) 2,534 Download

Tier 5: Family Structure

27,721 pairwise kinship links spanning seven levels of genetic relatedness, with twin zygosity classification. Includes 14,208 within-family sibling and twin pairs plus 13,513 cross-family extended relative pairs (first cousins, second cousins, half-cousins, nieces/nephews).

FileDescriptionRowsSize
cpp_kinship_links.csv Within-family sibling and twin pairs with R coefficients 14,208 1.8 MB Download
cpp_extended_kinship_links.csv Cross-family extended relative pairs from W17C–M work files (9,060 first-cousin, 1,702 second-cousin, 1,266 first-cousin-once-removed, 720 half-first-cousin, 435 half-niece/nephew, 330 niece/nephew) 13,513 0.6 MB Download
cpp_twin_zygosity.csv Twin pair zygosity classification (5-tier evidence schema) 640 115 KB Download

Tier 6: Derived Variables

Pre-computed files for common analytical tasks.

FileDescription
Cognitive & latent variable scores (three complementary files, mergeable by case_id):
cpp_cognitive_scores.csv Raw scores and z-scores for 12 cognitive measures (SB, WISC, WRAT, Bender, etc.) plus a default g factor Download
cpp_g_factors.csv 9 g factor scores (PCA, PAF, CFA across three variable sets, plus IRT-based Stanford-Binet theta) Download
cpp_item_scores.csv PCA and IRT factor scores from 5 item-level batteries (8-month Bayley, 7-year examiner ratings, neurological exams) Download
Physical and socioeconomic:
cpp_growth_trajectories.csv 755,739 longitudinal physical measurements for 55,443 children Download
cpp_birthweight_zscores.csv Sex- and gestational-age-specific z-scores for 54,223 children Download
cpp_ses_detailed.csv Merged registration and 7-year SES data Download
cpp_sb_items_reconstructed.csv Stanford-Binet reconstructed item responses for 37,820 children Download
Within-family analysis tools:
cpp_disability_discordance.csv 10 disability criteria and composite flags for 53,640 children Download
cpp_discordant_pairs.csv 11,539 sibling/twin pairs flagged for discordance Download

Survey Weights

Inverse-probability attrition weights and Census population weights enabling nationally representative estimation.

FileDescription
cpp_weights.csv Survey weights for all 59,391 children Download
cpp_weights_codebook.csv Documentation for weight variables Download

Full Release Archive

Contains everything above plus individual parsed CPPMASTER card files, standalone datasets, comprehensive merged file, multirow companion files, and all documentation.

FileDescriptionSize

Documentation

FileDescription
README.md Getting started guide, file selection, common pitfalls Download
RELEASE_NOTES.md Dataset overview, tier inventory, data quality notes Download
METHODOLOGY.md Technical pipeline, OCR process, validation methodology Download