All files are hosted as GitHub Release assets. CSV files can be opened in R, Python, Stata, or any spreadsheet application. RDS files are R binary format.
Quick download: For most users, start with cpp_clean_v1.csv (29 MB). For everything in one archive, download CPP_Data_Release.zip (694 MB).
The starting point for most analyses. Key demographics, IQ scores, breastfeeding, birth outcomes, and SES variables, cleaned and labeled.
| File | Description | Rows | Cols | Size | |
|---|---|---|---|---|---|
cpp_clean_v1.csv |
Analysis-ready dataset (185 key variables, cleaned and labeled) | 59,391 | 185 | 29 MB | Download |
cpp_clean_v1.rds |
R binary format with factor labels | 59,391 | 185 | 5.6 MB | Download |
cpp_clean_v1_codebook.csv |
Variable-level codebook for cpp_clean_v1 (type, missingness, descriptives, descriptions) | 185 | 12 | 28 KB | Download |
Every documented variable from the CPPVAR summary file, with complete codebook documentation.
| File | Description | Rows | Cols | Size | |
|---|---|---|---|---|---|
cppvar_all_columns.csv |
Every documented column from CPPVAR.ASC | 59,391 | 1,236 | 164 MB | Download |
CPP_Codebook.csv |
Publication-quality codebook (1,239 entries) | 1,239 | — | 278 KB | Download |
cppvar_codebook.csv |
Raw auto-parsed codebook (retained for reproducibility) | 1,140 | — | 208 KB | Download |
All 309 punch card types from the 6.1-million-record master file, parsed into named-column CSVs with per-card codebooks. Individual card files and the comprehensive merged file are available in the full release archive.
| File | Description | Size | |
|---|---|---|---|
CPPMASTER_Data_Dictionary.csv |
Complete field-level data dictionary (16,295 fields) | 2.5 MB | Download |
CPPMASTER_Codebook.csv |
Consolidated field-level codebook (6,280+ fields) | 588 KB | Download |
vol3b_crossref.csv |
CPPVAR-to-CPPMASTER cross-reference (906 mappings) | 70 KB | Download |
SAS_Variable_Catalog.csv |
Authoritative field definitions from 24 JHU SAS programs | 1.1 MB | Download |
All Stata .dta datasets and standalone CSVs from the NARA/NBER Johns Hopkins collection, merged into 11 domain-specific analysis files (RDS format), each keyed by the standard 9-digit case identifier with multi-card records coalesced to one row per child/pregnancy.
| File | Description | Rows | Cols | Size | |
|---|---|---|---|---|---|
nichd_psychology.rds |
8-month Bayley, 3-year speech/language/hearing, 4-year Stanford-Binet, 7-year WISC, WRAT, Bender-Gestalt, audiology | 44,445 | 894 | 16.2 MB | Download |
nichd_pediatric.rds |
Neonatal exam, 4-month/1-year pediatric exams, growth, 7-year neurology and conditions | 50,099 | 1,651 | 23.6 MB | Download |
nichd_mother_path.rds |
Placenta pathology, maternal obstetric data | 29,388 | 540 | 4.9 MB | Download |
nichd_summary7.rds |
Cumulative conditions through age 7 | 36,757 | 216 | 1.3 MB | Download |
nichd_serology_dta.rds |
19 serology Stata files: ABO/Rh typing, clinical infection, antibody titers | 62,407 | 395 | 2.7 MB | Download |
nichd_adm44.rds |
Death certificates and non-liveborn outcomes | 4,001 | 19 | 56 KB | Download |
standalone_health.rds |
Congenital malformations, 7-year abnormalities, toxemia, ruptured membranes | 111,608 | 201 | 3.2 MB | Download |
standalone_ses.rds |
Socioeconomic indices at registration and age 7 | 80,768 | 108 | 1.3 MB | Download |
standalone_serology.rds |
Serology card CSVs | 89,509 | 307 | 2.8 MB | Download |
standalone_drugs.rds |
Generic and brand drug name files | 49,214 | 43 | 607 KB | Download |
standalone_other.rds |
Visit/schedule, speech/language/hearing, W17 relationship files | 218,079 | 246 | 3.0 MB | Download |
master_codebook_integrated.csv |
Documentation for all 4,814 variables across 11 domains | 4,814 | — | 246 KB | Download |
Every child crossed with every variable from every card, merged into a single wide table.
| File | Description | Rows | Cols | Size | |
|---|---|---|---|---|---|
cpp_unified_wide.rds |
All card + standalone data merged per child (R binary) | 64,834 | 4,862 | 133 MB | Download |
cpp_unified_wide.csv |
Same, CSV format (in full release archive) | 64,834 | 4,862 | 674 MB | In Archive |
cpp_unified_manifest.csv |
Variable manifest with missingness and summary statistics | 4,862 | — | 288 KB | Download |
cpp_unified_supplementary.csv |
2,534 sparse variables (<1% coverage), mergeable by case_id | 64,834 | 2,534 | — | In Archive |
cpp_unified_supplementary_codebook.csv |
Codebook for supplementary sparse variables (also browseable on Codebook page) | 2,534 | — | — | Download |
27,721 pairwise kinship links spanning seven levels of genetic relatedness, with twin zygosity classification. Includes 14,208 within-family sibling and twin pairs plus 13,513 cross-family extended relative pairs (first cousins, second cousins, half-cousins, nieces/nephews).
| File | Description | Rows | Size | |
|---|---|---|---|---|
cpp_kinship_links.csv |
Within-family sibling and twin pairs with R coefficients | 14,208 | 1.8 MB | Download |
cpp_extended_kinship_links.csv |
Cross-family extended relative pairs from W17C–M work files (9,060 first-cousin, 1,702 second-cousin, 1,266 first-cousin-once-removed, 720 half-first-cousin, 435 half-niece/nephew, 330 niece/nephew) | 13,513 | 0.6 MB | Download |
cpp_twin_zygosity.csv |
Twin pair zygosity classification (5-tier evidence schema) | 640 | 115 KB | Download |
Pre-computed files for common analytical tasks.
| File | Description | |
|---|---|---|
Cognitive & latent variable scores (three complementary files, mergeable by case_id): |
||
cpp_cognitive_scores.csv |
Raw scores and z-scores for 12 cognitive measures (SB, WISC, WRAT, Bender, etc.) plus a default g factor | Download |
cpp_g_factors.csv |
9 g factor scores (PCA, PAF, CFA across three variable sets, plus IRT-based Stanford-Binet theta) | Download |
cpp_item_scores.csv |
PCA and IRT factor scores from 5 item-level batteries (8-month Bayley, 7-year examiner ratings, neurological exams) | Download |
| Physical and socioeconomic: | ||
cpp_growth_trajectories.csv |
755,739 longitudinal physical measurements for 55,443 children | Download |
cpp_birthweight_zscores.csv |
Sex- and gestational-age-specific z-scores for 54,223 children | Download |
cpp_ses_detailed.csv |
Merged registration and 7-year SES data | Download |
cpp_sb_items_reconstructed.csv |
Stanford-Binet reconstructed item responses for 37,820 children | Download |
| Within-family analysis tools: | ||
cpp_disability_discordance.csv |
10 disability criteria and composite flags for 53,640 children | Download |
cpp_discordant_pairs.csv |
11,539 sibling/twin pairs flagged for discordance | Download |
Inverse-probability attrition weights and Census population weights enabling nationally representative estimation.
| File | Description | |
|---|---|---|
cpp_weights.csv |
Survey weights for all 59,391 children | Download |
cpp_weights_codebook.csv |
Documentation for weight variables | Download |
Contains everything above plus individual parsed CPPMASTER card files, standalone datasets, comprehensive merged file, multirow companion files, and all documentation.
| File | Description | Size | |
|---|---|---|---|
CPP_Data_Release.zip |
Everything in one download | 694 MB | Download |