datascience
Chinese Longitudinal Healthy Longevity Survey Mockup
Explore the Chinese Longitudinal Healthy Longevity Survey datasets with example analyses and logistic regression modeling in Python.
Shipped January 2026
This repository hosts a mockup project built around the Chinese Longitudinal Healthy Longevity Survey (CLHLS) Biomarkers Datasets from 2009, 2012, and 2014 (ICPSR 37226). It provides data files, documentation, and example analyses primarily implemented in Jupyter Notebooks to facilitate exploration of health and longevity predictors in the Chinese elderly population.
Features
- Access to longitudinal biomarker datasets related to Chinese elderly health and longevity.
- Sample logistic regression modeling of health predictors using Python.
- Included user guide and related literature for context and reference.
Tech Stack
- Primary language: Jupyter Notebook (Python)
- Data analysis libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
- Additional tools: Google BigQuery client, contextily for mapping
Getting Started
Prerequisites
- Python 3.7 or higher
- Jupyter Notebook
- Required Python packages (install via pip):
pip install numpy pandas matplotlib seaborn scikit-learn google-cloud-bigquery contextily
Running the Project
- Clone the repository:
git clone https://github.com/justin-napolitano/chinese_longitudinal_mockup.git cd chinese_longitudinal_mockup - Launch Jupyter Notebook:
jupyter notebook - Open and run notebooks or scripts in the
datafolder or others as needed.
Project Structure
chinese_longitudinal_mockup/
├── 37226-descriptioncitation.html # Description and citation info
├── 37226-manifest.txt # Dataset manifest
├── 37226-related_literature.txt # Bibliography of related works
├── 37226-User_guide.pdf # User guide for dataset
├── data/ # Folder containing data and scripts
│ └── logistic_regression.md # Logistic regression model design and example
├── series-487-related_literature.txt # Additional literature
└── TermsOfUse.html # Terms of use for data
Future Work / Roadmap
- Expand example analyses beyond logistic regression.
- Integrate more comprehensive data visualizations.
- Automate data ingestion and preprocessing pipelines.
- Develop additional notebooks demonstrating advanced statistical modeling.
- Enhance documentation and user guides.
Need more context?
Want help adapting this playbook?
Send me the constraints and I'll annotate the relevant docs, share risks I see, and outline the first sprint so the work keeps moving.