Chinese Longitudinal Healthy Longevity Survey Mockup | Justin Napolitano

This repository hosts a mockup project built around the Chinese Longitudinal Healthy Longevity Survey (CLHLS) Biomarkers Datasets from 2009, 2012, and 2014 (ICPSR 37226). It provides data files, documentation, and example analyses primarily implemented in Jupyter Notebooks to facilitate exploration of health and longevity predictors in the Chinese elderly population.

Features

Access to longitudinal biomarker datasets related to Chinese elderly health and longevity.
Sample logistic regression modeling of health predictors using Python.
Included user guide and related literature for context and reference.

Tech Stack

Primary language: Jupyter Notebook (Python)
Data analysis libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
Additional tools: Google BigQuery client, contextily for mapping

Getting Started

Prerequisites

Python 3.7 or higher
Jupyter Notebook

Required Python packages (install via pip):

pip install numpy pandas matplotlib seaborn scikit-learn google-cloud-bigquery contextily

Running the Project

Clone the repository:

git clone https://github.com/justin-napolitano/chinese_longitudinal_mockup.git
cd chinese_longitudinal_mockup

Launch Jupyter Notebook:
```
jupyter notebook
```
Open and run notebooks or scripts in the data folder or others as needed.

Project Structure

chinese_longitudinal_mockup/
├── 37226-descriptioncitation.html      # Description and citation info
├── 37226-manifest.txt                   # Dataset manifest
├── 37226-related_literature.txt        # Bibliography of related works
├── 37226-User_guide.pdf                # User guide for dataset
├── data/                              # Folder containing data and scripts
│   └── logistic_regression.md          # Logistic regression model design and example
├── series-487-related_literature.txt  # Additional literature
└── TermsOfUse.html                     # Terms of use for data

Future Work / Roadmap

Expand example analyses beyond logistic regression.
Integrate more comprehensive data visualizations.
Automate data ingestion and preprocessing pipelines.
Develop additional notebooks demonstrating advanced statistical modeling.
Enhance documentation and user guides.