datascience
Data Analysis Projects with Google Cloud and Python
Explore a collection of data analysis projects using Python, Jupyter Notebooks, and Google Cloud services like BigQuery and Bigtable.
Shipped January 2026
A collection of data analysis projects and technical interview solutions by Justin Napolitano, primarily using Jupyter Notebooks and Python scripts interfacing with Google Cloud services such as BigQuery and Bigtable.
Features
- SQL queries and Python scripts for analyzing NYC taxi trip data using BigQuery.
- Weather data collection and streaming solutions using APIs and Google Cloud Bigtable.
- Jupyter Book structured technical interview answers and cost of living models.
- Automated build pipeline for Jupyter Book documentation.
Tech Stack
- Python 3
- Jupyter Notebook
- Google Cloud BigQuery and Bigtable
- Java (sample Bigtable external query)
- Jupyter Book for documentation
Getting Started
Prerequisites
- Python 3 installed
- Google Cloud SDK configured with appropriate credentials
pipfor Python package management
Installation
- Clone the repository:
git clone https://github.com/justin-napolitano/pmc-submission.git
cd pmc-submission
- Install dependencies:
pip install -r requirements.txt
- Set Google Cloud credentials environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/creds.json"
Running
- To run the main Python script:
python main.py
-
To run test queries against BigQuery, explore
test.py. -
To build the Jupyter Book documentation:
python python_build.py
Project Structure
pmc-submission/
├── ch4-emissions/ # Folder likely related to methane emissions analysis
├── jupyter-book/ # Jupyter Book source and build files
│ ├── _config.yml # Jupyter Book configuration
│ ├── _toc.yml # Table of contents
│ ├── notebooks/ # Markdown and notebooks for interview and analysis
│ ├── python_build.py # Build automation for Jupyter Book
├── login.py # Google Cloud Bigtable login helper
├── main.py # Main entry point
├── propensity_scoring/ # Folder likely containing propensity scoring analysis
├── python_build.py # Build automation for main project
├── query.py # Java-like imports, possibly incomplete BigQuery client code
├── query_gooogle.java # Java sample for Bigtable external query
├── query_gooogle.json # Duplicate of java file, likely misplaced
├── test.py # Python scripts with BigQuery SQL queries
└── documentation.ipynb # Possibly project documentation notebook
Future Work / Roadmap
- Complete and clean up Java and Python BigQuery client code.
- Consolidate or remove duplicate/misplaced files like
query_gooogle.json. - Expand automated testing and CI/CD for data pipelines.
- Enhance documentation with more detailed usage examples.
- Integrate data pipeline automation for weather and taxi data ingestion.
- Improve error handling and logging in scripts.
Assumptions: The project is a personal portfolio of data analysis and technical interview work using Google Cloud services. Some files appear incomplete or duplicated, suggesting ongoing development.
Need more context?
Want help adapting this playbook?
Send me the constraints and I'll annotate the relevant docs, share risks I see, and outline the first sprint so the work keeps moving.