datascience

Docker Compose Setup for Ingesting Hacker News into Postgres

Set up a Docker Compose environment to ingest Hacker News data into Postgres and apply SQL transformations for data processing.

Shipped January 2026

This is a minimal, production-leaning starter you can docker compose up to ingest Hacker News into Postgres (raw), then run a SQL upsert to a canonical tech staging table, and build a basic velocity mart.

What's included

  • apps/tech: Python ingestion job (Hacker News) — one niche worker
  • libs/: shared persistence + HTTP client utilities
  • sql/: DDL + transforms (Bronze→Silver→Gold)
  • runner/: simple SQL runner to apply transforms in order
  • docker-compose: Postgres + Adminer + Tech worker + SQL runner
  • .env.example: env vars for Postgres connection

Quick start

cp .env.example .env
docker compose up --build

Then open Adminer at http://localhost:8080 (system: PostgreSQL, server: db, user: postgres, pass: postgres, db: trends)

Apply transforms

The runner container will auto-run SQL in /sql on startup. You can re-run:

docker compose run --rm runner python /app/runner.py

Next steps

  • Add more sources (Reddit, GitHub Trending) to apps/tech/main.py.
  • Create apps/finance and sql/silver/stg_trend_items_finance.sql with same pattern.
  • Decide on dbt vs. custom runner (dbt Core recommended once models grow).

Note: The ingestion job respects idempotency via unique payload hashes and upserts by id and (source,url). Tune polling cadence in apps/tech/config.yaml.

Need more context?

Want help adapting this playbook?

Send me the constraints and I'll annotate the relevant docs, share risks I see, and outline the first sprint so the work keeps moving.