Skip to content
View CCallahan308's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report CCallahan308

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
CCallahan308/README.md

Christian Callahan

Data/ML engineer focused on reproducible, leakage-safe machine learning and honest evaluation.

I build models that survive scrutiny: leakage-safe labeling, real baselines, cross-validation, probability calibration, and a clear line between synthetic and real data. Every metric is committed and reproducible from a clean clone. I work end-to-end — from SQL/warehouse modeling through tuned models to Streamlit/FastAPI interfaces.

I optimize for correct methodology: proper train/test discipline, baselines, calibration, and clearly-stated limitations.


Featured Projects

Pit Wall Intelligence — F1 race-strategy analytics: FastF1 data in a DuckDB + dbt warehouse; tyre-degradation and undercut-success models served via FastAPI and a 6-page Streamlit dashboard. Calibrated LightGBM undercut classifier (AUC 0.66 ± 0.05, 5-fold GroupKFold on 62/21 train/test race split — GREEN-flag stops only). Monte Carlo race simulator. Pit-cost calculator across 33 circuits with bootstrap CIs, SC/VSC regime separation.

SignalForge — Churn modeling on IBM Telco with statistical rigor: Optuna tuning, leakage-free CV, bootstrap 95% CIs, paired t-tests, calibration. The three models land within ~0.003 AUC with overlapping confidence intervals — model choice is a calibration/interpretability call, not an accuracy race.

SaaS Churn Simulator — Leakage-safe churn + retention-ROI pipeline on RetailRocket (2.76M events). Time-windowed labeling, visitor-disjoint splits, Optuna-tuned LightGBM, isotonic calibration. 5-fold CV ROC-AUC 0.88 ± 0.06. Reports honestly that the ~99% base rate caps business lift. Live demo →

Ecommerce Retention & Growth — 30-day churn prediction and LTV segmentation on KKBox data; calibrated XGBoost (ROC-AUC ~0.79), ROI simulator. Ships a synthetic generator so it runs without the large download.

Ticket Intel — Support-ticket routing and summarization on Banking77 using TF-IDF + Naive Bayes by design: fast, interpretable, with a documented rationale for not using an LLM. Live demo →


Also

  • MeasureMap — Self-hosted KPI governance registry: define, approve, version, and audit metrics with role-based access, LDAP/AD integration, CSV import with validation, full audit trail. Built to run air-gapped in a hospital network. Next.js · TypeScript · PostgreSQL · Prisma · Docker
  • Healthcare SQL Analytics — Production EHR analytics SQL patterns from 6 years of clinical and operational BI on Meditech Paragon: wRVU physician productivity, SDOH screening compliance, 340B drug utilization extract, sepsis missed-identification rate. Synthetic identifiers throughout.
  • AutoModeler — Type a ticker, get a fully-linked 3-statement Excel model. FMP API · FastAPI · Python.

Stack

Python SQL TypeScript scikit-learn XGBoost LightGBM Optuna pandas DuckDB dbt FastAPI Streamlit Next.js PostgreSQL Prisma pytest GitHub Actions Docker Tableau


Background

  • BI Analyst — 4 years of clinical and operational analytics at a community hospital system on Meditech Paragon EHR: physician productivity reporting, clinical quality (SDOH, sepsis, readmissions), 340B compliance, Tableau dashboards, EMR data migration
  • MBA + MS Data Science at Eastern University (expected 2027)
  • Previous: Manufacturing, law enforcement — learned to find signal in noisy data and explain it to people who need a decision, not a model card

Portfolio · LinkedIn · Email

Pinned Loading

  1. pit-wall-intelligence pit-wall-intelligence Public

    Race strategy & tyre degradation analytics for Formula 1 — built on FastF1, DuckDB, dbt, scikit-learn, LightGBM

    Python

  2. MeasureMap MeasureMap Public

    Self-hosted KPI governance registry for healthcare and regulated orgs — define, approve, version, and audit every metric in one place. Next.js 15 · PostgreSQL · LDAP/AD. Apache 2.0.

    TypeScript 1

  3. signalforge signalforge Public

    Production churn prediction with statistical rigor — Optuna-tuned models, bootstrap 95% CIs, paired t-tests, calibration analysis on IBM Telco data. Live Streamlit dashboard.

    Python

  4. ecommerce-retention-growth ecommerce-retention-growth Public

    Subscription churn prediction + retention ROI on the KKBox dataset (400M+ event logs, 484K user holdout). XGBoost, SHAP, K-Means LTV segmentation.

    Python 2

  5. saas-churn-simulator saas-churn-simulator Public

    Leakage-safe, reproducible churn-prediction + retention-ROI pipeline on RetailRocket (Optuna-tuned, calibrated)

    Python 2

  6. ticket-intel ticket-intel Public

    NLP support ticket routing and summarization — TF-IDF + Naive Bayes for speed (12ms p99, 90% F1), with an extension path for transformer-based classification. FastAPI + Streamlit.

    Python 2