Project 01 · Machine Learning · Credit Risk
Credit Default Risk Model: rank the risk, then set the policy.
An end-to-end machine-learning credit model that predicts default probability and turns it into a lending decision, exposing the trade-off a policy threshold sets.
Premise: a lender approves on a policy, not a probability. The hard part of a credit model isn't the AUC, it's what threshold to set and what it costs.
Trains a gradient-boosting model against a logistic baseline on held-out data (about 0.80 AUC), then a threshold slider turns predicted risk into approve or decline, with the approval rate, the default rate among approved, and the share of bad applicants caught. Calibration and permutation importance show whether to trust it. Built in Python and scikit-learn on the Statlog German Credit data via OpenML. The point: the model ranks risk, the threshold is the decision.
Relevant to
Capital One
Voleon
Banks & fintech
Gradient boosting · Calibration · Threshold policy
Python · scikit-learn
Project 02 · Data Quality · Anomaly Detection
Market Data Health Monitor: catch the bad feed before the model does.
Ingests a markets feed, runs data-correctness checks, and flags abnormal behavior with rolling z-score control limits, then reports what a researcher should review.
Premise: trading and research run on clean data, and the most expensive failures start as a quiet bad feed, a gap, a stale value, or a price that should never have printed.
Runs calendar, duplicate, validity, staleness, and freshness checks, then rolling z-score anomaly detection on returns and volume, and produces a per-feed verdict on whether to hold downstream models. Built in Python and pandas, on a synthetic feed with issues injected on purpose, or a CSV upload of real data.
Relevant to
Voleon
Quant funds
Data platforms
Time series · Anomaly detection · Data health
Python · pandas
Project 03 · Quantitative Finance · Factor Research
Has the value premium decayed? A retrospective on HML.
A research note on the Fama-French value factor: is the premium still significant, is it stable, and what does a backward-looking estimate honestly imply for the future?
Premise: treating an in-sample factor mean as a forecast is a classic error. This note is as much about that gap as about value.
Estimates the full-sample premium with Newey-West standard errors, tests stability across subperiods and a 10-year rolling window, and measures the value-winter drawdown, using monthly Fama-French factors from the Ken French Data Library. It closes on an honest forward read rather than a point forecast. Built in Python and statsmodels.
Relevant to
Voleon
Asset managers
Factor research · Newey-West · Regime analysis
Python · statsmodels
Project 04 · Applied AI · Financial NLP
Cadence: structured intelligence from earnings calls.
A local-first AI tool that turns an unstructured earnings-call transcript into a structured KPI, sentiment, and risk dashboard in about 60 seconds, fully on-device.
Premise: earnings calls are dense and slow to read at portfolio scale, and cloud LLMs raise privacy and cost questions for market-sensitive text.
Runs a local LLM (Ollama, Qwen2.5) with JSON-schema-constrained decoding to extract themes, risks, and evidence, rendered in a Streamlit dashboard. It is an end-to-end Python data-and-AI pipeline where the transcript never leaves the machine, zero marginal cost and full control of prompt and schema.
Relevant to
Voleon
Bloomberg
AlphaSense
Local LLM · NLP · Data pipeline
Python · Streamlit · Ollama