QUANTITATIVE & DATA SCIENCE PORTFOLIO

Data science for markets and risk.

Selected Work
4 projects · scroll

Project 01 · Machine Learning · Credit Risk

Credit Default Risk Model: rank the risk, then set the policy.

An end-to-end machine-learning credit model that predicts default probability and turns it into a lending decision, exposing the trade-off a policy threshold sets.

Premise: a lender approves on a policy, not a probability. The hard part of a credit model isn't the AUC, it's what threshold to set and what it costs.

Trains a gradient-boosting model against a logistic baseline on held-out data (about 0.80 AUC), then a threshold slider turns predicted risk into approve or decline, with the approval rate, the default rate among approved, and the share of bad applicants caught. Calibration and permutation importance show whether to trust it. Built in Python and scikit-learn on the Statlog German Credit data via OpenML. The point: the model ranks risk, the threshold is the decision.

Relevant to Capital One Voleon Banks & fintech Gradient boosting · Calibration · Threshold policy Python · scikit-learn

Project 02 · Data Quality · Anomaly Detection

Market Data Health Monitor: catch the bad feed before the model does.

Ingests a markets feed, runs data-correctness checks, and flags abnormal behavior with rolling z-score control limits, then reports what a researcher should review.

Premise: trading and research run on clean data, and the most expensive failures start as a quiet bad feed, a gap, a stale value, or a price that should never have printed.

Runs calendar, duplicate, validity, staleness, and freshness checks, then rolling z-score anomaly detection on returns and volume, and produces a per-feed verdict on whether to hold downstream models. Built in Python and pandas, on a synthetic feed with issues injected on purpose, or a CSV upload of real data.

Relevant to Voleon Quant funds Data platforms Time series · Anomaly detection · Data health Python · pandas

Project 03 · Quantitative Finance · Factor Research

Has the value premium decayed? A retrospective on HML.

A research note on the Fama-French value factor: is the premium still significant, is it stable, and what does a backward-looking estimate honestly imply for the future?

Premise: treating an in-sample factor mean as a forecast is a classic error. This note is as much about that gap as about value.

Estimates the full-sample premium with Newey-West standard errors, tests stability across subperiods and a 10-year rolling window, and measures the value-winter drawdown, using monthly Fama-French factors from the Ken French Data Library. It closes on an honest forward read rather than a point forecast. Built in Python and statsmodels.

Relevant to Voleon Asset managers Factor research · Newey-West · Regime analysis Python · statsmodels

Project 04 · Applied AI · Financial NLP

Cadence: structured intelligence from earnings calls.

A local-first AI tool that turns an unstructured earnings-call transcript into a structured KPI, sentiment, and risk dashboard in about 60 seconds, fully on-device.

Premise: earnings calls are dense and slow to read at portfolio scale, and cloud LLMs raise privacy and cost questions for market-sensitive text.

Runs a local LLM (Ollama, Qwen2.5) with JSON-schema-constrained decoding to extract themes, risks, and evidence, rendered in a Streamlit dashboard. It is an end-to-end Python data-and-AI pipeline where the transcript never leaves the machine, zero marginal cost and full control of prompt and schema.

Relevant to Voleon Bloomberg AlphaSense Local LLM · NLP · Data pipeline Python · Streamlit · Ollama
Contact

Open to quantitative, data science, and ML conversations.