Resume

Statistician | research software engineer | data scientist

Modified

April 6, 2024

Statistician with 6 years of experience analyzing network data, performing causal inference and building tools for data scientists. Track record of strategic thinking, clear communication and successful remote work.

Work Experience

University of Wisconsin, Madison

PhD Student, Department of Statistics

August 2018 - Present

Developed statistical methods based on principal components analysis to cluster networks with missing data, to perform regression on networks, and to construct, interpret, and regularize network embeddings.
Developed causal inference methods to estimate mediation and spillover effects in social networks, and to determine when product changes have harmful side-effects on behaviors that are difficult to measure. Used causal machine learning to improve precision of estimates while reducing computational requirements by a factor of 5000.
Implemented research methods in user-friendly software. Released nine open source R packages to CRAN (notable: fastRG, vsp, distributions3, gdim, aPPR, fastadi).
Resolved computational bottlenecks in matrix completion algorithms by designing and implementing sparse matrix methods in R and C++. Scaled methods by three orders of magnitude to handle networks with millions of nodes.
Designed an approach to find localized clusters of Twitter users via Personalized PageRank. Managed unreliable Twitter API behavior by caching data in a Neo4J database running in Docker.
Collaborated with ROpenSci to design software development standards for statistical software. Reviewed scientific software for ROpenSci, the R Journal, and the Journal of Open Source Software.

Facebook

Research Intern, Core Data Science

Summer 2020 & Summer 2021

Prototyped a pipeline to automatically suggest relationships between hashtags, for a team using manual labeling. Prototype embedded a hashtag co-occurrence network and was implemented with Python, PyTorch and SQL.
Conducted experiments on hyperbolic embeddings for knowledge graphs and determined non-viability of hyperbolic methods. Advised against additional R&D investment, potentially saving $200k+ in compute costs.
Designed a metric, based on calibration of machine learning models, to help product teams understand reliability of prevalence estimates. Metric reported daily on multiple dashboards. Implemented with sklearn, Numpy, pandas.

RStudio

Intern, tidymodels team

Summer 2018

Re-factored thousands of lines of R code and developed a new test suite for the broom package (600k+ downloads/month, part of the tidyverse), improving behavioral consistency and reducing maintenance burden.
Shipped a major new release of the package (broom 0.5.0). Resolved 80+ open issues and coordinated 40+ pull requests from open source contributors.

Education

University of Wisconsin-Madison

Ph.D. Statistics

Expected spring 2024

Rice University

B.A. Statistics

2018

Skills

Network analysis, embeddings, clustering, causal machine learning, interference, mediation
Data analysis, visualization, modeling, regression, generalized linear models, hypothesis testing
Proficient in R, Python, tidyverse, bash/unix, git; familiar with SQL, C++, Docker, AWS, Julia, Stan