code
n.b. For the last several years, I written code purely for the purposes of furthering my academic research. The goal of this code is not to be broadly usable. I only commit to maintaining and explaining my research code to the extent that it is assists others in their own academic research. I have also written code designed for broad consumption – see #rstats below for some details about my open source work.
My research code, as well as miscellaneous personal projects in various stages of completion, lives on my Github. I primarily use R
, and most of my work as a developer is on methods packages. I am also a proficient Python user, and have passing exposure to SQL, Julia, and C++.
research software
vsp
performs semi-parametric estimation of latent factors in random-dot product graphs by computing varimax rotations of the spectral embeddings of graphs. The resulting factors are sparse and interpretable. The theory work on this was done by Rohe and Zeng (2022+), and then I ended up using varimax rotation a lot in my own data analysis and wrapped some of the infrastructure I developed into this package. I am committed to maintenance of this package and will respond quickly to feature requests or questions about how you might use it in your own research.fastRG
samples large, sparse random-dot product graphs very efficiently and is especially useful when running simulation studies for spectral network estimators. I am committed to maintenance of this package and will respond quickly to feature requests or questions about how you might use it in your own research. ThefastRG
sampling algorithm is described in Rohe et al. (2018).fastadi
is a proof-of-concept implementation ofAdaptiveImpute
, a self-tuning matrix completion with adaptive thresholding that is closely related tosoftImpute
(Cho, Kim, and Rohe 2019, 2018). I extendedAdaptiveImpute
to the computationally challenging case where the entire upper triangle is observed as part of my work with Karl Rohe on citation networks. This is research code rather than code intended for broad consumption. I make no commitments to maintaining or improving this code unless something about it is blocking an ongoing research project.aPPR
approximates Personalized PageRanks in large graphs, including those that can only be queried via an API.aPPR
additionally performs degree correction and regularization, allowing users to recover blocks from stochastic blockmodels (see Chen, Zhang, and Rohe 2020). OriginallyaPPR
was designed to be used together with theneocache
backend to sample large portions of the Twitter following graph with high Personalized PageRanks around seed nodes (joint work with Nathan Kolbow). I am no longer maintainingneocache
, however, and cannot commit any development time to keeping up with the Twitter API shenanigans. slides
design of statistical software
I am interested in the design of statistical software and have contributed to ROpenSci’s statistical software reviewing guidelines, as well as early versions of the tidymodels implementation principles. I have some long form explorations of modeling software design on my blog:
I review for the Journal of Open Source Software and the R Journal.
#rstats
I have been involved in a number of open source projects in the tidyverse
and tidymodels
orbits. I previously maintained the broom
package, and am responsible for the 0.5.0
release and a portion of the 0.7.0
release. For these contributions I was generously given authorship on the tidyverse paper. I intermittently participate in the Stan and ROpenSci communities.
I also wrote the distributions3
package, which provides an S3 interface to distribution functions, with an emphasis on good documentation and beginner friendly design. The vignettes in particular are designed to walk students intro stat courses though a litany of classic hypothesis tests. I do not actively maintain distributions3
but there is small community of invested contributors.