Short thoughts, sometimes explaining larger projects.

More Posts

A twitter bot that has read it all


Notes from PyMC3 journal club


Slides from a talk at Phillips Academy Andover on 12 January, 2018.



Hipster Media

A twitter bot inspired by the wonderful @NYT_first_said, this project uses the Media Cloud project to find the earliest mention of a word in major English language newspapers. It uses this to issue a smugly superior tweet.


imcmc (im-sea-em-sea) is a small library for turning 2d images into probability distributions and then sampling from them to create images and gifs.

Date Guesser

A library to extract a publication date from a web page, along with a measure of the accuracy. Built with the support and help of the Center for Civic Media at the MIT Media Lab.

Feed Seeker

A library for finding atom, rss, rdf, and xml feeds from web pages. Produced at the mediacloud project. An incremental improvement over feedfinder2, which was itself based on feedfinder, written by Mark Pilgrim, and maintained by Aaron Swartz until his untimely death.


A web app for generating samples from a sketched probability distribution function.


A command line utility to run, profile, and save Jupyter notebooks. Available on github and pypi.


A command line utility to create kernels in Jupyter from conda and virtual environments. Available on github and pypi.


I am a contributor to PyMC3, a “Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms.”


Both a precious pup and a task runner for Python. Available on github and pypi.

Compare Cross Country Courses

An offshoot of another project, this allows you to compare times between most collegiate cross country courses.

Minimal Machine Learning Visualization Example II

A demonstration of using Flask, React, and d3js to visualize machine learning models. This is a port of a previous project from Angular to React.

Cross Country Predictions

Using hundreds of thousands of historical cross country running results to make predictions about future meets. The page is updated more than weekly during the season.

In a Word

A small project which once a day checks for new words on Futility Closet and keeps a searchable, downloadable list of the curious words it finds there. Click the random button a few times, or make an Anki study deck.

Predicting March Madness

For the second year in a row, I took part in Kaggle’s contest to predict March Madness winners. The code for the actual model is not very expository (get in touch if you are interested), but I also built a friendlier page to query predictions interactively at the link.

Bayesian Updating

An interactive page for demonstrating and experimenting with the beta distribution. Built with AngularJS and d3js.

> TidyTex

A command line utility for automatically compiling $\LaTeX$, and eliminating auxiliary files. Try it out with pip install tidytex.

A Bayesian Approach to L1 and L2 Regularization

An essay on building linear regression models. It is converted from notes for a talk I gave at Rice University in September 2014. Contains lots of pictures, lots of interactivity, and a modest amount of math. As a bonus, everything is typeset with KaTeX.

Live $\LaTeX$ Previews

An AngularJS directive providing live rendering of MathJax. There is a more in depth info page with examples at the link.

Parametric Graphs

A small script built on top of d3js allowing you to define simple animated parametric equations.

Linear Regression Demo

A demonstration of linear regression, overfitting, normalization, and regularization. Allows you to choose data from a distribution, and interactively fit polynomials to the data using least squares, ridge, or Lasso regression.

Minimal Machine Learning Visualization Example

A demonstration of using Flask, AngularJS, and d3js to visualize machine learning models. This is meant to be a minimal example of how to put together such a demo, showing how to make the tech stack play nice.

Recent Talks

More Talks

Notes and reflections from getting involved in open source, up to becoming a maintainer of the PyMC3 software project.

A presentation to new graduate students at the MIT Media Lab, covering what machine learning is, how it can go right, how it can go wrong, and how to use it in their own projects.

PyMC3 is a Python library that allows you to specify a statistical model in a natural way, and then reason about it in the presence of data. This talk will compare the approaches from PyMC3 and the popular scikit-learn library in fitting regression models, and in applying regularization.

An introduction to the algorithms behind Bayesian inference.

Introductory talk covering building a linear regression estimator from scratch, and decision boundaries for various classifiers. Prerequisites are a vague memory of linear algebra and calculus.


  • >>> "{user}@{provider}.com".format(user=colcarroll, provider=gmail)