Blog

Short thoughts, sometimes explaining larger projects.

Highlights of PyMC3 v3.8

ODEs, approximate Bayesian inference, and ArviZ: A tour of the new features.

Sun, Aug 18, 2019 5 min read

Very parallel MCMC sampling

Four chains isn’t cool. You know what’s cool? A million chains.

Tue, Jul 23, 2019 1 min read

A tour of probabilistic programming APIs

I added a working tour of 9 probabilistic programming languages in Python. Code to get it all to run is here (though you are on your own installing all the correct frameworks), and issues/corrections/suggestions are all happily appreciated!

Projects

minimc

This is a test library to provide reference implementations of MCMC algorithms and ideas. The basis and reference for much of this library is from Michael Betancourt’s wonderful A Conceptual Introduction to Hamiltonian Monte Carlo. The highlight of the library right now is the ~15 line Hamiltonian Monte Carlo implementation (which relies on an 8 line integrator). This is commented and documented, with an aim to be instructive to read.

Ridge Map

A library for making ridge plots of… ridges. Choose a location, get an elevation map, and tinker with it to make something beautiful. Heavily inspired from Zach Cole’s beautiful art, Jake Vanderplas’ examples, and Joy Division’s 1979 album “Unknown Pleasures”. Uses matplotlib, SRTM.py, numpy, and scikit-image (for lake detection).

jshmc

Prototype to interactively visualize Hamiltonian Monte Carlo sampling in javascript

Strava Calendar

Plot the paths of all your runs from a year. Using Strava.

Simulation Based Calibration

A PyMC3 implementation of the algorithms from: Validating Bayesian Inference Algorithms with Simulation-Based Calibration (Talts, Betancourt, Simpson, Vehtari, Gelman).

ArviZ

I helped create ArviZ, a Python package for exploratory analysis of Bayesian models that is compatible with PyStan, PyMC3, emcee, Pyro, and TensorFlow probability. Includes functions for posterior analysis, model checking, comparison and diagnostics. Paper, docs, and on GitHub.

Hipster Media

A twitter bot inspired by the wonderful @NYT_first_said, this project uses the Media Cloud project to find the earliest mention of a word in major English language newspapers. It uses this to issue a smugly superior tweet.

imcmc

imcmc (im-sea-em-sea) is a small library for turning 2d images into probability distributions and then sampling from them to create images and gifs.

Date Guesser

A library to extract a publication date from a web page, along with a measure of the accuracy. Built with the support and help of the Center for Civic Media at the MIT Media Lab.

Feed Seeker

A library for finding atom, rss, rdf, and xml feeds from web pages. Produced at the mediacloud project. An incremental improvement over feedfinder2, which was itself based on feedfinder, written by Mark Pilgrim, and maintained by Aaron Swartz until his untimely death.

Skample!

A web app for generating samples from a sketched probability distribution function.

Carpo

A command line utility to run, profile, and save Jupyter notebooks. Available on github and pypi.

Callisto

A command line utility to create kernels in Jupyter from conda and virtual environments. Available on github and pypi.

PyMC3

I am a contributor to PyMC3, a “Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms.”

Pete

Both a precious pup and a task runner for Python. Available on github and pypi.

Compare Cross Country Courses

An offshoot of another project, this allows you to compare times between most collegiate cross country courses.

Minimal Machine Learning Visualization Example II

A demonstration of using Flask, React, and d3js to visualize machine learning models. This is a port of a previous project from Angular to React.

Cross Country Predictions

Using hundreds of thousands of historical cross country running results to make predictions about future meets. The page is updated more than weekly during the season.

In a Word

A small project which once a day checks for new words on Futility Closet and keeps a searchable, downloadable list of the curious words it finds there. Click the random button a few times, or make an Anki study deck.

Predicting March Madness

For the second year in a row, I took part in Kaggle’s contest to predict March Madness winners. The code for the actual model is not very expository (get in touch if you are interested), but I also built a friendlier page to query predictions interactively at the link.

Bayesian Updating

An interactive page for demonstrating and experimenting with the beta distribution. Built with AngularJS and d3js.

> TidyTex

A command line utility for automatically compiling $\LaTeX$, and eliminating auxiliary files. Try it out with pip install tidytex.

A Bayesian Approach to L1 and L2 Regularization

An essay on building linear regression models. It is converted from notes for a talk I gave at Rice University in September 2014. Contains lots of pictures, lots of interactivity, and a modest amount of math. As a bonus, everything is typeset with KaTeX.

Recent Talks

More Talks

Introduction to Bayesian Model Evaluation, Visualization, and Comparison Using ArviZ

Tue, Jul 9, 2019

Scipy 2019

Slides

Pragmatic Probabilistic Programming: Parameter Adaptation in PyMC3

Wed, Jun 19, 2019

Probabilistic & Differentiable Programming Summit

Slides

Tidy and beautiful: Visualizing Bayesian models with xarray and ArviZ

Thu, Dec 13, 2018

Bayesian Mixer London

Slides

Tidy and beautiful: Visualizing Bayesian models with xarray and ArviZ

Wed, Oct 17, 2018

PyData New York City 2018

Slides

ArviZ: a unified library for Bayesian model criticism and visualization in Python

Fri, Oct 5, 2018

PROBPROG 2018: The International Conference on Probabilistic Programming

PDF

Two Years of Open Source

Fri, Jan 12, 2018

Lecture for CSC 630 at Phillips Academy Andover, “The Open Source Movement”

Slides

A working knowledge of machine learning in 45 minutes

Wed, Dec 27, 2017

Lecture for MAS 500 at the MIT Media Lab, “Hands on Foundations in Media Technology”

Slides

Two views on regression with PyMC3 and scikit-learn

Wed, Nov 29, 2017

PyData New York City 2017

Slides

Hamiltonian Monte Carlo in PyMC3

Thu, Jun 15, 2017

Boston Bayesians Meetup

PDF Slides

Build You A Machine Learning

Sat, Sep 17, 2016

Kensho Machine Learning Seminar

Colin Carroll

Machine learning engineer

About Me

I am a machine learning researcher and software engineer in Cambridge, MA. Work in the past has involved modelling risk in the airline industry, collecting and organizing all the news, and building NLP-powered search infrastructure for finance.

I also spend a fair amount of time contributing to open source, particularly the popular PyMC3 and ArviZ libraries. In my academic life, I studied geometric measure theory with Dr. Robert Hardt at Rice University.

In my spare time I run, walk in the woods with Pete the pup, and launch balloons into [near] space.

Interests

Bayesian Machine Learning
Data Visualization
Natural Language Processing

Education

PhD in Mathematics, 2012

Rice University
MA in Mathematics and Economics, 2007

Williams College

Contact

>>> "{user}@{provider}.com".format(user=colcarroll, provider=gmail)

Blog

Highlights of PyMC3 v3.8

Very parallel MCMC sampling

A tour of probabilistic programming APIs

Projects

minimc

Ridge Map

jshmc

Strava Calendar

Simulation Based Calibration

ArviZ

Hipster Media

imcmc

Date Guesser

Feed Seeker

Skample!

Carpo

Callisto

PyMC3

Pete

Compare Cross Country Courses

Minimal Machine Learning Visualization Example II

Cross Country Predictions

In a Word

Predicting March Madness

Bayesian Updating

> TidyTex

A Bayesian Approach to L1 and L2 Regularization

Live $\LaTeX$ Previews

Parametric Graphs

Linear Regression Demo

Minimal Machine Learning Visualization Example

Recent Talks

Colin Carroll

Machine learning engineer

About Me

Interests

Education

Contact