Blog

Short thoughts, sometimes explaining larger projects.

Highlights of PyMC3 v3.8

ODEs, approximate Bayesian inference, and ArviZ: A tour of the new features.

Very parallel MCMC sampling

Four chains isn’t cool. You know what’s cool? A million chains.

A tour of probabilistic programming APIs

I added a working tour of 9 probabilistic programming languages in Python. Code to get it all to run is here (though you are on your own installing all the correct frameworks), and issues/corrections/suggestions are all happily appreciated!

Projects

minimc

This is a test library to provide reference implementations of MCMC algorithms and ideas. The basis and reference for much of this library is from Michael Betancourt’s wonderful A Conceptual Introduction to Hamiltonian Monte Carlo. The highlight of the library right now is the ~15 line Hamiltonian Monte Carlo implementation (which relies on an 8 line integrator). This is commented and documented, with an aim to be instructive to read.

Ridge Map

A library for making ridge plots of… ridges. Choose a location, get an elevation map, and tinker with it to make something beautiful. Heavily inspired from Zach Cole’s beautiful art, Jake Vanderplas’ examples, and Joy Division’s 1979 album “Unknown Pleasures”. Uses matplotlib, SRTM.py, numpy, and scikit-image (for lake detection).

jshmc

Prototype to interactively visualize Hamiltonian Monte Carlo sampling in javascript

Strava Calendar

Plot the paths of all your runs from a year. Using Strava.

Simulation Based Calibration

A PyMC3 implementation of the algorithms from: Validating Bayesian Inference Algorithms with Simulation-Based Calibration (Talts, Betancourt, Simpson, Vehtari, Gelman).

ArviZ

I helped create ArviZ, a Python package for exploratory analysis of Bayesian models that is compatible with PyStan, PyMC3, emcee, Pyro, and TensorFlow probability. Includes functions for posterior analysis, model checking, comparison and diagnostics. Paper, docs, and on GitHub.

Hipster Media

A twitter bot inspired by the wonderful @NYT_first_said, this project uses the Media Cloud project to find the earliest mention of a word in major English language newspapers. It uses this to issue a smugly superior tweet.

imcmc

imcmc (im-sea-em-sea) is a small library for turning 2d images into probability distributions and then sampling from them to create images and gifs.

Date Guesser

A library to extract a publication date from a web page, along with a measure of the accuracy. Built with the support and help of the Center for Civic Media at the MIT Media Lab.

Feed Seeker

A library for finding atom, rss, rdf, and xml feeds from web pages. Produced at the mediacloud project. An incremental improvement over feedfinder2, which was itself based on feedfinder, written by Mark Pilgrim, and maintained by Aaron Swartz until his untimely death.

Skample!

A web app for generating samples from a sketched probability distribution function.

Carpo

A command line utility to run, profile, and save Jupyter notebooks. Available on github and pypi.

Callisto

A command line utility to create kernels in Jupyter from conda and virtual environments. Available on github and pypi.

PyMC3

I am a contributor to PyMC3, a “Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms.”

Pete

Both a precious pup and a task runner for Python. Available on github and pypi.

Compare Cross Country Courses

An offshoot of another project, this allows you to compare times between most collegiate cross country courses.

Minimal Machine Learning Visualization Example II

A demonstration of using Flask, React, and d3js to visualize machine learning models. This is a port of a previous project from Angular to React.

Cross Country Predictions

Using hundreds of thousands of historical cross country running results to make predictions about future meets. The page is updated more than weekly during the season.

In a Word

A small project which once a day checks for new words on Futility Closet and keeps a searchable, downloadable list of the curious words it finds there. Click the random button a few times, or make an Anki study deck.

For the second year in a row, I took part in Kaggle’s contest to predict March Madness winners. The code for the actual model is not very expository (get in touch if you are interested), but I also built a friendlier page to query predictions interactively at the link.

Bayesian Updating

An interactive page for demonstrating and experimenting with the beta distribution. Built with AngularJS and d3js.

> TidyTex

A command line utility for automatically compiling $\LaTeX$, and eliminating auxiliary files. Try it out with pip install tidytex.

A Bayesian Approach to L1 and L2 Regularization

An essay on building linear regression models. It is converted from notes for a talk I gave at Rice University in September 2014. Contains lots of pictures, lots of interactivity, and a modest amount of math. As a bonus, everything is typeset with KaTeX.

Live $\LaTeX$ Previews

An AngularJS directive providing live rendering of MathJax. There is a more in depth info page with examples at the link.

Parametric Graphs

A small script built on top of d3js allowing you to define simple animated parametric equations.

Linear Regression Demo

A demonstration of linear regression, overfitting, normalization, and regularization. Allows you to choose data from a distribution, and interactively fit polynomials to the data using least squares, ridge, or Lasso regression.

Minimal Machine Learning Visualization Example

A demonstration of using Flask, AngularJS, and d3js to visualize machine learning models. This is meant to be a minimal example of how to put together such a demo, showing how to make the tech stack play nice.

Recent Talks

Pragmatic Probabilistic Programming: Parameter Adaptation in PyMC3
Wed, Jun 19, 2019
Tidy and beautiful: Visualizing Bayesian models with xarray and ArviZ
Wed, Oct 17, 2018
ArviZ: a unified library for Bayesian model criticism and visualization in Python
Fri, Oct 5, 2018
Two Years of Open Source
Fri, Jan 12, 2018
A working knowledge of machine learning in 45 minutes
Wed, Dec 27, 2017
Two views on regression with PyMC3 and scikit-learn
Wed, Nov 29, 2017
Hamiltonian Monte Carlo in PyMC3
Thu, Jun 15, 2017
Build You A Machine Learning
Sat, Sep 17, 2016

Contact

• >>> "{user}@{provider}.com".format(user=colcarroll, provider=gmail)