A friend posed the following question:

In MA there were 0.52 fatalities/100 million miles driven. Since 2009 Google has driven ~1.5 million miles without a crash – does this convince you that self-driving cars are safe?

Being low on coffee and feeling difficult, I objected to the premise for two reasons:

- We’re comparing a rate (0.52 fatalities/100M miles) to a number (0 fatalities)
- Google probably drives in California, not Massachusetts

In searching for the California fatality rate, I found a beautiful data set that included miles driven and total miles on a state-by-state level. This let me elevate my statistical snark to an nice case study: I could essentially copy and paste the example for hierarchical partial pooling from the Stan documentation that was later ported to PyMC3.

To describe our approach intuitively, it is easier to talk about two other
approaches we *do not* take:

**No pooling**: Each state, plus Google, is an independent experiment. Then either robots never crash cars, or we put some prior on each experiment, and we fit 52 different models, each using 1/52nd of the data.**Complete pooling**: We could aggregate all the miles driven by man and machine, and then just ask how surprising it is to drive 1.5 million with no crash. All driving is the same, and a mile in Wyoming is treated like a mile in NYC.

The approach we *do* take is called **partial pooling**: we acknowledge that each state is a
different experiment, while also positing that there is a distribution of fatality rates shared by
all 52 experiments, parameterized by the mean (`pooled_rate`

). The parameter `κ`

acts like a
slider between no pooling and complete pooling: when `κ`

is 0, we ignore `pooled_rate`

, and there
is no pooling. As `κ`

goes to infinity, we ignore the state level data, and will just report
the `pooled_rate`

. The model will estimate the most likely value for `κ`

itself. These two
parameters then feed into our estimates for experiment-level fatality rates (`state_rate`

,
`google_rate`

).

The case studies cited above have more details about the modelling choices.

## Model Definition

The model is implemented in `PyMC3`

, which also makes it easy to read, so I
include the code here.

```
def car_model(miles, fatalities, google_miles=1.5, google_fatalities=0):
with pm.Model() as model:
pooled_rate = pm.Uniform('pooled_rate', lower=0.0, upper=1.0)
κ_log = pm.Exponential('κ_log', lam=1.5)
κ = pm.Deterministic('κ', tt.exp(κ_log))
state_rate = pm.Beta('state_rate',
alpha=pooled_rate*κ,
beta=(1.0-pooled_rate)*κ,
shape=len(fatalities))
observed_fatalities = pm.Poisson('y', mu=state_rate*miles, observed=fatalities)
google_rate = pm.Beta('google_rate',
alpha=pooled_rate*κ,
beta=(1.0-pooled_rate)*κ)
observed_google_fatalities = pm.Poisson('y_new',
mu=google_miles*google_rate,
observed=google_fatalities)
return model
```

The code and data are available here, along with an interactive notebook. As a technical aside, this is a model that would have been very difficult to sample from just five years ago before open source implementations of Hamiltonian Monte Carlo samplers became widely available.

## Answering the Question

The original question was whether I trust Google’s self driving cars, given the fatality rate in MA. I don’t, yet.

Notice the wide spread of uncertainty we have for the self-driving fatality rate, which makes sense considering how much more data we have for individual states. There is a non-negligible chance that self driving cars are safer than Massachusetts drivers, but I would still view that outcome as surprising.

Also, for interest’s sake, here is a readable chart of estimates for all fifty states and DC. Makes me feel good to live in the Bay State!

## Followup

I should have had a third objection to the question, which would be a
reference for the numbers we were talking about. Actually looking those
up, it turns out that as of November 28, 2017,
The Verge
reports *Waymo* (which is not Google, but is sort of Google) has driven 4 million miles.

How does that change our conclusions? Almost not at all.

These two distributions look the same. But this makes sense: the average state in the dataset has 50 billion miles, so moving from 1.5 million to 4 million miles should not change the inference very much.

*Working code for this post may be found*
*here*. Thanks to Predrag
Gruevski for thoughtful comments on a draft, and Timothy Sweetser for putting up with initial
brainstorming.