So how did my 430 model do?
On both sides the 30-day version was significantly worse than the 4-day.
That makes sense; there was a lot of polling in the final days, and a
good number of voters wait until the end of a campaign to make up their
minds already.
On the Democratic side, the race looked like a coin flip, with the
odds very, very slightly in Clinton’s favor. As it turns out the race is
being called (by Sanders, at least) a virtual
tie, and in fact several delegates were awarded on coin
flips1. Martin O’Malley over-performed in
the model; seems like none of his people bothered to show up.
On the Republican side, the 430 model far overestimated Trump’s
chances, and underestimated both Cruz and Rubio. It’s a great example of
garbage in, garbage out. The polls overestimated how many of
Trump’s voters would turn out (and how few of Cruz’s would), so the
model did the same. With Rubio, I think, what happened was a little
different: he surged very late and very fast, and there just weren’t
many polls at the right time to catch it.
So, should we change the model at all? Maybe we should assume all
Trump results are inflated (which I feel is probably true) and deflate
them by some amount. That feels pretty arbitrary, though, and we’d
probably overcompensate anyway. Or we could try to see which pollsters
are the best and weight accordingly. That also seems prone to
over-fitting, though, unless you want to go the full 538. So I don’t
think I’m going to try to add too much sophistication in it that
way.
I do think I may fool around both with the time-weighting
function, which was more or less pulled out of a hat, and with the
window. It may be that, instead of looking at a time-window (so, all
polls in the last 30 days), it would be more helpful to look at a
number-of-polls window, still weighted by time-delay. That way, as we
get closer to elections and more polling happens, we zero in
automatically. I’ll probably put out a new version for New Hampshire and
see what that looks like.
And you should, too! The notebook is available here,
so feel free to make your own version and let me know on Twitter if you come up with
anything interesting!
The Clinton campaign won 6 of 6 tosses, the odds of
which are only 1 in 64. But surely if the Clinton camp were sharp enough
to pre-supply unfair coins they would have won outright, right? Right?↩︎
It’s caucus day in Iowa, so what better time to rip off Nate Silver?
Silver has a well-respected election
forecasting model based on polls, polling firms’ historical house
effects and accuracy, and a few non-poll factors like endorsements.
Probably took a lot of hard work.
But it’s actually pretty easy to build a simple poll-based model
yourself using Python. I’ve thrown one together, which I’m calling the
430 model. Why 430? Because of the 80/20 rule: you can get 80 percent of
the way there with 20 percent of the work, and 430 is about 80 percent
of 538.
Everything below is available in a Jupyter notebook here
Let’s start with a little setup:
import collectionsimport datetimeimport numpy as npimport pandas as pdimport requestsAPI_ENDPOINT ="http://elections.huffingtonpost.com/pollster/api/polls"np.random.seed(2016)
So, first, let’s get our polling data, which we can do using the Pollster
API:
Those two functions will allow us to specify a state, a party (either
gop or dem, per Pollster), and how far back in time we
want to go. What we get back is a Pandas DataFrame where each row has
the result for one candidate for one poll, plus some metadata about the
poll. All we really need for this model is the date, result, and number
of observations, but I’ve also included the population screen in case
you want to, say, restrict to only likely voters.
Now we want to combine those poll results for a point-in-time
estimate of the mean, plus a standard deviation of the estimate. But not
all polls are equally good; we’ll want to be able to weight them
somehow.
We’re going to be lazy, and just weight based on recency. For each
poll, we’ll set a weight of one over the square of the age of the poll
plus one (the plus one is so that we don’t divide by zero). Then we can
create a super-poll, in which we pool all the folks who said they’d vote
for each candidate in any poll, multiplied by the weight of that poll.
This allows us to calculate both the weighted estimate of the mean and
the standard deviation of the estimate:
This function allows us to specify a target date, in case we want a
snapshot from earlier in the campaign, and also a window that gives us a
maximum age of polls. That’s so Scott Walker doesn’t show up in our
results even though he’s already dropped out of the race.
Now we can run simulations! All we have to do is a draw from the
normal distribution for each candidate, and see who got the highest
percent of the vote:
So, now for the fun part, who wins? Here’s the superpoll results on
the Republican side:
Candidate
Estimate
Standard Deviation
Donald Trump
28.2%
2.0%
Ted Cruz
23.6%
1.8%
Marco Rubio
17.4%
1.6%
Ben Carson
7.6%
1.2%
Rand Paul
4.7%
0.9%
Jeb Bush
4.1%
0.9%
Mike Huckabee
3.3%
0.8%
John Kasich
2.8%
0.7%
Carly Fiorina
2.4%
0.7%
Chris Christie
2.1%
0.6%
Rick Santorum
1.3%
0.5%
Jim Gilmore
0.1%
0.2%
In the simulation, Donald Trump won 96 percent of the time, while Ted
Cruz won 4 percent.
Here’s our superpoll results for the Dems:
Candidate
Estimate
Standard Deviation
Hillary Clinton
47.4%
2.4%
Bernie Sanders
46.0%
2.4%
Martin O’Malley
3.6%
0.9%
Seems pretty close, but bad news, Bernie fans! In the simulation,
Hillary won 66 percent of the time, while Sanders only won 34
percent.
Now, that’s all with a 30-day window. What if we keep it to just the
most recent polls?
Here’s what we get with a 4-day window on the GOP side:
Candidate
Estimate
Standard Deviation
Donald Trump
27.5%
2.1%
Ted Cruz
23.1%
2.0%
Marco Rubio
18.1%
1.9%
Ben Carson
7.5%
1.3%
Rand Paul
5.1%
1.1%
Jeb Bush
4.1%
0.9%
Mike Huckabee
3.5%
0.9%
John Kasich
2.8%
0.8%
Carly Fiorina
2.5%
0.7%
Chris Christie
2.0%
0.7%
Rick Santorum
1.3%
0.5%
Trump still wins 93.6 percent of simulations, but Cruz is up to 6.4
percent, and Rubio is on the board, although with 0.0%.
On the Democratic side, things get really interesting:
Candidate
Estimate
Standard Deviation
Hillary Clinton
47.0%
2.7%
Bernie Sanders
46.9%
2.7%
Martin O’Malley
3.2%
1.0%
In the simulation, O’Malley stuns! Just kidding, Clinton wins 51
percent of the time, and Sanders wins 49 percent. Boy is that going to
be fun.
Obviously we shouldn’t bet the house on these predictions. My
weighting model may be wrong (read: is wrong) or the polls themselves
may be wrong (read: are completely unreliable in recent elections). But
this shows you how simple it really is to get something like this off
the ground.
If you decide to play around with this model (again, you can download
the Jupyter notebook here),
be sure to let me know on twitter. It would be a lot
of fun to see what people come up with.
The Powerball lottery is in the news because the jackpot is up to
about $1.5 billion, which sources tell me is a lot of money. The classic
stats argument is that you should buy a ticket only if the expected
value is greater than the cost of the ticket, which is $2.
Expected value is sum of the value of each possible outcome times the
odds of that possible outcome. In this case, it’s the sum of the value
times the odds of the 9 ways to
win.
Outcome
Odds
$1,500,000,000
1 in 292,201,338.00
$1,000,000
1 in 11,688,053.52
$50,000
1 in 913,129.18
$100
1 in 36,525.17
$100
1 in 14,494.11
$7
1 in 579.76
$7
1 in 701.33
$4
1 in 91.98
$4
1 in 38.32
Now there’s also an optional add-on called PowerPlay, which
will multiply your winnings by a randomly-selected multiplier (except
the jackpot, which remains the same, and the $1 million prize, which
doubles to $2 million). The odds for the multiplier are:
Multiplier
Odds
5
1 in 21.00
4
1 in 14.00
3
1 in 3.23
2
1 in 1.75
If you do the math 1, that gives us an expected value of
about $5.11 for a standard ticket, and $5.57 for a powerplay ticket.
Well good grief! That means we should all go buy a bunch of tickets,
right? That’s statistics! We even know not to buy powerplay, because the
return isn’t as good!2
Well, no, not really. The problem here is that expected value
is a misleading term. It does not actually tell you what value to expect
for a single ticket.
The expected value here is really just the mean winnings for all
possible tickets. But means aren’t helpful here because the distribution
is so wildly skewed. Almost all of the expected value—about $4.79—comes
from an exact winning ticket.
A better way to get reasonable expectations is with a simulation. I
wrote a quick one in Python, which you can find the code for in this Jupyter
notebook, and I used it to analyze what we can really expect when
playing Powerball.
First I ran a simulation of 100 thousand people buying a single
ticket, with no powerplay. How did they turn out?
Well, 95,907 people won exactly jack. 3,763 doubled their money with
a four dollar win, while a 324 won the princely sum of seven dollars. A
full 6 folks won $100. Woohoo! So if you buy a ticket, what’s reasonable
to expect?
Big fat goose egg, that’s what. If you’re really lucky? Four
bucks.
If you add in the PowerPlay, things aren’t much better:
One guy won $400 in 100,000 trials. 96,078 won nothing. Sure, the
prizes are bigger, but for the vast majority of cases, you’re just
throwing away $3 instead of two.
But wait a minute. What if I buy more tickets to improve my odds?
That should get me closer to the expected value, right? Well, sort of.
Remember, the expected value is dominated by that one winning combo.
I ran the simulation again assuming someone bought, 1, 2, 5, 10, 15,
20, 25, 30, 40, and 50 tickets, with 100,000 trials at each ticket
level. Here’s the results without PowerPlay:
The dotted line means you break even. The solid blue line is the
median outcome, while the shaded area shows the 5th to the 95th
percentile. The winnings you can reasonably expect certainly go up over
time, but they go up slowly—much more slowly than the cost of buying
tickets. Even though you’re getting more wins, you’re generally getting
low-value results and spending a lot on worthless tickest to make it
happen. Even the 95% percentile is earns way below the break-even level
once you get past two tickets. Here’s what it looks like with the
PowerPlay:
Basically the same story here. The upside, 95th-percentile result is
a bit better, but the more realistic median result isn’t. And a bit
better in this case means you would still lose money, but not
much.
The best median outcome comes from buying one ticket without the
PowerPlay option, which is when you just lose $2 and no more. Best
feasible result is with the PowerPlay and 2 tickets, where the 95th
percentile spent $6 to make $8, the only profit in the 5th to 95th
percentile range in the entire simulation.
So what’s the takeaway? First, expected value is overrated for this
kind of situation, because you can end up paying too much attention to
rare, extreme possiblities. Running a Monte Carlo
simulation like the one above gives you more realistic guidance.
Expected value may be appropriate for large organizations where you do
so many analyses that the one-in-a-million shot actually shows up every
now and then, but they’re less suited to one-shot decisions.
But second, and more directly, don’t play Powerball. The lottery is
still a tax on people who are bad at math. And people who tell you that
the statistics say otherwise have not thought all the way through their
statistics.
Actually, I’m being a bit lazy here. Alex Tabarrok has
an excellent post
explaining why the expected value is a good bit lower than the simple
analysis suggests, but that’s beside the point I’m making.↩︎