And I’m back! I have prettier graphs! Well, at least this one is pretty.
The WCHA standings are ranked in three tiers: Mankato-Tech-BG, NMU-BSU-FSU, and UAH-LSSU-UAA. This is true in the actual standings as well as the predicted ones that you’ll see below.
Here’s a legend on how to read the above graphic: the team in a bold color is on the road (with the best approximation of their colors that I can get in Excel), while the team at home is in white. So if you look at Bemidji-Northern:
- Bemidji wins about 1.09 times in the 10,000 runs that the model does.
- Northern wins about 0.88 times .
- The teams get an average of 0.04 ties.
You get this math from the ones below it, which are HM0% (the chance that the home team is swept), HM1% (the chance that they lose and tie), etc. I’ll explain a little bit more about the math in a bit.
As you can see in the Bemidji-Northern series, a split is the most common option (40.69% of all runs), with a Bemidji sweep second (32.94%) and Northern sweeping third (22.63%). The remainder are events where the model thinks that the schools might tie.
The math — expected values
The model I’m presenting here uses KRACH only for a way to generate an expected value of the series. KRACH is a rigorous mathematical answer to the improper application of the transitive rule, which could be considered from a UAH perspective like so:
“Hey, Omaha is 16-7-3, eighth-best in the country winning percentage-wise with the second-toughest strength-of-schedule. But you know what? UAH tied them one night and lost by only a goal another night. We’re not that bad!” Except, you know, Tech RUTS’d UAH and has almost an inverse record to the Red Cows (7-18-3).
KRACH compares everyone to everyone with matrix mathematics that accounts for the fact that, well, everyone doesn’t play everyone. So when teams cross over into non-conference play, it matters. The three teams with 20-win seasons so far are Minnesota State, Michigan Tech, and Robert Morris, but the relative non-conference records of those two conferences makes a difference, as does the schedules that each of those three schools played:
- Mankato: @Omaha 2x, H/H with Duluth, v Princeton, North Star College Cup (Minnesota, Bemidji State)
- Tech: v Michigan 2x, @ Duluth, GLI (Michigan, Ferris State), @ Wisconsin (the split there hurts them)
- RoMo: Lake Superior, Three Rivers Classic (Penn State, Colgate), H/H with Bowling Green
KRACH gives a numerical ranking that can be used to do a backwards look. If Robert Morris (103.5 in KRACH before games on 2015-02-13) played Air Force (23.82), you’d expect the Colonials to win 81.2% of the time [103.5/(103.5+23.82)]. This lines up fairly well with reality, as the Colonials are 2-1-1 against the Falcons this season (.750), with all four contests going to overtime.
A greater disparity can be seen in the MTU-UAA matchup, where the Huskies get the sweep 100% of the time.
Mind you, I had confused the M*U with the UA* matchups this weekend (I blame the cold medicine). But Mankato is a virtual lock to sweep UAF, too (93% of the time).
The math — distributions
Now once you get this expected value from KRACH, you can consider the results to be normally distributed, i.e. on a bell curve. This is to say: all things told, if you know how likely Team A is to defeat Team B, you can set that as the expected value of the distribution and then run simulations on that distribution.
Now wait, you’re saying, how are these things equally distributed? Didn’t you say that Tech was going to crush Anchorage? Yes, I did. Tech’s KRACH of this moment is 411.8; UAA’s is 47.44. Tech should win 90% of the time by that. Sorta.
See, the model says, “Okay, Tech’s going to ‘win’ 90% of the time, i.e. they’re going to get 3.6 points per weekend.” And that expected value is what’s used. Why? I’ve never gotten the sense that college hockey games are independent events, which is to say that what happened on Friday night will drive what happens on Saturday night (injuries, benchings, etc.). This may be a failing of the model — I haven’t tested it extensively, but it worked reasonably well last year. But it’s the model that I’ve chosen.
So if Tech is supposed to pick up an average of 3.0 points per weekend, rounding that up means that, on average, they sweep. In fact, if you use breakpoints in determining what is a sweep (for the model, it’s 2.95 /4, which gets pretty close to the historic average of ties produced in WCHA games), Tech sweeps every time.
The math — standard deviation
We’ve all heard the term “outlier”. We probably know one. Shoot, pretty much every NHL player is an outlier in some form or fashion, a man so uniquely skilled at hockey that people pay him vast sums to do so. But even in the NHL, some guys are simply better than others.
In statistics, this concept is variance. Tied closely to variance is the concept of a standard deviation, which is to say how wide the distribution is. In your standard academic exercise, 20 kids take an exam where the average is 68: 18 kids make an 80, one makes a 100, and one makes a 0 because they tried to cheat off of the kid acing the exam.
You can run this if you want, but the standard deviation is about 17 points, which is to say that the kid acing the exam was more than two standard deviations away, while the kid who cheated was four. In statistics, the former is expected variance — there’s nothing unusual about a kid who aces an exam when the vast preponderance of the class barely passed it. However, the latter is significant and should prod a question as to why (even though we know why in this case).
In the Tech-UAA case — thanks for staying with me — the mean is shifted so close to the 4-point limit that centering a bell curve here also mean that we have to consider the width of said curve. I set that width — the standard deviation — to the width of the mean in comparison to its distance from the edges. As such, the closer you get to 0.0 or 4.0 expected values for the home team, the more likely a sweep will happen.
You can see this with UAH @ LSSU, which is nearly even in terms of KRACH (48.19 UAH, 43.97 LSSU). This is why a split is most likely: the mean is very close to 2.0, and the variation is pretty wide, so the answers to to spread evenly between the center and the poles. This is largely a pick-’em series, but the numbers say that UAH is slightly better from a comparison standpoint.
But again, that’s why they play the games.
If you simulate all 22 remaining series — and I have — 10,000 times, you get results that look like the below.
It’s a shame that Alaska is ineligible, because three teams fighting for one spot would be far more desperate than three fighting for two.
My intent for the next couple of weeks is to come up with a way to set up a table that shows how many times that, say, the above order comes into play. If I were doing this with a database and not an Excel spreadsheet, this would be simpler, but I ran out of time to do anything else.