My summer project is to learn Python so I can put the prediction information out on a Web app. Also, the programming that I want to do lends itself to a scripting language rather than using Excel as a brute force Monte Carlo simulator. I’m going to simulate all 140 games and their results, and that would be one huge, unwieldy spreadsheet that would be a bear to create. Using Python and an SQL database is a saner approach.
I’ve been thinking about what makes sense for a predictive model. Here are my thoughts — feel free to chime in.
- Even though KRACH is reactive rather than proactive, I think that it’s a reasonable predictor of future performance — say 60% or so. KRACH gives you a strong indicator of team quality.
- I think that KRACH suffers because it uses win-loss data instead of goals for/goals allowed data. With the WCHA having a logjam in the middle of the rankings, I think that GF/GA would’ve been a good discriminator. Let’s consider two schools: Alaska and Michigan Tech. Late in the season, the Nanooks went into Houghton and blitzed the Huskies. KRACH saw those simply as two road wins; a goal-differential ranking system would see 7-3, 7-2 as an indicator that the Nanooks were demonstrably better rather than just clearly better, and perhaps their home sweep of Ferris State the next weekend wouldn’t have been as big of a surprise.
- There’s something else to consider: recency. Let’s consider Alaska and Lake Superior. The Nanooks were 14-12-2 for the season, but they started off 3-7-0. Did they face a tough WCHA schedule in that stretch? They played Northern (home split), Lake (away split), Ferris (swept away), Tech (swept at home), and Anchorage (away split). They caught each of those teams when they were better than they were all season (save Ferris, who was hot all year). Given how the Nanooks finished the regular season (5-1-0), you can argue that recency matters. (And yes, Anchorage, you beat them three-of-four to end their season.)
- A recency view also makes sense when you look at Lake Superior. They started 4-1-0 and were fairly highly ranked, and they finished out of the playoffs, getting swept by the top two teams in the league the last two weekends. But you can also look at January, where the Lakers went 2-6-0. The wheels eventually fell off in the Sault, and recency matters. Also, you can make a goal-differential argument as well: yes, the Lakers beat the Chargers on the road, but both games were one-goal margins.
- One more argument for recency and goal-differential: UAH clearly improved as the season went on, as the blowouts were fewer and farther between. The Bemidji road win seemed impossible at Christmas, but it was merely improbable by February.
I think that I’ll end up with a not=so-secret sauce for the model: won-loss KRACH, a paired-comparisons goal-differential system, and recency (probably last 10).
If you’re thinking this through, though, you’re anticipating my next step: each predictive run will predict each game, with the results driving subsequent predictions. You’ll play a night’s games, re-calculate the model, then play the next slate of games. It’s an iterative approach, and it’s possible that one team will go off on a tear on any individual simulation, just as would happen in real life. Running a Monte Carlo simulation will even that out, but you’ll be able to see how many times a team streaked or slumped, because we’ll have each individual simulation.
I’ll also be able to adjust the weights of the model as the season goes by, as I’ll be able to compare the model to reality. The recency weights in the model should temper my urge to tweak the weights of the other two parameters very far.