I know. I know. You’ve already heard me complain about this. I hate it when vendors sell digested student data back to a campus and call it “predictive analytics.” But they’re not stopping, so neither am I. Predictive analytics is a name we should reserve for actually predicting something.
Let’s talk about a real prediction. I can predict the collective motion of the Earth, Moon, and Sun with enough accuracy to tell you where to watch the total eclipse on August 21st. Doing this requires two things: information about the solar system today (positions, velocities, & masses of the planets), and an accurate understanding of the rules which govern planetary motion. Newton’s law of gravity is imperfect but quite adequate for this purpose. Understanding the dynamics – the rules which govern change – is what allows us to make a real, scientific prediction.
I’m really confident that this eclipse will happen when and where we say it will. How confident? I will bet you literally anything in my control that this will happen – my car, my house, my life savings, even my cat. Really. And I’m prepared to settle up on August 22nd.
Are there any vendors out there prepared to make a bet like this on their ‘predictive analytics’? Ok then.
So what are these vendors doing? First, they start with the institution’s data. Actually, this is the hardest part – wrangling the data so it’s easy to access. Then they feed this data into quite straightforward pattern recognition algorithms, and sell what they find back to the institution. Seriously – this is not rocket science. At best, they’re helping an institution learn from experience. But learning from experience doesn’t sound very magic, does it? So instead, they talk about ‘predictive analytics’ and focus on labeling current students – calling some ‘at risk’ – as if something practically magical were going on.
What does an at risk label for a student mean? It’s simple. When past students with similar demographic and prior performance labels entered an environment with similar labels; some received low or failing grades. That’s what so-called predictive analytics tells us. It might be relevant for predicting the future, but no one really knows, and no one proposes testing these algorithms in this way.
Analytics companies running algorithms on data from the past have no well-founded guidance about what’s possible in the future. A student labeled with similar demographic and prior performance labels might be completely different from those who’ve taken the class before in some essential but invisible way. The course labeled Chemistry 101 might be taught in some completely different way this term. These vendors doing the predictions don’t know either the current student or the current situation beyond the labels they have. And here’s the thing – they’ve got no understanding of the dynamics of education, no way to know what will happen if something changes.
What’s the real magic of the language of ‘predictive analytics’ for a vendor? I think the answer is clear. In a prediction like this, you can literally never be wrong. If the algorithm predicts a student has a 90% chance of failing and they don’t, no problem. Even if it predicts a group has a 90% chance of failing and all of them pass, no problem. Why? Because there are a million perfectly valid reasons why such a ‘prediction’ can be wrong. These models don’t predict the future – they only describe the past.
To truly predict the future, you have to understand how a system works and know in what ways it can – and can’t – change. I can bet my life on this summer’s eclipse. That’s a prediction. They can tell you some students like this failed in the past. That’s a report about the past. While experience like that might be relevant for what happens in the future, it’s just a story about the past. Not a real, reliable, testable prediction.