News:

Welcome to the new (and now only) Fora!

Main Menu

Colleges in Dire Financial Straits

Started by Hibush, May 17, 2019, 05:35:11 PM

Previous topic - Next topic

Puget

One more thing, since bad data analysis with bad consequences for self-serving purposes just makes me so mad that I'm thinking about this instead of my actual work. Warning, this is pretty stats-y.

As far as I can tell, there is no empirical basis whatsoever for this model. Something is only a model if it predicts something. In a real model, the variables are chosen based on prior evidence that they are predictive, and their weights are determined empirically, based on that prior data (if you're familiar with the concept of regression, think of this as the beta weights, but it's more complicated because good models also try to avoid over-fitting prior data, which ends up fitting noise).

According to the blog post and data sheet, they didn't do any of this. They literally just did some arithmetic on (presumably z-scored) variables like so:
Value: (Credential * Experience * Education) / Tuition.
Vulnerability: (Endowment / Student and % International Students).

Why are the "value" indicators multiplied rather than added? Why are they given equal weight? Each of those variables is in turn an unweighted composite of several other variables-- again, what is the basis for equal weights? Why use tuition (again, this is book price, not actual cost) as the denominator? Why is it a denominator and not just subtracted? Same questions for "vulnerability". I'm willing to bet several million quintaloons or whatever the fake fora currency was called that he has no good answers to those questions.

In contrast, a real model would look something like:
Risk of failure (and you have to define what that means) = b1*variable1 + b2*variable2 + b3*variable3 . . . + bn*variablen + error
Where the b's are empirically determined weights (negative or positive) determined by past data. Some of the variables in this data set may be highly predictive, others may not be-- you need to find out based on real data. You probably also need interactions among variables (e.g., the other variables are likely going to have very different effects for public vs. private institutions). For example, this is now good election forecast models like 538's work.

Now, one could argue that there simply isn't enough past relevant data, since there are so many unknowns with COVID. But the answer to that is either don't make a model then, or make a model based on the best available past data incorporating a high degree of uncertainty (that is, you'll end up with wide confidence intervals around your estimates).

If you were really quibbling, you can defend him by saying he never actually claims it's a model to forecast future events. However, the very names of the quadrants and their descriptions use explicitly forecast-y language (e.g., "perish" certainly implies he's predicting these institutions will go under).  This is not a model, it's not science, and it's downright irresponsible. But then he is in marketing, so I guess that's all par for the course. I just wish the media hadn't run with it with so little critical thinking.

(Tearing this apart and trying to do it better would make a GREAT project for a graduate statistics course though).
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes

Hibush

Quote from: Puget on July 29, 2020, 08:50:54 AM
Because I'm a big stats nerd, I took another look at the comments on Galloway's blog post I can only conclude that  he either is super incompetent or just doesn't care because the point was to get media attention and reach his pre-conceived conclusions (done and done).

Getting media attention is the goal of marketers, at least the intermediate goal, much like publishing a paper in a high-profile journal is a goal of researchers.

A marketing guy like Galloway should be going for media attention. As we all recognize, veracity is not one of the requirements and indeed may be a strong impediment to strong media attention. 

Therefore, I think your second point is likely the driving factor. It does not preclude the first also being true. At least in the sense of being incompetent at informative data analysis, while being hypercompetent at marketing.

secundem_artem

Quote from: spork on July 29, 2020, 10:23:35 AM
Quote from: Puget on July 29, 2020, 08:50:54 AM
Because I'm a big stats nerd, I took another look at the comments on Galloway's blog post (https://www.profgalloway.com/uss-university)-- people have done a pretty good job of peer review there, pointing out numerous analysis problems including:

1. Restriction of range: he only includes nationally ranked institutions, which are actually the least likely to "perish"

2. He then uses median splits to assign this restricted range of institutions to categories, so definitionally 25% are going to end up in each category. That is, there is no objective cut off for "perish", "struggle" etc. -- it's just top or bottom half of his restricted range.

3. Some of the input variables are garbage. e.g., he includes search traffic as a "reputation" variable, which (a) doesn't distinguish between good and bad reasons for searching, (b) weights heavily toward big institutions (he doesn't weight by institution size), and (c) weights heavily toward places with big sports teams. Other data are out of date, or downright wrong (mis-entered or in some cases wrong institution with a similar name) .

4. Uses sticker price rather than actual cost of attendance for ROI calculations (this just makes no sense at all)

5. Public tuition rates take the simple average of in-state and out-of-state, rather than being weighted for % in-state and out-of-state students (again, this makes no sense)

6. Uses endowment rather than more complete metrics of financial health, punishing institutions that rely less on endowments.

7. Is almost exclusively focused on undergrad metrics. This does not accurately characterize research universities.

8. The idea that large public universities will be allowed to "perish" is just out of touch with reality. A lot of the other labels also just don't pass face validity-- top-ranked SLACs with huge endowments aren't going anywhere either. In both cases their metrics are also likely being seriously distorted by all of the above. If the results of your model aren't face-valid, that's a strong indicator you need to check your data and code.

Really, given the extent and obviousness of these problems (I do stats, but not on these types of data, and the problems were glaringly obvious to me, and obviously lots of others who commented), I can only conclude that  he either is super incompetent or just doesn't care because the point was to get media attention and reach his pre-conceived conclusions (done and done).

Absolutely love this.

I pretty much figure that Galloway is trying to take Clay Christensen's place as higher ed's most revered oracle.  I did not have much faith in Christensen usually said, and I have even less in Galloway.

If Ambrose Bierce were still alive and working on The Devil's Dictionary, his definition for Blogging and/or Tweeting would be something like "screaming nonsense into the ether without benefit of an editor".
Funeral by funeral, the academy advances

spork

Quote from: Puget on July 29, 2020, 11:15:38 AM

[. . . ]

As far as I can tell, there is no empirical basis whatsoever for this model. Something is only a model if it predicts something. In a real model, the variables are chosen based on prior evidence that they are predictive

[. . . ]


Curious what your take is on this (relatively non-statsy) model from a year ago.

Quote

(Tearing this apart and trying to do it better would make a GREAT project for a graduate statistics course though).

I can see it as a project for an undergraduate course on risk and forecasting. If such courses exist at the undergraduate level. They should.
It's terrible writing, used to obfuscate the fact that the authors actually have nothing to say.

mamselle

QuoteIf Ambrose Bierce were still alive and working on The Devil's Dictionary, his definition for Blogging and/or Tweeting would be something like "screaming nonsense into the ether without benefit of an editor".

This is going over to the bumper sticker thread....

M.
Forsake the foolish, and live; and go in the way of understanding.

Reprove not a scorner, lest they hate thee: rebuke the wise, and they will love thee.

Give instruction to the wise, and they will be yet wiser: teach the just, and they will increase in learning.

Puget

Quote from: spork on July 29, 2020, 01:07:23 PM
Quote from: Puget on July 29, 2020, 11:15:38 AM

[. . . ]

As far as I can tell, there is no empirical basis whatsoever for this model. Something is only a model if it predicts something. In a real model, the variables are chosen based on prior evidence that they are predictive

[. . . ]


Curious what your take is on this (relatively non-statsy) model from a year ago.

Quote

(Tearing this apart and trying to do it better would make a GREAT project for a graduate statistics course though).

I can see it as a project for an undergraduate course on risk and forecasting. If such courses exist at the undergraduate level. They should.

Took a really quick look clicking through to the original report (which seems to be an internal white paper, not a peer reviewed publication). It's better in that they had actual data comparing institutions that closed to similar institutions that didn't. However, it is pretty much purely descriptive -- there is no actual model, identifying how much each risk factor contributes in a way that would let you predict out-of-sample new closings. Most of the conclusions seem pretty obvious (tiny institutions with no endowment are more likely to close, gee golly you don't say!).

Someone with the time and skills could almost certainly build such a model. It would take time to put together an appropriate data set (i.e., NOT what Galloway is using), but it shouldn't be fundamentally different than modeling in other domains I wouldn't think-- e.g., presidential elections are also complex and not that frequent, but people model those, albeit with mixed success. Reasons for failure there would also apply here and be avoided-- reliance on two few variables, over-confidence (not building in enough error variance), over-fitting past data (which contributes to the over-confidence), and not recognizing the importance of interactions between predictors. In fact, college closings may be easier to model because most of the predictors should be measured with less error than election polling (e.g., enrollment should be a pretty error-free measurement). Great dissertation project.

Maybe someone HAS done all this and I just haven't heard about it, but I thin we would have heard about it here?
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes

spork

#1236
Well, there is Edmit, but it's plan to publicly release its findings was squashed:

https://www.insidehighered.com/news/2019/11/19/private-colleges-convinced-company-scuttle-release-list-projected-college-closures.

Another article about Edmit, regarding Covid-19 effects:

https://www.bostonglobe.com/2020/05/08/metro/amid-pandemic-growing-list-colleges-financial-peril/.

As far as I know, the specifics of Edmit's model (variable weights, etc.) haven't been made public, but all data used in the model seems to be publicly available.
It's terrible writing, used to obfuscate the fact that the authors actually have nothing to say.

polly_mer

Edmit is closest to a real model as Puget describes.

https://www.educationdive.com/news/how-many-colleges-and-universities-have-closed-since-2016/539379/ is collecting data.

There's another widely circulated, not quite really model white paper from within the past five years that I can't pull up right now.

One thing that's common in my current work is what to do when you can't get a predictive model because you can't get data and yet decisions have to be made.  Looking for major factors and then trying to set danger points along with a solid margin for error tends to be typical.  However, Galloway's assertions fail on that type of analysis as well since it ignored the institutions that are truly at risk.
Quote from: hmaria1609 on June 27, 2019, 07:07:43 PM
Do whatever you want--I'm just the background dancer in your show!

Diogenes

Quote from: Puget on July 29, 2020, 11:15:38 AM

If you were really quibbling, you can defend him by saying he never actually claims it's a model to forecast future events.

But even when he ultimately does say that, it should be noted he's is using predictive language in his claims- certain colleges WILL perish.

The best part is how he talks in one video about how he gets paid waayyy too much to be a professor. If his stats are that bad, I concur with his analysis on at least one claim.

Puget

Quote from: Diogenes on July 30, 2020, 04:06:12 PM
Quote from: Puget on July 29, 2020, 11:15:38 AM

If you were really quibbling, you can defend him by saying he never actually claims it's a model to forecast future events.

But even when he ultimately does say that, it should be noted he's is using predictive language in his claims- certain colleges WILL perish.

The best part is how he talks in one video about how he gets paid waayyy too much to be a professor. If his stats are that bad, I concur with his analysis on at least one claim.

Right, unless I'm misunderstanding your point here that's what I said:
QuoteHowever, the very names of the quadrants and their descriptions use explicitly forecast-y language (e.g., "perish" certainly implies he's predicting these institutions will go under).


With the latter I agree-- the disparity between business school and A&S salaries is quite something, and I've seen no evidence it's warranted, certainly not in his case.

Quote from: spork on July 29, 2020, 03:56:53 PM

As far as I know, the specifics of Edmit's model (variable weights, etc.) haven't been made public, but all data used in the model seems to be publicly available.

I wouldn't necessarily expect them to release the code-- they after all are in this to sell it. I would at least expect to see a detailed description of the methodology including the variables and their statistical approach. Again I think 538 is a good example of people who are doing it right within the constraints of not giving away their work entirely-- they don't share the code, but  they provide very detailed documentation of methodology (much more than the average reader wants or can understand) and are transparent with the data (much of it downloadable). Of course, an academic effort could and should release the code and data both.
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes

pgher

On a morning show today, there was a university president, from a university on Galloway's perish list. He said it was garbage in, garbage out. Apparently his university has grown a lot with online students which weren't counted at all by Galloway, and then got dinged for being a predominantly residential campus.

Wahoo Redux

Come, fill the Cup, and in the fire of Spring
Your Winter-garment of Repentance fling:
The Bird of Time has but a little way
To flutter--and the Bird is on the Wing.

Puget

Quote from: pgher on July 31, 2020, 05:28:21 PM
On a morning show today, there was a university president, from a university on Galloway's perish list. He said it was garbage in, garbage out. Apparently his university has grown a lot with online students which weren't counted at all by Galloway, and then got dinged for being a predominantly residential campus.

I think we need to add "garbage in the middle"-- even if the input data were not garbage the results would be garbage because there's no actual model in the middle, just some unsupported arithmetic.
"Never get separated from your lunch. Never get separated from your friends. Never climb up anything you can't climb down."
–Best Colorado Peak Hikes

pgher

Quote from: Wahoo Redux on August 01, 2020, 08:44:48 PM
Forbes on Akron.

What a cluster****.

A very dear friend of mine lives in Ohio. Her son will be a freshman at Akron this fall. They visited and he fell in love with it, and I have to believe they've read the news. So I'm reluctant to say anything to them about this. My hope is that they have four good years left.