Psychometric Flops (March Mathness Follow-up)

Apr 18, 2025

by Assessments Department

The psychometric team is accustomed to placing high in the NCAA women's basketball tournament, but, as many psychometricians will tell you, we don't know what we don't know. The SoCal collapse hurt the psychometric brackets and the resurgence of UConn from a shaky early season also harmed our estimates and changed the strengths of our estimation methods toward luck and intuition.

Here's a recap of the models used and their places:

· Rasch (#8 ARRT; #77698 ESPN): this is the scoring model we use in ARRT examinations. This model did well on all sides of the bracket until the SoCal collapse.

· 2PL (#11 ARRT; #92416 ESPN): this is an extension of the scoring model we use at ARRT. The model considers the possibility that high scorers and low scorers behave differently when posed with an item, or, in our case, team. The model did well by calling chalk (i.e., picking the higher ranked team) in most instances; so, it didn't grab many upsets.

· Logistic regression (#15 ARRT; #128108 ESPN): this is part of the underlying mathematics of both Rasch and 2PL, but it is a much older development-you may recall doing these on your calculator in high school or college statistics classes. This model did well in the East and poorly in the West. We didn't have high expectations for this model because of its simplicity and tendency toward being biased in the model building stage.

All three psychometric models finished better than chance but not better than many participants. It seems that these models are still better left to testing than calling perfect brackets. Maybe we psychometricians will wise up and make more complicated models, but, in the end, you don't know what you don't know.