AI: A Review in Assessment

May 24, 2024

by Exam Requirements and Psychometrics

While artificial intelligence (AI) has dominated much of the science and technology news, it has also captured the attention of psychometricians as a tool for everything from item writing to the measurement model. Ultimately, though, there are a few unique problems with AI as it stands.

For item writing, the paradigm for automatic item generation has yet to change to fully leverage large language models' (LLMs; an implementation of AI) strengths. Furthermore, item generation is a frequent source of AI hallucinations. Oh yeah, AI hallucinates. An AI hallucination happens when the program states and even insists on something factually incorrect, such as a person having three legs. Item generation has also only been successful in highly targeted examinations, such as grade school English Language Arts examinations in public education.

On the measurement side, AI has yet to outperform the current model-the Rasch model-we use to calculate scores on our examinations at ARRT. Besides that, we can explain what, why, and how the Rasch model works, but no one has solved that problem with AI. AI models can contain millions to billions of parameters while the Rasch model for the Radiography examination would only have 200 parameters.

How about a little bit of optimism? These studies were all performed on older AI models (including LLMs), which is very exciting. Models are progressing so quickly every piece of AI research in assessment is almost certainly referring to something less sophisticated than the state-of-the-art at time of publication. So, we can have reasonable security knowing that, unless LLMs and AI in general aren't dead ends (they almost certainly aren't), we are getting closer to more useful breakthroughs on a near weekly basis.