Item Calibration: Keeping ARRT Exams Accurate

Jul 26, 2024

by Exam Requirements and Psychometrics

Medical imaging is constantly evolving as a profession, and our exams must adapt to keep up with those changes. Item calibration is a process that modifies our exams in response to change, improving measurement of a candidate's knowledge. In this procedure, items receive adjustments to reflect changes in the difficulty of a question within a discipline over time. The method places items from different exam forms in the same discipline onto a common scale, which makes candidate scores comparable across the same discipline's forms over time.

To understand item calibration, consider the Rasch model, the psychometric model used for our exams. The Rasch model places a candidate's ability and an item's difficulty on the same scale. This model allows for comparisons between candidates of different ability levels who answer an item of a certain difficulty. An intuitive tradeoff exists between candidate ability and item difficulty: candidates with an ability larger than an item's difficulty will likely answer an item correctly.

Now imagine an item was written in 1979 and covers film-screen radiography. Imagine also that the item is easy. If a knowledgeable candidate answered the item in 1979, the candidate would likely answer correctly. The Rasch model would make an accurate prediction about the candidate's ability. However, if a knowledgeable candidate answered the same item in 2024, the candidate would likely answer incorrectly; and the model would make a poor prediction. Why? Because film-screen radiography is not practiced often today, candidates today receive less exposure to this concept. Therefore, the item's difficulty has drifted and needs adjustment for accurate scoring.

To make a change, psychometricians select anchor items that are tied to previous data and that allow the scale to be held constant over time. Psychometricians then apply the Rasch model to new data, evaluating model accuracy and examining differences between old and new difficulties for each item. The film-screen item, for example, would have a large difference because students are usually no longer taught about film-screen procedures. Items with large differences are flagged, removed from creating the exam's scale, and are sometimes sent to subject matter experts who later review item content. With flagged items removed, the Rasch model is re-applied to reform the scale, improving accuracy. In addition, some flagged items, such as the film-screen item, receive a new difficulty. The method keeps items updated and connects new candidates' scores to previous scores.

Item calibration is a crucial step to keeping our exams accurate. The procedure ensures that each exam is fair: candidates with high ability are recognized for their ability. Through this process, we have confidence in a candidate's score and the decisions based on that score.