Mind Your Ps and Rs

Jul 5, 2024

by Exam Requirements and Psychometrics

The psychometrics group has written extensively about P-Values and R statistics. Most of the time statistics decisions for newly tested pilot items are straightforward: the item either has high R values and middle range p-values or it does not. Today, I want to discuss when these two measures of item quality provide conflicting information. We will cover what different patterns of statistics mean and solutions for dealing with each situation.

Recall that good values for p-values fall within an ideal range in the middle of the distribution. Good values for R statistics are anything above .2. Examples of inadequate statistics are when a statistic is too low, too high (we only flag p-values that are too high, but R statistics can run problematically high in certain cases), negative (only applying to R values; you cannot have negative p-values), or zero.

What do we learn when one statistic is good and another is bad?

· Pattern: The R is good, and the p-value is too low.

o Meaning: The item is predictive of performance on the exam, but it is still difficult even for the highest-performing candidates.

o Solution: We can delete the best distractor and replace it with an easier distractor.

o Rationale: The item is predicting performance on the test, but the candidates may be guessing. We have seen that removing the best distractor can raise the p-value above the guessing threshold.

· Pattern: The R is good, and the p-value is too high.

o Meaning: The item is predictive of performance on the examination but is so easy that even people that fail the exam do well on the item.

o Solution: Delete the item or replace distractors examinees are not picking with distractors more examinees will choose.

o Rationale: The item is still predictive of the total test, but almost everyone is getting the item correct. This is an unusual pattern because of the way the R statistic is conceived. These types of items are usually more easily replaced than repaired.

· Pattern: The R is negative or zero, and the p-value is good.

o Meaning: There may be a miskey (when the wrong response is identified as correct), which I have yet to see at ARRT; or it could be the case that otherwise high-performing candidates are finding something about the distractors more attractive than the correct answer.

o Solution: Delete the item or work with the subject matter experts to determine what is confusing high performing candidates.

o Rationale:We assume that the bulk of the high performing candidates are unlikely to all be wrong on a particular item. Alternatively, we infer that there is something interfering with an expert item writer's and subject matter experts' understanding of an item and very-soon-to-be-R.T.s' understanding of the same item.

Of course, we have not even gotten to the weeds quite yet. There are even more strange patterns that can occur in statistics that can be problematic for an item. Look forward to learning more about strange statistics and some studies we are running to address them in a future blog post!