# Using the Right Tool for the Job: An Analysis of Item Selection Statistics for Criterion-referenced Tests

In test development, researchers often depend upon item analysis in order to select items to retain or add to an exam form. The conventional item analysis statistic is the point-biserial correlation. This statistic was developed to select items that would maximize the reliability indices (such as Cronbach's alpha) of norm-referenced tests. When the focus of the exam is norm-referenced scores, then the point-biserial correlation works well as an item selection tool. However, the point-biserial correlation is also used in testing contexts where it may be less useful, specifically on criterion-referenced tests. In criterion-referenced testing, the focus of the test is typically not on examinee scores but rather on examinee classification. Criterion-referenced tests also have different reliability indices than norm-referenced tests, known as decision consistency indices. As such, using the point-biserial correlation to select items to maximize decision consistency may not have as much utility as other options. Researchers have developed several criterion-referenced item analysis statistics that have yet to be fully evaluated for their utility in selecting items for criterion-referenced tests. These statistics are the agreement statistic, the B-index, and the phi-coefficient. The purpose of this research was to evaluate each of the respective criterion-referenced item selection tools as well as the point-biserial correlation to determine which one optimized decision consistency. Results suggested that the point-biserial correlation had equivalent or better utility for maximizing reliability in criterion-referenced tests when compared to the criterion-referenced item analysis statistics (the B-index, the phi coefficient, and the agreement statistic).

