Analysis of the Ruff 2 & 7 Test of Attention with the Rasch Poisson Counts Model

The Ruff 2 & 7 Selective Attention Test [1], known also as the 2 & 7 test, is a paper and pencil cancellation test devised to measure different components of attentional processes. The test is composed of 20 blocks of letters and numbers in which numbers 2 and 7 are targets. Each block contains three lines of digits or a combination of digits and letters. Each line contains 50 characters where participants have to cross out the 10 target characters embedded within each line and ignore the distractors. Fifteen seconds are allotted for each block.


INTRODUCTION
The Ruff 2 & 7 Selective Attention Test [1], known also as the 2 & 7 test, is a paper and pencil cancellation test devised to measure different components of attentional processes.The test is composed of 20 blocks of letters and numbers in which numbers 2 and 7 are targets.Each block contains three lines of digits or a combination of digits and letters.Each line contains 50 characters where participants have to cross out the 10 target characters embedded within each line and ignore the distractors.Fifteen seconds are allotted for each block.
The target characters, i.e., 2's and 7's, are presented in two conditions.Ten blocks contain only numbers and 10 blocks contain a combination of capital Roman letters and Arabic numbers.The two conditions are presented alternatingly.The idea behind these two distractor conditions is to measure automatic detection and controlled search [2].Identifying targets among distractors of a different category, i.e., digits among letters, is an instance of automatic detection or parallel processing while finding targets among distractors of a similar type, i.e., numbers within numbers, is an instance of controlled search or serial processing [3].
Initial analyses of the test by [1] demonstrated high retest reliabilities for different age groups and for the two distractor conditions (between .84 to .97).Participants significantly performed better on the automatic search condition compared to the controlled search condition, which was consistent with the theory [2].Findings also showed that age and education significantly impacted scores.Mean scores systematically decreased with increased age but improved with increasing number of years at school, while sex was found to be unrelated to performance.
In a follow-up study, Ruff et al. [3] administrated the test to patients with brain injuries.Shallice [4] stated that serial and parallel processing is controlled by different areas of the brain.Serial processing is controlled by the frontal lobes and parallel processing is mostly controlled by posterior parts of the brain.Based on these theories, Ruff, et al. formulated some hypotheses concerning the performance of patients with various cerebral lesions on the two distractor conditions.The findings of their study were consistent with the neuropsychology literature, hence, contributing to the validity argument for the test.
Divergent and discriminant validity of the 2 & 7 test has also been established.The test is correlated with the Digit Symbol Test of the Wechsler Adult Intelligence Scale-Revised [5] in normals (r = .35-.40;Baser & Ruff, 1987).In a sample of normal and clinical cases, the test correlated at r=.62 with the Map Search and at r=.69 with the Telephone Search from the Test of Everyday Attention (TEA) [6,7].The 2 & 7 test also correlates with other TEA subtests which measure attentional switching, divided attention, and sustained attention (r = .30to -.57; (8)) Bate,Mathias,and Crawford (6).performed factor analysis on a combination of attention tests.The 2 & 7 test loaded on a visual selective attention factor which also included the Symbol Digit Modalities Test [10], the Stroop test [11], and the Map Search and Telephone Search subtests from the TEA.Small correlations have been reported with word fluency (r = .17-.22), and no correlation with other executive functioning tests such as the Wisconsin Card Sorting Test [12] and visual and auditory memory tests [8].
The validity and reliability of the 2 & 7 test has been established using classical test theory methods.However, the fit of the test to Item Response Theory models [13] has not been examined yet.To contribute to the validity literature of the 2 & 7 test, this study aims to examine the fit of the test to the Rasch model.If items in a test measure a single latent ability, "then the Rasch model is the necessary and sufficient conceptualization.If they do not, then the set of items contains a mixture of variables and there is no simple, efficient, or unique way to know their utility for measuring anything" [14].The fit of the RM indicates that the latent ability is quantitative, and items and the latent ability can be measured on an interval scale with a common unit of measurement [15].

Participants and Instrument
The Ruff 2 & 7 test of attention was administered according to the standard procedures to 138 nonclinical Iranian university students (68% female).The age range was between 19 to 52 years (M=24.26,SD=5.64).As mentioned earlier, the test consists of 20 blocks of characters where respondents should cross out 2s and 7s in 15 seconds.This time limit is allotted for each block separately.This structure makes the test optimal for RPCM analysis.

RESULTS
The Rasch Poisson Counts Model (RPCM) [16] is selected as the measurement model to analyze the Ruff 2 & 7 test.RPCM is a unidmensional member of the family of Rasch models which is used for speeded tests where counts of correct replies or errors, within each task, are modeled instead of replies to individual items [17,18].Such testing conditions arise in speeded neuropsychological or psychomotor tests where respondents have to tick off an unlimited number of items within a fixed time period.
The fit of data to a latent trait model, such as the Rasch model, is evidence that the covariation among the test items is caused by an underlying latent factor which could be the intended construct and is, therefore, considered a validity evidence [19,20].
RPCM can be applied to the raw counts of successes or the raw counts of errors in each item.Here each block is considered an item.We fitted RPCM to seven different scores derived from the test [8]: Error of commission (C; non-target characters [1] cancelled) Errors of omission (O; target characters respondents [2] failed to cancel) Total errors (TE; C+O) [3] Total number of characters cancelled (TN) [4] Total number of characters correctly cancelled (CTN) [5] Total number of characters correctly cancelled minus [6] errors of commission (CTN-C) Total number of characters correctly cancelled minus [7] total errors (CTN-TE) The 'lme4' package [9] in R [21] was used to estimate the model.We estimated seven different RPCM's separately for each scoring technique.Andersen's [22] likelihood ratio test was employed to evaluate the overall fit of the data to the model by partitioning the sample according to their raw score median.The likelihood ratio test is based on the invariance of the item parameters.If the Rasch model holds the item parameters should be invariant within subsamples of the data [23].
As Table 1 shows only the total number of characters canceled, the total number of characters correctly canceled, and total number of characters correctly cancelled minus errors of commission fit the RPCM and the other scoring techniques are not Rasch scalable.In RPCM the assumption is that the mean and variance are equal.The φ coefficient is the ratio of model implied variance to predicted mean [24].If φ is equal to one, the assumption is met.If it is smaller than one, under dispersion occurs and if it is above one, over dispersion occurs [18].When over dispersion occurs, reliability is overestimated while in the case of under dispersion, reliability is under-estimated.Table 1 clearly depicts that none of the scoring techniques meets this requirement of the RPCM.Nevertheless, CTN-C is the closest to the ideal value among the three fitting models.
Table 2 depicts the item parameters, their standard errors, and their fit statistics for the three scoring techniques which fit the RPCM.A chi-square type item fit statistic based on binning observed and predicted values showed that none of the items misfit the RPCM in the three scoring procedures which fit the RPCM (df=5, α=0.01).

DISCUSSION
The Ruff 2 & 7 test of attention is a short and easy-toadminister measure of both selective and sustained attention.It is based on a well-grounded theoretical framework and is relatively well-researched.The test has sound psychometric properties and its validity has been demonstrated against other criterion measures by providing divergent and discriminant evidence.The aim of the present study was to contribute to the validity literature of the test by examining its fit to a unidimensional Rasch model.
The RPCM was chosen as the psychometric model to fit to the 2 & 7 test.Findings showed that three out of seven scoring technique fit the Rasch model.The scoring techniques which fitted the model were total number of characters cancelled, total number of characters correctly cancelled, and total number of characters correctly cancelled minus errors of commission.The other four scoring procedures, i.e., errors of commission, errors of omission, total errors, and the number of characters correctly cancelled minus total errors did not fit the RPCM.Therefore, the Ruff 2 & 7 test is psychometrically unidimensional when these three scores are computed.None of the 20 blocks which are treated as items in the RPCM analysis misfitted.These findings are in line with another study on the Rasch scalability of the d2 test of attention where researchers demonstrated that the same scoring techniques which showed good fit to the RPCM in this study had also the best fit in the d2 test [25].In another recent study, Steinborn,et al. [26] examined the reliability of the d2 test and concluded that only the total number of characters correctly cancelled and the total number of characters correctly cancelled minus errors of commission are highly reliable.These scoring techniques were among the best fitting techniques in this study for the Ruff 2 & 7 test.
To date, the 2 & 7 test has mostly been used in English speaking countries or in countries where Latin script and Western style Hindu-Arabic numerals are used.This study examined the performance of a group of respondents who use a different writing system in their native language and is, therefore, not well-accustomed to seeing and scanning Latin letters and Western numerals.The findings suggest that the test is well functioning in the population of such examinees too.
The research has some limitations as well.Since we had a small sample size, we did not investigate measurement invariance and differential item functioning across sex and age groups.Latent trait models in general and IRT models in particular can be used as powerful psychometric models to evaluate neuropsychological tests.The results of the RPCM analysis demonstrated that the 2 & 7 test is an internally valid and accurate measure of attention in a non-western culture.If substantive theory implies that a construct can be represented as a line (i.e. relations between positions on the trait are linearly ordered), and if the items all systematically depend on this construct, without sharing additional variance due to say, direct causal relations or other sources, then one expects that a unidimensional IRT model should fit the data.The fit of a unidimensional IRT model is a backing for the homogeneity of the latent variable.If the latent trait is not homogeneous adding raw scores to compute a total score is not justified because one then adds elements of a heterogeneous trait.Fit of data to the IRT shows that the total raw score is a valid estimator of ability and can be used as an indication of examinees' latent ability [27].
Future research should examine the utility of the test in clinical samples in non-Latin script cultures.

CONCLUSION
In the classical definition of speed tests, the items are very easy and almost equally difficult.Items in speeded tests have p values above .95under unspeeded conditions and would be correctly answered by almost everybody [28].Under such conditions, the application of conventional item response theory models is difficult.RPCM is a model designed for such tests where the total number of items correctly answered or total errors within some time limits on separate parts of the test are modeled instead of individual items.These models are less complex than other IRT models and can be applied with small sample sizes.Analysis, validation, and psychometric evaluation of speeded neuropsychological and psycho-educational tests can benefit from the RPCM.