Formal Psychological Assessment for Autism Spectrum Disorder Diagnosis : A New Methodology to Build An Adaptive Testing System

RESEARCH ARTICLE Formal Psychological Assessment for Autism Spectrum Disorder Diagnosis: A New Methodology to Build An Adaptive Testing System Maria Chiara Pino, Andrea Spoto, Melania Mariano, Umberto Granziol, Sara Peretti, Francesco Masedu, Marco Valenti, Monica Mazza and Giulio Vidotto Department of Applied Clinical Sciences and Biotechnology, University of L’Aquila, L’Aquila, Italy Department of General Psychology, University of Padua, L’Aquila, Italy Reference Regional Centre for Autism, Abruzzo Region Health System, L’Aquila, Italy


INTRODUCTION
The prevalence of Autism Spectrum Disorder (ASD) has been steadily increasing.It is worth identifying individuals with ASD because of the societal costs due to the occurrence of this disorder and the importance of early diagnosis for rehabilitative benefits.In fact, the increase of ASD diagnosis over the last few years has become a much-debated topic [1].The quality of clinical evaluation has an important role in diagnosis, as well as for future treatment [2 -4].ASD can be distinguished by a pattern of multiple symptoms and is typically identified around 2 years of age (or later in the case of regressive forms of ASD).The symptoms of ASD include deficits in social communication and social interaction as well as the presence of repetitive and stereotypical behaviours [5].
The clinical diagnosis of ASD can be discerned by combining information from observing the child with parental information.Indeed, a correct evaluation aims to clarify the diagnostic process, which includes both meeting with the child and the family to thoroughly review the developmental history, as well as the current symptom profile.The evaluation includes the administration of the gold standard of diagnostic tools: The Autism Diagnostic Observation Schedule (ADOS-2) [6].Individuals with autism are subjected to various assessments throughout their life to monitor their symptomatology following rehabilitation treatments.For this reason, it would be useful to have a test that allows faster evaluation of the symptomatology while also deepening the critical areas.Time reduction while preserving accuracy is an important and ambitious objective worth achieving [7].In this paper we propose the application of the Formal Psychological Assessment (FPA) [8,9] to the ADOS-2, the clinical test defined as the gold standard for ASD diagnosis, to reduce the time required for the test and to allow an adaptive assessment of an individual with ASD.Its introduction in psychological assessment will allow a deeper understanding of the evaluation process for an adaptive assessment procedure [7] and a clear connection between the item asked and the clinical elements collected.
In addition, we aim at validating the clinical structures obtained for both the T module and module 1 of the tool.This process involves estimating the parameters and testing the fit of the model to the available data.This operation allows for the calibration of an adaptive version of the ADOS-2 able to provide detailed information about the symptoms displayed by each patient according to the FPA methodology.

The Formal Psychological Assessment
When assessing an individual, a clinician is more interested in the symptoms the individual presents rather than in the score he/she obtains on an instrument.The FPA produces an attempt to provide the clinician with exhaustive information about a patient's specific set of clinical issues, to identify the situation in a both quantitatively and qualitatively rigorous way.What allows the FPA to carry out this critical task are its mathematical and theoretical foundations.In fact, the FPA is a conjoint application into the clinical framework of two theories from mathematical psychology: The Knowledge Space Theory [10,11], and formal concept analysis [12].A very trivial intuition behind the FPA's conceptualization is the possibility of describing each item of an instrument in terms of the clinicaldiagnostic issues it explores [8,13].In this way, it is possible to identify which clinical issues are investigated by any item of an assessment tool for some specific disorder.In the FPA, each item is named a clinical object, while each clinical-diagnostic issue (e.g., DSM diagnostic criteria) is named a clinical attribute.Starting from these basic definitions it is possible to build a matrix where each row is an object and each column is an attribute.Each cell ij of such a matrix, named clinical context, contains either a 1, whenever the item i investigates the attribute j, or a 0 otherwise.The clinical context displays and represents the object-attribute assignment.Notice that, once the set of attributes is specified, any item can be analysed with respect to the presence/absence of each attribute; this means that the same matrix could even contain items from different tools.Furthermore, this representation allows the clinician to discriminate between response patterns having the same numerical score.This aspect is particularly relevant because a patient could reach a given score by satisfying different sets of symptoms: each item is therefore important not for the score it provides, but for the attributes it investigates.This possibility is completely lost in traditional tools, and it represents one of the most relevant improvements provided by the FPA that differentiates patients not based on their score, but on the clinical symptoms they present.
Starting from the clinical context, through some formal passages [13] a clinical structure can be built.This structure uses the information included in the context to derive the prerequisite relations among the items of the context.This passage is crucial in the construction of an adaptive algorithm for psychological assessment [7].In fact, the structure defines the admissible response patterns (also named the clinical states) that are defined by the item-attribute assignment of the context.For instance, if an item i 1 investigates attributes {a 1 , a 2 } and a second item i 2 investigates attributes {a 1 , a 2 , a 3 }, any pattern including item i 2 but not i 1 will be excluded from the structure given the item-attribute assignment.
So far, the deterministic skeleton of the FPA has been described.In the deterministic case no error is assumed during the assessment phase; in other words, the answer to a specific item is assumed not to be affected by any kind of error, and therefore the collected information is treated as certain.It is clear how such an assumption does not adequately describe reality for two main reasons.On the one hand, two main errors could occur in the assessment phase; for instance, in the case of the ADOS-2, the clinician could report a certain behaviour even if it did not occur, or could miss a behaviour that did occur.These two kinds of errors are respectively a "false positive" and a "false negative" [8,11].On the other hand, the clinical states (i.e., the different clinical conditions in which a patient could be) could present different prevalence rates in the population.For these reasons, the introduction of a probabilistic model is needed.In the FPA, a model that accounts for both error rates and probability of the states is the Basic Local Independence Model (BLIM) [10].Within the BLIM, all the responses to the items are locally independent given the state of the individual; moreover, a probability value is associated with each state.Formally, the clinical structure becomes a probabilistic clinical structure (Q, , p) where (Q,) is the clinical structure and p is a probability distribution of the states of .Given (Q,, p), each response pattern has a probability value: This last one is defined by means of a response function which assigns each response pattern its conditional probability given a state K (for all states K in [13,14]: The conditional probability p (R, K) is determined by the two error parameters β and η, respectively the false negative and false positive errors related to each item, as expressed by the following equation: The model parameters are the two error rates η q and β q for each item, and a probability value π K for each state.In this way, it is possible to estimate the probability of observing the real clinical state of a patient.

Context Construction and Validation
The first step in applying the FPA is the construction of the clinical context that depicts the item/attribute assignment.This operation first requires the specification of the attributes that must be included in the context.Recalling that the clinical attributes can be symptoms, diagnostic criteria or theoretical considerations related to the disorder at hand, this operation has been conducted by referring to the DSM-5 (APA, 2013) and other information from the literature [1,6].Table 1 lists the main attributes considered when assessing ASD symptoms.The second operation conducted to build the context was the selection of the items.In the current application, all items of the ADOS-2 were included in the context.To have a clearer picture of the instrument, the five main parts of the tool were analysed separately.Therefore, the items of the ADOS-2 were subdivided into five different contexts (one for each part).Finally, six experts (four psychologists and two child psychiatrists) in the field of ASD and in the administration of the ADOS-2 independently performed the item/attribute assignment.Disagreements among raters were solved through direct discussion about the specific criticisms.
Whenever a clinical context is built, five important and informative configurations may occur due to the itemattribute assignment.First, some empty rows may occur (i.e., rows of the matrix that contain only 0s), meaning that the specific item does not investigate any of the selected attributes.Second, some empty columns may occur, indicating that none of the items of the tool investigate that specific attribute.This is crucial because it indicates that the assessment tool at hand does not investigate some of the selected (and thus necessary) attributes.Third, equal columns may occur (i.e., columns having 1s in the same positions) indicating that it is impossible to distinguish between the presence/absence of either of the equivalent attributes in the assessment.Fourth, equivalent rows may occur, which indicate equivalent items (i.e., items that convey, from a clinical perspective, the same information).These items are redundant and thus, in terms of efficiency, could be collapsed.Fifth, some rows are present that are included in other rows (i.e., the 1s in one row are all present in the other row that also has some other 1s).This last case is important because the included items represent so-called prerequisites of the included ones.In other words, in this case there is one item that investigates some attributes and another item that investigates all the attributes of the first item, plus some other attributes.Therefore, from a clinical perspective, an affirmative answer to this last item implies an affirmative answer to the first one as well.
The last step of the application of the FPA is the statistical testing of the robustness of the model in describing the collected data.To this aim, the BLIM parameters have been estimated starting from a dataset of patients evaluated by means of the ADOS-2.Parameter estimation has been iteratively carried out based on the Expectation-Maximization (EM) algorithm [9,13,15].Through this procedure, we were able to obtain an estimate of the two error parameters for each item (false negative and false positive) and the probability of each clinical state in the structure.The procedure computes a fit statistic based on Pearson's Chi-square; this statistic, due to the sparseness of this kind of data matrix, is not reliable.Therefore, the obtained value has been tested by means of a parametric bootstrap (with 5000 replications) to check for the reliability of the obtained value.Of course, the estimate of the error rates for each item can be viewed as fit indices.It has been observed [9] that, in general, such values must be as low as possible.More precisely, the following condition must hold true for every item: η q < 1-β q .In other words, a positive answer to an item is more likely to occur if the patient has its attributes, rather than as the result of a false positive on the item.In the next section, the results obtained from the construction and analysis of the contexts are displayed.
The BLIM was defined and tested on the collected data.

T Module
For this module, 18 attributes were identified as relevant for the assessment of ASD, namely attributes A2, A7, A3, A5, A8, A12, A13, A16, A4, A6, A11, A15, A1, A9, A10, A14, A17 and A18 in Table 1.For the T module, there are no empty rows.This means that there are no useless items with respect to the selected attributes.Moreover, even if no empty columns were found, then all the selected attributes are investigated by at least one item of this module.In this module, equal columns are not present, but there are equivalent rows.In fact, two items (free play and free play-ball) of the T module evaluate the same set of attributes.These items are repetitive and thus, in terms of efficiency, one of them could be deleted to reduce the duration of the test and stress on the child (Table 2).The analysis showed that each item evaluates at least one selected attribute.This information shows that the T module assesses the main abilities to make ASD diagnosis in the age range of 12-30 months.However, two items (free play and free play-ball) perfectly evaluate the same attributes.Thus, these items should be considered repetitive.
For this module we found two equivalent rows for module 1 (symbolic and functional imitation and birthday party).These items evaluate the same attributes and can be considered repetitive.

Clinical Structures
The clinical structure is the formal and graphical representation of the relations among the items of the context and, (Table 5) contd..... furthermore, it can display the relations between sets of items and sets of attributes.It is usually represented by a graph in which every node is a collection of items (i.e., the clinical state) and the corresponding set of attributes (i.e., the set of criteria satisfied by a patient having that clinical state).The structure is built according to both the relations among items and the attributes depicted in the clinical context and the prerequisite relations depicted by the same context.Some of the crucial configurations regarding the rows and columns of the context have precise counterparts in the structure.For instance, an empty row (i.e., an item that does not investigate any of the selected attributes) will result in the absence of that item in the structure.The same happens to empty columns.When two rows are equivalent, the corresponding items will be included systematically in the same states.Finally, the prerequisite relation occurring whenever the attributes investigated by an item i are a subset of those investigated by j in terms of states in the structure means that there will not be any state including j that does not also include i.

T Module
There are 14 items in the T module.The obtained structure has 1156 states.Remembering that if no relations among items were present, the total theoretical response patterns would be 2 14 = 16384 (i.e. the cardinality of the power set of the domain).The obtained structure contains only 7% of the potential patterns, indicating how an adaptive version of the instrument could be much more efficient in evaluating patients.The application of this analysis demonstrates the existence of prerequisite relations between several items of the T module.In particular, item 1 is a prerequisite of items 7 and 8; item 2 is a prerequisite of item 13; items 3 and 6 are prerequisites of item 7; and item 12 is a prerequisite of item 13 (Table 2).
This implies that if the child does not pass item 1, and item 1 is a prerequisite of other items, it is not necessary to administer those items.

Module 1
There are 9 items in module 1.The obtained structure has 224 states, 43% of the potential patterns.The application of this analysis demonstrates the existence of a prerequisite relation between two items of module 1.In particular, item 1 is a prerequisite of item 5 (Table 3).This implies that if the child does not pass item 1, it is not necessary to administer item 5.

Module 2
There are 14 items in module 2. The obtained structure has 1384 states, 8% of the potential patterns.The application of this analysis demonstrates the existence of prerequisite relations between several items of the T module.In particular, item 1 is a prerequisite of item 2; in addition, item 4 is a prerequisite of items 9 and 10; and item 9 is a prerequisite of item 10 (Table 4).This analysis demonstrates that it is not necessary to administer item 2 (free play) if the child does not pass item 1 (bubbles-play), because item 1 is a prerequisite of item 2.Besides this, item 10 no longer needs to be administered if the child does not pass items 4 and 9.

Module 3
There are 13 items in module 3. The obtained structure has 1136 states, which is 13% of the potential patterns.The application of this analysis demonstrates the existence of prerequisite relations between several items of module 3.In particular, item 1 is a prerequisite of items 2, 3 and 4. In addition, items 4, 5 and 6 are prerequisites of item 3; and item 8 is a prerequisite of items 3 and 9 (Table 5).This suggests that the order of appearance of items in the module should be reviewed.

Module 4
There are 13 items in module 4. The obtained structure has 592 states, 7% of the potential patterns, indicating how an adaptive process could be much more efficient in evaluating patients.The application of this analysis demonstrates the existence of prerequisite relations between several items of module 4. In particular, items 1, 2, 3 and 4 are prerequisites of item 5; item 3 is a prerequisite of item 12; item 7 is a prerequisite of items 8 and 10; finally, item 10 is a prerequisite of item 8 (Table 6).This result means that the clinician can select the items based on the item the child passed previously.In this way, it is possible to reduce the time it takes to administer the test.

Clinical Structures Validation: Application to T Module and Module 1
The clinical structure of both the T module and module 1 were tested using the BLIM.The fit of the model parameters was estimated by means of the EM procedure [9,15].To test the reliability of the observed Chi-square, a parametric bootstrap with 5000 replications was conducted for each of the two modules [13].Concerning the T module, the results showed that the model adequately fit the data (x 2 (260,952) = 757.04;p > .05;number of states = 1,156).The reliability of this result was confirmed by the parametric bootstrap (bootstrap p > .05).Moreover, the estimates of the error rates for the T module, reported in Table 7, satisfy the condition of acceptability (i.e., β q + η q < 1) for every item.On the other hand, for module 1, the model fit the data quite well (x 2 (65, 280) = 17, 585; p > .05;number of states = 224) even though the reliability of the observed results is not supported by the bootstrap (bootstrap p > .05).Nonetheless, as can be seen in Table 8, even for this module the error rate estimates are acceptable for every item.
In general, it can be concluded that the two models accurately explain the observed data.The estimates can then be implemented into an algorithm constituting an adaptive tool for the assessment of the diagnostic issues for ASD.Such a tool is derived from the ADOS-2 by removing its redundancies and focusing on the specific issues useful for diagnosis.

GENERAL DISCUSSION
The aim of the present study was to analyse the single modules of ADOS-2 using the FPA methodology.Diagnosis is a fundamental process that determines which disorder or condition explains the symptoms of a person.Clinical assessment can be described as an intelligent procedure the clinician carries out with the aim of collecting information about a patient to formulate the diagnosis and propose therapeutic treatment.The investigation proceeds through a sequence of hypothesis formulations and validations [7].This process is more difficult in neurodevelopmental disorders where it is not possible to ask patients to describe their symptoms (such as in the T module or module 1) or to measure the general capacity of the patient's social and relational abilities (for example in modules 2, 3 and 4).In addition, the information provided by caregivers through diagnostic interviews is not always related to the hypothesis of a clinician.For this reason, it is important to maximize and optimize the collected information, reduce the errors, focus on the critical areas of an individual (e.g., deficit in social interaction), and reduce the number of items and time of evaluation, while trying to preserve its accuracy.We hypothesized that the reduction of the number of items considered redundant by the FPA would correlate with the improvement of the quality and quantity of information collected [7].Notice that such reduction has no effect on the amount of information collected and, in general, it does not necessarily mean the removal of the item from the tool.In fact, the adaptive assessment procedure could store the redundant items into a buffer and use them if needed.For example, the answers may not be coherent and, therefore, it would be necessary to administer the same item twice.In this case, having the redundant items available in the buffer would allow the clinician to investigate the same attributes twice, without repeating the same item twice.This is a strong advantage of such an approach.This methodology could be applied to all clinical tests used in the autism field (such as the Autism Diagnostic Interview -Revised, Vineland), to reduce the weight of items without diagnostic or repetitive significance based on evidence from an FPA analysis and the validation of its clinical structure.Our goal is to improve the diagnostic process for ASD by making it adaptive.This adaptive system can perform logically correct inferences based on all the information collected during the testing process [7].The system is adaptive in the sense that the question posed by the system at a given moment depends on the previously collected answers of the patient or behaviours observed.The adaptivity of this system lets it dynamically choose the best sequence of items or stimulus situations to be posed to maximize the informational content of each answer [7,16].In this way, the system can often avoid posing the entire set of questions by inferring the answers to logically connected questions, thus saving time.A previous study found that the application of an adaptive version of a questionnaire allowed people to reach the end of the assessment without using 25-50% of the items [7].Once an adaptive algorithm is defined that can take into account the ADOS-2 assessment modality, a similar time savings is expected.

CONCLUSION
The aim of this study was to use a practical application of the FPA on the ADOS-2 and show its potential for psychological assessment for individuals with ASD.An important limit of the ADOS-2, like all tests based on observation, is that diagnostic characteristics seem to change among clinicians and the results for the same individual can be different if the evaluation is subsequently repeated.Unfortunately, the FPA method cannot enhance the reliability of the ADOS-2, but the main aim is to create an adaptive testing system for ASD diagnosis.We suppose that the reduction of the number of items considered repetitive by the FPA could to be correlated with an improvement of the quality and quantity of information collected.This issue represents a future goal of our research on autism.Nonetheless, some work is being carried out to provide new methodological tools for an accurate assessment of the inter-rater agreement for tasks like those involved in the administration of the ADOS-2 [17].In addition, another limitation of this study is that the items of the ADOS-2 are then assigned to the attributes by six experts (four psychologists and two child psychiatrists) in the field of ASD and in the administration of the ADOS-2, thus these assignments are subjective.
Nevertheless, we believe our study is important and addresses a critical issue in the literature on autism.This study represents the first step toward creating an adaptive testing system for ASD based on an existing clinical test that would improve the diagnosis of ASD and assist clinicians in their diagnosis formulation.For example, if a clinician assigns the score 2 to an item that evaluates "used gestures to indicate requests", the system, using the updating rule, increases the likelihood of the states containing the same attributes; on the contrary, if the clinician assigns the score 0 to this item, the system decreases the other attributes [7,16].
In line with Donadello and colleagues [7], our ambitious objective is to develop a software product that can assist clinicians with the assessment of ASD.

FUNDING
The author(s) received no financial support for the research, authorship, and/or publication of this article.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
Not applicable.

HUMAN AND ANIMAL RIGHTS
No animals/humans were used for studies that are the basis of this review.

CONSENT FOR PUBLICATION
Not applicable.

DECLARATION OF CONFLICTING INTERESTS
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Table 3 .
The module 1 containing the 9 items and the 16 attributes.The items named functional and symbolic imitation and birthday party have been combined into a single item because they are equivalent.The A9 has been not inserted in the table because not examined by any items of this module.-A13A2 A5 A7 A12 A3 A11 A16 A4 A6 A8 A15 A1 A10 A14 A17 TOTAL 1. Bubbles-play 1

Table 5 .
The module 3 containing the 13 items and the 21 attributes.The items named Social difficulties and annoyance and friends, relationship and marriage have been combined a single item because they are equivalent.The A1, A2, A5, A9, A10, A11, A12, A15 and A17 have been not inserted in the table because not examined by any items of this module.-A22 A3 A8 A27 A29 A6 A21 A23 A30 A4 A13 A14* A25* A26* columns **equal columns ***equal columns.

Table 1 . List of attributes of the clinical context for the ASD diagnosis.
∈∩Formal