Summary
Objectives: Understanding the progression of comorbid neurodevelopmental disorders (NDD) during
different critical time periods may contribute to our comprehension of the underlying
pathophysiology of NDDs. The objective of our study was to identify frequent temporal
sequences of developmental diagnoses in noisy patient data.
Methods: We used a data set of 2810 patients, documenting NDD diagnoses given to them by an
NDD expert at a child developmental center during multiple visits at different ages.
Extensive preprocessing steps were developed in order to allow the data set to be
processed by an efficient sequence mining algorithm (SPADE).
Results: The discovered sequences were validated by cross validation for 10 iterations; all
correlation coefficients for support, con -fidence and lift measures were above 0.75
and their proportions were similar. No significant differences between the distributions
of sequences were found using KolmogorovSmirnov test.
Conclusions: We have demonstrated the feasibility of using the SPADE algorithm for discovery of
valid temporal sequences of co-morbid disorders in children with NDDs. The identification
of such sequences would be beneficial from clinical and research perspectives. Moreover,
these sequences could serve as features for developing a full-fledged temporal predictive
model.
Keywords
Sequence mining - SPADE - neurodevelopmental disorders - comorbidity