Keywords dental record - data quality - electronic health record
Background and Significance
Background and Significance
The increasing availability of electronic health record (EHR) data is enabling significant
insights into the health profiles and treatment outcomes of diverse patient cohorts
in real-world clinical settings.[1 ] Harnessing EHR data for research can increase efficiency,[2 ]
[3 ] lower costs,[2 ] include the study of all patients, and facilitate longitudinal studies that are
not possible with traditional methods.[4 ] In the last decade, electronic dental record (EDR) data have been increasingly used
for clinical research and quality improvement purposes in academic settings and larger
health care systems.[5 ]
[6 ]
[7 ]
[8 ]
[9 ]
[10 ] Studies in Europe and Canada have also utilized longitudinal data from solo and
group practices to assess treatment outcomes, such as longevity of composite versus
amalgam restorations.[11 ]
[12 ]
[13 ]
[14 ] These authors highlighted the need for more studies from practice-based contexts
to study more-diverse patient cohorts and restorative procedures not performed in
well-controlled randomized controlled trials. To the best of our knowledge, no study
has determined the feasibility of utilizing EDR data in the United States to assess
treatment outcomes in routine community clinical settings where most people receive
their dental care.
When reusing EHR and EDR data for research, completeness and accuracy of data can
be important limitations, because these data are recorded for patient care, not research
purposes. It is well-recognized that data gathered for a specific purpose may not
be beneficial for another purpose.[15 ] Therefore, it is important to assess the appropriateness of reusing the data for
research and understand their limitations. Moreover, with the application of machine
learning algorithms using EHR data in health care, it is important to examine the
underlying data for potential misclassification, and missing data that may influence
the model. Studies in medicine have established several data quality measures to assess
EHR data.[16 ]
[17 ]
[18 ] Weiskopf et al. identified completeness, correctness, and currency as fundamental
measures of EHR data quality. According to them, completeness refers to “whether or
not a truth about a patient was present in the EHR.” Correctness is “closeness of
agreement between a data value and the true value”[19 ] and currency indicates whether the data are “representative of a patient state at
a desired time of interest.” They also emphasized the need to define correctness and
completeness of EHR-derived datasets based on the context of the study.[16 ]
A few studies have evaluated the completeness and correctness of EDR data in academic
institutions to identify gaps and improve dental students' training for patient care
documentation.[20 ]
[21 ]
[22 ] These studies indicate the need to establish a systematic process to evaluate the
data quality of EDR-derived datasets and to promote fidelity and reproducibility of
secondary data analysis.
Objectives
The primary objective of this study was to determine the feasibility of extracting
and utilizing EDR data from U.S. solo and small-group dental practices in the National
Dental Practice-Based Research Network[23 ] for clinical research. The second objective was to evaluate the completeness and
correctness of the data required to perform survival analyses of posterior composite
restorations (PCR) and root canal treatments (RCT) performed on permanent teeth in
network practices. To inform researchers and clinicians interested in leveraging EDR
data for research and quality improvement purposes, we also report in detail the process
followed to generate the dataset.
Methods
Practice Recruitment
We recruited network practices, which consisted of small group and solo general dental
practices that met the following inclusion criteria: used EDR Dentrix (Henry Schein
One, American Fork, Utah, United States) or EagleSoft (Patterson Dental, St Paul,
Minnesota, United States) for at least 5 years; maintained electronic clinical information
of at least existing conditions and treatment performed; placed at least one PCR on
a permanent tooth on at least 100 patients or performed at least one RCT on a permanent
tooth on at least 50 patients; had follow-up electronic data available for at least
2 years and had performed either of these procedures or both procedures between January
1, 2000 and October 31, 2015.
As part of enrolling in the network, practices completed an enrollment questionnaire regarding characteristics of their practice. The network's Regional Coordinators
identified eligible practices by reviewing data from the network's enrollment questionnaire
and contacted via email only those who used one of the two specified EDRs for at least
5 years. Upon receiving a response indicating interest, the coordinators confirmed
that interested eligible practices met the remaining study inclusion criteria. The
practices were provided a brochure that described the study and the process required to share de-identified data with researchers. If a practice had additional questions
or concerns that the regional coordinators could not address, the principal investigator
(T.P.T.) responded via email or telephone. Once practices agreed to participate, they
signed the network-specific consents and necessary data sharing agreements with Indiana
University (IU).
Data Transfer
Once consented, practices contacted their respective EDR vendors (Dentrix or EagleSoft)
to share de-identified data based on the study criteria with the research team ([Fig. 1 ]). After obtaining appropriate permissions from the practice owner, the vendors securely
accessed and extracted the practice's data and generated de-identified databases containing
the study data. The databases included de-identified treatment records of patients
who had a PCR or RCT performed on permanent teeth through October 31, 2015. [Supplementary Appendix Table S1 ] (available in the online version) lists the American Dental Association (ADA) code
on dental procedures and nomenclature (CDT codes) for PCR and RCT used to identify
the study records.[24 ] We included only procedures that contained the last four digits of the codes. The
PCR codes include only resin or resin-based restorations placed on posterior teeth
([Supplementary Appendix Table S1 ], available in the online version). While codes 2385 to 2388 are retired codes used
for PCR, codes 2391 to 2394 are current codes used for PCR as well as for restorations
that used glass ionomer or other resin-based restorative materials. The vendors de-identified
the data, which included offsetting the dates and redacting all identifiers according
to the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule and
transferred the study data to researchers through an encrypted online portal maintained
by IU.
Fig. 1 Flow diagram demonstrating the steps involved in (A ) extracting the electronic dental record data from 99 practices and transferring
to the research team and (B ) generating the final study dataset for analysis.
Data Query and Creation of the Study Dataset
To aggregate data from Dentrix and EagleSoft databases, we first determined the tables
and fields that contained the variables necessary to perform our research. These variables
included demographic information, such as patients' gender, age (date of birth [DOB]),
and the presence of dental insurance anytime during patients' dental care; provider
information; and dental charting information regarding existing services, conditions,
and completed CDT procedure codes at the tooth and tooth surface level. We wrote custom
software scripts and structured query language queries to extract relevant study variables
from the de-identified databases and stored the extracted data from each practice
as a text file ([Fig. 1B ]). The individual data files were then loaded into a central repository (relational
database) for use by the study team.
Data Quality Assessment
We used the work performed by Weiskopf et al[16 ]
[17 ] as a guide by which we defined two dimensions of data quality which we could assess:
completeness and correctness of the EDR data variables needed to perform survival
analysis of two treatment outcomes: RCT and PCR performed on permanent teeth.
For data completeness, we assessed the percentage of missing data for specific variables
of interest: patient identification number, DOB, gender, tooth number, and date of
procedure, and tooth surface for PCR procedures. Next, we removed all PCR and RCT
procedures that were recorded as “existing services” because they were performed at
another practice.
Although we could not directly compare the data to the actual patients, we defined
the data to be reasonably correct if the distribution matched that of the overall
population in previous studies.[25 ]
[26 ] We evaluated DOB, tooth number, tooth surface, and procedure codes for RCT and PCR.
We first created frequency distributions of each of the variables and found that each
of them were comparable to the population. In addition, we removed data which were
not reasonable, using the following metrics. A patients' DOB was assessed as incorrect
if the patients' calculated age was less than 0 years or greater than 100 years. A
tooth number was considered incorrect if it was outside the range 1 to 32, or tooth
numbers were represented as tooth ranges (e.g., 2–8). Observations that represented
tooth surfaces other than facial (F), buccal (B), mesial (M), distal (D), lingual
(L), and occlusal (O) were considered incorrect. For procedure codes, we determined
the correctness of PCR and RCT by considering the correctness of procedure codes entered
for the respective tooth type. For example, a PCR procedure code was defined as incorrect
if the procedure was performed on an anterior tooth. An RCT code was defined as incorrect
if an RCT code was entered for the same tooth within 90 days, which is considered
a follow-up procedure of the first RCT code for that tooth.
Data Analysis
We summarized the practice and patient characteristics using descriptive analysis.
We performed a descriptive analysis of patients' age and gender based on their having
received PCR, RCT, or both PCR and RCT treatments. For this analysis, we included
only teeth 1 to 32 according to the Universal/National numbering system used in the
United States.[27 ] Patient age was calculated as the difference between a patient's first PCR and/or
RCT procedure and the patients' DOB. We calculated the observation time as number
of years between a patient's first and last visit in a dental office. Patient characteristics
were analyzed at the patient level, while PCR and RCT information were analyzed at
the tooth level (tooth type, arch location [maxilla, mandible]) and PCR procedures,
at the tooth surface level.
Results
Practices Recruited
We recruited 99 network practices that used Dentrix or EagleSoft EDR and who shared
de-identified data of patients who received RCT and/or PCR on permanent teeth through
October 31, 2015. Fifty-seven practices used Dentrix; the remaining 42 practices used
EagleSoft. [Table 1 ] displays the number of practices that shared their data across the six network regions
and the number of RCT and PCR procedures present in each region. Given the geographic
distance between practices, we determined it was highly unlikely that practices shared
patients.
Table 1
Number of study practices, number of teeth with posterior composite restorations (PCR)
and root canal treatment (RCT) procedures across the 6 network regions
Network regions (number of practices)
PCR by number of surfaces
RCT by tooth type
One-surface
Two-surfaces
Three or >surfaces
Total
Anterior
Bicuspid
Molar
Total
Western (16)
54,406
50,343
23,780
128,529
1,460
2,908
4,415
8,783
Midwest (13)
43,403
37,315
15,390
96,108
1,938
3,371
4,701
10,010
Southwest (23)
81,792
77,126
23,925
182,843
2,585
4,885
7,575
15,045
South Central (16)
88,510
70,631
28,585
187,726
3,188
5,102
7,344
15,634
South Atlantic (16)
51,926
46,216
17,493
115,635
3,089
4,689
5,371
13,149
Northeast (15)
51,900
43,639
22,737
118,276
1,880
2,752
4,005
8,637
Total (99)
371,937
325,270
131,910
829,117
14,140
23,707
33,411
71,258
Practice Characteristics
The distribution of practice types, as reported in the enrollment questionnaire , is shown in [Table 2 ]. Only two providers reported being Hispanic, and the gender breakdown was predominately
male (81 males). The mean age of the providers was 49.4 years with a mean of 22.4
years of clinical experience, based on the number of years since graduation from dental
school. A total of 96 out of 99 practices provided information on the enrollment questionnaire
regarding the age distribution and insurance coverage of their patients ([Table 3 ]).
Table 2
Characteristics of participating practices
Practice type
Number reported[a ] (n = 119)
General practitioner
98
Pediatric dentist
4
Endodontist
3
Oral/Maxillofacial surgeon
1
Orthodontist
5
Periodontist
4
Prosthodontist
4
Race/ethnicity of practice provider
Number (n = 98)
White
83
African American
2
American Indian or Alaska Native
1
Asian
8
Other
4
Age distribution of patients in 96 practices
Percentage of patients
1–18 y
18.8
19–44 y
29.5
45–64 y
33.3
65 y or older
18.4
Patients' dental visit characteristics
Percentage of patients
Regular
67.9
Irregular
15.1
Emergency only
10
Only one visit
7
a Numbers do not add to 99 because some practices reported multiple specialties in
addition to general dentistry.
Table 3
Table showing the number (%) of patients by the observation time between the first
and last visit
Time in years
Number of patients (%)
N = 217,887
No follow-up
32,922
(15.1)
Up to 5 y
91,289
(41.9)
>5 and ≤10 y
48,195
(22.1)
>10 and ≤15 y
31,078
(14.3)
>15 and ≤20 y
11,989
(5.5)
>20 y
2,414
(1.1)
The reported race/ethnicity of patients, based on enrollment questionnaire data, was
mostly white (71.9%), followed by African American (12.3%), Asian (7.5%), American
Indian (1.7%), and Hawaiian/Pacific Islander (0.9%). A total of 11.94% of patients
were reported as being Hispanic. Gender percentages of patients were not reported
in the enrollment questionnaire. They reported 65% having private insurance, 9% having
public insurance, 24% having no insurance, and 2% receiving a reduced fee. Finally,
these practices reported the visit characteristics of their patients ([Table 2 ]).
Data Quality and the Final Dataset
[Fig. 2 ] demonstrates the steps involved in assessing the data quality and generating the
final dataset to perform descriptive and survival analysis. As shown in the figure,
the final dataset consisted of 217,887 patients and 11,289,594 procedures. These patients'
data included 0 to 37 years of observation time. As [Table 3 ] shows, approximately 15% of the patients did not return after their first PCR or
RCT. Forty-two percent of the patients had up to 5 years of observation time, and
approximately 1% of the patients had more than 20 years of observation time after
their first PCR or RCT.
Fig. 2 Generating the final dataset. CDT code: code on dental procedures and nomenclature;
DOB, date of birth; PCR, posterior composite restorations; RCT, root canal treatment.
Data Completeness
Gender was documented for all patients; race/ethnicity was missing in the EDR data
from both systems. The completeness of variables needed to perform survival analyses
of PCR was 99.73%, with 2,480 observations missing for tooth number, tooth surfaces,
or date of treatment. The completeness of variables needed to perform survival analyses
of RCT was 99.61%, with 284 observations missing for tooth number or date of treatment.
Of the 217,887 patients, 159,028 patients (73%) had documented insurance information
at least once during their dental care history; the remaining patients (27%) lacked
documentation of any insurance information. About 57,685 (27%) of the 217,887 patients
did not have any insurance coverage during their dental care.
Data Correctness
Missing Teeth, Primary Teeth, and Supernumerary Teeth for PCR
We considered the tooth number field incorrect for PCR procedures if the following
were present: tooth number was missing, tooth number indicated primary teeth, supernumerary
teeth (indicated as larger than 32), and tooth numbers indicated as anterior teeth.
[Fig. 2 ] and [Supplementary Appendix Table S2 ] (available in the online version) list the number of observations with missing tooth
number, primary teeth, and supernumerary teeth. The primary teeth and supernumerary
teeth could have been present in the treatment history of patients less than 18 years
who had a PCR and/or RCT on a permanent tooth. [Fig. 2 ] shows the number of observations where PCR code was incorrectly applied for anterior
teeth restorations and observations with missing tooth surface information. The final
dataset included 829,117 PCR observations in 203,212 patients, as shown in [Fig. 2 ].
Missing Teeth, Primary Teeth, Supernumerary Teeth, and Tooth Ranges for RCT
We found instances of missing tooth number, presence of primary teeth numbers (B,
E, F, K, O, and T: one each), tooth ranges ([Fig. 2 ] and [Supplementary Appendix Table S3 ], available in the online version) typically indicating multitooth procedures and
supernumerary teeth. We removed these 324 observations. As [Fig. 2 ] shows, 1,344 RCT observations entered for the same tooth and the same patient in
90 days, which we considered the continuation of the same procedure on the same tooth.
After removing these observations, the final dataset included 71,258 RCT observations
in 46,700 patients.
We also found 949 observations where the current version of CDT codes for RCT was
coded incorrectly by tooth type. [Supplementary Appendix Table S4 ] (available in the online version) lists the number of RCT observations coded for
the wrong tooth type. This error may have occurred due to changes in the CDT codes
for RCT. The present RCT codes are indicated by code 3310 for anterior tooth, 3320
for bicuspid, and 3330 for molar teeth. Previously, RCT CDT codes were based on the
number of canals, with 3310 for a tooth with one canal, 3320 for a tooth with two
canals, and 3330 for a tooth with 3 or more canals. We fixed this error by recoding
the RCT procedure for each tooth type according to the tooth number listed in the
dataset, which is 100% reliable ([Supplementary Appendix Table S4 ], available in the online version). For example, tooth number 7, which is the upper
right lateral incisor is matched to the code 3310 for anterior tooth.
In our examination of the CDT codes, we discovered that individual practices could
enter the CDT codes as free text, and this created variations among the use of CDT
codes. These differences included the use of letters either before or after the code
([Supplementary Appendix Table S5 ], available in the online version). In some cases, these altered codes still indicated
the same procedure, while in others, the use of letters indicated a separate but related
procedure. We determined that using all CDT codes that ended in the proper four-digit
numerical code would be equivalent to the four-digit code alone because some EDR systems
coded CDT codes using a “D” or “0” at the beginning. Because the data spanned many
years, we had to account for multiple versions of CDT codes.
Final Dataset
The final dataset consisted of 217,887 unique patients with 900,375 observations for
PCR and RCT procedures. These 217,887 patients had a total of 11,289,594 observations
after including all dental visits. [Fig. 2 ] and [Supplementary Appendix Table S6 ] (available in the online version) show the number of patients by age and gender
who received PCR, RCT, and both PCR and RCT. Nineteen percent of patients were less
than 18 years old, 46% of patients were ages 18 to 44 years, 27% of patients were
ages 45 to 64 years, and 8% of patients were 65 years or older.
PCR (N = 829,117) were performed almost equally on the maxillary and mandibular teeth ([Supplementary Appendix Table S7 ], available in the online version). Sixty-six percent of the PCR were performed on
molars, 52% of which were mandibular molars. Among the PCR performed on bicuspids,
55% were performed on the maxillary teeth. For the RCT (N = 71,258), 55% were performed on the maxillary teeth; 47% of the RCT was performed
on the molar teeth, followed by bicuspids and anterior teeth; 57% of the molar RCT
was performed on the mandibular molar teeth, and 59% of the bicuspid RCT and 76% of
anterior RCT were performed on the maxillary bicuspids and anterior maxillary teeth,
respectively ([Supplementary Appendix Table S7 ], available in the online version).
Discussion
This study demonstrated the feasibility of utilizing EDR data integrated from multiple
distinct solo and small-group network practices for longitudinal studies to assess
treatment outcomes. We established a process to extract de-identified data from practices
that used two different EDR systems and assessed the data completeness and correctness
to perform survival analysis of RCT and PCR. Major findings of this investigation
include: the near 100% completeness and high percentage of correct data; characterization
of the incorrect data that may occur for these data types; presence of longer observation
times of patients who received a PCR or RCT; unavailability of race/ethnicity data
and the ability to study insured and uninsured populations of patients who sought
treatment in these practices. Through this study, we intended to highlight the importance
of understanding the data before analysis to identify the biases that could occur
due to the health care recording process.
As [Fig. 2 ] demonstrates, less than 1% of the patients had a missing DOB, and less than 0.1%
had incorrect DOB. Approximately 8% of PCR observations had incorrect tooth numbers
([Supplementary Appendix Table S2 ], available in the online version), which were mostly primary or supernumerary tooth
numbers. They were considered incorrect because we intended to limit survival analysis
to permanent teeth. The rates of missing tooth numbers, incorrect/missing tooth surface
information, and incorrect entry of PCR/RCT code ([Fig. 2 ]) were low, at less than 1%. We also observed less variation in these data across
practices. These results demonstrate that EDR data from solo and small-group practices
in the United States can be utilized to study patient populations in real-world settings
and to assess the longevity of dental procedures at the practice level. Moreover,
we can analyze differences in outcomes across different geographic regions.
A major concern regarding the use of EDR data for research is the loss of patients
following the initial patient visit to the dental office. Our results indicated that
85% of patient records had at least 5 years of observation time, out of which, 22%
had 5 to 10 years, 14% 10 to 15 years, and 5% had 15 to 20 years of observation time
([Table 3 ]). Having access to information over such long time enables researchers to perform
longitudinal studies such as survival analysis at much lower costs compared with prospective
clinical studies. Only 15% of patient records did not have observations following
the initial date of performing a PCR or RCT. This finding is comparable to reports
that private practices experience approximately 17% of patient attrition annually[28 ] due to patients changing dental practices. Our results indicate the potential of
using EDR data for researchers and practitioners to study long-term outcomes of various
treatments, which has not been possible previously.
Contrasted to claims data, this dataset provided access to both insured and noninsured
populations, permitting us to study treatment outcomes in patients without dental
insurance coverage. In this study, we retrieved 72% of the patients' dental insurance
information; only 27% of the patient records lacked insurance documentation. Also,
27% of the patients did not have insurance coverage (coincidentally the same percentage
as those who did not have any information about coverage), which is consistent with
existing reports of dental insurance coverage among U.S. adults. We believe insurance
information was not available specific to the date of treatment because they were
stored as a patient characteristic in the EDR systems. Therefore, obtaining de-identified
data without patient identifiers may have prevented us from obtaining treatment-specific
insurance coverage. Previous studies utilizing EDR data have worked with limited datasets
that included actual dates of treatment and birth.[8 ]
[10 ] Further work is needed to obtain treatment-specific insurance coverage because finances
play a major role in the patient's decision to receive dental treatment.
As with any study, we encountered some limitations. First, race and ethnicity were
not available in these EDR data. The vendors confirmed that their EDR systems were
not designed to capture race and ethnicity. This information was reported in EDR data-based
studies from academic and large health care systems. While race and ethnicity may
not be essential information for patient care purposes, future studies would surely
want to examine differences among different demographic groups to evaluate health
disparities and health outcomes for subpopulations.[29 ] Second, we did not assess the decayed missing filled teeth and caries risk status
because these data may be recorded as structured data or in clinical notes requiring
more data processing. Future work should consider retrieving information such as patient
and practitioner characteristics such as oral hygiene, frequency, and history of performing
certain procedure, which can be derived from EDR data. Also, a better understanding
of documentation practices is warranted because the potential for variations in documenting
diagnosis and findings is high across practices. We also wanted to focus on the retrieval
and completeness of data essential to perform survival analyses of treatments.
Finally, network members are not recruited randomly, so factors associated with network
participation (e.g., an interest in clinical research) may make network dentists unrepresentative
of dentists at large. While we cannot assert that network dentists are entirely representative,
we can state that they have much in common with dentists at large, while also offering
substantial diversity in these characteristics.[30 ]
[31 ] This assertion is warranted because: (1) substantial percentages of network general
dentists are represented in the various response categories of the characteristics
in the enrollment questionnaire; (2) findings from several network studies document
that network general dentists report patterns of diagnosis and treatment are similar
to patterns determined from non-network general dentists,[32 ]
[33 ]
[34 ]
[35 ] and the similarity of network dentists to non-network dentists based on characteristics
was reported in the 2010 ADA Survey of Dental Practice.[36 ]
Conclusion
Despite these drawbacks, this study demonstrated the feasibility of leveraging EDR
data to establish a learning health system for practitioners to gain insights about
their patients' treatment outcomes. We also established a process for solo- and small
group practices to share their data for research, and to learn about treatment outcomes.
Thus, this work has laid the groundwork to establish a clinical data warehouse of
solo and small-group practices data similar to the Big Mouth data[37 ] and virtual data warehouse repositories[38 ] that include dental data from academic and health care system settings respectively
in the United States.
Clinical Relevance Statement
Clinical Relevance Statement
This study established a patient cohort using electronic dental record (EDR) data
from multiple community practices who use different EDR systems to assess the longevity
of two commonly performed dental treatments. We also describe methods to assess the
completeness and correctness of the data. The results from this study laid the groundwork
to establish a learning health system that enables practitioners to learn about their
patients' outcomes by utilizing data from their own practice.
Multiple Choice Questions
Multiple Choice Questions
What are the limitations of reusing Electronic Health Record (EHR) and Electronic
Dental Record (EDR) data for research?
Difficulty accessing data.
Lack of structured data.
Questionable completeness and accuracy of data.
Absence of data quality measures.
Correct Answer: The correct answer is option c, questionable completeness and accuracy of data. The
data recorded are for patient care and not for research. Therefore, the data recorded
for one purpose may not fit to use for another purpose. Thus, reuse of EHR and EDR
data has certain limitations.
What is the main advantage of using electronic dental record data from community practices
for clinical research?
Practitioners learn about different treatment outcomes.
Identify possible reasons for treatment failures.
Enable study of diverse patient cohorts undergoing different treatments.
Learn patient's adherence to treatment.
Correct Answer: The correct answer is option c, enables study of diverse patient cohorts undergoing
different treatments. Studies using electronic dental data from community practices
offer an opportunity to include diverse patient cohorts and treatments provided or
performed in real world settings.