Establishing a Data-Sharing Environment for a 21st-Century Academic Health Center

Abstract Objective The main purpose of this study was to establish a seamless clinical data sharing system in a new medical school in partnership with community health systems. Methods We developed a Data Request Management System (DRMS) and a data request process to streamline access to and management of data for quality improvement, population health, and research. We utilized a four-pronged methodology in implementing our clinical data sharing system: data governance, data extraction, external relationships, and internal engagement. Results The Data Core team of honest data brokers through the established relationships, data use agreements, data request processes, and the DRMS processed more than 50 data requests from all the departments during its first year of operation. The DRMS application and the supporting governance and relationships provided a platform for improved process and accuracy of data sharing environment by facilitating trust, transparency, standardization, and service provisioning. Conclusion Developing a seamless data ecosystem that forms the basis of a learning health system between an academic health center and community health systems requires a combination of people (the Data Core team), processes (common data request process policies and procedures), and technology (an effective online DRMS). Future work is needed to measure the impact of the clinical data sharing system on efficiency and accuracy of data sharing.

seek to integrate healthcare data across organizations for point of care delivery, many barriers still exist limiting integration of data across organizations for public and population health management and health services research. 19 The unique circumstances of launching the Dell Medical School without an initial clinical practice for the school led to a novel development of a Data Core team centralized at the medical school and a unified data request system (DRMS) with distributed access to data from multiple healthcare organizations with disparate data governance, systems, and data structures.

Objectives
As a 21st-century medical school, the Dell Medical School has a mission to improve the health of its community's population as it was created with strong support from the community. 20 The School has set the audacious goal of "rethinking health." This includes developing new models of patient-centered delivery, value-based care, innovative medical training curriculum, and community-based population health strategies. 21 However, each of these changes has to be driven by good data to not only plan interventions and strategies but also to iteratively fine tune and measure the impact of these changes. The objective of the School's data strategy is to assess the existing data ecosystem, understand structures and schema of community health data systems, and help develop a seamless health data sharing environment.
The Dell Medical School interfaces with multiple healthcare organizations with whom it shares patients and providers. These care partners include a large network of hospitals and specialty clinics linked to a national nonprofit health system, federally qualified health centers, the local mental health authority, the county healthcare district, emergency medical services, and others. All have implemented and maintain their own electronic health systems along with their own clinical data warehouses. Even within organizations, there may exist multiple electronic health record (EHR) implementations (inpatient vs. outpatient), databases for legacy systems used prior to more recent EHR implementations, multiple interfaced clinical applications, and multiple custom datasets for quality reporting. In short, the electronic data ecosystem is quite diverse with multiple EHR systems (Cerner, Meditech, Next-Gen, Epic, eClinicalWorks, AthenaHealth, etc.) and highlyfragmented clinical data sources resulting in significant challenges in data aggregation and linkages across the healthcare providers.

Methods
Whether it was the lessons learned from Beacon Community Programs 22,23 funded by the HITECH 2009 federal strategy 24 or other community-based quality improvement efforts, 25,26 the importance of having a data sharing ecosystem has been repeatedly established as a prerequisite for population health improvement. [27][28][29] Based on lessons learned by others' experiences with setting up of data access and management services, 17 including PCORNet 14, 29 and Clinical and Translational Science Awards (CTSAs), 12,13 the Dell Medical School developed a Data Core, a centralized team of data brokers to access, curate, integrate, and manage data for healthcare delivery, quality improvement, clinical health services, and population health research.
A single centralized Data Core has several advantages over placing smaller teams for data management in multiple departments or units of the School. Depending on the question being investigated, there are many sources of data that may be used independently or may require integration across common reference points such as patients or locations. These data sources include multiple disparate healthcare organizations that independently generate and store clinical and administrative data pertaining to the delivery of health care; claims data processed and stored by federal, state, local government, or private payers; social determinants of health including environmental data collected by monitoring agencies or services provided by local social services; demographic and community data from surveys and census; and data generated by patients and caregivers themselves. Focusing energy and efforts in a centralized Data Core allows a team of data managers to develop expertise about these various data sources and how they can be linked, build relationships with data teams in partner organizations, bring efficiency to the process by following a common set of policies and procedures, cross-train each other on skills and knowledge, and back up each other. This data strategy is based on a phased approach, the first phase focusing on accessing relevant data wherever they are stored. This requires understanding these data sources, establishing mutually beneficial relationships with them, streamlining data access processes, and building extract, transform, and load scripts in Structured Query Language (SQL) to provide data to requestors. The second phase aggregates linked data centrally so less effort is needed to connect data from different sources. The second phase also includes extending beyond healthcare data to access data on social determinants of health such as air quality, housing, and education. 30 The third phase of this strategy uses insights from these aggregated data to develop informatics clinical decision support tools, providing timely information when and where it is needed. 31 We began by developing a Data Request Management System (DRMS) based on prior studies of using CTSA data repositories for data requests by researchers. We also examined existing data request forms and processes in our partner healthcare institutions to develop a more inclusive and comprehensive data request form that would support both clinical and research requests. The resulting DRMS request form (►Figs. 1 and 2) includes the purpose of the request, healthcare sites that are the sources of clinical data to be extracted, sources of non-clinical data to be included, expected outcomes to guide the data managers to define specific fields, institutional review board (IRB) status for research, methods to address data privacy, security, and data sharing agreements, and the contact at the partner institution. The results gathered through this form serve as the basis for a conversation between the Data Manager and the requestor, where the former can gain insight into what the  requestor would like to achieve and the latter can gain insight into the structure and availability of the data. Currently, we make data files available to data requestors for their use; we are in the process of setting up a secure environment where the researcher can bring analytic tools to bear on the data without delivering data outside the environment.
We designed the platform interface for ease of use, standardization of request format, and transparency of process. For the technically inclined, we used a web-based platform based on Representational State Transfer Application Programming Interfaces (ReST API) transferring data formatted as JavaScript Object Notation (JSON) elements to develop the DRMS. Data are stored as JSON documents in a No-SQL database (MongoDB) residing on HIPAA-compliant Amazon Web Services (AWS) instances. The Data Core uses the DRMS in the data request processes as a bridging technology. The data request process of DRMS was designed to enhance transparency of data management to both requestors of data and our partner healthcare institutions. For example, we guide communication between both sides to avoid the kind of mutual frustration that previously arose when the request process went along the lines (see examples below): "We want all relevant data for our study" and "What do you mean by 'all': from where, for which patients, for what time frame, etc.?" Data managers provide a centralized source of expertise of defining clinical cohorts understanding the pitfalls of the data (e.g., diagnoses of congestive heart failure are often wrong) and customizing cohorts based on existing variations of definitions to optimize precision or recall of a query by using diagnoses, laboratories, and notes. 32 The system is continually improved based on user feedback, as we strive to improve upon this communication flow.
There were four key methods that helped achieve success in the initial phase: data governance, data extraction, external relationships, and internal engagement.

Data Governance
When accessing clinical data from health systems, most academic institutions and research organizations face difficulties. Governance for sharing sensitive data is an important barrier. 33 We solved that issue by developing shared governance that is facilitated by an online DRMS. Each use of data from health systems is initiated by a specific request that explains its purpose, details of the request, definition of the scope of work, IRB requirements, and data handling and security. The DRMS allows for requestors' acknowledging and agreeing to privacy and security policies related to specific requests or partner organizations, if required. A designated person or process within each partner healthcare organization whose data are being requested has full access to each request specific to their organization and an opportunity to modify or deny the request. A reviewer in the Data Core can also modify or deny the request.

Data Extraction
Data managers from the Dell Medical School's Data Core are given contingent worker status at the partner organizations and provided credentials with direct access to the health care Establishing a Data-Sharing Environment Khurshid et al. e62 and administrative databases to fulfill the requests. Once a request is approved, the Data Core data managers extract the data. There are clear advantages of having data managers from the Data Core team be provided with direct access to partner healthcare databases. Such access saves the health organization's data analysts from being diverted from their regular institutional duties and workload. Historically, barriers to researchers at Dell Medical School receiving timely data from partner institutions included the unavoidable situation where the partner institution's data teams did not have the bandwidth to process requests or the requests did not align with existing priorities of the team. Direct access solves the above issue but also allows the Data Core data managers to better explore and understand the partner health organizations' databases within the security and compliance environment of the partner organization. At the same time, data managers are well-versed in the institutional rules of the university surrounding the use of sensitive data for research. The increased understanding helps the data managers to become effective interpreters and educators for clinicians and researchers at the School, and their active participation in the extraction process reduces the demand on data managers at our partners who are focused on serving their primary constituency. This demand on partner resources is also reduced when data requests are streamlined through one point of access, where centralized policies and procedures of the Data Core evaluate when the requestor has met all governance requirements more easily than can the partner.

External Relationships
Health data moves at the speed of trust. However, trust is built between organizations and among people and not just by policies and rules governing data systems. The Data Core's data managers and leaders invested significant effort in developing mutually beneficial relationships through frequent person-toperson interactions with leaders and data teams from partner organizations. Those with experience in health system databases can appreciate that despite having enterprise data warehouses, linking internal clinical databases requires clinical and technical expertise along with intimate knowledge of the health organizations' data architecture, provenance, and flow. Regular communication strengthens relationships by demonstrating how data generate value for both the health systems and the School while also providing avenues for sharing knowledge gained about strengths and weakness of partner health systems' data. The close relationship between academic health centers and health organizations around data, therefore, provides many opportunities to identify errors, improve quality of data, and explore creative problem-solving for topics of mutual interest.

Internal Engagement
Besides working with data and analysts among partner organizations, engaging researchers, academics, and clinicians internally in the Medical School has been important. Open communication between data managers, clinicians, and researchers requesting data improves the quality of initial data requests, which reduces frustration, and even improves the number and quality of research hypotheses generated. For example, before the DRMS was instituted, Medical School researchers waited more than 6 months for data from a partner organization, ultimately receiving data that did not fulfill their research needs. The requestors knew neither the database structure nor local issues concerning the quality and completeness of the data. Conversely, the partner institution's data analysts had limited communications with the requestors and, with that limited information, did not fully understand their plans or data needs. Such miscommunications are minimized and prevented with an intermediate team of data managers in the Data Core who understand data sources and their limitations along with needs of requestors and hence can act as effective bridges between the requestors and partner organizations. We engage with internal researchers and clinicians through biweekly Data Users Group meetings where data managers present the results of recent data requests, review examples of current requests, and provide researchers and clinicians opportunities to air important issues and share new insights for effective use of these complex data.

Data Governance
Key milestones that help establish data governance with our health partners include signing institutional data use agreements and business associates agreements while building strong personal relationships with key stakeholders. Governance is built on a foundation of trust that identifies a single point of contact for each partner organization. We started with the Medical School's affiliated healthcare organization, which is a community health network and a part of a national hospital system, and iteratively developed a streamlined process for transparency and approval of each data request (►Fig. 3), while assisting our partner organization's data stewards develop their own internal processes for vetting and approving data requests.
Significantly, data stewards at the partner organization have two opportunities to modify or deny each request. First, during the initial data discovery phase, each data plan undergoes scrutiny before approval by the Data Core's clinical director and the assigned data manager. The request is then vetted by the partner organization, with opportunities for revising it. Once approved by both Data Core and partner organization, the Data Core's data manager extracts, cleans, and processes the data to meet the needs of the data request. Before providing the researchers with access to the data, data stewards at the partner organization, who have insight into all of the various steps of DRMS workflow, must approve the release of the data.
The process described in ►Fig. 3 represents the responsibilities of the Data Core, including both touchpoints for review by the partner organization. We have intentionally left these steps simple so additional partner organizations can expect the same straightforward process and control of data released and can specify their own procedures for the steps requiring their approvals. All healthcare organizations have concerns about data privacy and security as well as potential exposures of protected health information. The processes described above provide transparency for any organization working with the Data Core, as an honest data broker, to have details of all the projects and requests that use their data. This provides a mechanism of auditing what data Medical School faculty are accessing and for what purpose.

Data Extraction
We launched DRMS on August, 2017 in a beta-testing phase, and the tool is undergoing continuous evaluation and revision based on user experience and feedback. The Dell Medical School's Data Core has processed more than 50 data requests through the DRMS from all departments during its first year of operation (►Fig. 4). Data requests included aggregate cohort definitions to assess feasibility of planned research and encounter-level data for clinical quality improvement interventions. Data sought were both structured (e.g., diagnoses, laboratory values, and medications) and unstructured (e.g., visit notes). When accessing the DRMS forms after authentication and authorization, requestors are presented a dashboard of requests they submitted earlier. Data managers also have dashboards of all the requests to track progress, provide transparency to requestors and data partners of data sharing activities, and provide resources to requestors. The DRMS tool is in the process of being adopted by other partner organizations, including safety net clinics and other data owners in the community. As a centralized service with access to distributed organizations, the Data Core does not join large datasets in a central data warehouse. It may link data from multiple organizations on common patients where patient matching is done probabilistically and then manually using multiple identifiers. At our stage of development, we are only providing datasets for what is requested; hence, we are able to address heterogeneity of data or metric definitions on a case-by-case basis with the requesters and the partner clinical organizations (e.g.,

External Relationships
At times, the divergence of organizational goals and business models between an academic health center and its affiliated healthcare organizations creates a chasm between these interdependent partners. 34 However, the process of working together on data requests using DRMS has been particularly successful with our Medical School's affiliated community health network. Biweekly meetings to discuss data requests and to iteratively streamline processes for approval and governance have resulted in a working relationship where data managers in both organizations freely share information, lessons learned, and skills. Example of such collaboration includes sharing code and definitions of specific diagnoses that require multiple patient data. The data teams have also been trained together in security, privacy, and IRB policies so that procedures for data management are understood and followed.

Internal Engagement
The Data Core has been able to reduce the burden on our clinical partner organizations and improve efficiency of engaging with researchers by being a shared resource to manage part of the workload of data extraction and evaluation of governance steps while maintaining common policies and procedures for managing data requests across the school. Simultaneously, the Data Core acts as a multifaceted bridge between research teams at Dell Medical School and its partner organizations, translating broad requests for research information into nuanced data extraction steps and content. The Data Core has been able to focus resources and energy on specific questions initiated by researchers, graduate medical education, and school administrators. Even during the first year of the implementation of the Data Core, members of Dell Medical School have experienced notable benefits from the bridges we have created.
An example of collaboration with our clinical partner institution was a project originating from a resident physician along with the Dell Medical School value-based health team 35 for grant-funded graduate medical education quality improvement projects to identify and reduce unnecessary laboratory tests ordered in the emergency department for patients with possible acute coronary syndrome. The Data Core identified an order set being invoked by physicians that included unnecessary tests resulting in avoidable costs. The resident physician presented these data to the health system's leaders who modified the order set to remove the unnecessary routine tests.
By being honest brokers of disparate organizations' clinical data, our Data Core has linked patient data between organizations caring for common patients. In collaboration with the Dell Medical School's Women's Health Department, the Data Core linked outpatient data from the primary site of prenatal care with the hospital where they delivered their babies. By matching data for patients seen at both sites, communication was enhanced between sites of prenatal, delivery, and postnatal care.
The Data Core has supported multiple research projects originating from investigators at the medical school where we have established with the University's IRB our role as honest data brokers and ensured that datasets provided match what has been approved through the IRB: aggregate data for feasibility studies in prep to research, datasets that the Data Core has applied the safe harbor method and removed all identifiers, created limited datasets by removing all identifiers except for dates, or identified data where all elements are described in the IRB protocol and either consent is obtained by the research subjects or there is a waiver of consent. The Data Core works with the partner organizations' research committees to ensure the data we provide from their organization match, described in the IRB protocol and the site agreement. Our partner clinical organizations had previous relationships with our university and accepted the university's healthcare research IRB, but as part of local IRB procedures researchers must obtain prior approval from each clinical site where the research is being conducted. This reciprocity of IRB approval helped in building the relationships of trust across organizations for data sharing.

Discussion
We described how, through data governance, data extraction, and internal and external engagement and relationships, the Data Core at the Dell Medical School built an efficient and effective data sharing arrangement with its partnering health systems. This required a phased approach focused on mutually beneficial processes and outcomes rather than just focusing on the academic institutions' needs. This was achieved by aligning the academic institution and healthcare organization on their common goal of improving the efficiency and effectiveness of patient care while maintaining data security and patient privacy. 36 With the increasing emphasis on value-based care 37 and patient-centered approaches for a more holistic goal of wellness and health, data sharing and collaborative research projects will require systems like DRMS and the processes around them. The primary goal of the $28 billion provided by the HITECH Act to make EHR systems ubiquitous was to have patients' data follow them wherever they receive care and support patientcentered research. We designed our DRMS and Data Core to do exactly that.
Academic health centers can play a key role in improving population health through collaboration with healthcare organizations and other community partners. This requires an efficient and effective data sharing environment. 28 Transparency and insight into how the data have been handled by all participants creates an environment of trust and serves as the basis for further collaboration between teams. This positive feedback loop, maintained by our DRMS, has facilitated both our collaboration with partner organizations and our engagement with researchers, teachers, and clinicians. The DRMS supports data moving securely across organizational boundaries into the hands of researchers and academics while also supporting the healthcare organizations' quality improvement initiatives. Notably, increasing use of clinical data for a wide variety of purposes has increased the quality of the data by identifying and rectifying problems with their generation, storage, and use.
This article describes a data ecosystem supporting population health and research that can accommodate nonclinical sources of data from external organizations such as air quality data routinely provided by the U.S. Environmental Protection Agency (EPA). 38 The traditional era of clinical research taking years before influencing practice and of clinical decision being based on a patient chart alone is being overturned by the world of real-time information. 39 Millions of online retail transactions on Amazon or Uber are all driven by information sharing in real-time. While healthcare may be a far more complex business than any of these examples, the consumer expectations are being driven by their daily interactions with these services. Academic health centers have to prepare future physicians and clinical researchers to better leverage newer technologies and real-time data to deliver safer, higher quality care and create knowledge at a faster pace. It is therefore essential that academic health centers develop data-sharing and delivery systems such as the DRMS to support access and systematically and efficiently link and use data from a variety of clinical and nonclinical sources. This journey toward more seamless information sharing in healthcare will require trust to overcome the legal, regulatory, and cultural barriers to realizing a world where the right data on the right patient is delivered to the right provider in the right format at the right time: the "five rights of health information." 40 Achieving this requires a DRMS similar to the one described in this article.
Our experience of building a Data Core in a new academic health center may be somewhat unique among all academic health centers. However, many well-established academic health centers are developing central data cores and could benefit from our experience. Moreover, our Data Core's data managers and administrators had to work with many legacy health information systems and processes in a fragmented data ecosystem. Thus, we believe that the lessons we learned can be applied not only to new academic health centers but also to established ones that seek to develop core data acquisition and management support for clinical and population-based research and health care across a similarly fragmented data ecosystem in their communities. A key tenet of the role of the Data Core as honest data brokers was that we did not ask the partner organizations to change their data security policies or practices. Instead, we developed a process for working within their governance and compliance environment to access their data.

Conclusion
The Dell Medical School's Data Core created an efficient and effective data sharing environment, including people (the Data Core team), processes (common data request process policies and procedures), and technology (an effective webbased DRMS and a comprehensive data platform to manage clinical and nonclinical data). Achieving this required intrainstitutional collaborations focused on four areas: (1) managing data governance; (2) developing standardized means for extracting, aggregating, and delivering data; (3) providing insight and transparency to external partners; and (4) creating a focused school-wide resource that engages academic researchers and educators. The Data Core and DRMS have been in use for over a year and, while under continual iterative improvement, the DRMS already streamlined data acquisition for both our partner health systems and researchers. As we gain experience and build trust, we intend to expand our data network to include additional local community clinics and hospitals. Future work is needed to measure the impact of the clinical data sharing system on efficiency and accuracy of data sharing while also comparing the performance of this system to other similar systems.