CC BY-NC-ND 4.0 · Yearb Med Inform 2021; 30(01): 176-184
DOI: 10.1055/s-0041-1726503
Section 6: Knowledge Representation and Management

The Evolution of Clinical Knowledge During COVID-19: Towards a Global Learning Health System

Karin Verspoor
1  School of Computing Technologies, RMIT University, Melbourne VIC 3000 Australia
2  Centre for Digital Transformation of Health, The University of Melbourne, Melbourne VIC 3010 Australia
3  School of Computing and Information Systems, The University of Melbourne, Melbourne VIC 3010 Australia
› Author Affiliations


Objectives: We examine the knowledge ecosystem of COVID-19, focusing on clinical knowledge and the role of health informatics as enabling technology. We argue for commitment to the model of a global learning health system to facilitate rapid knowledge translation supporting health care decision making in the face of emerging diseases.

Methods and Results: We frame the evolution of knowledge in the COVID-19 crisis in terms of learning theory, and present a view of what has occurred during the pandemic to rapidly derive and share knowledge as an (underdeveloped) instance of a global learning health system. We identify the key role of information technologies for electronic data capture and data sharing, computational modelling, evidence synthesis, and knowledge dissemination. We further highlight gaps in the system and barriers to full realisation of an efficient and effective global learning health system.

Conclusions: The need for a global knowledge ecosystem supporting rapid learning from clinical practice has become more apparent than ever during the COVID-19 pandemic. Continued effort to realise the vision of a global learning health system, including establishing effective approaches to data governance and ethics to support the system, is imperative to enable continuous improvement in our clinical care.


1 Introduction

The emergence of the global pandemic of coronavirus COVID-19 dominated much of the health informatics and medical research landscape during 2020. Hence it is appropriate that this end-of-year review of recent developments in medical knowledge management focuses on the pandemic.

The pandemic has highlighted the clear need for informatics to support management and synthesis of health information at the global scale and pace in the face of a rapidly spreading infection. However, it has also highlighted the presence of severe limitations in our ability to share, integrate, and analyse data at this scale.

To address those limitations, we propose that the model of a “global learning health system” (gLHS) can be deployed. The concept is of a learning health system (LHS) [[1] [2]], expanded to a global scale but with the singular focus on the viral disease. Indeed, the international effort to quickly gather and share knowledge for clinical diagnosis, management, and treatment of COVID-19 can be seen as an exemplar of a gLHS, albeit not yet fully realised or effective.

The key elements of a LHS, including the core information cycles that characterise it, were observed throughout the interactions of the global scientific community. Information flowed from practice (what was being done on the ground to manage COVID-19 patients) to data (what was captured about those patients and their clinical characteristics or response to interventions) to knowledge (about disease characteristics and trends, what care approaches worked, and what did not, based on analysis and modeling of the data) and rapid implementation for practice again. Furthermore, as required by a LHS, information technology infrastructure played a critical role in enabling these information flows.

In the wake of the previous SARS and Ebola virus epidemics, it was already argued that unified frameworks supporting clinical and biological data integration were critical to support evidence generation in a pandemic [[3]] and that information technology was needed for knowledge management [[4]]. Broader adoption of electronic health records has facilitated evidence generation, including in observational studies and in traditional randomised controlled trials, through their use to identify eligible patients, support data collection, and monitor outcomes [[5]]. Electronic sharing of patient-level data from trials facilitates re-analysis of outcomes and fosters reproducibility and trust in findings [[6]]. But, our public health and clinical responses during COVID-19 demanded much more sophisticated strategies for rapid information synthesis and knowledge management than was available, which we argue an effective gLHS would facilitate.

The rapid spread of COVID-19 internationally and its immediate impact on the global economy has led to much more widespread appreciation of the need to coordinate pandemic research, including substantially increased scientific globalism [[7]] – international research collaboration – and sharing of patient-level clinical data [[6]]. However, despite a few shining examples of rapid deployment of multi-site clinical trials [[8] [9] [10] [11]], clinical research for COVID-19 has been highly fragmented [[12]], with a particular dearth of meaningful evidence in the area of non-drug interventions with public policy impacts [[13]].

This survey will show that although there is still work to be done, the pandemic has illustrated that many of the elements required for a global learning health system are in place, critically including the human motivation to achieve it.

The framework of the gLHS that we present is not only a useful way of characterising how knowledge evolved during the pandemic under the strong impetus to support knowledge-informed clinical care and public health response to COVID-19, but also provides an architecture for a system that we can invest in to ensure robust knowledge evolution for ongoing global health needs into the future.


2 A Global Learning COVID-19 System

In this section, we introduce the framework for ‘learning’ that we will adopt in our analysis of the evolution of knowledge during the first year of the COVID-19 crisis (Section 2.1). We then present a model of the collective activity toward learning about COVID-19, viewed as a global-scale learning health system (Section 2.2), and illustrating the elements of the gLHS that emerged as observed from the literature.

2.1 Framework for Conceptualising Learning in a Crisis

To frame the evolution of knowledge in the COVID-19 crisis, we follow a recent proposal by Tovstiga and Tovstiga (2020) to adopt a classical four-quadrant ‘conscious-competence’ conceptual framework from learning theory [[14]]. The model is presented in [Figure 1], illustrating the stages of a learning trajectory, from unconscious ignorance of lack of knowledge, to deeply embedded knowledge.

  1. Quadrant 1: Zone of uncertainty, including lack of clarity about a topic

  2. Quadrant 2: Zone of learning, where the value of knowledge on the topic is recognised and sought. Questions play a crucial role in this zone;

  3. Quadrant 3: Zone of actionable knowledge, where learning is consolidated and integrated with existing knowledge;

  4. Quadrant 4: Zone of embedded understanding, enabling intuitive action. Knowledge in this zone is often not fully recognised.

Zoom Image
Fig. 1 The ‘conscious-competence’ matrix of learning, based on a model originally attributed to Broadwell (1969) [[83]]; adapted from [[14]].

While typically applied in the context of an individual learner, the Tovstigas argue that the framework effectively reflects the general knowledge evolution process in the context of the COVID-19 crisis. They further suggest that it is useful for structuring and understanding the learning trajectory with respect to the crisis, demonstrating the various phases through an analysis of information about COVID-19 communicated through news reports.

Their analysis identifies data analytics and scientific knowledge sharing as key drivers of the learning trajectory in COVID-19, citing efforts such as the World Health Organization's creation of a global COVID-19 clinical information platform based on a standard case report form[1] requesting data on specific detailed clinical and demographic parameters on COVID-19 positive patients. This underscores the important role of data standards in supporting the learning needed to manage the pandemic. These are also key elements of the gLHS and highlights the relevance of the model.


2.2 Modelling the COVID-19 Knowledge Ecosystem Through the Learning Health System

Building on this framework for conceptualising learning, we propose that the core model of a LHS [[1] [2]] can be applied to characterise the rapid evolution of knowledge that has occurred during the COVID-19 pandemic. In this model, data and analytics over clinical practice data from patient care drives learning of new knowledge that can be implemented to improve clinical practice, leading to continuous improvement. A number of critical knowledge management and information technology elements can be identified as key enablers of this learning process, supporting the significant human efforts that catalysed and provided the appropriate socio-technical conditions for the learning cycle.

Our proposed model for the COVID-19 gLHS is shown in [Figure 2]. The impetus for the learning process in COVID-19 arose from a knowledge gap, the gap between purposeful action grounded in knowledge (Q3) that exists for routine clinical care and the uncertainty surrounding diagnosis and management of the novel virus in affected patients (Q1). This then triggered a broad effort to gather data to fill the gap, primarily taking advantage of data collected through electronic health record systems. Data collection and integration, facilitated through electronic data sharing, then enabled learning (Q2), actioned through observational analysis, clinical trials, predictive modelling, and other research leveraging data, with the objective of turning data into information. Publications summarising these studies served as the key vehicle for sharing research results, but challenges in finding and interpreting papers – particularly in the face of a flurry of research activity – resulted in remaining uncertainty in the knowledge that existed (Q4). Translation of new knowledge into practice (Q3) required evidence synthesis approaches such as systematic reviews, critically involving searching (information retrieval), screening, appraisal, and meta-analysis of research publications. Key conclusions were rapidly shared via actively maintained, living guidelines [[15]] and platforms [[16]] or tools [[17]] for making available clinical decision support knowledge artifacts.

Zoom Image
Fig. 2 Abstraction of the structure of the global learning system in place for COVID-19. Knowledge management activities are overlaid onto the Learning Health System model (core LHS cycle figure adapted from [[84]]), and related to the learning framework of ‘conscious-competence’ presented in [Figure 1].

2.2.1 Electronic Data Capture and Data Sharing

Electronic health records (EHRs) are a key resource in the learning health system, as they provide the data that is used to drive learning from practice. For COVID-19, EHRs were analysed to characterise early cases of the infection in Wuhan, China [[18] [19]], to provide important information related to the efficacy of symptom-based screening [[20]], and to collect data on patients prospectively after enrolment in a trial [[21]]. In the UK, the OpenSafely Platform[2] was used to identify factors associated with COVID-19 deaths through analysis of the primary care records of over 17 million patients [[22]], facilitated through the use of a single EHR system (TPP SystemOne) by general practice surgeries covering approximately 40% of the UK population. Similarly, a study of risk factors associated with death due to COVID-19 [[23]] was made possible by the use of a single integrated EHR system across many sites, and the Quick COVID-19 Severity Index was developed with data from a single health system with nine Emergency Departments [[24]]. A highly-cited study demonstrating lack of effectiveness of hydroxychloroquine treatment was conducted using data extracted directly from the New York-Presbyterian / Columbia University Irving Medical Center EHR [[25]].

Leveraging distributed EHRs across national and international boundaries through collaborative consortia and clinical data networks [[26]], several large-scale studies were undertaken to characterise COVID-19 patients in relation to similar disease groups [[27]], to understand the trajectory of the disease [[28]], and to study interaction of the disease with patient medications [[29]]. Such studies were enabled through the adoption of common data models to harmonise data, prominently the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) [[30]]; partners in the OHDSI network contribute statistical results based on federated querying of local data sources represented with the CDM.

The Columbia Open Health Data for COVID-19 Research (COHD-COVID) data set [[31]] takes a different approach, also based on the OMOP CDM. They make publicly available prevalence data for conditions, drugs, procedures and their co-occurrences, calculated from their EHR, thereby sharing aggregate counts rather than patient-level data. Coupled with slight perturbation and suppression of rare concepts, this eliminates privacy concerns while still enabling comparative data analysis.

Absent adoption of common data models and data sharing platforms for EHR data, research across multiple sites required manual extraction of defined clinical data elements from EHRs submitted to a central study team. This was done in the UK for a pediatric study related to COVID-19 [[32]] and for a study of factors associated with coronavirus death in the US [[33]]. In China, targeted national data reporting to a central research team supported observational studies [[34] [35]]. Data harmonisation and cleaning in these cases relied on expert review of data submissions. For example, to support the development of a machine learning-based predictive model for COVID-19 associated mortality based on EHR data from five New York hospitals, expert mappings and data harmonisation were performed by a multi-disciplinary team of clinicians [[36]].

Other approaches to collecting electronic clinical data were also utilised, including deployment of clinical natural language processing to rapidly identify and characterise patients relevant to COVID clinical questions [[37]]. Registries were established to collect disease-specific data items utilising case report forms (CRFs). The most sophisticated of these used electronic CRFs mapped to common data standards and submitted to a central database, such as the VIRUS-COVID-19 registry[3] [[38] [39]] based on REDCap [[40]], which coordinated data entered in over twenty sites. Electronic surveys to gather data directly from patients[4] were introduced to allow for collection of data from non-hospital settings [[41]].

Despite these successful demonstrations of the use of EHR and registry data to study COVID-19, a number of challenges remain, summarised effectively in Madhavan et al. [[26]]. EHRs are not primarily designed to support coordinated research and public health response, and their use in this context placed substantial strain on informatics and data science teams at hospitals. Even basic conversion of spreadsheet-based systems for research data collection to formal database structures can prove a challenge where local differences exist in how fields are interpreted or used [[4]]. Infrastructure for individual-level storage and exchange of data for research purposes is required, as well as commitment to common data models, terminologies, and data interchange standards [[42]]. Rapid adaptation of standard clinical vocabularies such as ICD, LOINC, and SNOMED-CT to include relevant new vocabulary is needed [[43] [44]]. Governance and ethics factors must also be crucially addressed. The National COVID Cohort Collaborative[5] (N3C) in the US aims to address many of these points, utilising the OMOP CDM to bring together data from disparate sources, and aiming to facilitate record-level analysis of COVID-19 patients (and matched controls) in a secure environment [[45]]. Tremendous progress has been made in the rush to address the pandemic, but there is still much work to be done before truly international-scale data can be efficiently and effectively brought together.


2.2.2 Data Analytics and Modelling

With the availability of large-scale and complex data about COVID-19 came the need to analyse and model it. Advanced computational methods including machine learning, natural language processing and other artificial intelligence (AI) methods can play key roles [[46]], and indeed significantly contributed to detecting the COVID-19 outbreak, diagnosing the disease, and predicting outcomes [[47]]. Models are critical to inform decision making [[48]], supporting prediction and simulation of outcomes under varying conditions or patient characteristics.

Several of the EHR-based studies cited above utilise machine learning over clinical variables [21] [36]], while more traditional statistical or epidemiological modelling is typically employed for observational studies. Imaging analysis models have also been adapted to COVID-19 from models for related diseases such as pneumonia [[49]], facilitated by public sharing of COVID-19 images with the AI community[6] [7] [8] [[50]].

The challenges faced in building sufficiently large data sets has meant that the modelling of COVID-19 has resulted in high risk of bias and poor external validation [[51] [52] [53]]. Additionally, the inherent nature of observational EHR-based studies, lacking controlled cohort selection, may lead to unreliable results due to confounding [[5]] and a risk of case contamination due to ambiguous cohort definitions [[54]].


2.2.3 COVID-19 Information Retrieval and Synthesis

The amount of COVID-19 research output has been remarkable; based on the LitCovid index of this research at the US National Library of Medicine[9] [[55]], over 75,000 COVID-19-related publications were added to the PubMed literature repository between January and late November 2020, at a steady pace of approximately 2000 articles per week (see [Figure 3]). A review of the literature focused on clinical presentation and management of COVID-19 as of June 15, 2020, prioritised for general medicine readers, identified over 100 relevant articles [[56]]. Over 7000 COVID-19 clinical trials are registered in the World Health Organization's International Clinical Trials Registry Platform[10], which plays a key role in identifying research gaps [[57]]. Nearly 3000 systematic reviews related to COVID-19 are currently catalogued in the Living OVerview of Evidence (L·OVE) platform[11] [[58]].

Zoom Image
Fig. 3 Weekly publications in 2020 related to COVID-19, as indexed in the LitCovid collection of PubMed [[55]].

This scientific knowledge is also broadly accessible; over 75% of this research is available in open access publications, an unprecedented proportion, more than double the rate for publications generally during 2015-2019 and for other topics in 2019-2020 [[7]].

However, the accumulation of this body of evidence about one disease in such a short time is also overwhelming. A key challenge in COVID-19 knowledge management lay in navigating this massive quantity of research evidence to support diagnosis, treatment, and public policies, as well as molecular information about the virus. The sheer volume of the research – published in natural language texts that must be read and interpreted – requires significant effort to translate into knowledge. Studies must be synthesised and evaluated, and broader conclusions drawn from comparing multiple studies examining the same question.

Therefore, many systems based on information retrieval or text mining were created in response to this challenge, including our COVID-SEE Scientific Evidence Explorer system [[59]]; more are reviewed in [[60] [61] [62]]. An important resource in these efforts was the COVID-19 Open Research Dataset (CORD-19) which compiled a significant collection of literature for both COVID-19 and related coronaviruses into a single, downloadable resource [[63]].

Leveraging such tools, community-based approaches to collect, curate, and model knowledge rapidly emerged for COVID-19. Groups began working together to review the literature and build living evidence guidelines that were updated as new information was made available [[15]]. Utilising systematic review automation technologies, complete reviews could be undertaken in a matter of weeks [[64]].

However, in the rapidly changing information space of COVID-19, the rush to explore and share research outcomes also resulted in poor study designs, poor research reporting, and lack of coordination and redundancy in research activities [[13]]. Coupled with the data biases noted above, this creates new problems – wasted effort, increased review and quality appraisal work, and uncertainty about key diagnostic, prognostic, and treatment decisions. The gLHS, effectively implemented, could provide the coordination and feedback mechanisms needed to address these problems.


2.2.4 Knowledge Dissemination

Knowledge has been recognised as strategically important for managing pandemics [[48]] and it plays a central role in our learning-based model. As knowledge is acquired through learning, it must be shared in order to have impact. While publications serve a key role in disseminating knowledge, alone they are insufficient and ambiguous to guide practice. Social media have been used effectively for knowledge dissemination during COVID-19 [[65]], but this focuses on transferring knowledge between individuals.

Knowledge management implemented through information technology can improve information sharing and coordination [[4]]. Several key elements for knowledge management in pandemics have been identified [[66]]:

  • Shared knowledge spaces utilising consistent vocabulary.

  • Formal representations of knowledge.

  • Enabling reusable knowledge.

  • Empowering human collaboration through knowledge sharing.

All of these elements were adopted to one degree or another during the COVID-19 pandemic, through the scientific globalism that emerged; in consortia like the N3C [[45]], working groups like those organised through the Research Data Alliance [[42]], and informal data and clinical networks.

Knowledge has been disseminated through numerous mechanisms, including online platforms such as the Australian National COVID-19 Clinical Taskforce[12], Registry sites [[39]] and through clinical decision support tools such as the Magic Evidence Ecosystem Foundation (MAGIC) MAGICApp [[17]].

The push for formal representation of knowledge, including computational and executable models that can be integrated into health information systems to enable application of knowledge to practice [[67] [68]], has gained momentum during COVID-19. The COVID-19 Disease Map project [[69] [70]] captured and made available molecular interaction information for the SARS-CoV-2 virus, based on manual curation, supported by weekly videoconferences. Knowledge graphs are also being used to support representation and integration of the variety of biomedical data related to COVID-19 [[71]], including via text mining [[72]]. Through the adoption of standardised ontology identifiers, comparable data in different resources can be linked together for analysis, and combined in different ways for different tasks.

Much of this work has been based on automated or semi-automated analysis of literature. Direct generation of computable evidence from structured clinical trial registries has also been proposed [[73] [74]], which would shortcut the need for literature-based synthesis.


3 Discussion

Friedman and colleagues have stated [[75]]:

“A national-scale LHS will have to be understood and designed as such a cyber-social ecosystem: a large-scale, decentralized, human-intensive, cyber-catalyzed and cyber-supported information processing system. The system as a whole—not just the digital infrastructure, but also networks of people and institutions—will have to be understood not just as users of a technological infrastructure, but also as parts of the information system itself.”

Extending this to a global-scale LHS demands an even broader view of the relevant ecosystem. The scale is larger, even more decentralised, and crosses a more diverse set of legal jurisdictions. It is apparent that the people and institutions play a critical part of the information system, to make possible the required data sharing – including tackling legal barriers and leading ethical discussions around data sharing – as well as by supporting effective communication of knowledge.

It has been observed that we entered the pandemic without a functioning LHS [[76]]. The authors ask [[76]]:

“We have the motivation. We have the vision. We have the technology. We have a roadmap. What are the barriers?”


“The issue is culture. We need to treat medical data as a public good.”

They further point to the ethics framework of Faden et al. [[77]] that identifies the dual obligations of health professionals to learn and implement, and patients to participate in the learning system by contributing their data.

Initiatives such as the US N3C are making important strides towards realising a gLHS. We do appear to have the motivation, the vision, and the technology. A recent review of the use of digital technologies during COVID-19 highlights how far we have come in leveraging technology for the pandemic response [[78]]. What is required to achieve an ongoing gLHS is a commitment to the vision, coupled with rigorous data governance and legal and regulatory frameworks that safeguard patient privacy while supporting the learning knowledge ecosystem.


4 Conclusions

The model we have proposed is strongly aligned with the Agency for Healthcare Research and Quality evidence-based Care Transformation Support (ACTS) Knowledge Ecosystem initiative referred to as the ‘ACTS COVID-19 Evidence to Guidance to Action Collaborative’[13], which aims to continually enhance patient care throughout the pandemic, as the evidence base evolves. This Collaborative emphasises development of digital infrastructure to support the Knowledge Ecosystem, a cycle of Action-Data-Evidence-Guidance that mirrors the LHS cycle. It has further been active in developing groups such as COKA, the COVID-19 Knowledge Accelerator Initiative[14] [[74]], a response focused on COVID-19 to the call by Dunn and Bourgeois [[73]] to aim for computable knowledge synthesis and representation, through the use of standards such as EBMonFHIR[15] [[79]] and CPGonFHIR[16], or rule formalisms for computational clinical guideline specification [[80]].

The vision pursued in these initiatives is still under active development, and has required a vast community of clinicians, researchers, informaticians, developers, industry and government representatives, and beyond coming together with the common objective of addressing the technical, policy or legal, and cultural hurdles to enable more effective management of the COVID-19 pandemic. It has been argued that infrastructure is currently sorely lacking in most public health organisations to realise this vision effectively or efficiently [[81]]. There are still many unanswered questions about how to overcome bias and determine causality through real-world data [[24] [82]]. As we have shown, many of the learning and knowledge sharing activities in the context of the pandemic have been limited to very human-intensive approaches.

However, the core gLHS framework is in place, technology has been harnessed in many ways to share data and knowledge at a pace that arguably outstripped the spread of the virus, the requirements for information technology systems to support data and knowledge exchange are increasingly being clarified, and the initial steps toward achieving the vision have been made. This is entirely thanks to a tremendous response by the scientific community with a shared objective of improving outcomes for patients. Successful examples of large-scale, truly international data sharing and research collaborations now exist [[28]]. Both the need for and the value of continued work towards a healthcare system enabled through data and information technologies – a system that can be achieved through the gLHS – are now obvious.

Continued efforts towards achieving a robust gLHS are important, not only to allow us to respond to this pandemic and to prepare us to respond to the next pandemic, but to support continuous improvements in how we care for human health. We now know that we can do this.


No conflict of interest has been declared by the author(s).


KV is supported by numerous grants from the Australian Research Council and the National Health and Medical Research Council of Australia. This review primarily relates to her work with the Industrial Transformation Training Centre in Cognitive Computing for Medical Technologies, grant IC170100030 from the Australian Research Council, and the Centre of Research Excellence in Digital Health, grant 1134919 from the Australian National Health and Medical Research Council. The author would also like to thank Dr Brian Hur and the anonymous reviewers for feedback on previous versions of the manuscript.

















Correspondence to

Karin Verspoor
School of Computing Technologies, College of STEM, RMIT University
124 La Trobe Street, Melbourne VIC 3000

Publication History

Publication Date:
03 September 2021 (online)

© 2021. IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

Zoom Image
Fig. 1 The ‘conscious-competence’ matrix of learning, based on a model originally attributed to Broadwell (1969) [[83]]; adapted from [[14]].
Zoom Image
Fig. 2 Abstraction of the structure of the global learning system in place for COVID-19. Knowledge management activities are overlaid onto the Learning Health System model (core LHS cycle figure adapted from [[84]]), and related to the learning framework of ‘conscious-competence’ presented in [Figure 1].
Zoom Image
Fig. 3 Weekly publications in 2020 related to COVID-19, as indexed in the LitCovid collection of PubMed [[55]].