Keywords
Quantitative imaging - precision medicine - Big Data - information dissemination/methods
- diagnostic imaging/trends
Introduction
Precision medicine [[1]] requires the measurement, quantification, and cataloging of medical characteristics
to identify the most effective medical intervention. The National Academy of Sciences
defines precision medicine as “the tailoring of medical treatment to the individual characteristics of each patient.
It does not literally mean the creation of drugs or medical devices that are unique
to a patient, but rather the ability to classify individuals into subpopulations that
differ in their susceptibility to a particular disease, in the biology and/or prognosis
of those diseases they may develop, or in their response to a specific treatment” [[2]]. In other words, through precision medicine we can classify patients into cohorts
that share characteristics such as diagnosis, prognosis, response to a certain therapy
etc. This requires ready access to networks of data that can be queried using many
different types of search criteria across many different types of data. And to create
such classifiers, large quantities of diverse data must be accessed, analyzed, and
reduced to actionable knowledge for patients and encounters.
Imaging, which includes radiology, radiation oncology, and pathology, complements
clinical and molecular data and offers crucial insights that help stratify patients
into cohorts and guide care using the principles of precision medicine [[3]–[7]]. In addition to diagnosis and treatment planning, imaging also has the potential
to provide deep and novel insights by evaluating a patient’s response to therapy during
treatment, as well as predicting outcome at an earlier time point [[6]–[8]]. Treatment response and early outcome prediction thus create opportunities for
adaptive medicine. For example, in breast cancer patients with ER+, PR+, and HER2–
invasive ductal carcinoma MRI-based features (texture and morphological) could predict
the likelihood of recurrence and magnitude of chemotherapy benefit [[9]]. Clustering morphological signatures extracted from digitized whole-slide pathology
images of glioblastoma patients helped identify significant prognostic sub-classifications,
in which clusters are correlated with transcriptional, genetic, and epigenetic events
[[10]]. Imaging can also be used for surveillance in certain low-grade cancers and help
avoid unnecessary biopsies. In such patients, imaging features can also be used to
identify sub-populations that are likely to advance to higher-grades, and would thus
be candidates for specific treatments [[11], [12]]. These four examples, from breast cancer, glioblastoma, low-grade glioma, and prostate
cancer, are illustrative of the role imaging can play in precision medicine for cancer.
A more detailed survey of imaging and its role in precision medicine was recently
reviewed by the Association of University Radiologists Radiology Research Alliance
[[13]]. This survey, in many ways, complements the review by providing a survey of challenges,
and an assessment of the needs of imaging informatics, for the advancement of precision
medicine in cancer.
Today, imaging is predominantly digital; however, image interpretation and its use
in diagnosis and treatment assessment have remained largely qualitative. This has
been changing steadily through initiatives such as the Quantitative Imaging Biomarker
Alliance (QIBA) [[14]–[16]], as well as research programs such as the Quantitative Imaging Network (QIN) [[16], [17]]. This article surveys the landscape of quantitative imaging, its role in advancing
precision medicine, some of the informatics priorities, and challenges, and presents
some recommendations. It should be noted, that while this article focuses on cancer
imaging, the underlying needs and challenges are by no means unique to cancer. Domains
such as neurology or cardiology have similar characteristics and requirements. For
example, in neuroimaging, imaging data is closely linked with observational data or
connectome data. These data types are not typically seen in oncology. However, both
oncology and non-oncological conditions share the pattern where the integration of
imaging and associated/ derived non-imaging data with clinical and genomic data can
be used to classify patient populations based on diagnosis and response to treatment
[[6], [7]]. For the sake of readability, this article does do not make a distinction between
imaging and cancer imaging. The challenges, approaches, and recommendations surveyed
here are applicable across the broad landscape of quantitative imaging and its application
to precision medicine.
It is also worth emphasizing that while much of this work was predominantly geared
in recent years towards radiology, there has been a steady increase in the research
and development of similar techniques in pathology. Cancer diagnosis is primarily
based on pathology; outcome prediction and treatment recommendations are highly dependent
on pathologist observations. While digital pathology imaging has lagged behind radiology
imaging due to the continuing use of glass slides in clinical diagnostic pathology,
the advent of high quality, high throughput, digital scanners has led to the widespread
adoption of digital pathology in cancer research studies. DICOM (Digital Imaging and
Communications in Medicine), the de facto standard for medical imaging now includes
specifications for digital pathology [[18], [19]]. Digital pathology data management, visualization, and analysis tools have been
developed by both research groups and private companies. Work is rapidly progressing
on the development of standards for pathology data management, annotation, and markup.
Such advances as well as this increased adoption has led to digital pathology sharing
with radiology many of the same imaging-based, informatics design patterns for disease
classification, patient stratification, response assessment, and outcome prediction.
The inclusion of information obtained from digital pathology is crucial to the success
of efforts to improve precision of quantitative imaging-based predictions. Diagnostic
and treatment guidelines call for quantitative measurements that are challenging for
human observers (e.g., tumor infiltrating lymphocytes, mitoses and immunohistochemistry (IHC) staining).
There is also an increasing tendency to mandate detailed assessments of tumor heterogeneity
across tumor type. A highly pertinent example is non-small cell lung cancer (NSCLC)
adenocarcinoma WHO guidelines that specify that for each patient, pathologists break
down sub-type composition in 5% increments. Digital pathology machine-learning methods
also promise to reduce inter-observer variability arising from the sole reliance on
human-generated pathology classifications.
Quantitative Imaging, and Informatics Methodologies for Precision Medicine
Quantitative Imaging, and Informatics Methodologies for Precision Medicine
Quantitative imaging is the process of extracting measurable (numerical) information
from images to determine the amount, extent, or severity of disease, where imaging
devices behave as standard measurement instruments providing reliable and reproducible
numerical results. It has benefitted from advances in image acquisition that have
led to improvements in quality and resolution of imaging and diversity imaging modalities.
Quantitative imaging, through advances in high performance computing and machine learning,
has enabled the process of radiomics (extraction and mining of quantitative imaging
features) [[18]–[20]] and radiogenomics (integrating radiomic features with clinical and molecular data).
It is enabling optimized treatments, surveillance, and better prediction of response
to treatment, and it offers great promise for precision medicine. Numerous groups
have developed methodologies to extract rich collections of imaging features, linked
them with clinical outcome and molecular characterizations, and studied their relevance
in clinical research [[10], [18]–[87]].
Informatics is the practice of information processing and the engineering of information
systems, focusing on the collection, classification, storage, retrieval, and dissemination
of recorded knowledge. Quantitative imaging, therefore, benefits from the methodologies,
tools, and capabilities that are offered by informatics to help convert the information
contained in images into actionable knowledge. The combined information that is gathered
can be used to enhance individual and population health outcomes, simplify patient
care, and improve the quality of clinical workflow. One notable effort in this area
is the National Cancer Institute (NCI) Quantitative Imaging Network (QIN). QIN has
encouraged the development and validation of quantitative imaging methods for the
measurement of tumor response to therapies in clinical trials and routine care.
It is imperative that by incorporating the science of informatics into medical imaging
we create a powerful driver for precision medicine activities. Open and regular communication
among scientists, clinicians, informatics specialists, and regulatory experts is needed.
We discuss some of the highest priority informatics activities that will help bring
quantitative imaging into the realm of clinical decision support. These priority activities
were identified through a series of workshops, the first of which was convened in
October 2015, to discuss the joint roles of quantitative imaging and informatics within
the context of present and future precision medicine needs. The primary needs and
challenges, as well as some representative efforts in areas, are presented here.
N1 Curated Data Repositories
N1 Curated Data Repositories
Access to well-curated image repositories with support for semantically integrated
datasets and the ability to integrate information across type and scale are critical
Archiving vast amounts of existing data is a major informatics effort, not only because
of the rapid growth in the volume of data to be stored, but also because of the challenges
in accessing, retrieving, analyzing, and displaying the results. Imaging datasets
require hundreds of terabytes to petabytes of storage. Imaging features, produced
by a pipeline, can depend on a variety of parameters, leading to an explosion of post-processing
feature data. Meaningful comparison and subsequent downstream use of the imaging features
necessitate a standardized representation. Additionally, clinical information about
each patient should be linked to the image data. The linked clinical data must be
available when searching for sub-populations. It is therefore imperative that the
research community stops reinventing the wheel in the context of imaging biomarker
development and comes up with common ways to share tools and data to help improve
interoperability. There are three types of data that researchers and practitioners
of quantitative imaging share, namely:
-
Clinical Data: Clinical data includes demographic information, diagnosis, exposure, family history,
treatment, and outcomes data. Clinical data must be harmonized against a common vocabulary.
This is an active area of research. One possible direction is the use of DICOM to
represent clinical data [[88]]. This is an attractive proposition given the near-universal acceptance of DICOM,
especially in the clinical domain. However, the DICOM specification merely provides
a data representation format. Work is needed to create an ontology that can be used
to encode the data. An example of such an ontology that helps encode clinical data
in DICOM has been developed for Head and Neck cancers [[88]]. Another option that is being explored is the use of the clinical data model used
in The Cancer Genome Atlas (TCGA) [[89]]. The TCGA clinical data model includes site-specific terms, with mappings to the
NCI Thesaurus, and would provide the ability to create image cohorts that span different
imaging studies. The TCGA clinical data model has also been adopted by The Genomic
Data Commons (GDC) [[90], [91]].
-
Images and Image Metadata: This includes the raw pixels as well as metadata that describe the image such as
patient level information, acquisition data, etc. Image metadata is frequently stored
in DICOM formats and follows the DICOM information hierarchy. Other formats such as
NIfTI (Neuroimaging Informatics Technology Initiative) are also widely used [[92]]. DICOM is not widely used in digital pathology, since most digital pathology scanner
vendors prefer their own formats. There are however open source libraries, such as
OpenSlide [[93]], that allow researchers to interact with these images using a shared library and
application programming interfaces (APIs).
-
Image Annotations and Features: These include human and machine generated annotations and features. QIN agreed to
adopt DICOM as a standard for images and segmentation maps. The term “features” refers
to the quantitative characteristics extracted from images; these are represented in
various open formats. Since imaging features are at the core of quantitative imaging,
a detailed description about their representation and storage is presented separately
(see N3 & N4).
Data Curation: The data used in the development of imaging-based methodologies for precision medicine
must be well curated to reduce any uncertainty in its history or content. The use
of standards such as DICOM is therefore essential. In addition to standards, the data
management system, as well as the processes of data curation are very important. In
recent years, several imaging repositories have come online line [[92], [94]–[99]], with The Cancer Imaging Archive (TCIA) being an exemplar of a well curated, diverse,
imaging repository. Since its inception in 2011, TCIA has evolved into NCI’s primary
resource for curating, managing, and distributing images. A significant component
of TCIA operations and tools involves the curation and de-identification processes.
With the adoption of data standards, as well as the deployment of easy to use tools
and shared best practices, the process of data preparation and submission is greatly
simplified. This results in reducing the burden of data sharing, and in faster submission
and quicker dissemination of data. This has the added advantage of encouraging data
sharing, since the burden of data sharing is frequently cited as one of the common
roadblocks to data sharing.
De-identification: A key component of data curation is having well-documented processes and tools that
facilitate the de-identification of data and the removal of any patient identifiers.
It is often assumed that de-identification involves the scrubbing of protected health
information (PHI) from DICOM headers. However, in practice, it has been observed that
scanners frequently encode identifiable information in private DICOM tags. There are
additional challenges with de-identification when dealing with time-series data. In
such situations, the chosen heuristic for date de-identification must be cognizant
of time elapsed between successive studies. This information allows users to run queries
such as: “find all lung screening studies where 3 or more studies were performed, and each study
was within 6 months of the prior study.” In addition to maintaining the elapsed time, users may want the ability to integrate
these imaging studies with other non-imaging data. Therefore, enough metadata must
be preserved to ensure compliance with appropriate rules and regulations, while ensuring
that researchers can unambiguously locate imaging and associated non-imaging data
[[100]].
N2 Data Exploration, Access, and Integration
N2 Data Exploration, Access, and Integration
Exploring and accessing images along with associated data is critical to research
in the era of Precision Medicine
Data Exploration and Access: How data is managed is critical when it comes to versatility and ease of use of the
data. While methods of storage are important for creating useful and minable image
troves, efficient content-based methods of data retrieval may be even more essential
to making these data accessible and usable. Search engines capable of returning image
data along with all appropriate metadata are necessary assets in the era of large
electronic datasets. It is necessary for data associated with the images (see N1)
to be accessible and integrated with image data. It is not practical, nor are we advocating
for a centralized meta repository that manages all data types. Rather a mechanism
that allows one to use some of the data types to create a cohort and then access images
and relevant data directly from the individual repositories that manage that data.
Data Integration: An example of this would be an integration of TCIA with the Genomic Data Commons
(GDC). It would give researchers the ability to create a cohort using genomic, clinical,
and imaging attributes, and then access the images and genomic data for the identified
cases. One popular and decentralized approach to achieving such integration is through
the adoption of REST APIs. Our goal should be adoption of APIs and when appropriate,
convergence on shared API specifications. The underlying API implementations are best
left to the repositories, and will have minimal impact on facilitating an integrated
exploration and retrieval of data.
The development and adoption of an API economy has the added advantage of encouraging
developers to directly and programmatically integrate with the various data repositories.
Doing so allows, for example, an image analysis algorithm to directly retrieve images
from TCIA without requiring that a user first download a dataset, then upload it to
a local cluster, and finally launch the algorithm on this dataset. Integration via
APIs allows one to run large, cloud-based, pipelines that can exploit the cost and
scale benefits of clouds (See N6). Similarly, research workstations like 3DSlicer
[[101]] can directly integrate with, and utilize the search and retrieval capabilities
of image repositories, giving their users an optimized experience.
N3 Algorithm Validation and Reproducibility
N3 Algorithm Validation and Reproducibility
In addition to the reduction of hardware errors in data collection, quantitative imaging
deals with the development and optimization of robust algorithms capable of extracting
useful information from the collected images. The potential of quantitative imaging
can only be realized if the algorithms are reproducible and validated. These algorithms
are individually designed to serve specific functions in a chain of analysis, that
begins with the collected images, and ends with the extracted quantitative information.
Functions such as segmenting suspicious regions in the image and then processing the
information within those regions for information correlative with disease are included.
Quantitative methods differ in how much information is extracted and used, and in
how the information is assembled for dissemination. In addition, as there are many
different imaging systems in use, the information extracted must be available in a
standardized form that can be read and interpreted across multiple devices. Using
this information, researchers can then generate new diagnostic and prognostic techniques.
One notable mechanism for advancing these goals are Grand Challenges that have proven
to be a successful means for promoting innovation in algorithm development [[102]].
Feature Generation: Imaging features cover the gamut of tumor segmentations, observations, and features
captured by humans, as well as features that are computed by algorithms. They include
qualitative, quantitative, and mixed features. The underlying objective is to capture
a set of features that can act as numeric surrogates for an image and can then be
used to explore correlations with clinical or genomic data. They could be used to
train classifiers that can guide diagnosis, prognosis, or response to therapy. For
example, Aerts et. al., extract ∼400 features from CT and MRI images of lung cancer,
and head and neck cancer patients, and identify feature signatures that are strong
predictors of outcome [[18]]. Features here included morphological features, tumor intensity, texture, and other
higher-level features. In digital pathology, a similar process is followed, leading
to the coinage of the term pathomics. An illustrative example of pathomics is the
work done by Cooper et. al., where they processed glioblastoma images and extracted
74 different features [[103]], that included morphological characterizations of nuclei, nuclear intensity, texture,
and gradient statistics. These features were extracted from 200M nuclei and revealed
three prognostically significant clusters with associations to genetic mutations and
outcomes [[10]]. A similar study was done by Huang et. al. on breast cancer images [[104]].
Deep Learning and Medical Imaging: In recent years, there has been a strong interest in the application of neural networks
and deep learning for quantitative imaging. These methodologies have been around for
a long time, and as far back as the early 90s, during the early days of digital imaging,
they were used in a variety of applications, such as the detection of lung nodules
[[105], [106]], and classification of regions of interest (ROIs) from mammograms as benign or
malignant [[107]]. However, it was not until very recently that deep learning gained popularity and
has emerged as one of the most promising tools for image classification. This popularity
is driven, not only by advances in algorithms, but also, in large part, by advances
in high performance computing (HPC), incl. graphics processing units (GPUs), and the
fact that the cost of these HPC systems have come down significantly. Since these
algorithms are easily parallelizable, they can take advantage of the inherent parallelization
of GPUs. A comprehensive survey, as well as a series of articles, covering the use
of deep learning and medical imaging can be found in an IEEE special issue on medical
imaging and deep learning edited by Greenspan et. al. [[108]]. There are however a set of challenges that have slowed down the success of deep
learning, the biggest one being access to large quantities of annotated data sets
(see N1). Data needs to be well annotated (N4), and researchers should have the ability
to integrate the extracted ‘hidden’ features with clinical and/or genomic data (N2).
Algorithm Validation: The performance characteristics of the algorithms must eventually be tested and
validated under a variety of clinically relevant conditions before they can be useful
in clinical workflow. This often requires the use of large datasets of clinical images
as an environment in which to test the performance and robustness of the algorithms.
Ideally, metadata such as annotations, clinical information from patient history,
and patient outcomes will be a part of the information included in algorithm validation.
The need for informatics in this process is critical and is integral to the process
not only through the function of the final quantitative algorithm as it performs in
clinical workflow, but also during the degree of required testing and validation needed
to ensure algorithm performance before it reaches the clinical setting.
Role of Grand Challenges for Algorithm Validation and Reproducibility: Grand challenges have proven to be very successful in helping with the development
and validation of novel and innovative algorithms such as brain tumor segmentation
[[109]]. They are a good way to crowdsource the annotation of data. This results in an
annotated data set which is critical to the advancement of quantitative imaging through
deep learning. Grand challenges explicitly encourage open science and open source,
best-of-breed algorithms. They do so, by presenting informaticians with specific problems,
constraints, and incentives for innovation [[110]]. They also address reproducibility and integration by providing access to clinical
and -omics data that are not always readily available and can help move a tool to
readiness for the commercialization pipeline. Grand challenges are conducted in specially-designed
environments that import specific data sets from selected archives, for training and
testing on the algorithms, and provide comparisons of results and performance. However,
they need to be organized in a manner that simulates clinical workflows and other
real-world constraints (e.g., computational pathology challenges must use whole slide images (WSIs) at 40x magnification
or higher depending on the task, or test datasets that are noisy and need cleaning).
They also need to be organized in a manner where the submissions are run on the test
data by organizers, thereby fostering a culture of tool reproducibility (more details
in N5).
N4 Representing and Managing Quantitative Image Features
N4 Representing and Managing Quantitative Image Features
Effective curation of, and integration with, imaging features is critical to the realization
of the potential of quantitative imaging. Non-proprietary annotation and markup will
allow for cross-hardware compatibility, interoperability, and sharing of data from
many sources and between institutions
Feature Representation: Integrating radiology, pathology, clinical, and -omics data requires that image
annotations be stored in a standardized and interoperable manner. One example of image
annotations is the segmentation of the image regions corresponding to the tissues
or objects of interest. Such annotations can be displayed during image viewing, can
be used to extract quantitative measures from the image (e.g., tumor volume, vessel permeability), and can capture aspects of key regions within
images that are meaningful to the radiologist and oncologist. For example, image annotations
can record the location and measurements of target lesions or point out non-target
lesions. Frequently, the annotations, created on commercial image viewing workstations,
are collected and stored in either proprietary formats or as DICOM presentation state
objects, which are like graphical overlay objects. This enables rendering the information
visually, but does not support search of, and access to the annotations, nor any computation
on them. One is therefore forced to rely on vendor-specific implementations and software.
Even if one were to use vendor-specific software, these software tools are often closed,
and do not adopt standards for annotation, thus hindering interoperability. Consequently,
all annotations currently must be created and maintained within siloed commercial
applications, and there is no interoperability of image annotations across platforms
and applications. To realize the potential value of integrative radiology-pathology-omics,
it is vital that image annotations be stored in standardized interoperable formats
such as the Annotation and Image Markup (AIM) standard or DICOM; a harmonization effort
is underway to unify these two standards.
The goal of the AIM project [[111]] is to provide a standardized, interoperable mechanism for modeling, capturing,
and serializing image annotation and markup data that would be adopted widely within
the medical imaging community. Both human- and machine-readable artifacts are possible.
The variability in methods of storing annotations with the image data is a concern
that can be addressed by developing standard DICOM objects to store this information.
DICOM Working Group 8 is working to harmonize and unify the AIM and DICOM standards
and create a DICOM Structured Reporting object to store AIM image annotations. When
adopted by commercial platforms, this will provide a standardized interoperable format
for image annotations. Adopting this as the standard format to store image annotations
will streamline software development and enable the work to focus on providing rich
annotation features and functionality and on amassing a large collection of minable
image data. Designing the tools to be compatible with other standards will enable
a high degree of interoperability and the incorporation of the annotation standard
into commercial, clinical, information systems.
Data Visualization: Image viewing platforms that support AIM/DICOM-SR will permit consuming annotations
from a variety of sources and linking them to other types of image data as well as
non-image data. Moreover, large collections of image data will become “minable” to
enable discovery from historical collections of Radiology/ Pathology image data. Such
activities will be particularly important in cooperative groups, who routinely collect
and store large amounts of such image data and annotations during clinical trials.
N5 Scaling Quantitative Imaging via Container and Cloud Deployments
N5 Scaling Quantitative Imaging via Container and Cloud Deployments
Novel container technologies will allow for portability and interoperability, critical
to sharing algorithms in a distributed research environment. Increasing adoption of
cloud environments will allow researchers to compute and process at significantly
larger scales
Advances in systems software such as containers provide the ability to encapsulate
algorithms, and their implementations, thus enhancing reuse and portability [[89], [112]]. Containers, popularized by Docker, make it possible for researchers to share their
algorithms and pipelines in a robust and self-contained fashion. These systems integrate
nicely with modern distributed version control system, thereby greatly simplifying
the deployment of data processing codes. Additionally, in instances where investigators
are unable to share source code, containers give them the ability to create images
that are equivalent to platform-agnostic, binary executables of their data processing
codes.
In recent years, cloud computing has become much more popular within the research
community. This increased interest has been spurred, in part by the launch of the
NCI supported, genomics cloud pilots [[113]]. These cloud pilots are now serving as exemplars that allow researchers to perform
genomic studies on the cloud, without having to first download large quantities of
genomic data and then upload them to institutional clusters for processing and analysis.
The adoption of containers ease this migration by greatly simplifying the complexity
of deploying diverse code-bases on a single cloud [[91], [114]].
The imaging community should consider these technologies as a means of sharing methods
and tools. Some key issues in this area are still to be addressed. The cost of processing
on the cloud is still high, though this is being addressed through the recent launch
of the NCI Commons Credit Pilot [[115]].
Open Standards and Open Source Architecture
Open Standards and Open Source Architecture
These enable flexible and more rapid technology developments, which are reproducible
and are more likely to see an accelerated adoption in the marketplace
Open source refers to software that is accompanied by its source code and is made
available through a license which allows users to change and re-distribute the software
under the conditions stipulated by the license. Different flavors of open source licenses
exist [[112]]. Examples include GNU Public License (GPL) that limits commercial use of the source
code, and MIT or FreeBSD licenses that do not limit modification and reuse of the
source code by anyone and for any purpose, including commercialization. There are
numerous examples of software tools developed within NCI-supported programs that are
being made available as open source. One example is ePAD [[113], [115]], a quantitative imaging informatics platform that provides web-based access to
AIM-compliant metadata and semantic image annotation on any platform and any image
workstation. Another open source solution is LIBRA [[95], [116]], a software package developed at the University of Pennsylvania that is a fully-automated
breast density estimation solution based on a published algorithm that works on both
raw and vendor post-processed digital mammography images. The DICOM Toolkit (DCMTK)
is another example of openly available software [[117]]. DCMTK is a collection of libraries and applications that implement large components
of the DICOM standard, including software for examining, constructing, and converting
DICOM image files, handling offline media, and sending and receiving images over a
network connection.
For the developer of quantitative imaging algorithms, whether for data collection
or image analysis, the use of open source software as modules or components in the
total algorithm package can be a shortcut to success. Open source development has
seen a significant growth and transformation with the release of git [[118]] (a distributed version control systems) and github.com (a publically accessible,
centralized, hosted git service). A commitment on the part of the developer to use
a modular, open-source architecture, encourage reuse, thereby introducing efficiencies
in algorithm development is required. A significant development is the widespread
use of containerization platforms such as Docker [[119], [120]] and related projects, which are enabling more broad dissemination of methods through
facile packaging and execution of algorithms. In other words, the inherent flexibility
in open source programming permits the programmer to focus on building custom interfaces,
to create new capabilities, and to customize the performance of the overall algorithm.
It also allows for parallel development on independent components. Importantly, open
source development is critical for community building and a continuity of the development
that might be more tolerant to the interruptions in funding or fluctuations in the
personnel at individual academic labs.
Innovation is important to science, but we also need to balance that innovation with
pragmatism, developing what researchers need today and what can facilitate progress
in steps. We can learn from the success of communities such as DICOM and The Biomedical
Research Integrated Domain Group (BRIDG) [[121]] to make sure what we develop resonates with research communities. If we do not
take this approach, reproducibility of research results and outputs, which is critical
to scientific research, will never be a reality. Additionally, there is an urgency
to demand and reward the sharing of both imaging data and data analysis results to
enable secondary analysis, support reproducibility of findings, and to allow aggregation
of standardized datasets. These datasets can include radiological images, digital
pathology, immunohistochemistry, and data from other modalities that can be standardized
and integrated for analysis. Efforts such as the Informatics Technology for Cancer
Research (ITCR) Program [[101]], which is funding the development of open source tools and algorithms, have been
very successful in generating interest and engagement with the imaging community.
Over a dozen tools that support visualization, storage, and analysis have been developed
from ITCR funding.
Bringing Quantitative Imaging into the Clinical Workflow
Bringing Quantitative Imaging into the Clinical Workflow
For quantitative imaging to become a part of precision medicine, it is critical that
images and features connect with other diagnostic approaches. Genomics, for example,
is receiving a great deal of scientific focus for its ability to chart the progression
of disease and to unlock the molecular basis for cancer. Radiomics and radiogenomics
are creating a culture change in imaging and in the use of informatics to predict
patient outcome. Showing the benefit in combining genomic information with subtle
imaging results to gain greater insight into cancer progression is important to speed
adoption of imaging methods and incorporate them into the clinical workflow.
Educating clinicians on the benefits of imaging methods in clinical practice is key
to their adoption. For example, morphologic diagnosis is required in many cases, as
genomic analysis alone is sometimes inadequate. Genomic analysis will not reveal carcinoma
versus benign growth and mutations analyses alone cannot provide a specific diagnosis.
For example, in the case of Leiomyoma (benign disease) vs. Leiomyosarcoma (cancer),
the genetic mutation is the same, but human cognition and the use of microscopes are
required to accurately diagnose cancer versus benign growth, although novel deep learning
techniques may be helpful in aiding differential diagnosis in the future. Spatial
phenotypic heterogeneity is not captured by genomic data. There is no way of understanding
interactions between the various cell types in a tumor microenvironment (TME). If
the cell composition is the same, but the interactions are different, in two different
TMEs, genomics cannot tell them apart. Hence, the study of images, and their spatial
data, is crucial.
A Case Study - The Open Health Imaging Foundation at Dana Farber / Harvard Cancer
Center: One of the fundamental drivers for an integrated cancer imaging informatics infrastructure
is the ability to easily view and share images across sites and modalities, and provide
a standards-based platform and plug-in architecture for developers. There are many
proprietary commercial web-viewers on the market, which are not easily customized
or open to collaborative development, and those systems that are open are not typically
of a professional grade that would allow translation and collaboration between academics
and industry.
The Tumor Imaging Metrics Core (TIMC) at Harvard created the Open Health Imaging Foundation
(OHIF; http://ohif.org). OHIF supports open-source, web-based, imaging technologies, and is building a vendor-neutral,
open source, extensible, zero-footprint web-viewer and supporting server for display
and analysis of DICOM images. The platform is designed with a plug-in architecture
to allow the group to integrate this web-viewer with oncology applications across
the cancer research community.
One use case of this zero-footprint web-viewer is the replacement of the group’s existing
thick-client with an open source image workstation from the Precision Imaging Metrics
[[116]] clinical trials management system. The system was developed by the Dana Farber
/ Harvard Cancer Center (DF/ HCC) TIMC and is presently in use across six NCI-designated
Cancer Centers. To make the system broadly available to the oncology research community,
the team is developing an interface to the TIMC’s Precision Imaging Metrics web-based
application, and implementing an annotation and overlay standards-compatible interface.
The group has been actively working with investigators from several other NCI-funded
projects to integrate their viewer with other oncology research platforms. The viewer
will meet all the basic requirements for radiology tumor measurements specific to
the needs of oncology clinical trials, yet also be flexible enough to be configured
for user preferences and extended via plug-ins to support varied research workflows
as a shared research resource. To achieve these design goals, the viewer and all its
functionality will be delivered to client machines exclusively through the web browser
requiring nothing to install on client computers or mobile devices, which greatly
simplifies and reduces the cost and support requirements of software deployments,
and increases accessibility. The proposed viewer will enable researchers, imaging
software developers, clinicians, and patients to access oncology clinical trials images
in a freely available and openly extensible environment. This will facilitate remote
image viewing and collaborative image consultations among a wide-range of imaging
professionals.
On-Going and Future Initiatives
On-Going and Future Initiatives
There are numerous ongoing initiatives that help advance the integration of imaging
into the clinical and research workflow. We summarize a few of these that cover a
cross-section of research and clinical use cases, ranging from those that enable imaging-based
epidemiologic studies, to others that advance the quality and reproducibility of imaging
algorithms, to a few that enable the management and processing of imaging data at
large scales. This is by no means a comprehensive list, rather a sampling of informatics
projects with a shared theme; one that includes a focus on quantitative imaging, informatics
methodologies, and a specific facet of precision medicine.
Development of a Cancer Imaging Commons: The Blue Ribbon Panel working as part of the Vice President’s Cancer Initiative (the
Moonshot) made several recommendations [[122]], including the creation of a National Cancer Data Ecosystem. This ecosystem will
comprise several commons, like the Genomic Data Commons [[123]], and include an Imaging Commons. The concept of a commons includes the data, compute,
and analytical tools residing in one place, presumably in the cloud, for easy access
and computation by researchers. The task of getting TCIA data into the CGC Pilots
is a first step towards the development of a Cancer Imaging Data Commons.
Virtual Tissue Repository (VTR): The NCI Surveillance, Epidemiology, and End Results (SEER) program is working with
participating registries to create the VTR, which will allow researchers to select
cases and request that the tumor registries gather tissue, slides, and images, and
generate Pathology imaging features and/or additional information. A pilot VTR is
using caMicroscope [[124]] for online viewing of digital pathology images and will employ ITCR tools to carry
out analyses on Pathology imaging features along with an integrative query system.
Prototype Data Harmonization and Integration Project: Using an NCI Early Detection Research Network (EDRN) breast cancer study containing
clinical trial data, in-vivo images, pathology images, and biomarker images, the aim
is to build the informatics connections between imaging and clinical data based on
ISO standards.
Imaging and Cloud Computing: The Cancer Genomics Cloud (CGC) Pilots, funded by NCI, has been launched in 2016.
The three platforms provide access to genomic data in combination with clinical data
from The Cancer Genome Atlas. In 2017, these cloud pilots expanded their scope by
incorporating proteomic data and imaging data to allow for cross-domain analysis.
Such collocation of data simplifies data access (N1) and facilitates an integrated
exploration of data (N3). CGCs all rely on containerized applications (N5 & N6). Thus,
they meet many of the informatics needs that are outlined here. Recent NCI initiatives,
encouraging the use of these resources in research activities, could provide a real-world
assessment of the informatics needs that were identified in this paper, and help develop
a road map for the advancement of quantitative imaging in clinical and research settings.
Conclusion
This article provides a survey of the role and priorities for imaging informatics
to help advance quantitative imaging in the era of precision medicine. It came about
from a series of workshops, and dialogues between NCI staff and the academic and industrial
scientists involved with imaging informatics. The community continues its work through
various initiatives, and ongoing dialogues on the subject, and working to translate
informatics developments to clinical utility as rapidly as possible. In addition to
the six needs and challenges outlined above, there are some other broad recommendations,
listed below:
-
Ensure buy-in from clinicians and make sure the tools developed will work in the clinical workflow. Educate clinicians on the value of imaging and its potential contribution to diagnosis, guiding treatment
plans, and scientific research.
-
Incentivize and reward sharing of both the imaging data and the data analysis results to enable secondary analysis,
support reproducibility of findings, and to allow aggregation of standardized datasets.
-
Create solutions that ensure data quality and veracity, for ease of retrieval and clinical utility.
-
Work towards a flexible, extensible, integrated framework but not a single, monolithic platform; encourage APIs, agile data management systems,
and use of standards and semantics for interoperability.
-
Work within organizations to educate tech transfer and legal departments on the importance of industry partners and to set reasonable expectations for such
partnerships.