Background
Silicon Valley: meeting place of the tech giants
Silicon Valley, located in the southern San Francisco Bay Area in California on the
West Coast of the USA, is known around the world as the global center for high-tech
and innovation. From the middle of the last century, the area has evolved into a hotspot
for groundbreaking technological inventions, starting with the research into and development
of silicon-based transistors, from which the Valley took its name [1]. Originally, what is now Silicon Valley was a predominantly agricultural region,
known as the Valley of Heartʼs Delight due to its popular orchards. The decisive shift
from agriculture to becoming an epicenter of technology began under the aegis of Frederick
Terman, the dean of the School of Engineering of Stanford University. Terman encouraged
the teaching staff and graduates of Stanford University to set up their own companies,
and in 1951 he launched the Stanford Industrial Park, later renamed the Stanford Research
Park, with
part of the site leased by the university to emerging technology companies [1]. The companies include spin-offs from university departments such as Hewlett-Packard,
Google, and Varian Medical Systems. In the 1980s and 1990s, research and industry
in Silicon Valley began to diversify away from the manufacture of semiconductors towards
the development of computers and the development of software- and internet-based companies
[1]. Today, the region is home to world-leading technology companies pushing for the
increased integration of artificial intelligence (AI) into everyday life, a development
which is also beginning to affect the healthcare sector. But the scandal surrounding
the start-up company Theranos, formerly sited in Stanford Research Park, shows just
how close together the potential and the risks can lie when developing modern healthcare
technologies. The company, which had a corporate value of 9 billion US
dollars in 2015, collapsed rapidly in 2018 after investigations by journalists uncovered
deceptive promises about revolutionary diagnostic technologies [2]. This is just one example which highlights the importance of structured and scientifically
substantiated validation processes for emerging healthcare technologies.
ACOG Annual Clinical and Scientific Meeting 2024 in San Francisco
The annual conference of our American sister society (the American College of Obstetrics
and Gynecology, ACOG), known as the ACOG Annual Clinical and Scientific Meeting 2024
(ACSM), was held in the immediate vicinity of Silicon Valley, in San Francisco, from
17 – 19 May 2024. For the first time, the ACSM included two sessions on the topic
of artificial intelligence (AI). The sessions “Cutting-Edge AI Applications” and “Generative
AI” presented and discussed recent artificial intelligence developments in gynecology
and obstetrics, with representatives from leading technology companies from Silicon
Valley sitting in on the presentations.
To make the contents of these talks available to the readers of “Geburtshilfe und
Frauenheilkunde” this review provides an overview of current AI terms against the
backdrop of the digitalization status of the German healthcare system, describes three
areas presented at the ACSM 2024 where AI is already being used in gynecology, and
looks at the current development status in the context of existing impediments to
implementation.
Status of Digitalization in Germany and the Commission Digital Medicine of the German
Society for Gynecology and Obstetrics (DGGG)
In recent decades, the German healthcare system has not excelled at promoting digitalization.
But at the start of 2023, the Federal Ministry of Health in Germany issued its “Digitalization
Strategy for the Healthcare and Nursing Sector” which provides a regulatory framework
and aims to reduce the existing digitalization gap [3]. The very same year, two laws were also passed in Germany, the “Digital Act” (Digital-Gesetz,
DigiG) and the “Health Data Use Act” (Gesundheitsdatennutzungsgesetz, GDNG), which
provided a basis for the extensive digital reform plans of the Federal Ministry of
Health (Bundesministerium für Gesundheit, BMG) [4], [5]. With the establishment of the Commission Digital Medicine in February 2022, the
DGGG has taken a further step towards digitalization with the aim of dealing more
actively with the impact of the relevant regulations and the rapid implementation
of
modern healthcare technologies in both gynecology and obstetrics. Among other things,
this also includes making current developments and discussions about AI, which is
now an omnipresent topic in the media, more accessible to the members of its own specialist
society.
Artificial Intelligence and its Promise to Health
The Artificial Intelligence (AI) Index Report 2024, published by the Human Centered
AI Initiative of Stanford University, has collected, analyzed, and visualized data
and issued almost 500 pages on data about AI in all areas of life [6]. The most important insights were summarized in a Top 10 list and a number of visionary
conclusions were formulated. They include, for example, the statement that AI already
outperforms humans in certain tasks, for example, image classification and text comprehension,
while the technology still lags behind the capabilities of humans in more complex
challenges such as mathematics and visual thinking. The report points out that industry
dominates the leading research into AI, and the majority of these companies are located
in Silicon Valley. In 2023, industry-led research and development teams developed
of total of 51 “remarkable AI models” compared to just 15 from the academic community
[6]. The cost of developing leading models such as GPT-4 from OpenAI, which has cost
an estimated 78 million dollars, and Gemini Ultra by Google, which has cost some 191
million dollars in computing power, does not just require huge financial resources
but also highly sought-after human capital resources in the form of highly trained
developers [6]. This is one reason why algorithm development is increasingly industry-led or is
done in tandem by science and industry.
Now its 7th edition, the current 2024 report has, for the first time, included a new
chapter on “Science and Medicine” which analyzes the increasing use of AI in medical
research and clinical practice [6]. Current figures from May 13, 2024 show that the US-Food and Drug Administration
(FDA) approved a total of 882 AI-based medical products up until 2023 [7]. Figures showing the changes over time reveal that development has accelerated,
with the original figure of just three FDA-approved AI medical products in 2013 rising
to a total of 171 in 2023. More recently, 191 new applications were added in the first
five months of the current year alone [6].
A review of the distribution of FDA-approved AI-based medical products according to
different medical specialties shows that radiology stands out as the AI frontrunner
with 671 currently approved applications, followed by cardiovascular medicine with
89 [7]. To date, just one AI-based medical product which was assigned to the field of gynecology
and obstretrics has successfully passed the approval process of the FDA. The KIDScore
D3 tool supports specialists in reproductive medicine by providing prognoses on the
probability of embryos developing, based on their statistical viability [8]. However, more and more AI-based gynecological applications are being submitted
to the FDA for approval. The process is fundamentally different from the traditional
FDA process used to approve medications and is changing in parallel to the rapid technological
developments.
Artificial Intelligence in Gynecology
The AI sessions of the ACOG Annual Clinical and Scientific Meeting 2024 (ACSM) began
with a classifcation of some of the terminology currently widely used by the press
([Fig. 1]). While artificial intelligence (AI) includes the concept that machines/algorithms
carry out tasks which would normally require human intelligence, Machine Learning
(ML) refers to algorithms which learn from patterns and relationships between data,
without having been explicitly programmed. ML is divided into three main types: Supervised
Learning, which means that the algorithm learns from previously prepared and marked
training data; Unsupervised Learning, which means that the algorithm learns from unmarked
and unstructured training data and independently discovers relationships within data;
and Reinforcement Learning, which includes models that have learned to make decisions
through interactions with the environment and associated rewards. Deep Learning is
a
subdomain of machine learning and uses deep neuronal networks modelled on the human
brain, consisting of many sequentially connected layers, which make it possible to
detect and model complex patterns and connections in large and often unstructured
volumes of data through hierarchical learning and automatic trait extraction. Generative
AI, which is currently a very hot topic in the media, includes deep learning models
which are also capable of generating texts and audio and video files as well as synthetic
data.
Fig. 1 Overview of AI terminology.
After this introduction, various areas of AI application in gynecology and obstetrics
are presented below.
AI in fetal heart rate monitoring
The first area covers the use of AI in fetal heart rate (FHR) monitoring. Although
standards for evaluating CTGs have been established, for example the FIGO criteria,
evaluations are often still inconsistent and subjective. High intra- and inter-rater
variability and uncertainties about the identification of critical conditions can
result in risks for mother and infant not being recognized and treated in time [9]. Initially, intelligent algorithms were used to support FHR, but none of them have
been able to provide specific predictive prognoses which would lead to the early detection
of fetal risks. The core message from the INFANT trial, which was first published
in 2017 and included 46 000 patients from 24 centers, was that computer-aided evaluation
of CTGs in women who had continuous electronic FHR monitoring during contractions
did not lead to an improvement in the clinical outcomes of mothers or babies [10].
This must be set against the harsh criticisms levelled against the study design of
the INFANT trial, which include inconsistent randomization and blinding and inadequate
collection of crucial clinical data which should serve as important clinical criteria
for decision-making in clinical practice.
OʼSullivan et al. summarized the lessons from the INFANT trial in a review article
and formulated the challenges of developing robust AI models for FHR monitoring intrapartum
[11]. AI models need orientiation to be able to differentiate between pathological and
physiological findings, but as this differentiation is usually not “black or white,”
this still leaves a range of “not-reassuring” findings which make it difficult for
ML algorithms to detect patterns in the data. Moreover, large datasets are rare, but
they form the basis for developing every AI algorithm. The training data does not
include clinical data about follow-up neonatal care or information about the long-term
outcomes of neonates. This means that the current training data limits the development
of models to such core elements as umbilical cord pH, base excess, lactate, and Apgar
scores [12]. The authors demanded that cases should be defined more
precisely and data should be segmented and that additional clinical variables and
data modalities are included, which would make it possible, in the longer term, to
develop tools with explanatory power to support decision-making.
The study by Chiou et al. presented here shows that a combination of AI and FHR monitoring
can be successful [9]. The study investigated automated CTG interpretations as a possible solution for
improving early detection rates of fetal hypoxia during contractions and reducing
unnecessary surgical interventions, thereby improving the overall care of mothers
and neonates. Their study used deep learning to reduce the level of subjectivity associated
with visual CTG interpretations. Their results showed the feasibility of using deep
learning to predict fetal hypoxia based on CTG traces. But it also needs to be pointed
out that to improve the robustness of the results, future investigations need to include
greater amounts of data and more diverse data from maternity centers from across the
world. Such data would have to include different clinical contexts, demographic characteristics
and results [13].
The presentation of this area of application ended with the statement that AI has
already arrived in FHR monitoring. Future improvements to the models will be based
on far more extensive training data which will lead, in the near term, to clinical
validation and medical product approval of the respective algorithms, although this
will probably happen first in the US-American market. But such models will not replace
medical staff because our American colleagues were all agreed on one point: the decision
to carry out clinical interventions must still rest with the medical team of midwife
and doctor.
AI in ultrasound
The presentation of the second area of application still focused on obstetrics. Although
arriving at an objective reading of FHR remains a challenge, obstetrical ultrasound
findings often offer more tangible standards, meaning that there are more areas suitable
for developing robust AI models. AI-sensitive tasks in ultrasound include classification
(What is this object?), segmentation (Where is the outline of the head?), navigation
(How do I get the best image?), quality assessment (Is this image usable?), diagnosis
(What is the diagnosis for this image?) and the writing of reports (Please write an
ultrasound report). A review by Chen et al. summarized how a combination of AI and
ultrasound can support clinicians to diagnose different conditions and diseases [14]. The authors showed how the combination can increase efficacy, reduce the number
of misdiagnoses, improve the quality of medical services, and ultimately benefit patients.
In
obstetrical ultrasound AI is used to recognize structures, e.g, of abdominal organs
and facial structures, and to calculate ventricular volumes and ventricular wall thickness
in fetal echocardiography. Other applications include automatic measurement of nuchal
transparency and volume of fetal structures such as the head, bladder or stomach;
classification and diagnosis of the risk of preterm birth using cervical ultrasound;
assessment of fetal lung maturity; the detection of congenital heart defects; and
the quantification of fetal weight and gestational age. In their systematic review
of the literature, Jost et al. showed that when medical specialties were compared,
the use of AI in gynecological ultrasound was used predominantly in obstetrics, even
though there are many other areas where AI could be usefully applied, for example
to identify adnexa and breast tumors and to assess the endometrium and pelvic floor
[15].
The key message of the session is that AI-guided ultrasound is already a reality,
usually operating below our level of awareness as it is seamlessly integrated into
the ultrasound unit. According to our colleagues, the benefits of AI-supported ultrasound
lie above all in the increased efficiency, for example, faster image capture and evaluation
and automatic measurements. Other benefits include improved results due to more consistent
evaluations, greater precision, and fewer measurement errors due to standardization
as well as support for medical staff by providing training, assisting with knowledge,
and reducing processing. But the hurdles to implementation are also mentioned: they
are the same problems which were already addressed in the context of AI and FHR monitoring:
data is crucial. The development of AI is restricted by the lack of high-quality data
and by inherent bias. There are also concerns about data protection and data security.
The performance of algorithms
remains a challenge as the complex models have to function in real time and their
explanatory power and transparency are often insufficient for clinical users. Finally,
an appeal was made to the listeners that there would have to be more investment in
technological acceptance. Clinical validation is still patchy, there are regulatory
obstacles, and the issue of liability needs to be resolved before AI can be extensively
adopted in clinical practice [16].
AI in robotic surgery
When presenting the third area of application, our colleagues introduced yet another
concept: Surgical Data Science (SDS). SDS is an interdisciplinary field which uses
the methods of data science und computer science to improve surgical procedures and
outcomes. It consists of the collection, analysis, and interpretation of data from
different sources and modalities, including imaging, patient files, intraoperative
sensors, and other medical devices [17]. SDS aims to improve surgical planning and preparation by creating more precise
and more individualized surgical plans. During surgery, SDS supports the surgeon by
offering real-time analysis, which provides important information and can recognize
potential problems early on. Data analysis is used to optimize postoperative monitoring
and follow-up and recognize complications at an early stage. SDS also contributes
to improving surgical training by using virtual reality and simulations based on
real data. SDS also supports clinical research by analyzing large volumes of data
to obtain new insights into surgical practices and patient outcomes. This means that
SDS is closely linked to robotic surgery, which permits the structured collection
of objective und granular parameters of surgical performance, e.g., a precise recording
of exerted pressure or the length of coagulation phases. As more and more of this
data is collected and combined with data processing by AI, SDS can contribute to expanding
and automating coaching, feedback, assessment, and decision-making aids in surgery
[18].
A familiar pattern emerges with this: if a large, diverse, multimodal dataset is available
which cannot be independently processed by humans due to numerous different variables,
artificial intelligence is able to collect valuable data treasures, whose location
was previously unknown. After data treasures have been collected, this can be used
to gain knowledge or, based on the learned relationships within the data, AI can be
used to provide support to treating medical staff carrying out surgical procedures.
It became clear during the presentation of the third area of application that AI has
also entered the surgical field, even if currently it is only present in areas which
permit the collection of structured data, i.e., mainly robotic surgery [17]. Given that surgical robots are not yet part of the standard equipment of operating
rooms and when they are present, the robot might currently be used by other colleagues
from urology to carry out
robot-assisted prostatectomy procedures, it remains to be seen when SDS and AI will
become standard companions in gynecological and obstetrical procedures.
Generative AI
The second session was entirely devoted to a single subset of AI: generative AI. Since
the official market launch of the large language model (LLM) ChatGPT by OpenAI, a
company which is headquartered in San Francisco, generative AI has experienced an
explosion of media interest. Such so-called Foundation Models use vast amounts of
data which are trained up through Reinforcement Learning and Supervised Learning and
human feedback to process texts, data, language, and other structured data and carry
out many different tasks. Potential tasks include the extraction of information, image
descriptions, object recognition, analysis of psychological moods, and finding answers
to many different questions. Previously, AI required the development of a specific
model for a specific task. Today, however, an appropriate Foundation Model can undertake
numerous tasks at the same time. LLMs make use of mathematical models, for example
to calculate the next most probable word when formulating
texts. Kiela et al. showed the speed at which these AI systems have outperformed humans
in recent years in language and image recognition and that the breakthrough occurred
in the last 3 – 4 years [19]. Nori et al. expanded these findings to include medicine and showed how specialized
medical LLMs are rapidly increasing their capacities and capabilities [20]. The session summarized the current status of generative AI under the title “The
Good, the Bad and the Ugly” without discussing concrete applications in gynecology
and obstetrics ([Fig. 2]).
Fig. 2 Overview of The Good, the Bad and the Ugly of Generative AI (Fig. is based on data
from: summary of the session “Generative AI” of the 2024 ACSM).
Generative AI already offers benefits in many relevant areas of medicine which range
from clinical care to simplifying administration, supporting carers and patients to
medical teaching and research and public health. On the other hand, “The Bad” shows
that many of the issues of generative AI are still unsolved, including the problem
of ensuring proper data protection, the lack of transparency of decisions for users,
and the ethical and legislative basis. It is therefore still unclear how issues such
as intellectual property, liability, and copyright laws should be dealt with in the
context of generative AI and who will have access to specific models. Questions relating
to the effects of the clinical adoption of models also remained largely unanswered.
In which areas can we expect to see a decline in human capabilities with avoidance
of previously unavoidable learning processes in medical training? Will the implementation
of AI lead to “de-humanization” in certain areas?
“The Ugly” shows how dangerous the weaknesses of generative AI can be. Examples include
so-called “hallucinations,” when generative AI models simply make up information so
that they can provide an answer “by hook or crook,” and “omissions,” when the model
does not provide a complete answer and withholds important information. A study by
Alkaissi et al. investigated the ability of ChatGPT to describe the pathogenesis of
different diseases and was able to show that at times, the chatbot hallucinated non-existing
pathogenic relationships or invented sources which did not exist [21]. LLMs have also been shown to have a tendency to manifest prejudices and discrimination
although as researchers, we cannot be held entirely blameless for this. This is because
models are trained on existing databases which can include significant ethnic and
gender-specific biases. Zack et al. was able to show that GPT-4 did not model the
demographic range of diseases
correctly and consistently produced clinical vignettes which stereotyped demographic
presentations [22]. The authors emphasized that it was urgently necessary to carry out a detailed and
transparent evaluation of the biases in the instruments of generative AI with regards
to planned cases of application before they are integrated into clinical care. The
quality and safety of publicly available models is currently not good enough to ensure
the safe handling of sensitive patient data. The models still struggle with performance
problems in terms of precision and with a lack of knowledge-based thinking and argumentation
which can lead to misinformation. In some cases, this can feed into so-called “deep
fakes” which are often difficult to recognize by the human eye due to the high visual
and textual capability of the models. Containment strategies such as those proposed
by the Trustworthy & Responsible AI Network (TRAIN) [23], the Coalition for Health AI [24] and the AI Act of the European Union [25] are important steps on the way to defining basic rules and boundaries for AI.
The session ended with an important note for the auditorium: generative AI is becoming
an increasingly powerful tool which is used by patients, medical staff, and family
members. For us as specialists for gynecology and obstetrics, this means that we should
not miss the historical opportunity to look at these applications in detail, evaluate
them critically, and actively contribute to integrating them in our specialist field.
Conclusion and Outlook: Artificial Intelligence and Gynecology and Obstetrics, quo
vaditis?
In summary, the impressions we obtained in Silicon Valley and at the ACSM can be condensed
into three main points which explain why AI will find its way into gynecology and
obstetrics and how it is already changing clinical practice or will change it in the
near future:
The data
“Data is the new oil” is a metaphor coined by the mathematician Clive Humby in 2006
which is now more relevant than ever. “Big Data” is also present in gynecology. In
gynecological oncology, for example, the growing flood of data obtained through advances
in precision oncology, better genomic profiling and targeted therapies has resulted
in significant improvements in diagnosis and treatment, which has led to significant
breakthroughs [26]. These advances are accompanied by an abundance of multimodal therapeutic and diagnostic
data alongside increasingly complex research results which, little by little, are
exceeding the limits of human cognitive processing. AI is already helping to expand
human medical intelligence to successfully process these huge volumes of data and
text information [27], [28]. And yet: although some areas have an abundace of data, other areas such as fetal
heart rate monitoring show that while AI has potential, there is still not enough
high-quality training data with the corresponding level of diversity that would be
necessary to develop robust models.
The visual aspect
The rapid development of FDA-approved medical AI products in radiology shows: AI has
already proven to have a special talent for processing visual elements [6]. And such visual data is also available in gynecology: from mammographies and histopathological
data to colposcopies and laparoscopies to specialized prenatal ultrasound, AI is learning
to process different visual components used in gynecological diagnostics. Although
performance can still vary significantly depending on the specific application, these
areas are creating the first bridges over which the technology will enter into our
specialist field.
The patient
As the discussion focuses on AI in obstetrical and gynecological diagnostics and treatment,
it is important not to lose sight of the patients. The mean age of primipara in Germany
is 31.7 years, and gynecology treats the full age range of patients, including a not
insignificant number of younger, digitally-savvy patients [29]. This will contribute to emerging technologies rapidly becoming more relevant for
treatment in clinical reality.
The overview of the two AI sessions at the 2024 ACSM shows that possible applications
of artificial intelligence (AI) are very diverse and so extensive that they cannot
be comprehensively summarized in a short review. With AI finding increasing applications
in clinical care, including in gynecology, scientific monitoring will be needed. Whether
it takes the form of original scientific works or the synthesis of evidence using
structured literature reviews and meta-analyses, when technological developments happen
so fast, this can only be done through interdisciplinary global networks. To that
end, the Commission Digital Medicine of the DGGG has positioned itself nationally
in Germany and will be expanding its activities in future to work towards making current
issues accessible for its own specialist association and to use the potential of AI
to improve gynecological and obstetrical care.