Keywords
computed tomography - endoscopic sinus surgery - imaging - simulation training - FESS
Introduction
The novel coronavirus disease 2019 (COVID-19) pandemic has thrust patients and health
care workers into a vulnerable state. Faced with this global crisis, the medical field
had to embrace new technologies to advance education and patient care.[1]
[2] Stay-at-home orders, widely implemented during the early phase of the pandemic,
resulted in a dramatic decrease in the feasibility of in-person examinations. The
subsequent increase in video and telehealth visits during this time period suggests
the need for alternative, safer, and no-contact methods for examining patients, to
avoid delays in diagnosis and treatment.
Virtual reality (VR) offers tremendous potential in the medical field, especially
for inherently visual-spatial exercises like diagnostic and surgical endoscopy.[3]
[4]
[5] Sinus anatomy is intricate and variable, with close proximity to critical neurovascular
structures.[6]
[7]
[8] Preoperative planning and innovative intraoperative image guidance systems presently
rely on 2D computed tomography (CT) planes that may not offer the most intuitive visualization
of anatomy.[9]
[10]
[11]
[12]
In otolaryngology, VR has demonstrated efficacy as a teaching tool for students, residents,
and surgeons to hone procedural skills.[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26] While such innovations showcase VR's potential, current simulators rely on laborious
manual image segmentation—the identification of different components in an image—and
are thus not clinically scalable.[20] Machine learning methods offer the potential to automate high-quality image segmentation,
addressing a significant hurdle of clinical VR.[27]
[28]
[29]
[30]
Existing machine learning methods like convolutional neural networks (CNNs) have shown
promise as a segmentation tool across a variety of modalities but require a large
volume of high-quality annotated data, as observed in previous studies centered around
sinus segmentation.[31]
[32]
[33] True clinical applicability demands a more data-efficient alternative.[34]
Subspace approximation with augmented kernels (Saak) is a novel transformation that
offers a fully reversible and data-efficient means of feature extraction.[35]
[36] Equipping the Saak transform with a classifier produces an automatic image segmentation
algorithm capable of operating with minimal training data. We previously developed
this method and studied its ability to segment intricate light sheet fluorescence
microscopy images, finding that Saak transform–based machine learning consistently
outperformed a CNN, particularly with lower numbers of training images.[36]
In this study, we leverage data-efficient machine learning to create a VR tool for
patient-specific surgical planning.[5] Our Saak transform–based method automatically segments soft tissue and bone from
sinus CT scans to allow operators to explore a patient's unique anatomy in the VR
domain.
Materials and Methods
Preparation of Training Data
All data collection for this study received IRB approval. We obtained Digital Imaging
and Communications in Medicine (DICOM) files for three patients' sinus CT scans with
identification stripped for confidentiality purposes. The patients were selected from
a pool of preoperative candidates for functional endoscopic sinus surgery. Both the
soft and bony tissues of all 548 axial images, between 100 and 200 per CT scan, were
annotated in Amira to establish the ground truth for training and validation purposes.
Segmentation Algorithm
We used MATLAB to extract and trim each axial slice from the raw CT scan DICOM files
to 350 × 350 windows encompassing the airspace structures. We then used randomly selected
images and their manually segmented labels to train our Saak-based machine learning
algorithm, consisting of a multistage Saak transform—based on principle component
analysis (PCA)—and a random forest classifier. The model segmented all desired anatomic
structures at once for each axial slice. We ran the trained model to segment each
CT scan such that the training data did not overlap with testing data.
Validation of Segmentation
The Saak-based method was validated using randomly selected, nonoverlapping axial
images from the CT datasets. We calculated the intersection over union (IOU) and dice
similarity coefficient (DSC) of the segmentation results of our algorithm trained
with three, six, and nine training images.
Virtual Reality Demo
After obtaining segmented data, we reconstructed the three-dimensional (3D) object
in Amira 6.1 and generated, compressed, and exported a surface model to Autodesk Maya for scaling and smoothing. We developed our VR demo in Unity using models exported from Maya. We finalized all steps in Maya and Unity with educational licenses.
We used an Acer Windows Mixed Reality Headset (Acer) as the VR viewer and an educational-purpose
license version of Unity 5.5 (Unity Technologies) as the development engine. In a
new Unity project, we imported the patient's 3D reconstructed head and mounted a probe with
a virtual endoscopic camera. We enabled user control of the probe using either Acer
Windows Mixed Reality Controllers or keyboard inputs and computed its coordinate position
for mapping to 2D slices of the CT scan. Finally, we designed a Unity canvas that
contained windows for the probe's location and the endoscopic camera's display. The
outline of our VR pipeline is outlined in [Fig. 1].
Fig. 1 Segmentation and virtual reality (VR) pipeline. (A) The axial slices of a raw computed tomography (CT) scan are passed to (B) the Saak transform–based machine learning algorithm, which has been trained with
manually labeled images. The algorithm produces segmented slices of (C) soft tissue and (E) bone, which are stacked and processed to generate (D, F) three-dimensional (3D) meshes that can be ported to (G) the prebuilt Unity VR user interface for interactive anatomical exploration.
Results
Automatic Segmentation
We compared the Saak transform segmentation results of soft tissue ([Fig. 2], blue) and bone ([Fig. 2], red) with the ground truth ([Fig. 2], white). Under the condition of three, six, and nine training images, the DSC of soft tissue
was 0.94 ± 0.05, 0.96 ± 0.04, and 0.98 ± 0.01, respectively, while the IOU was 0.89 ± 0.09,
0.92 ± 0.07, and 0.97 ± 0.02, respectively. In comparison to soft-tissue segmentation,
bone images had DSCs of 0.30 ± 0.06, 0.66 ± 0.10, and 0.6 ± 0.07 and IOUs of 0.44 ± 0.08,
0.49 ± 0.11, and 0.44 ± 0.08 across the same numbers of training images.
Fig. 2 Qualitative and quantitative comparison of segmentation results. (A, B) The Saak transform–based method's soft tissue (blue) and bone (red) segmentation results overlaid with the ground truth (white) for two axial slices of a computed tomography (CT) scan. (C, D) The dice similarity coefficient (DSC) and intersection over union (IOU) results
of our automatic segmentation method for soft tissue and bone computed using 24 randomly
selected validation image sets.
Virtual Reality Model
The automatic segmentation results enabled us to explore each patient's sinus anatomy
in our Unity VR model with functionality to augment the user's experience. In the digital reconstruction,
we examined the nares and nasal cavity to view the orifice of the maxillary sinus,
ethmoid air cells, frontal sinus, nasopharynx, and any obstructions along the pathway
with real-time mapping of our location ([Fig. 3A, B]) to corresponding 2D slices of a CT scan. At any point during the exploration, we
could toggle between soft-tissue and bone views to assess the degree of mucosal obstruction
([Fig. 3C–F]). Our investigative trajectory was also traced during the virtual endoscopy, and
tissue and bone boundaries were overlaid on the 3D path to assess the user's proximity
to sensitive structures such as the lamina papyracea.
Fig. 3 Functionality of the virtual reality (VR) interface. (A) The three-paned Unity user interface displaying the mapped location of the probe,
the probe's camera feed, and the three-dimensional (3D) model in which the probe is
deployed (MS, maxillary sinus; NP, nasopharynx). (B–F) A view of the frontal sinus showing user ability to toggle between (C, D) tissue and (E, F) bone views and control the brightness of the probe's light. (G–I) A 3D tracer enables the user to view the probe's path. Bone and tissue can be toggled
on and off to observe proximity of the path to other structures like the orbits.
We performed virtual endoscopy on two patients with significant sinus disease and
captured parallel views. In addition to the frontal sinus, we visualized the alar
cartilage ([Fig. 4A, E]), nasal cavity ([Fig. 4B, F]), nasopharynx ([Fig. 4C, G]), maxillary sinuses, and ethmoid air cells ([Fig. 4D, H]). All structures were identifiable both through the navigation system and through
the endoscopic camera's view. These perspectives allowed us to compare the obstructions
and varying landmark locations between these patients ([Fig. 4]).
Fig. 4 Labeled anatomical features. (A–D) Selected views of a patient's anatomy. (E–H) Parallel views in the second patient highlighting the anatomical differences. (B) The first patient has a narrower nasal cavity due to obstruction compared with (F) the second patient. (C) While the natural orifice of the maxillary sinus is normally located above the inferior
turbinate, (H) the second patient has a passageway below the turbinate from previous surgery.
Discussion
Both anatomic abnormalities and low conceptual expertise of the surgeon are cited
as risk factors for increased complication rates in endoscopic sinus surgeries.[37] As VR enables a more intuitive, 3D visualization of anatomic features compared with
traditional 2D CT scan views, our model has the potential to address both of these
issues. Our framework offers the novel ability to automatically process and view a
patient's unique anatomy in the VR domain. While existing VR models serve as teaching
tools, the scalable patient-specific nature of our model broadens its application
to preoperative planning, virtual endoscopy, and education. The mapping feature, inspired
by the intraoperative image-guidance systems in practice today, further enhances the
identification and understanding of landmarks.
The primary advantage of the Saak transform over other machine learning methods is
its data efficiency. We were able to generate viable 3D sinus reconstructions using
as few as three training images, meaning that users can tailor the algorithm to their
specific segmentation needs with a minimal amount of manually labeled data. We assessed
the accuracy of our Saak-based segmentation method with qualitative and quantitative
measures. In addition to strong DSC and IOU values, the soft-tissue segmentation results
had minimal visible noise, avoiding unnecessary surface vertices in the final model
that would otherwise affect the performance of the VR demo. Bone segmentation was
less accurate, likely due to the limitations of generating manually labeled training
data around difficult-to-visualize structures like the ethmoid air cell walls. However,
while highly precise soft-tissue segmentation is essential for an effective VR model,
less accurate bone segmentation still enables clear visualization of contours needed
to identify the probe's relative position. The six and nine training sizes were more
effective in producing a bony anatomy model, but all training sizes provided a similarly
effective soft-tissue VR experience.
This study provides a foundation for future innovations in the VR domain. Adding interactive
functionality such as the ability to cut or debride tissue would build on this foundation
and allow trial runs of a surgery.
Both the machine learning and VR aspects of this study present limitations. Our Saak-based
method functions well with consistent scan settings but is not designed to simultaneously
handle images with multiple different windows and contrast profiles. However, the
data-efficient nature of the Saak transform allows any user to tailor the performance
of the algorithm for their scan standards using only minimal training data, preserving
its clinical applicability.
Second, our VR model, like most others, is built using surface meshes rather than
space-occupying voxels due to computational constraints. This makes deformation or
manipulation of the object more challenging, limiting the realism of functional endoscopic
sinus surgery simulators. Nevertheless, the fundamental framework of applying automatic
segmentation to the VR domain is broadly applicable and will remain relevant even
as voxel-based VR technology improves.
Conclusion
This study found that Saak transform–based machine learning automatically generates
accurate, patient-specific VR models. Beyond preoperative planning, automatic segmentation
and visualization of scans in VR may pave the way for virtual endoscopy and other
remote alternatives to diagnostic examinations, addressing major challenges presented
by the COVID-19 era. Future research into the automatic segmentation of additional
anatomic structures and the interactive mechanics of VR will reinforce the clinical
applicability of this technology.