2025 Proffered Presentations
S335: ENHANCING SURGICAL VIDEO PHASE RECOGNITION WITH ADVANCED AI MODELS FOR ENDOSCOPIC PITUITARY TUMOR SURGERY
Jack Cook1; Jonathan Chainey, MD, MSc2; Ruth Lau, MD3; Margaux Masson-Forsythe1; Ayesha Syeda1; Kaan Duman1; Daniel Donoho, MD4; Dhiraj A Pangal5; Juan Vivanco Suarez6; 1Surgical Data Science Collective, Washington, D.C., USA; 2Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada.; 3Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada. Division of Neurosurgery, Joan XXIII University Hospital, Tarragona, Spain; 4Surgical Data Science Collective, Washington, D.C., USA. Children's National Hospital, Washington, D.C., USA; 5Department of Neurosurgery, Stanford University, Stanford, C.A., USA; 6University of Iowa, Iowa City, I.A., USA
Introduction: Operative videos are often used as a source of surgical education and demonstration. With improvements in computer vision, surgical video analytics has revolutionized the analysis of surgical performance. However, surgical videos, particularly skull base procedures, are lengthy, require significant manual effort to optimize for downstream functions, and are poorly delineated.
Detecting and identifying phases of surgery can help surgeons quickly skip to parts of the surgery that are applicable for education and demonstration and provide useful, targeted analytical insights.
We introduce an artificial intelligence model designed to segment pituitary tumor surgery into four distinct and essential phases: nasal, sphenoid, sellar, and closure. This was achieved by collaborating with a global team of surgeons to create an extensive dataset of labeled phase videos.
Method: Our total dataset includes 127 video clips across 38 case videos from 3 contributing centers. We split our dataset into 80% training data and 20% validation and test data.
We developed two deep-learning model pipelines to segment the phases of pituitary tumor surgery. The first pipeline employs a state-of-the-art video transformer model to directly predict surgical phases from video input. The second pipeline generates frame-by-frame embeddings, which are then processed using an MSTCN++ (Multi-Stage Temporal Convolutional Network) model to predict phases. A post-processing stage utilizing an accumulator is applied to enhance the accuracy and consistency of the predictions. This stage mitigates any erratic predictions. Our approach is validated using a comprehensive dataset of labeled phase videos provided by a global team of surgeons. Also included are remapped and adapted data from the PitVis dataset (PitVis dataset [Data set]. Synapse. https://www.synapse.org/Synapse:syn51232283/wiki/621581 ).
Results: The performance of the two deep learning model pipelines was evaluated using accuracy, precision, and F1 score as the primary metrics. These metrics provided a comprehensive assessment of the models' ability to segment the surgical phases accurately and precisely. The embeddings pipeline achieved an accuracy of 77.7% over the test set, whereas the video transformer achieved an accuracy of 72%. In addition to quantitative metrics, visual segmentation timelines were generated for a visual performance analysis [Fig. 1.] , the additional smoothing effects of the accumulator in post-processing are also visible. These timelines helped illustrate the phase predictions' effectiveness and identify any discrepancies or areas for improvement in the segmentation process.
Figure 1: Visual timeline predictions of phases.
Conclusion: Our study demonstrates the effectiveness of two deep-learning model pipelines in segmenting Pituitary Tumor Surgery into four distinct phases. By leveraging the video transformer model and a combination of frame-by-frame embeddings with the MSTCN++ model, we achieved high accuracy, precision, and F1 scores. The post-processing stage using an accumulator further refined these predictions, resulting in coherent and reliable phase segmentations. Including remapped and adapted data from the PitVis dataset, combined with visual segmentation timelines, provided robust performance analysis and valuable insights. This approach not only enhances surgical training and performance but also has the potential to be adapted to other types of surgeries, contributing to the advancement of surgical analytics and education.