FedEFM: Federated Endovascular Foundation Model with Unseen Data

1University of Liverpool, UK
2AIOZ, Singapore
3Automation & Control Institute, TU Wien, Austria
4Xi'an Jiaotong-Liverpool University, China
5National Tsing Hua University, Taiwan
*These authors contributed equally to this work.

Abstract

In endovascular surgery, the precise identification of catheters and guidewires in X-ray images is essential for reducing intervention risks. However, accurately segmenting catheter and guidewire structures is challenging due to the limited availability of labeled data. Foundation models offer a promising solution by enabling the collection of similar-domain data to train models whose weights can be fine-tuned for downstream tasks. Nonetheless, large-scale data collection for training is constrained by the necessity of maintaining patient privacy. This paper proposes a new method to train a foundation model in a decentralized federated learning setting for endovascular intervention. To ensure the feasibility of the training, we tackle the unseen data issue using differentiable Earth Mover's Distance within a knowledge distillation framework. Once trained, our foundation model's weights provide valuable initialization for downstream tasks, thereby enhancing task-specific performance. Intensive experiments show that our approach achieves new state-of-the-art results, contributing to advancements in endovascular intervention and robotic-assisted endovascular surgery, while addressing the critical issue of data sharing in the medical domain.

Robotic Setup.

To collect large-scale X-ray images, we employ a robotic platform and a full-size silicon phantom. A surgeon uses a master device joystick to control a follower robot for cannulating three arteries: the left subclavian (LSA), left common carotid (LCCA), and right common carotid (RCCA). The provided video shows an overview of our robotics setup and the data collecting progress. During each catheterization procedure, the surgeon activates the X-ray fluoroscopy using a pedal in the operating room. The experiments are conducted using the Epsilon X-ray Generator. We develop a real-time image grabber to transmit the video feed of the surgical scene to a workstation, a computer-based device equipped with an 8-Core ARM v8.2 64-bit CPU. Overall, we collect and label 4,700 new X-ray images to create our EIPhantom dataset.

Simulation Data.

Apart from X-ray images collected from our real robot, we also collect an EISimulation dataset from the CathSim simulator for simulated X-ray images. We manually label both data from the robot and CathSim simulator to use them in downstream tasks. We note that the datasets used to train the foundation model are not being used in downstream endovascular understanding tasks.
cars peace
Table below summarises datasets related to endovascular intervention we use in this paper. All datasets cover different endovascular procedures with X-ray images as the main modality. The data are collected from diverse sources, including human/animal studies, human phantoms, and simulated environments.
X-ray datasets used in our experiments.
Phase Dataset #Frames
Federated Foundation Training CathAction [Huang et al., 2024] 500,000
VESSEL12 [Rudyanto et al., 2014] 12,892
Drive [Staal et al., 2004] 8,028
SenNet [Walsh et al., 2021] 7,436
Medical Decathlon [Antonelli et al., 2022] 442
Downstream Fine-tuning EISimulation (ours) 1,683
EIPhantom (ours) 4,710
RANZCR [Hansen et al., 2021] 33,664
CathAnimal [Kongtongvattana et al., 2023] 25,000

Unseen Data Issue

We aim to train a federated foundation model for endovascular intervention with all possible types of X-ray data. In practice, each silo (hospital) retains certain data sources that may not be available at other hospitals. The issue arises from the dissimilarity in data corpora across hospitals, i.e., some data are available in one hospital but not in others. Figure below shows an illustration of this problem. Consequently, this leads to the unseen data issue that needs to be addressed to ensure the feasibility of the federated training process.

Algorithm: Federated Knowledge Distillation with Earth Mover’s Distance

We propose Algorithm below for training a foundation model within a decentralized federated learning process, effectively addressing the issue of the unseen data problem.
Input: Initial weight θᵢ(0) for each silo i; Maximum training round K.

for k = 0 to K - 1 do
    // The loop below runs in parallel
    for each silo i do
        𝒩(i) ← List of i-th neighbour nodes.
        
        ξᵢ(k) ← Sampling data from local silo i

        for each silo j ∈ 𝒩(i) do
            ξⱼ(k) ← Sampling data from the j-th neighbor of silo i
            
            θᵢ → ⱼ ← Train overseas expert model at j-th silo using Equation (intersilo)
            
            // Collect overseas expert weights from j-th neighbor back to i-th silo
            ̂θᵢ → ⱼ ← θᵢ → ⱼ  
            
            EMD(θᵢ, ̂θᵢ → ⱼ) ← Compute Earth Mover's Distance using Equation (EMD)
        end for
        
        θᵢ(k+1) ← Compute 𝓛ⁱ_MD with Equation (distil_loss) and train i-th local model using Equation (cross_learn)
    end for
end for

BibTeX

Soon