EchoTracker: Advancing Myocardial Point Tracking in Echocardiography

1Norwegian University of Science and Technology, Norway
2Clinic of Cardiology, St. Olavs Hospital, Norway
3SINTEF Digital, Norway
Point tracking in echocardiography.

An illustration of tracking queried points (highlighted in red) from the first frame throughout one heart cycle.

Abstract

Tissue tracking in echocardiography is challenging due to the complex cardiac motion and the inherent nature of ultrasound acquisitions. Although optical flow methods are considered state-of-the-art (SOTA), they struggle with long-range tracking, noise occlusions, and drift throughout the cardiac cycle. Recently, novel learning-based point tracking techniques have been introduced to tackle some of these issues. In this paper, we build upon these techniques and introduce EchoTracker, a two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound image sequences. The architecture contains a preliminary coarse initialization of the trajectories, followed by reinforcement iterations based on fine-grained appearance changes. It is efficient, light, and can run on mid-range GPUs. Experiments demonstrate that the model outperforms SOTA methods, with an average position accuracy of 67% and a median trajectory error of 2.86 pixels. Furthermore, we show a relative improvement of 25% when using our model to calculate the global longitudinal strain (GLS) in a clinical test-retest dataset compared to other methods. This implies that learning-based point tracking can potentially improve performance and yield a higher diagnostic and prognostic value for clinical measurements than current techniques.

Architecture

EchoTracker architecture.

EchoTracker includes two stages as shown in the figure, “Initialization“ and “Iterative reinforcement“. The approach follows a two-fold coarse-to-fine strategy inspired by TAPIR. In the initial stage, trajectories are initialized based on the coarse resolution of the feature maps using a coarse network. Subsequently, in the second stage, the trajectories are iteratively refined using fine-grained feature maps by a fine network, thus constituting a two-fold coarse-to-fine approach. This technique not only speeds up computation but also prevents the loss of important information due to downsampling. Although the networks in both stages estimate trajectories independently, they exploit point locations from the first frame to maintain spatial correlation and estimate coherent trajectories. Additionally, frame flow, representing the difference between consecutive frames, is naively passed to the model to make it aware of global appearance changes. The model can run on ultrasound sequences of any length and with any number of query points, depending on available memory.

EchoTracker Performance

Technical Results

Method

< δ1

< δ2

< δ4

< δ8

< δ16

< δavg

MTE ↓

AIT(s) ↓

TAPIR

14

34

67

92

99

61

3.64

0.62

PIPs++

15

36

70

94

100

63

3.28

0.42

CoTracker

19

42

74

95

100

66

3.02

1.34

EchoTracker (ours)

19

43

76

96

100

67

2.86

0.24

Performance on a test-retest dataset compared to state-of-the-art methods.

EchoTracker accurately estimates trajectories given the query points on the myocardial ventricle wall.

Clinical Results

Method

Reference

Test-retest

μ

σ ↓

MAD ↓

μ

σ ↓

MAD ↓

c-PWC-Net-60A

1.85

2.73

N/A

us2ai

0.68

2.52

2.0

EchoPWCNet

-1.4

1.9

1.8

0.0

1.9

1.6

PIPs++

-1.21

1.95

1.76

0.11

1.62

1.28

CoTracker

-0.82

2.40

1.98

-0.11

2.47

1.96

EchoTracker (ours)

-0.13

1.78

1.36

-0.13

1.55

1.21

Clinical results for GLS calculations compared to reference measurements and in a test-retest scenario.

Updated Model & Follow-up Work

The 🤗 Hugging Face Demo uses an updated EchoTracker checkpoint from our follow-up work:

Taming Modern Point Tracking for Speckle Tracking Echocardiography via Impartial Motion  —  ICCV 2025 Workshop  ·  arXiv

This version achieves best performance when query points are selected from the frame at approximately 72% of the video's time dimension, corresponding to diastasis — the quiescent slow-filling phase between the E-wave and A-wave in a full ED-to-ED cardiac cycle.

If you use this updated model, please cite the follow-up paper (see Citation).

Related Links

CoTracker is a fantastic work based on transformer architecture for general purpose point-tracking from Meta AI and Visual Geometry Group, University of Oxford.

PIPs++ is the updated version of PIPs respectively from Stanford University and CMU.

c-PWC-Net-60A is from the paper: Motion Estimation by Deep Learning in 2D Echocardiography: Synthetic Dataset and Validation.

us2ai is from the paper External validation of a deep learning algorithm for automated echocardiographic strain measurements.

EchoPWCNet refers to the papers Deep Learning for Improved Precision and Reproducibility of Left Ventricular Strain in Echocardiography: A Test-Retest Study and Artificial Intelligence for Automatic Measurement of Left Ventricular Strain in Echocardiography.

Citation / BibTeX

If you use this code or the EchoTracker model (MICCAI 2024), please cite:

@InProceedings{azad2024echo,
  author    = {Azad, Md Abulkalam and Chernyshov, Artem and Nyberg, John
               and Tveten, Ingrid and Lovstakken, Lasse and Dalen, H{\aa}vard
               and Grenne, Bj{\o}rnar and {\O}stvik, Andreas},
  title     = {EchoTracker: Advancing Myocardial Point Tracking in Echocardiography},
  booktitle = {Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
  year      = {2024},
  volume    = {IV},
  publisher = {Springer Nature Switzerland},
  pages     = {645--655},
}

If you use the updated model weights available in the 🤗 Hugging Face Demo, please additionally cite:

@InProceedings{Azad_2025_ICCV,
  author    = {Azad, Md Abulkalam and Nyberg, John and Dalen, H{\aa}vard
               and Grenne, Bj{\o}rnar and Lovstakken, Lasse and {\O}stvik, Andreas},
  title     = {Taming Modern Point Tracking for Speckle Tracking Echocardiography via Impartial Motion},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
  month     = {October},
  year      = {2025},
  pages     = {1115--1124},
}