EchoTracker: Advancing Myocardial Point Tracking in Echocardiography

Md Abulkalam Azad¹, Artem Chernyshov¹, John Nyberg¹, Ingrid Tveten^{1, 3}, Lasse Lovstakken¹, Håvard Dalen^{1, 2}, Bjørnar Grenne^{1, 2}, Andreas Østvik^{1, 3}

¹Norwegian University of Science and Technology, Norway

²Clinic of Cardiology, St. Olavs Hospital, Norway

³SINTEF Digital, Norway

Paper (into the top 11% of MICCAI2024) arXiv Code 🤗 HF Demo

An illustration of tracking queried points (highlighted in red) from the first frame throughout one heart cycle.

Abstract

Tissue tracking in echocardiography is challenging due to the complex cardiac motion and the inherent nature of ultrasound acquisitions. Although optical flow methods are considered state-of-the-art (SOTA), they struggle with long-range tracking, noise occlusions, and drift throughout the cardiac cycle. Recently, novel learning-based point tracking techniques have been introduced to tackle some of these issues. In this paper, we build upon these techniques and introduce EchoTracker, a two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound image sequences. The architecture contains a preliminary coarse initialization of the trajectories, followed by reinforcement iterations based on fine-grained appearance changes. It is efficient, light, and can run on mid-range GPUs. Experiments demonstrate that the model outperforms SOTA methods, with an average position accuracy of 67% and a median trajectory error of 2.86 pixels. Furthermore, we show a relative improvement of 25% when using our model to calculate the global longitudinal strain (GLS) in a clinical test-retest dataset compared to other methods. This implies that learning-based point tracking can potentially improve performance and yield a higher diagnostic and prognostic value for clinical measurements than current techniques.

Architecture

EchoTracker includes two stages as shown in the figure, “Initialization“ and “Iterative reinforcement“. The approach follows a two-fold coarse-to-fine strategy inspired by TAPIR. In the initial stage, trajectories are initialized based on the coarse resolution of the feature maps using a coarse network. Subsequently, in the second stage, the trajectories are iteratively refined using fine-grained feature maps by a fine network, thus constituting a two-fold coarse-to-fine approach. This technique not only speeds up computation but also prevents the loss of important information due to downsampling. Although the networks in both stages estimate trajectories independently, they exploit point locations from the first frame to maintain spatial correlation and estimate coherent trajectories. Additionally, frame flow, representing the difference between consecutive frames, is naively passed to the model to make it aware of global appearance changes. The model can run on ultrasound sequences of any length and with any number of query points, depending on available memory.

EchoTracker Performance

Technical Results

Method	< δ¹ ↑	< δ² ↑	< δ⁴ ↑	< δ⁸ ↑	< δ¹⁶ ↑	< δ^avg ↑	MTE ↓	AIT(s) ↓
TAPIR	14	34	67	92	99	61	3.64	0.62
PIPs++	15	36	70	94	100	63	3.28	0.42
CoTracker	19	42	74	95	100	66	3.02	1.34
EchoTracker (ours)	19	43	76	96	100	67	2.86	0.24

Performance on a test-retest dataset compared to state-of-the-art methods.

EchoTracker accurately estimates trajectories given the query points on the myocardial ventricle wall.

Clinical Results

Method	Reference			Test-retest
Method	μ	σ ↓	MAD ↓	μ	σ ↓	MAD ↓
c-PWC-Net-60A	1.85	2.73	N/A
us2ai	0.68	2.52	2.0
EchoPWCNet	-1.4	1.9	1.8	0.0	1.9	1.6

PIPs++	-1.21	1.95	1.76	0.11	1.62	1.28
CoTracker	-0.82	2.40	1.98	-0.11	2.47	1.96
EchoTracker (ours)	-0.13	1.78	1.36	-0.13	1.55	1.21

Clinical results for GLS calculations compared to reference measurements and in a test-retest scenario.

Updated Model & Follow-up Work

The 🤗 Hugging Face Demo uses an updated EchoTracker checkpoint from our follow-up work:

Taming Modern Point Tracking for Speckle Tracking Echocardiography via Impartial Motion — ICCV 2025 Workshop · arXiv

This version achieves best performance when query points are selected from the frame at approximately 72% of the video's time dimension, corresponding to diastasis — the quiescent slow-filling phase between the E-wave and A-wave in a full ED-to-ED cardiac cycle.

If you use this updated model, please cite the follow-up paper (see Citation).

Citation / BibTeX

If you use this code or the EchoTracker model (MICCAI 2024), please cite:

@InProceedings{azad2024echo,
  author    = {Azad, Md Abulkalam and Chernyshov, Artem and Nyberg, John
               and Tveten, Ingrid and Lovstakken, Lasse and Dalen, H{\aa}vard
               and Grenne, Bj{\o}rnar and {\O}stvik, Andreas},
  title     = {EchoTracker: Advancing Myocardial Point Tracking in Echocardiography},
  booktitle = {Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
  year      = {2024},
  volume    = {IV},
  publisher = {Springer Nature Switzerland},
  pages     = {645--655},
}

If you use the updated model weights available in the 🤗 Hugging Face Demo, please additionally cite:

@InProceedings{Azad_2025_ICCV,
  author    = {Azad, Md Abulkalam and Nyberg, John and Dalen, H{\aa}vard
               and Grenne, Bj{\o}rnar and Lovstakken, Lasse and {\O}stvik, Andreas},
  title     = {Taming Modern Point Tracking for Speckle Tracking Echocardiography via Impartial Motion},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
  month     = {October},
  year      = {2025},
  pages     = {1115--1124},
}