MViST – Multi-label Video Classification for Underwater Ship Inspection

MViST (Multi-label Vision Spatiotemporal Transformer) is the implementation of my master’s thesis at SINTEF Digital / NTNU, focused on multi-label video classification for underwater ship hull inspection.

Overview

Automated inspection of ship hulls using underwater ROVs (Remotely Operated Vehicles) generates large volumes of video data. MViST addresses the challenge of automatically classifying multiple concurrent conditions — such as corrosion, biofouling, or damage — appearing simultaneously in each video clip. The model leverages multi-attention transformer and Vision Transformer (ViT) architectures adapted for spatiotemporal video understanding in challenging underwater conditions.

Key Features

Multi-label classification (multiple simultaneous labels per video)
Vision Transformer and spatiotemporal attention mechanisms
Designed for challenging underwater visual conditions (turbidity, low contrast, motion blur)
Extended results compared to the OCEANS 2023 conference paper

Technologies

Python, PyTorch
Vision Transformers (ViT)
Video understanding / spatiotemporal modeling
Multi-label learning

Publications

Conference paper: OCEANS 2023 – Limerick
Master’s Thesis: NTNU Open – 2023

Share on

Twitter Facebook LinkedIn

Md Abulkalam Azad

Overview

Key Features

Technologies

Publications

Share on