MViST – Multi-label Video Classification for Underwater Ship Inspection

MViST (Multi-label Vision Spatiotemporal Transformer) is the implementation of my master’s thesis at SINTEF Digital / NTNU, focused on multi-label video classification for underwater ship hull inspection.

Overview

Automated inspection of ship hulls using underwater ROVs (Remotely Operated Vehicles) generates large volumes of video data. MViST addresses the challenge of automatically classifying multiple concurrent conditions — such as corrosion, biofouling, or damage — appearing simultaneously in each video clip. The model leverages multi-attention transformer and Vision Transformer (ViT) architectures adapted for spatiotemporal video understanding in challenging underwater conditions.

Key Features

  • Multi-label classification (multiple simultaneous labels per video)
  • Vision Transformer and spatiotemporal attention mechanisms
  • Designed for challenging underwater visual conditions (turbidity, low contrast, motion blur)
  • Extended results compared to the OCEANS 2023 conference paper

Technologies

  • Python, PyTorch
  • Vision Transformers (ViT)
  • Video understanding / spatiotemporal modeling
  • Multi-label learning

Publications