Multiple object tracking (MOT) interest has grown in recent years, both in civil and military contexts, enhancing situational awareness for better decision-making. Typically, state-of-the-art methods integrate motion and appearance features to preserve the trajectory of each object over time, using new detection information when available. Visual features are fundamental when it comes to solving temporary occlusion or complex trajectories, i.e. non-linear motion associated with high object speeds or low framerate. Currently, these features are extracted by powerful deep learning-based models trained on the re-identification (ReID) task. However, research focuses mostly on scenarios involving pedestrians or vehicles, limiting the adaptability and transferability of such methods to other use cases. In this paper we investigate the added value of a variety of appearance features for comparing vessel appearance. We also include recent advances in foundation models that show their out-of-the- box applicability to unseen circumstances. Finally, we discuss how the robust visual features could improve multiple object tracking performances in the specialized domain of maritime surveillance.
|