A multi-task segmentation and classification network for remote ship hull inspection


The diagram of the inspection of a ship hull using robots, including the Automated Underwater Vehicle (AUV), Remotely Operate Vehicle (ROV), Unmanned Aerial Vehicle (UAV) and climbing robot.

Abstract
In contrast to manual close-up ship hull inspection methods, Remote Inspection Technology (RIT) offers the potential to improve performance while minimising costs. Nevertheless, the effectiveness of the RIT is still subject to the experience and expertise of the inspectors. As a result, development has focused on applying automated data processing methods to RIT itself. However, in the hull inspection scenario, several challenges remain, including suboptimal hull imaging conditions, disparities in the distribution of defect categories, and significant variations in defect size. To overcome these challenges, we introduce a multi-task hull inspection network (MTHI-Net) that leverages the principles of multi-task learning to improve hull inspection accuracy. This network addresses two problems associated with RIT: image-level defect classification and pixel-level defect segmentation. Specifically, MTHI-Net exploits the advantages of spatial and channel self-attention mechanisms, a residual refinement module, and a feature fusion module to enhance the feature representation capability for defect classification and mitigate the probability of mis-segmentation. In addition, a lightweight MTHI-Net called MTHI-Net-Lite was developed. The experimental results show that the proposed networks fulfil the hull inspection task better than the baselines in both real underwater and in-dock RIT scenarios.
The multi-task segmentation and classification hull inspection network
This section introduces the proposed MTHI-Net, a multi-task segmentation and classification network designed for ship hull inspection. The architecture of MTHI-Net starts with a stack encoder–decoder network. Two self-attention modules are used to enhance feature capture: the DAM and the RAM. Additionally, a RRM is employed to refine the segmentation outputs. Motivated by the multi-task learning parametersharing approach, the FFM is used to fuse feature maps for defect classification. A segmentation mask is used to narrow the FOV to the defect regions. Finally, the proposed MTHI-Net undergoes end-to-end training to achieve simultaneous defect classification and segmentation by combining the single-task loss of the segmentation network and classifier. In addition, a lightweight MTHI-Net called MTHI-Net-Lite is introduced.

The flowchart of the remote ship hull inspection using the proposed multi-task classification and segmentation network MTHI-Net. For the hull image collected by the remote inspection robot, MTHI-Net on the cloud server obtained classification data for general defect type and segmentation mask for defect location and quantification. The inspector finally draws conclusions on hull condition.

MTHI-Net comprises three parts: (i) a stack encoder–decoder network with self-attention modules for multi-resolution feature extraction and defect segmentation; (ii) a RRM for segmentation; and (iii) a feature fusion network following a classifier for defect classification. The proposed network uses RGB images of ship hulls as input and performs defect segmentation and classification tasks simultaneously.

The network architecture of the proposed multi-task segmentation and classification network MTHI-Net. The number represents the shape of the feature map or the number of neurons.

Experimental results
Benefiting from the multi-task learning mechanism, the proposed MTHI-Net and MTHI-Net-Lite models can use defect categories as auxiliary information when performing defect segmentation tasks. This feature exchange enhances the capacity of the network to extract meaningful and separable features, leading to improved performance in hull defect segmentation tasks compared to single-task methods. The spatial and inter-channel attention mechanism further guides the network towards defective areas with similar characteristics. Therefore, the proposed MTHI-Net excelled at extracting defect features and provided better segmentation outcomes.

The defect segmentation results obtained on MaVeCoDD-HiRes data set. Each column shows the results derived using (a) U-Net, (b) X-Net, (c) Trans U-Net, (d) Swin U-Net, (e) MTHI-Net (f) MTHI-Net-Lite, and (g) the Ground Truth in turn. The type of defect is shown on the left side of each row.

The defect segmentation results obtained on MaVeCoDD-LoRes data set. Each column shows the results derived using (a) U-Net, (b) X-Net, (c) Trans U-Net, (d) Swin U-Net, (e) MTHI-Net (f) MTHI-Net-Lite, and (g) the Ground Truth in turn. The type of defect is shown on the left side of each row.

Unlike existing methods that directly stack convolutional layers, the proposed model uses spatial and inter-channel attention modules to extract features from different stack encoders of the segmentation branch and subsequently filters these features using a segmented mask. By leveraging this mechanism, the proposed multi-task networks prioritised more relevant and informative features, improving the overall performance of defect classification, even in feature-limited underwater environments.

The defect segmentation results obtained on LIACi data set. Each column shows the results derived using (a) U-Net, (b) X-Net, (c) Trans U-Net, (d) Swin U-Net, (e) MTHI-Net (f) MTHI-Net-Lite, and (g) the Ground Truth in turn. The type of defect is shown on the left side of each row.

Conclusion
Automated image processing methods have the potential to reduce the reliance on manual inspection and improve detection efficiency and accuracy. In this study, we proposed a novel multitasking ship inspection network, MTHI-Net, designed for synchronous ship-hull defect segmentation and classification. The MTHI-Net architecture is realised through a stacked encoder–decoder network with self-attention mechanisms to enhance feature extraction capabilities. To address missegmentation issues, an RRM was introduced. For defect classification, the FFM fuses features from different layers of the stacked network. The network is trained end-to-end using a multi-task loss function. A lightweight version of the network, MTHI-Net-Lite, provides a solution for deployment on edge devices. Extensive experiments were conducted to analyse the benefits of the proposed MTHI-Net in the reconstructed MaVeCoDD and LIACi ship-hull inspection datasets. The results demonstrate that the proposed MTHI-Net outperformed the other approaches in both tasks. The lightweight version of the network enables competitive inspections.