INSTRUMENTATION, Vol. 11, No. 4, December 2024

Copyright: © 2024 by the authors. This article is licensed under a Creative Commons Attribution 4.0 International License (CC BY) license(https://creativecommons.org/licenses/by/4.0/).
Introduction
The oil-surface thermometer primarily consists of a thermal bulb, capillary tube, elastic element, and internal structure. When the ambient temperature changes, the temperature-sensitive medium inside the thermal bulb accordingly. The capillary tube transmits this deformation to the elastic element, driving the internal structure, ultimately resulting in the movement of the pointer. Due to its mechanical characteristics, oil-surface thermometers are widely used in substations and similar environments, typically employing manual reading methods, which result in low efficiency and accuracy in pointer reading. The working environment of oil-surface thermometers involves high temperatures and high pressures, making manual reading unsuitable. Therefore, achieving automatic identification of oil-surface thermometers is crucial.
The process of recognizing oil-surface thermometers consists of two primary steps: detecting the dial and recognizing the reading. Common methods for dial detection include the Hough transform and feature matching. However, the Hough transform method[1] exhibits low accuracy when dealing with dial images that have complex backgrounds, while feature matching[2] requires pre-determined features. In recent years, numerous scholars have started applying deep learning networks to the recognition of pointer-type instruments. Zhang and Salomon et al. proposed dial recognition methods based on the YOLOv7 and YOLOv3 convolutional neural networks[3,4], respectively. However, it is difficult to accurately recognize pointers and scales using rectangular bounding boxes.
The commonly used methods for reading oil-surface thermometers are the proportional method and the angle method. Both techniques necessitate accurately determining the positions of the pointers and scales. Edge detection, Hough transform, and template matching are often employed to locate pointers and scales. However, the preprocessing required by these methods is complex, and their robustness is poor. Moreover, they are not suitable for oil-surface thermometers with uneven scales.
Semantic segmentation and feature fusion techniques have been widely applied in industries such as medicine and autonomous driving. Many researchers have begun applying these techniques to the recognition of pointers and scales. In [5] and [6], scholars used feature fusion methods for recognition. General feature fusion approaches may not effectively capture global information and may exhibit deficiencies in detail preservation and fusion efficiency. Hou et al. used a U-Net-based semantic segmentation approach to extract pointers and major scales[7], but there are still limitations in terms of accuracy and efficiency. Compared to the former, attention U2-Net introduces a multi-scale feature fusion strategy and attention mechanism based on U-Net, improving segmentation accuracy and efficiency.
Method and Application
Fig.1 illustrates the methodology and procedural steps of the proposed approach for recognizing pointer meter readings. Leveraging YOLOv5 and attention to U2-Net for gauge reading recognition, the implementation details are outlined as shown in the figure below.

Fig.1. The proposed method's principle for recognizing pointer meter readings
In this method, the YOLOv5 network is used for meter localization and classification. When dealing with multiple types of meters in a scene, the YOLOv5 network locates and classifies different oil-surface thermometers. Subsequently, the corresponding Attention U2-Net model weights for each meter are applied to perform information segmentation. Finally, the enhanced proportional method is used to process the information and calculate the recognition results.
YOLOv5 dial detection
Fig.2 provides an overview of the network architecture of a typical YOLOv5, which typically comprises four main components: input, backbone (CSP Darknet53), neck (FPN and PAN), and head.

Fig.2. The model architecture of YOLOv5.
The input image first undergoes initial processing through the input layer before being passed to the backbone for feature extraction. The backbone produces feature maps of different dimensions, which are then integrated via the feature fusion network (neck) to produce three distinct feature maps: P3, P4, and P5 (with sizes 80×80, 40×40, and 20×20, respectively). These maps are designed to identify objects of different sizes within the image. Each feature map is then passed to the prediction head (head), where predefined anchor boxes are used for bounding box regression, and individual pixels are evaluated for confidence. This process yields A multi-dimensional array (BBoxes) containing object class, confidence score, Bounding Box Coordinates, Bounding Box Size ,and additional information. Irrelevant data in the array is filtered out using specified thresholds (conf and obj), and a non-maximal suppression (NMS) method is employed to produce the final detection results. The YOLOv5 model is trained over 300 epochs with a batch size of 4 and an initial learning rate set to 0.01.
Attention-U2-Net dial element segmentation
As shown in Fig.3, the Attention U2-Net is an encoder-decoder-based network. Unlike the traditional U2-Net[8], the Attention U2-Net is primarily composed of a feature extraction structure, attention modules, and edge detection branches arranged in a U-shaped architecture built with stacked RSUs. The RSUs blend feature maps of different scales and receptive fields through a U-shaped structure, enabling the network to capture more global information at various scales. Each RSU corresponds to a specific depth. During the down-sampling process, for each feature map size, a onvolution with a dilation rate of 1 and a kernel size of 3 × 3 is used to expand the receptive field, thereby enhancing the network's ability to extract contextual and neighborhood feature information.

Fig.3. The model architecture of attention U2-Net
At RSU4F, the cost of attention selection becomes too high, so attention selection is no longer performed starting from RSU4F.
From the successive down-sampling process, multi-scale features are extracted from the feature maps outputted by residual units at different depths. These features are then decoded into high-resolution feature maps through a series of operations including progressive up-sampling, concatenation, attention mechanisms, and weighting. This approach helps mitigate the loss of details that typically occurs when directly up-sampling smaller-scale feature maps.
The feature maps F1, F2, F3, F4, and F5 are gathered from the outputs of De_1, De_2, De_3, Dn_4, and Dn_5, respectively, and then processed through a 3 × 3 convolutional layer. These feature maps are then scaled to the size of the input image using bilinear interpolation, resulting in F_1, F_2, F_3, F_4, and F_5. Subsequently, these five feature maps are concatenated.
In the final stage, the processed feature maps are passed through a 1 × 1 convolutional layer and activated using the Softmax function to generate the final predicted segmentation results.
TILT correction
Due to the on-site shooting conditions, the meter image may appear tilted, which can reduce the accuracy of the recognized readings. Therefore, preprocessing and correction of the meter are crucial. In this method, correction is primarily achieved through the Hough transform and mapping transformation. Fig.4 outlines the principle of this method. First, the elliptical contour is identified through edge detection and Hough transform, and the coordinates of the four endpoints (A, B, C, D) corresponding to the major and minor axes of the ellipse are calculated. Then, by defining four target points (A', B', C', D') of a perfect circle, a mapping transformation matrix is obtained based on the coordinates of the ellipse and the target points. This mapping transformation matrix is then applied to correct the elliptical contour in the image, aligning it to the ideal circular contour.

Fig.4. Mapping transformation method
Reading recognition
Once the image is corrected, the pointer and scales within the dial can be semantically segmented to determine their coordinates. As shown in Figure 5, dial elements can be effectively identified using Attention U2-Net.

Fig.5. Splitting of the dial elements
The traditional method of reading involves calculating the reading based on the proportion of the pointer's position over the entire measuring range. However, most oil level thermometers have an uneven scale distribution, which can lead to significant errors when using traditional methods, making them unsuitable for accurate readings. In this identification method, an enhanced proportional method is employed to determine the readings. This method identifies the two adjacent scales of the pointer and calculates the reading based on the proportional position of the pointer between these adjacent scales. The reading formula is as follows:
Ai=i+L-LiLi+1-Lin-1×Range
There into Ai is the final reading, i is the index of the previous tick in which the pointer is located, n is the number of scales recognized, L is the position of the pointer, Range is the range of the meter.
Figure 7 is an example diagram of the instrument information after linearization, where L represents the pointer information, and Li and Li+1 indicate the information of the preceding and succeeding scales of the pointer. By combining this with the aforementioned calculation formula, the reading indicated by the pointer can be obtained.

Fig.6. The principle of confirming the pointer scale value

Fig.7. The diagram of the enhanced algorithm.
For any fixed value of the pointer scale information, each pointer scale can be approximated as a trapezoid after linearization, as shown in Figure 6.
Figure 6 shows the method for confirming the scale and pointer positions. In the figure, a and b represent the midpoints of the base, while a' and b' represent the midpoints of the median lines. Due to the offset of the center, different situations as shown in the figure may occur. To avoid errors introduced by conventional methods, the vertex position needs to be confirmed using the median line method.
Experimental results and analysis
To verify the feasibility and adaptability of this method, several experiments were conducted. The algorithms were run in an environment of Windows 10, Pytorch 1.9.0, and OpenCV2. The hardware used includes an AMD FX(tm)-8300 Eight-Core Processor CPU, an NVIDIA GEFORCE RTX1050Ti GPU, and 8GB of RAM. The dataset consists of 350 oil-surface thermometer images of different resolutions. These images were taken under various conditions, including in an ideal laboratory environment, with strong light, weak light, through a glass cover, and with tilted shooting angles.
DIAL detecting
In this study, a dataset of 350 images containing various types of oil-surface thermometers was utilized to train the dial detection module in this paper, with an additional 30 images reserved for evaluation purposes. The effectiveness of the method was assessed through the evaluation of mAP (mean average precision), P (precision), and R (recall) derived from the training outcomes. As illustrated in Table 1, YOLOv5 demonstrates superior performance in the detection and classification of straightforward meter types. To test the method's robustness, it was further evaluated under diverse environmental conditions, including images taken under exposure, ideal laboratory conditions, and images obstructed by glass covers. Partial results of the dial detection process are presented in Fig. 8.
Method | epoch | mAP(%) | P(%) | R(%) |
---|---|---|---|---|
Yolov4s | 300 | 93.5 | 97.80 | 95.6 |
Mask R-CNN | 300 | 92.3 | 96.9 | 94.7 |
Yolov5-6.2s | 300 | 95.1 | 98.4 | 96.9 |
Yolov8s | 300 | 94.2 | 97.6 | 95.8 |

Fig.8. Dial detection result.
In this recognition method, YOLOv5 was chosen because it is already highly mature in the field of object detection and classification. As shown in Table 1, YOLOv5 has demonstrated excellent precision and recall for the localization and classification of oil-surface thermometers, making it well-suited to meet the needs of this solution.
Semantic segmentation and oblique correction
Containing different types of oil-surface thermometers were obtained using a YOLOv5-based dial detection method, with 400 images allocated for training and 50 for evaluation. This study validates the efficacy of the attention U2-Net network by comparing it with the traditional U2-Net network. From Table 2, it can be observed that, compared to the traditional U2-Net network, the attention U2-Net network reduced the Mean Absolute Error (MAE) by 0.006, while increasing the MaxFβ by 0.014.
Method | epoch | MAE | MaxFβ |
---|---|---|---|
U2-Net(base) | 400 | 0.044 | 0.873 |
attention U2-Net | 400 | 0.038 | 0.887 |
To evaluate the effectiveness of the proposed dial correction method, this experiment corrected tilted meters, as shown in Fig.9. Images (a) and (c) depict two different types of tilted oil-surface thermometers, while (b) and (d) show the corresponding corrected images. In (a) and (c), the recognized results were 24.21℃ and 25.32℃, respectively, while in (b) and (d), the recognized results were 24.26 ℃ and 25.13℃, respectively. The comparison between the image transformations and recognition results clearly demonstrates that the proposed mapping transformation correction method exhibits a significant corrective effect.

Fig.9. Tilt correction results.
Reading recognition
To test the feasibility and accuracy of the proposed reading method, readings from various meters were evaluated in different environments. Fig. 10 illustrates the reading recognition process under conditions such as exposure, glass cover, laboratory environment, and various types of oil-surface thermometers to validate the method's reliability. Detailed recognition results are provided in Table 3.

Fig.10. The process of recognizing meter readings under complex conditions
Group | No. | Reference values (℃) | Reading(℃) |
Fiducial error (%) |
---|---|---|---|---|
Meter 1 (laboratory) |
1 | 22.5 | 22.68 | 0.11 |
2 | 40.45 | 40.50 | 0.03 | |
3 | 72.06 | 71.86 | -0.13 | |
4 | 76.2 | 76.18 | -0.01 | |
Meter 1 (exposure) |
1 | 59.13 | 58.94 | 0.12 |
2 | 77.58 | 77.67 | 0.06 | |
3 | 49.13 | 49.01 | 0.08 | |
4 | 97.91 | 97.95 | 0.02 | |
Meter 1 (Glass cover) |
1 | 17.93 | 18.08 | 0.09 |
2 | 45.49 | 45.31 | -0.11 | |
3 | 61.15 | 61.42 | 0.17 | |
4 | 39.2 | 39.21 | 0.01 | |
Meter 2 (laboratory) |
1 | 46.45 | 46.28 | -0.11 |
2 | 21.65 | 21.61 | -0.03 | |
3 | 50.32 | 50.35 | 0.02 | |
4 | 34.09 | 34.14 | 0.03 |
The reference values were obtained from oil-surface thermometer images using a vernier caliper. The accuracy of manual readings can vary based on the number of graduations on the instrument. In this study, readings with two decimal places were used as the reference values. Increasing the number of graduations between the smallest scales improves the accuracy of manual readings. An example of a manual reading is shown in Fig.11.

Fig.11. Example of manual instrument readings
The maximum absolute errors observed under exposure, glass cover, and laboratory environment conditions were 0.25°C, 0.3°C, and 0.2°C, respectively, while the maximum fiducial errors were 0.16%, 0.19%, and 0.13%, respectively. Compared to the meter's resolution of 2°C, these errors are considered acceptable.
In this figure, the black gauge is Meter 1, and the white gauge is Meter 2.
To verify the accuracy of this recognition method, multiple (more than 6) readings of the same pointer position on the instrument were conducted, and the standard deviation of these readings was calculated as an accuracy indicator of the recognition method. The results are shown in Table 4.
group | NO. | Reading(℃) | Std(℃) |
---|---|---|---|
Meter 1 (30℃) |
1 | 29.62 |
0.02266 |
2 | 29.62 | ||
3 | 29.66 | ||
4 | 29.66 | ||
5 | 29.62 | ||
6 | 29.62 | ||
Meter 1 (60℃) |
1 | 59.79 |
0.01926 |
2 | 59.80 | ||
3 | 59.77 | ||
4 | 59.79 | ||
5 | 59.77 | ||
6 | 59.74 | ||
Meter 1 (90℃) |
1 | 89.14 |
0.0123 |
2 | 89.16 | ||
3 | 89.17 | ||
4 | 89.14 | ||
5 | 89.14 | ||
6 | 89.17 |
Discussion
The oil-surface thermometer recognition method proposed in this paper is developed based on neural network technology. It captures images of the oil-surface thermometer using a camera and analyzes and recognizes them with enhanced algorithms. This solution employs various deep learning and image processing techniques to accurately locate the dial and pointer position of the oil-surface thermometer, thereby reading the temperature value. Additionally, deep learning algorithms can improve the accuracy and robustness of recognition based on a large amount of training data.
In this study, since the instrument dataset was sourced only from laboratory and artificially controlled environments, significant errors were encountered when testing the robustness of the recognition method. To address this issue, we added images of instruments under extreme conditions to the dataset, which led to a significant improvement in recognition performance. The error issues encountered in the above experiments were primarily attributed to the dataset's imperfections. Therefore, in future training for different instruments, it is crucial to focus more on collecting datasets from various environments to further enhance the robustness of the recognition method.
This method can accurately recognize the readings of the oil-surface thermometer in complex environments, avoiding errors that may arise from manual readings. Through the automatic recognition system, real-time monitoring of the oil-surface thermometer can be achieved, enabling timely detection and handling of anomalies, thereby ensuring the normal operation of the equipment. It reduces the need for manual intervention, lowers the risk for personnel operating in high-temperature, high-pressure environments, and enhances the safety of the detection process. This solution can also be used for the automatic calibration of oil-surface thermometers, maintaining high accuracy and reliability through regular automatic detection and adjustment.
Future improvements in algorithm optimization, system integration, and functional expansion can further enhance the performance of this recognition method. A deep analysis of the oil-surface thermometer recognition scheme reveals the tremendous potential of this technology in industrial applications. It not only improves the accuracy and safety of equipment monitoring but also provides strong support for achieving more intelligent industrial management.
Conclusion
In this research, a recognition method for oil-surface thermometers based on a multi-task cascaded network is proposed to improve the accuracy and robustness of current methods for identifying oil-surface thermometer readings. Firstly, the YOLOv5 network is used to determine the dial and type of the target thermometer, followed by a mapping transformation to correct the thermometer. Then, the proposed attention U2net network is used to segment the pointer and scale of the thermometer. Finally, the precise reading of the thermometer is calculated using the proposed enhanced reading method based on the proportional relationship between the pointer and the adjacent scales.
Multiple experiments were conducted to verify the accuracy and robustness of this recognition method. The results show that the mean average precision (mAP) for dial detection and classification based on the YOLOv5 network is 95.1%, and the maximum fiducial error of the recognition results in different environments is no more than 0.19%.
In summary, the proposed method achieves satisfactory results in oil-surface thermometer reading recognition, demonstrating high accuracy and robustness, thus enabling reliable reading recognition in various environmental conditions.