Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Li Feng; Wenke Xiao; Chuanbiao Wen; Qiaoling Deng; Jinhong Guo; Haibei Song

doi:10.3791/65140

Medicine

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published: April 14, 2023 doi: 10.3791/65140

Li Feng*¹, Wenke Xiao*¹, Chuanbiao Wen¹, Qiaoling Deng¹, Jinhong Guo¹, Haibei Song¹

¹School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine

* These authors contributed equally

Summary

The present study employed U-Net and other deep learning algorithms to segment a tongue image and compared the segmentation results to investigate the objectification of tongue diagnosis.

Abstract

Tongue diagnosis is an essential technique of traditional Chinese medicine (TCM) diagnosis, and the need for objectifying tongue images through image processing technology is growing. The present study provides an overview of the progress made in tongue objectification over the past decade and compares segmentation models. Various deep learning models are constructed to verify and compare algorithms using real tongue image sets. The strengths and weaknesses of each model are analyzed. The findings indicate that the U-Net algorithm outperforms other models regarding precision accuracy (PA), recall, and mean intersection over union (MIoU) metrics. However, despite the significant progress in tongue image acquisition and processing, a uniform standard for objectifying tongue diagnosis has yet to be established. To facilitate the widespread application of tongue images captured using mobile devices in tongue diagnosis objectification, further research could address the challenges posed by tongue images captured in complex environments.

Introduction

Tongue observation is a widely utilized technique in traditional Chinese ethnic medicine (TCM). The color and shape of the tongue can reflect the physical condition and various disease properties, severities, and prognoses. For instance, in traditional Hmong medicine, the tongue's color is used to identify body temperature e.g., a red or purple tongue indicates pathological factors related to heat. In Tibetan medicine, a condition is judged by observing the tongue of a patient, paying attention to the color, shape, and moisture of the mucus. For instance, the tongues of patients with Heyi disease become red and rough or black and dry¹; patients with Xieri disease² have yellow and dry tongues; meanwhile, patients with Badakan disease³ have a white, humid, and soft tongue⁴. These observations reveal the close relationship between tongue features and physiology and pathology. Overall, the state of the tongue plays a vital role in diagnosis, disease identification, and evaluation of the treatment effect.

Simultaneously, owing to diverse living conditions and dietary practices among different ethnic groups, variations in tongue images are evident. The Lab model, established on the basis of an international standard for the determination of color, was formulated by the Commission International Eclairage (CIE) in 1931. In 1976, a color pattern was modified and named. The Lab color model is composed of three elements: L corresponds to brightness, while a and b are two color channels. a includes colors from dark green (low brightness value) to gray (medium brightness value) to bright pink (high brightness value); b goes from bright blue (low brightness value) to gray (medium brightness value) to yellow (high brightness value). By comparing the L x a x b values of the tongue color of five ethnic groups, Yang et al.⁵ found that the characteristics of tongue images of the Hmong, Hui, Zhuang, Han, and Mongolian groups were significantly distinct from each other. For example, the Mongolians have dark tongues with a yellow tongue coating, while the Hmong have light tongues with a white tongue coating, suggesting that tongue features can be used as a diagnostic indicator for assessing the health status of a population. Moreover, tongue images can function as an evaluation index for evidence-based medicine in clinical research of ethnic medicine. He et al.⁶ employed tongue images as a foundation for TCM diagnosis and systematically evaluated the safety and efficacy of Chou-Ling-Dan pellets (CLD granules-used to treat inflammatory and febrile diseases, including seasonal influenza in TCM) combined with Chinese and Western medicine. The results established the scientific validity of tongue images as an evaluation index for clinical studies. Nevertheless, traditional medical practitioners generally rely on subjectivity to observe tongue characteristics and assess patients' physiological and pathological conditions, requiring more precise indicators.

The emergence of the internet and artificial intelligence technology has paved the way for digitizing and objectifying tongue diagnosis. This process involves using mathematical models to provide a qualitative and objective description of tongue images⁷, reflecting the content of the tongue image. The process includes several steps: image acquisition, optical compensation, color correction, and geometric transformation. The pre-processed images are then fed into an algorithmic model for image positioning and segmentation, feature extraction, pattern recognition, etc. The output of this process is a highly efficient and precise diagnosis of tongue image data, thereby achieving the goal of objectification, quantification, and informatization of tongue diagnosis⁸. Thus, the purpose of high efficiency and high precision processing of tongue diagnosis data is achieved. Based on tongue diagnosis knowledge and deep learning technology, this study automatically separated the tongue body and tongue coating from tongue images using a computer algorithm, in order to extract the quantitative features of tongues for doctors, improve the reliability and consistency of diagnosis, and provide methods for subsequent tongue diagnosis objectification research⁹.

Subscription Required. Please recommend JoVE to your librarian.

Protocol

This study has been approved by the National Natural Science Foundation of China project, Constructing Dynamic Change rules of TCM Facial image Based on Association Analysis. The ethics approval number is 2021KL-027, and the ethics committee has approved the clinical study to be carried out in accordance with the approved documents which include clinical research protocol (2021.04.12, V2.0), informed consent (2021.04.12, V2.0), subject recruitment materials (2021.04.12, V2.0), study cases and/or case reports, subject diary cards and other questionnaires (2021.04.12, V2.0), a list of participants in the clinical trial, research project approval, etc. Informed consent from the patients participating in the study was obtained. The main experimental approach of this study is to use real tongue images to validate and compare the model segmentation effects. Figure 1 presents the components of tongue diagnosis objectification.

1. Image acquisition

Use the self-developed hand-held lingual face diagnostic instrument to collect lingual face images of patients.
Fill in the patient's name, gender, age, and disease on the computer page. Images included here are from patients who came to the clinic and agreed to be photographed after being informed of the purpose and content of the study. Confirm that the patient is sitting upright, place the whole face in the image acquisition instrument, and instruct the patient to extend their tongue out of their mouth to the maximum extent.
Hold the image acquisition device connected to a computer and verify through the images on the computer screen that the patient is in the correct position and that the tongue and face are fully exposed.
Press the Shoot button on the computer screen three times to take three pictures.
NOTE: The image acquisition instrument is currently only at the patent application stage and is not for commercial use, so it is not for sale.
Manually select and filter the collected tongue and face images. Filter and exclude images that have incomplete tongue and face exposure, as well as images that are too dark due to insufficient light. Figure 2 shows the image acquisition page of the software.
In the experimental design, collect three images from each patient at a time as alternatives and select a relatively standard, fully exposed, well-illuminated, and clear image as the sample for subsequent algorithm training and testing.
Collect data after the shooting, export the data for manual screening, and delete the non-standard images visible to the naked eye. Use the following filtering and exclusion criteria: incomplete tongue and face exposure, and images that are too dark as a result of insufficient light. An example of an under-lit, an incomplete, and a standard image is shown in Figure 3.
NOTE: Insufficient light is generally caused by failure of the patient to place the face entirely into the instrument. Complete exposure is usually only obtained by correctly photographing the patient.

2. Tongue segmentation

Perform tongue image segmentation using an online annotation tool, as described below.
1. Install Labelme, click on the Open button in the upper left corner of the label interface, select the folder where the image is located, and open the photos.
2. Click on create polygon to start tracking points, track the tongue and lingual shapes, name them according to the selected areas (e.g., tongue and lingual surface), and save them.
3. When all the marks are complete, click Save to save the image to the data folder. See Figure 4 for a detailed flow chart.
  NOTE: As the images may have pixel differences, the images cannot be directly used for algorithm training and testing.
Unify the images to the same size by edge-filling the images, with the long side of the image as the target fill length and performing white edge-filling to fill the images to a square, with the long side of the image as the edge length. The image size captured by the device is 1080 x 1920 pixels, and the size of the filled image is 1920 x 1920 pixels. See Figure 5.
Apply image enhancement if needed. No enhancement was applied in this study, as the images used were taken in a fixed scene and were less affected by the environment, lighting, and other factors.
Because three images were collected for each patient during the shooting process to account for uncontrollable factors, such as subject blinking and lens blocking, manually screen the images from each patient to retain one image per patient.
For the purpose of training the model, collect data from 200 people, or 600 images. After the screening, retain about 200 usable images.
According to the image number, randomly divide all the tongue images, placing 70% of them into the training set and 30% into the test set in a spreadsheet.

3. Tongue classification

Go to the official websites and download and install Anaconda, Python, and Labelme. Activate the environment and complete the installation and adjustment of the overall environment. See Figure 6 for a flow chart describing the installing and setting up of the software.
Build the deep learning algorithm model in the installed environment, tune the parameters, and complete the model training using the training set. Perform model selection and tuning as described in the following steps.
1. Model selection: Choose the appropriate model based on the purpose of the research. After reviewing research on tongue image processing in the last 5 years, four algorithms, U-Net, Seg-Net, DeeplabV3, and PSPNet, were selected for validation in this study (see Supplementary Coding File 1, Supplementary Coding File 2, Supplementary Coding File 3, and Supplementary Coding File 4 for model codes).
2. Data set construction: After completing the model selection, construct the required data set in conjunction with the research content, mainly using Labelme annotation and the uniform image size methods, as described above.
Perform model training as described below. Figure 7 shows details of the algorithm training operation.
1. Input the data into the neural network for forward propagation, with each neuron first inputting a weighted accumulation of values and then inputting an activation function as the output value of that neuron to obtain the result.
2. Input the result into the error function and compare it with the expected value to get the error and judge the degree of recognition by mistake. The smaller the loss function is, the better the model will be.
3. Reduce the error by back propagation and determine the gradient vector. Adjust the weights by the gradient vector to the trend toward results so that the error tends to zero or shrinks.
4. Repeat this training process until the set is completed or the error value no longer declines, at which point the model training is complete. See Figure 8 for a flow chart of the algorithm model in training and testing.
Test the four models using the same test data for segmentation and judge the model performance according to the segmentation effect. The four metrics of precision, recall, mean pixel accuracy (MPA), and MIoU provide a more comprehensive model performance evaluation.
After the results of the four models are generated, compare their values horizontally; the higher the value is, the higher the segmentation accuracy and the better the model's performance. See Figure 9, Figure 10, and Figure 11.

Subscription Required. Please recommend JoVE to your librarian.

Representative Results

For the comparison results, see Figure 12, Figure 13, and Table 1, where the environment constructed by this study uses the same samples to train and test the algorithm model. MIoU indicator: U-Net > Seg-Net > PSPNet > DeeplabV3; MPA indicator: U-Net > Seg-Net > PSPNet > DeeplabV3; precision indicator: U-Net > Seg-Net > DeeplabV3 > PSPNet; recall: U-Net > Seg-Net > PSPNet > DeeplabV3. The larger the index value is, the higher the segmentation accuracy is and the better the model performance. According to the index results, it can be analyzed that the U-Net algorithm is superior to the other algorithms in MIoU, MPA, precision, and recall, and its segmentation accuracy is also higher than the other algorithms. Therefore, the U-Net algorithm has the best performance among the four different algorithms. PSPNet is better than DeeplabV3 in MIoU, MPA, and recall, while the DeeplabV3 model is lower than the Seg-Net model in all indexes. Therefore, it can be concluded that the DeeplabV3 algorithm has the least desirable comprehensive performance among the four algorithms in this research environment.

Evaluation indicators
In this study, the performance of the algorithm model was validated mainly by precision, recall, MPA, and MIoU. The performance metrics of the model are directly related to the confusion matrix, which consists of the model classification results, and reflects the number of samples that the model classified correctly and incorrectly. The matrix represents the estimated value, equivalent to the test set results, and the actual represents the ground truth. Both categories are divided into true and false, represented by T and F respectively, resulting in four combinations: TP, FP, FN, and TN. MPA is the mean value of the proportion of correctly classified pixels in each category, and MIoU is the mean intersection-to-merge ratio. This is the most common metric for semantic segmentation; it calculates the ratio of the intersection and merges the true and predicted values¹⁰. The formula for these are:

Precision = Equation 1 , recall = Equation 2 , MPA = Equation 3 (CPA = , where N is the total number of categories), and MIoU = Equation 4 (IoU= Equation 5 ).

These four metrics provide a more comprehensive evaluation of the segmentation effect of tongue images.

This study selected four deep learning algorithm models, U-Net, Seg-Net, DeeplabV3, and PSPNet, to train and test the algorithm models using real lingual image data. U-Net¹¹ has a U-shaped architecture, consisting of an encoder on the left and a decoder on the right, and has the advantage of training more accurate classification results with fewer data and extracting image features comprehensively. Based on the Res-Net network to solve the multi-scale target segmentation problem, DeepLabV3 adopts the hollow convolution structure, designs the module to capture the multi-scale context, removes the conditional random field (CRF), and upgrades the atrous spatial pyramid pooling (ASPP) module, significantly improving the model performance. Semantic segmentation aims to get the category label for each pixel of the segmented object. Seg-Net is a convolutional neural network (CNN) architecture with a symmetric structure for semantic segmentation, including an encoder and a decoder. The advantage of this is that the decoder's up-sampling method for lower-resolution feature diagrams eliminates the up-sampling learning time. The PSPNet model is mainly applied to scene parsing, adding context information to semantic segmentation, which can avoid partial error, solve the problem of lacking appropriate strategies to use global scene classification information, and improve the reliability of the final predicted results.

Figure 1: Components of tongue diagnosis objectification. Tongue diagnosis components, including image shooting elements, tongue segmentation, and tongue classification. Please click here to view a larger version of this figure.

Figure 2: Image acquisition page. Tongue image acquisition interface and questionnaire content. Please click here to view a larger version of this figure.

Figure 3: Image filtering and rejection criteria. A green tick mark represents inclusion criteria and a red cross represents exclusion criteria. Please click here to view a larger version of this figure.

Figure 4: Schematic diagram of Labelme marking process. Labelme software is used to annotate the whole process of the image, from opening the folder to saving the file. Please click here to view a larger version of this figure.

Figure 5: Picture pre-processing diagram. The size of the shot image is 1080 x 1920 pixels, and the size of the fill image is 1920 x 1920 pixels. Please click here to view a larger version of this figure.

Figure 6: Flow chart of environment configuration. The algorithm can run only after the environment is configured. Please click here to view a larger version of this figure.

Figure 7: Algorithm training run detail diagram. Detailed steps and execution methods in the algorithm operation. Please click here to view a larger version of this figure.

Figure 8: Flow chart of algorithm model in training and testing. The important steps of the algorithm, including data processing, algorithm training, and algorithm testing. Please click here to view a larger version of this figure.

Figure 9: Seg-Net algorithm structure. Seg-Net algorithm logical structure and code running process. Please click here to view a larger version of this figure.

Figure 10: U-Net algorithm structure. U-Net algorithm logical structure and code running process. Please click here to view a larger version of this figure.

Figure 11: Flow of tongue image segmentation studies. The red area in the image is the result of tongue segmentation, and the green area is the result of tongue coating segmentation. Please click here to view a larger version of this figure.

Figure 12: Comparison chart of four algorithm metrics. MIoU, MPA, precision, and recall are all evaluation indexes of algorithm performance. The larger the value, the better the algorithm performance and the higher the segmentation accuracy. Please click here to view a larger version of this figure.

Figure 13: Comparison of the results of the four algorithms for tongue segmentation. The red area in the image is the result of tongue segmentation, and the green area is the result of tongue coating segmentation. Please click here to view a larger version of this figure.

Figure 14: U-Net algorithm structure diagram. The blue/white boxes indicate the feature map, while the number above the feature map represents the number of channels. Please click here to view a larger version of this figure.

	MIoU	MPA	Precision	Recall
U-Net	84.00%	89.38%	91.90%	89.38%
DeeplabV3	59.68%	61.33%	84.21%	61.33%
PSPNet	67.80%	72.56%	82.71%	72.56%
SegNet	80.09%	87.14%	88.53%	87.14%

Table 1: Comparison of four algorithm segmentation result metrics. The metrics were MIoU, MPA, precision, and recall.

Supplementary Coding File 1: U-Net_training. U-Net model training code. Please click here to download this File.

Supplementary Coding File 2: Seg-Net_training. Seg-Net model training code. Please click here to download this File.

Supplementary Coding File 3: DeeplabV3_training. DeeplabV3 model training code. Please click here to download this File.

Supplementary Coding File 4: PSPNet_training. PSPNet model training code. Please click here to download this File.

Subscription Required. Please recommend JoVE to your librarian.

Discussion

Based on the comparison results presented above, it is evident that the characteristics of the four algorithms under consideration are varied, and their distinct advantages and disadvantages are described below. The U-Net structure, based on the modification and expansion of a full convolution network, can obtain contextual information and precise positioning through a contracting path and a symmetrical expanding path. By classifying each pixel point, this algorithm achieves a higher segmentation accuracy and segments the image with the trained model more quickly. On the other hand, the Seg-Net algorithm, comprising a symmetrical structure of an encoder and a decoder, has the advantage of adapting rapidly to new problems and performing well in tasks such as speech, semantics, vision, and gaming. However, the algorithm requires a large amount of data, making it demanding in terms of hardware configuration, and thus is only applicable for some tasks. As a more general framework, the DeeplabV3 algorithm has the advantage of improving ASSP modules for most networks and laying them out in cascade or in parallel to improve overall performance. However, the final feature map needs to be obtained with up-sampling at rates 8 and 16, which is relatively rough and could be improved later. Furthermore, the PSPNet model has the most significant feature of aggregating contextual information from different regions through the PSP module, thereby improving access to global information and delivering good results on multiple data sets. The results indicate that the U-Net model has the highest segmentation accuracy and the best segmentation effect in this research environment.

The U-Net architecture demonstrates its superiority in medical image segmentation¹². Initially designed for 2D cell image segmentation, the U-Net algorithm has been further developed by replacing its 2D module with a 3D module. This modification has strengthened its ability to process 3D images such as magnetic resonance imaging (MRI), computed tomography (CT), and three-dimensional (3D) ultrasound images. By segmenting medical images into organs, tissues, and lesions, valuable clinical data can be obtained. The improved U-Net algorithm represents an effective tool for subsequent examination and treatments. In medical diagnostics, the classification of images is a crucial part of many diagnostic processes. Traditional medicine relies on observing all visible signs, including the tongue, skin, and expression. The emergence and advancement of medical image segmentation technology hold significant importance in medical diagnosis. In TCM, analyzing face and tongue images requires using various deep-learning algorithms for feature extraction classification. On the other hand, image segmentation algorithms are widely used in Western medicine, providing a foundation for clinical diagnosis and pathology¹³.

This study's research process comprises critical steps, including data pre-processing, algorithm training and testing, and algorithm performance comparison. Initially, the raw data undergoes processing, labeling, and division into training and test sets to facilitate the subsequent algorithm construction. The processed data is then fed into the neural network, and the loss function is set to determine the gradient vector through back propagation. Subsequently, the parameters are adjusted until the completion of the training process. Algorithm performance is evaluated by testing the image segmentation effect using multiple indexes, such as MIoU, MPA, precision, and recall to assess its performance comprehensively. During the actual algorithm training process, over-fitting can occur, where the model learns the data too thoroughly, including the characteristics of the noise data. This results in identifying data during later tests, incorrect classification of data, and a poor generalization ability. If over-fitting occurs, one can increase the training data or re-clean the data. In this study, the gradient descent iterative method is adopted. Over-fitting can also be prevented by cutting off iteration in advance.

The limitation of this study is apparent; the images were collected using fixed instruments, and the experimental instruments cannot currently be used for commercial purposes. Consequently, the tongue images in this study are from a single scene and do not entirely reflect the clinical background and the complex and variable light conditions. Therefore, further research is necessary to study image processing techniques under complex environments and poor illumination conditions. Objectification studies of tongue diagnosis contain rich content, so accurate tongue body segmentation is essential. Consequently, comparing and verifying algorithms with the most suitable segmentation effect is significant for subsequent studies. Combining tongue segmentation with classification can theoretically achieve automatic tongue image judgment and assist in diagnosis; scholars have explored and studied this subject. In healthcare, using the internet of things and wireless communication technologies to process biomedical images, as well as diagnosis assistance, can enhance a systems' efficiency. Mansour et al.¹⁴ designed an automated tongue color image (ASDL-TCI) based on collaborative deep learning and the internet of things. It includes data acquisition, pre-processing, feature extraction, classification, and parameter optimization. This model's precision, recall rate, and accuracy are 0.984, 0.973, and 0.983, respectively, which are superior to other methods.

Image acquisition and pre-processing
During the image acquisition process, the intensity and variety of light sources can directly impact image quality, which in turn influences image segmentation and classification outcomes. Therefore, it is essential to set the light source to mimic the effect of natural light sources as closely as possible. Additionally, methods such as utilizing standard light sources or employing multiple light sources and shooting in a fixed scene can prevent the negative impact of light, background, and other factors, thereby enhancing the accuracy of algorithmic segmentation. The instrument lighting parameters used to collect tongue images are not identical to standard illumination, which affects tongue images' color rendering effect. Thus, the most common pre-processing method used is color correction. Cai et al.¹⁵ found that to address the discrepancy between a tongue image's color data and the corresponding tongue's color chroma, normalizing the tongue image's color space conversion and color correction is necessary. The display device's color performance also deviates from the real tongue body, necessitating testing and adjustment. Moreover, the picture size varies due to different acquisition instruments used during the image collection process¹⁶. To enhance the training efficiency and save storage space, the deep learning network has limitations on the input picture size. Therefore, the picture size must be standardized during the picture pre-processing stage. Typically, this is accomplished by uniformly reshaping the input picture size for model training, with commonly used reshaping methods being interpolation, clipping, inclusion, tiling, and mirroring.

Tongue image segmentation
Tongue image segmentation can be categorized into two types: traditional and deep learning segmentation methods¹⁷. Traditional tongue image segmentation methods consist of algorithms such as the Snake algorithm and the Otsu algorithm. As an active contour model, the Snake algorithm¹⁸ first sets a profile curve and then adjusts the initial profile to evolve into a true profile curve. The acquisition of initial contours and the evolution of contours are the primary focus of research for the Snake algorithm. On the other hand, the Otsu algorithm is a classical threshold segmentation algorithm that employs one or more thresholds to calculate the gray value on the original image and compare the grayscale value of each pixel to the threshold value. Based on the comparison results, the tongue and background are depicted before the advent of deep learning methods. These two algorithms are commonly used in tongue image processing and tongue diagnosis objectification.

Since the advent of deep learning theory, numerous scholars have researched the integration of tongue diagnosis objectification and deep learning. Zheng et al.¹⁹ devised a tongue detection method based on image segmentation by amalgamating various algorithms and exploring the tongue detection method in an open environment, ultimately achieving favorable tongue segmentation results. Yuan et al.²⁰ proposed a tongue segmentation method based on the single pixel loss function of region association, wherein the improved loss function accounted for the correlation between region pixels. Employing pixel label semantics supervised learning, the model training efficiency was enhanced, exemplified by the MIoU index reaching 96.32%. The tongue image exhibited specific morphological characteristics such as tooth marks, cracks, and punctures, closely linked to disease onset. Thus, tongue observation can aid in diagnosing the progress of disease. Wang et al²¹ proposed a deep-learning tongue fracture segmentation approach for small sample data sets that yielded improved accuracy and stability. This method involved splitting the tongue body first, followed by tongue cracks, and improved the U-Net algorithm by incorporating focus loss as the loss of function.

Tongue image classification
Classifying tongue images mainly involves identifying characteristics such as tongue color, spines, cracks, and coating color. Wang et al.²² employed the Snake algorithm to segment the tongue body and utilized techniques such as mutual information image registration, log edge detection, parallel line, and other methods to identify punctures. This approach effectively solved the issue of automatic puncture identification and counting while facilitating early detection and prevention. To address the limitations associated with training of the tongue image algorithm, such as a large data volume, long training time, and high equipment requirements, Yang et al.²³proposed a fully connected neural network based on transfer learning. This method utilizes the well-trained Inception_v3 to extract features and combine them with the fully connected neural network (FCN), achieving an accuracy rate of over 90%. This approach resolved the issue of deep learning in small samples and multiple classifications. Song et al.²⁴ employed a cascade classifier to locate images on GoogLe-Net and Res-Net for transfer learning, training, and applying deep learning to automatically classify three tongue image features: tooth marks, cracks, and tongue coating thickness. The average accuracy of the classification results exceeded 94%. However, the tongue image classification algorithm is highly susceptible to interference from other unrelated parts of the face, directly impacting classification accuracy²⁵.

Zhai et al.²⁶ developed a multi-stage algorithm for classifying tongue images using attention mechanisms. This method enhances the accuracy of identifying tongue regions by extracting features from various perceptual fields of view, which are fused during the tongue localization phase. Furthermore, the attention mechanism module improves tongue image classification accuracy, which suppresses interference from tongue impurities. Facing the problem of classifying tongue features of different diseases²⁷, deep learning algorithms may also provide novel approaches. In addition, Shi et al.²⁸ have investigated a typical classification method for non-small cell lung cancer based on the C5.0 decision tree algorithm. They identified seven attribute classification rules relevant to the Qi deficiency certificate and Yin deficiency certificate classification. The model's accuracy was found to be 80.37%. In addition, Li et al.²⁹ have developed a diagnostic model for diabetes using the stochastic forest algorithm. They further analyzed texture and color features from tongue images to enhance the model's performance.

Conclusion
In contrast to the contemporary diagnostic approaches of Western medicine, the diagnostic methods of TCM are minimally invasive and entail minimal harm. Additionally, the four diagnostic methods of observation, listening or smelling, inquiry, and palpation have their foundations in diverse aspects of TCM. Nevertheless, owing to the heavy reliance of TCM diagnosis and treatment on practitioner' expertise and personal treatment concepts, there may be a shortage of objectivity and standardization. As a result, the trend toward objectifying the diagnosis of TCM has emerged as a direction for further research, which could promote the advancement of TCM.

The objectification of tongue diagnosis possesses the potential to process images and large amounts of data with high efficiency, which could significantly aid doctors. However, it is essential to note that tongue diagnosis is not only a traditional method, but also has been validated. Chen et al.³⁰ conducted a study in which they collected clinical data on the tongue images of 382 COVID-19 patients. They statistically analyzed tongue image features and the lab's color pattern parameters for all imaging groups. The study's findings revealed a correlation between the features of tongue images and the type of Western medicine used. Additionally, the changes in tongue images align with the overall pathogenesis of the disease. Some parameters of tongue images could potentially assist in predicting pathogenic changes of COVID-19 in TCM³¹.

While objectifying traditional medical tongue diagnosis, numerous researchers have utilized the segmentation and classification method. Deep learning and convolution neural networks are essential for classifying tongue image characteristics. The accuracy of the tongue image segmentation algorithm is crucial as it determines whether the tongue can be precisely separated from the face, thereby impacting the accuracy of the subsequent classification of features. Consequently, enhancing the accuracy of the current algorithm model is a crucial research focus in this field. At the moment, improving the algorithm model and its accuracy is a research hotspot.

This study employed the same test set data to compare the performance of the U-Net, Seg-Net, DeeplabV3, and PSPNet4 algorithms. This measure was taken to ensure consistency in the quality of data used. Under the experimental environment employed in this study, the U-Net algorithm outperformed the other three algorithms significantly regarding segmentation accuracy. MIoU is the annotation measure of the semantic segmentation algorithm³², the most crucial index used to evaluate algorithm performance. The MIoU value of the U-Net algorithm was 3.91% higher than that of the Seg-Net algorithm, 23.32% higher than that of DeeplabV3, and 16.2% higher than that of PSPNet. This provides evidence that the U-Net algorithm performs better than the other algorithms.

However, there are some problems in the segmentation and classification of tongue images using deep learning algorithms. For example, due to patient privacy, medical image data sets are too small in size compared to other semantic segmented data sets, which restricts the advantages of deep learning in big data. Large parameter model segmentation is prone to the fitting problem. Therefore, the network structure needs to be adjusted by selecting the appropriate modes of improvement. At present, the objectification research of tongue diagnosis has not yet formed a uniform collection standard; the acquisition environment and light source type lack proper standardization. Researchers usually set up the collection environment and build their own non-public database. At the same time, although the current algorithmic models can achieve good accuracy, the data used are carefully screened and pre-processed, which is difficult to achieve in the actual diagnosis and treatment environment, thereby limiting its clinical application. Additionally, further objectification of tongue diagnosis will deal with complex environments or tongue images captured by different devices³³. Another trend is dynamic information processing, specifically video image processing, which provides more detailed information on the tongue and more comprehensively reflects the advantages of tongue diagnosis. Thus, it is necessary to develop deep learning algorithms to process dynamic details. Overall, the objectification of medical tongue diagnosis combined with deep learning algorithms holds promise for reducing subjectivity in TCM diagnosis.

Subscription Required. Please recommend JoVE to your librarian.

Disclosures

The authors have no conflict of interest to declare.

Acknowledgments

This work was supported by the National Nature Foundation of China (grant no.82004504), the National Key Research and Development Program of the Ministry of Science and Technology of China (grant no.2018YFC1707606), Chinese Medicine Administration of Sichuan Province (grant no.2021MS199) and National Nature Foundation of China (grant no.82174236).

Materials

Name	Company	Catalog Number	Comments
CPU	Intel(R) Core(TM) i7-9700K
GPU	NVIDIA GeForce RTX 3070 Ti (8192MB)
Operating systems	Microsoft Windows 10 Professional Edition (64-bit)
Programming language	Python
RAM	16G

DOWNLOAD MATERIALS LIST

References

Jiu, G., et al. Effects of herbal therapy on intestinal microbiota and serum metabolomics in different rat models of Mongolian medicine. Evidence-Based Complementary and Alternative. 2022, 7255780 (2022).
Xi, J., Xin, Y., Teregle, Study on the correlation between the animal model of Mongolian medicine 34;Xieri disease" and serum ALT and AST. Electronic Journal of Cardiovascular Diseases in Combination of Traditional Chinese. 4 (33), 134-135 (2016).
Yin, L., et al. Study on the mechanism of serum differential protein changes in bronchial asthma based on proteomics. Chinese Journal of Traditional Chinese Medicine. 47 (22), 6227-6234 (2022).
Wang, X. H., Bao, L. Band Seed. The origin and development of tongue diagnosis in Mongolian medicine. Chinese Ethnic Folk Medicine. (1), 64-65 (2008).
Yang, S., et al. A comparative study on the feature parameters of tongue diagnosis images in five nationalities. Chinese Journal of Traditional Chinese Medicine. 36 (11), 6428-6430 (2021).
He, J. Y., et al. Efficacy and safety of Chou-Ling-Dan granules in the treatment of seasonal influenza via combining Western and traditional Chinese medicine, protocol for a multicentre, randomised controlled clinical trial. BMJ Open. 9 (4), e024800 (2019).
Wang, D. J., et al. Scientific knowledge mapping and visualization analysis in the field of Chinese medicine tongue feature objectification research. World Science and Technology - Modernization of Chinese Medicine. 23 (9), 3032-3040 (2021).
Yuan, S. M., Qian, P., Li, F. F. Research progress of color correction methods for tongue and face diagnosis in traditional Chinese Medicine. Chinese Journal of Traditional Chinese Medicine. 34 (9), 4183-4185 (2019).
Kanawong, R., et al. Tongue image analysis and its mobile app development for health diagnosis. Advances in Experimental Medicine and Biology. 1005, 99-121 (2017).
Yu, Y., et al. Semantic segmentation evaluation index and evaluation method. Computer Engineering and Application. , (2023).
Sehyung, L., Negishi, M., Urakubo, H., Kasai, H., Ishii, S. Mu-net: Multi-scale U-net for two-photon microscopy image denoising and restoration. Neural Networks. 125, 92-103 (2020).
Huang, X. M., et al. A review on the application of U-Net and its variants in medical image segmentation. Chinese Journal of Biomedical Engineering. 41 (5), 567-576 (2022).
Lu, J. H., Xu, Y. F., Wang, Y. Q., Hao, Y. M. Research overview of tongue objectification in traditional Chinese medicine based on computer image technology. World Science and Technology - Modernization of Traditional Chinese Medicine. 24 (11), 4568-4573 (2022).
Mansour, R. F., Althobaiti, M. M., Ashour, A. A. Internet of things and synergic deep learning based biomedical tongue color image analysis for disease diagnosis and classification. IEEE Access. 9, 94769-94779 (2021).
Cai, Y. H., Hu, S. B., Guan, J., Zhang, X. F. Analysis of the development and application of tongue diagnosis objectification techniques in Chinese medicine. World Science and Technology - Modernization of Chinese Medicine. 23 (7), 2447-2453 (2021).
Ghosh, S., Das, N., Nasipuri, M. Reshaping inputs for convolutional neural network: some common and uncommon methods. Pattern Recognition. 93, 79-94 (2019).
Shang, Z. M., et al. Research progress of digital acquisition and characterization of tongue diagnosis information. Chinese Journal of Traditional Chinese Medicine. 36 (10), 6010-6013 (2021).
Ning, J., Zhang, D., Wu, C., Yue, F. Automatic tongue image segmentation based on gradient vector flow and region merging. Neural Computing and Applications. 21, 1819-1826 (2012).
Zheng, F., Huang, X. Y., Wang, B. L., Wang, Y. H. A method for tongue detection based on image segmentation. Journal of Xiamen University. 55 (6), 895-900 (2016).
Li, Y. T., Luo, Y. S., Zhu, Z. M. Deep learning-based tongue feature analysis. Computer Science. 47 (11), 148-158 (2020).
Wang, Y. D., Sun, C. H., Cui, J. L., Wu, X. R., Qin, Y. X. Research on deep learning-based tongue fissure segmentation algorithm. World Science and Technology - Modernization of Chinese Medicine. 23 (9), 3065-3073 (2021).
Wang, X. M., Wang, R. Y., Guo, D., Lu, S. Z., Zhou, P. Research on the identification method of tongue punctures based on auxiliary light source. Journal of Sensing Technology. 29 (10), 1553-1559 (2016).
Yang, J. D., Zhang, P. A fully connected neural network based on migration learning for tongue image classification. Journal of the Second Military Medical University. 39 (8), 897-902 (2018).
Song, C., Wang, B., Xu, J. T. Research on tongue feature classification method based on deep migration learning. Computer Engineering and Science. 43 (8), 1488-1496 (2021).
Ding, H. J., He, J. C. Study on modern techniques and methods of tongue diagnosis. Shi Zhen Chinese Medicine. 21 (5), 1230-1232 (2010).
Zhai, P. B., et al. A multi-stage tongue image classification algorithm incorporating attention mechanism. Computer Engineering and Design. 42 (6), 1606-1613 (2021).
Hou, Y. S. A new clustering analysis algorithm based on deep learning. Journal of Xinxiang University. 35 (12), 4 (2018).
Shi, Y. L., et al. A decision tree algorithm for classification of non-small cell lung cancer evidence based on tongue and pulse data. World Science and Technology - Modernization of Chinese Medicine. 24 (7), 2766-2775 (2022).
Li, J., Hu, X. J., Zhou, C. L., Xu, J. T. Study on the feature analysis and diagnosis model of diabetic tongue based on random forest algorithm. Chinese Journal of Traditional Chinese Medicine. 37 (3), 1639-1643 (2022).
Chen, C. H., et al. The characteristics of the combination of the four diagnostic methods of traditional Chinese medicine from the perspective of the differences between Chinese and Western medical diagnosis methods. Journal of Guangzhou University of Traditional Chinese Medicine. 28 (3), 332-334 (2011).
Chen, R., et al. Correlation analysis of tongue image and western medicine typing in 382 patients with novel coronavirus pneumonia based on Lab colour model and imaging histology. Chinese Journal of Traditional Chinese Medicine. 36 (12), 7010-7014 (2021).
Ju, J. W., Jung, H., Lee, Y. J., Mun, S. W., Lee, J. H. Semantic segmentation dataset for AI-based quantification of clean mucosa in capsule endoscopy. Medicina. 58 (3), 397 (2022).
Wu, X., et al. A review of research on deep learning in tongue image classification. Computer Science and Exploration. , 1-23 (2022).

Medicine