Professor Sang-hyun Park’s Research Team at DGIST Develops an Artificial Intelligence Model to Accurately Detect Cells with Limited Labels

- Professor Park’s research team at DGIST successfully develops AI for pathological image analysis that can accurately segment overlapping cell nuclei using only rough labels - Improved efficiency in diagnosing diseases and predicting the prognosis of patients by analyzing the microenvironment of pathological images... published in MICCAI, a top journal in the field of artificial intelligence

□ A research team led by Professor Sang-hyun Park of the Department of Robotics and Mechanical Engineering, Daegu Gyeongbuk Institute of Science and Technology (DGIST; President Young Kuk), announced on October 17 (Tuesday) that they have successfully developed a new artificial intelligence technology that can accurately analyze cell nuclei in pathological images. As it can accurately distinguish overlapping cell nuclei through rough labels of cell nuclei, this technology is expected to contribute significantly to the analysis of the microenvironment in pathological images.


□ To diagnose patients with cancer and predict their prognosis, it is necessary to identify the shape and count the number of cell nuclei in pathological images. To allow high-performance deep learning models to learn, however, accurate cell nucleus data are necessary. As cell nuclei have different morphologies and there are hundreds of thousands of cell nuclei in a single image, substantial time and money must be expended to generate the respective data cell by cell.


□ To address this problem, there have been many studies in recent years of deep learning models based on weak supervised learning that marks a point in the center of the cell nucleus and uses it as training data. While this has simplified the traditional methods of building a dataset, it has the limitation that it cannot accurately segment attached or overlapping cell nuclei, as information about the boundaries of cell nuclei is not known, and learning works well only if the center of the cell nucleus is located.


□ In an effort to distinguish cell nuclei more accurately, Professor Park’s research team added an offset module[1] and a center point prediction module to the deep learning model. Further, the research team introduced the expectation-maximization (EM) algorithm[2] to enable segmentation if a point exists only inside the cell nucleus, and added a process to adjust uncertain point labels to the center. This technique allows the accurate segmentation of cell nuclei and identification of the boundaries between cell nuclei.


□ The deep learning model newly developed by the research team shows high accuracy. The value of the Dice similarity coefficient (DSC) score, which indicates the accuracy of segmentation results in medical image analysis, that this model achieved was 75‒78%, while it attained an Aggregated Jaccard Index (AJI) performance (which compares whether adjacent cell nuclei are well distinguished) of 55‒62%. It is worth noting that its AJI performance was 11‒14% greater than that of conventional techniques when the point was not exactly in the center.


□ Professor Sang-hyun Park of the Department of Robotics and Mechanical Engineering said, “This research allows us to accurately analyze cell nuclei while greatly reducing the time and cost required to build a dataset. By analyzing pathological images, this technology is expected to significantly contribute to diagnosing diseases in patients and predicting their prognosis.”


□ This research was conducted as a part of the National Police Agency’s Intelligent Big Data Integrated Platform Development Project for Healthcare Services Personalized for Police Officers and the National Research Foundation of Korea’s New Research Support Project. Its results were published in October 2023 in Medical Image Computing and Computer Assisted Intervention, a top journal in the field of artificial intelligence.

 corresponding author E-mail address : [email protected]

[1] Offset module: A two-dimensional image of the distance between the point labels and the boundaries of cell nuclei along the x and y-axes, which helps in finding the boundaries between cells.

[2] Expectation-maximization algorithm: An iterative optimization algorithm to estimate the parameters of a probability model that contains missing data or latent variables.