Synthesising Lesion Images with Pixel-level Labels Without Manual Annotations for Medical Image Analysis Model Learning

Project: Research project

Project Details


Background: A lesion is an area of damage or abnormal tissue, such as pathological cells. Lesions provide comprehensive information about the health status of patients, and this information has been extensively used in clinical settings for disease diagnosis, follow-up and treatment planning. To streamline and automate these processes, over the past decade, many deep learning models have been proposed to locate, identify and characterise lesions in medical images. The results are encouraging when substantial amounts of pixel-level labelled medical images are available for model learning. Because the collection and annotation of medical images by medical experts are costly and timeconsuming, the automatic generation of annotated lesion images from unlabelled data and the corresponding medical reports without manually labelled data has been a desirable goal in the medical domain. This achievement would also benefit the development of lesion-based algorithms and their deployment in practical applications, especially in the context of new or rare diseases. As such, through the proposed project, we will investigate and address the research problems associated with the above-mentioned unmet clinical needs.

State-of-the-art (SOTA) Methods and Their Limitations:
To address the above-mentioned problems, a straightforward solution would be to employ SOTA data augmentation methods, such as diffusion-based and conditional GAN-based methods, to generate annotated lesion images. However, to train diffusion-based or GAN-based models, relatively large amounts of annotated data are required for model learning. This problem can be partially solved using domain adaptation (or transfer learning) approaches and pre-trained lesion segmentation models. However, the accuracy and generalisation ability of these algorithms are not as good as those of algorithms fully learned using real annotated data. Another popular approach is to detect lesion region(s) using a pre-trained segmentation algorithm and then refine the regions. A generated annotated image is considered good if the lesion region(s) can be accurately detected. However, it is challenging to accurately segment lesions, particularly the small and complex ones. Moreover, annotated lesion images are still required for segmentation model learning.

Novelty of the Proposed Project:
To achieve the above-mentioned goals, in the proposed project, we will establish a new annotationfree synthesis framework to generate annotated lesion images for medical image analysis model learning. We propose the following three novel ideas for the framework: ü An annotation-free sub-framework and a new strategy combining pseudo-lesion-region segmentation with test-time training to learn a global–local lesion style-based synthesiser ü A new lesion attribute-aware editing sub-framework that aligns attributed images and selects representative synthesised annotated lesion images ü A new clinical knowledge-driven strategy to inject realistic lesions into healthy images for realistic synthesised annotated lesion images.

Research Impact:
To the best of our knowledge, this will be the first investigation of annotated lesion image synthesiser learning without manually labelled data. The results of the proposed project will enable the automatic generation of annotated lesion image datasets for medical image analysis model learning. The availability of more and larger balanced and diverse annotated lesion image datasets will enhance the robustness and generalisation of the learned models. Moreover, the synthesised lesion images will not contain the health information of patients (abnormal regions in medical images), and the synthesised datasets will not contain any original patient images. The results of the proposed project will also promote the idea of shareable synthetic medical data.
StatusNot started
Effective start/end date1/01/2531/12/27


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.