A Large-Scale Multi-Centre Research On Domain Generalisation in Mammography Artificial Learning-Based Mass Detection A Review

Debopriya Ghosh¹ , Ekta Ghosh² , Debdutta³

¹Department of Physiology, University College of Medical Sciences, New Delhi, India

²Department of Paediatrics, All India Institute of Medical Science – New Delhi, India

³Department of Computer Science, IIT BHU, India

Corresponding Author Email: d3bopr1ya@gmail.com

DOI : https://doi.org/10.5281/zenodo.7008773

Abstract

ABSTRACT

In 2020, breast cancer surpassed lung cancer for the first time as the most prevalent cancer globally. Nearly 30% of all malignancies in women are caused by it, and the prevalence is rising according to current trends. During x-ray mammography, the gold standard imaging method for early detection utilized in screening programmes, anomalies in the breast structures can be found to be indicative of breast cancer. These abnormalities may take the form of masses, calcifications, architectural distortions, or asymmetries in the breast. Furthermore, breast cancer screening has a high chance of false positives that could lead to unnecessary biopsies as well as a high rate of false negatives or missed malignancies. A recent, substantial study that used the interpretations of 101 radiologists to evaluate the effectiveness of an artificial intelligence (AI) system came to the conclusion that, in a retrospective setting, the AI stand-alone had cancer detection accuracy comparable to that of a typical radiologist. The performance was similar in Breast cancer screening AI stand-alone solutions were studied in several therapeutic contexts and demonstrated better cancer prediction rate in comparison to a human expert double-reader strategy. Contrary to these conclusions, related research evaluated how well AI algorithms performed.

Keywords

Breast cancer, Domain Generalization, Domain Synthesis, Image Standardization, Learning Techniques, Mammography, Transformer-Based Detection

Download this article as:

Introduction

MEDICAL IMAGING DOMAIN GENERALIZATION:

Samala et al [1] investigated of the classification error of a deep convolutional neural network Mammograms can detect benign and cancerous tumours. They aim to balance the brain’s capacity for learning and remembering by changing the percentage of faulty data in the training data set. They came to the conclusion that training with noisy data or 10% of damaged labels might lead to more generalizations. Mistake and enhance transfer learning performance strategies. The inconsistent performance of deep learning models in the categorization of mammography was described by Wang et al. in 2020. Six deep learning architectures were evaluated using a total of four datasets from various patient groups. The results demonstrated that, independent of the model architecture, training strategy, or data labeling approach, the excellent performance attained in the training dataset could not be applied to unknown external datasets. Recently, [2] used a contrastive learning approach to study the DG in lesion detection on different vendors and extract domain invariant characteristics. The technique demonstrated excellent generalizability when tested on two unseen suppliers using mammograms from three different vendors for training. No statistical significance tests were performed to confirm the improvement; they merely used the mean average precision (mAP) as an assessment metric for the comparison with cutting-edge generalization approaches. In the context of cross-institutional domain shift in other medical imaging fields, such as chest X-rays, prior research has found variable generalization performance of deep learning models [3]. Chest X-ray prediction models’ generalization abilities were examined by Cohen et al. [4] using training and testing datasets from various universities.

It was found that the domain shift in the pictures had a far less impact on the generalization error than the shift in the labels. This paper also discusses the distinctions between the covariate shift, which is present in the data of each domain, and the domain shift, which is induced by the varied image acquisition procedures. The effectiveness of eight domain generalization strategies was recently benchmarked by Zhang et al. [5] using chest X-ray pictures and multi-site clinical time series datasets. On the data from the chest X-ray imaging, none of the DG approaches managed to significantly improve OOD performance. They did not apply intensity scale standardization or any of the other single-source domain generalization strategies that we used in this investigation, which is in contrast to our work. Additionally, a single classification architecture called DenseNet-121 was trained on a very small sample size of pictures. The largest study to date on the effects of domain shift in deep learning models trained using MR images, [27] assessed the reliability of a deep learning model in clinical ODD data for magnetic resonance imaging (MRI).

The results showed that performance in OOD data was improved by training with more heterogeneous data from a greater variety of scanners and methods. In contrast to this, the focus of our work is on how to strengthen models when data from other institutions, or domains, is not readily available. Additionally in MRI, [6] proposal for a causality-inspired data augmentation methodology for single-source domain generalization for medical image segmentation was compared to various SSDG approaches, with their method outperforming them all. BigAug achieves performance that is similar to the two state-of-the-art algorithms in four distinct, hitherto unexplored regions. Finally, Thagaard et al. [7] and Stacke et al. [8] have explored the domain shift effect for deep learning in digital pathology and histopathology.

Deep learning has been widely addressed in the literature for the identification and categorization of masses in mammograms [9]. A wide variety of deep learning models have been developed to help radiologists in mammography screening. The bulk of techniques in the literature reports their performance in the training domain, though transfer learning is later used to adapt the model to new domains. Instead, we would like to evaluate the performance of models in unexplored fields without the aid of transfer learning and determine the degree to which models developed in a single-source environment can be generalised. We also compare the top single-source DG model with transfer learning in five distinct domains. The majority of approaches in the literature only discuss how well they perform in the training domain; transfer learning is applied later to adapt the model to other domains. Instead, we would like to investigate how well models trained in a single-source context performed both without and with DG methods, as well as how well those models might generalise without the aid of transfer learning. Additionally, we contrast the best single-source DG model with transfer learning across five different domains.

Furthermore, current proposals in the literature use just one well-known Convolutional Neural Networks (CNN) architecture, such as Faster R-CNN ([10] or [11], whereas the more recent Transformers-based detection models [12] are still not extensively studied We also incorporate these cutting-edge Transformer-based detection models in our study and compare their generalizability to those of conventional CNN detection techniques. Transformer based Architectures’ Robustness Transformer designs’ OOD robustness has been because Transformers got analysed in recent articles [13] more often used in Computer Vision tasks [14], mostly because Visual Transformers (ViT) were released.

The majority of these studies discover that self-awareness approaches, in particular, and the lack of convolutions have strong inductive biases and outperform CNNs in terms of OOD resilience due to the inherent properties of Transformers. As an illustration, Zhang et al. (2021a) discovered that the Transformer-based model, a DeiT [15], performed better than a single variant of the well-known Big Transfer (BiT) CNN-based model. They accomplished this utilising the most well-liked ImageNet data-shift datasets [16]. Transformers are not more durable than CNN models, but rather better calibrated, according to a more extensive analysis that considered the most relevant BiT and ViT variations.

Transformers’ improved robustness is mostly related to the architectural components of their design, such as their self-attention mechanism and lack of inductive biases. They proved that the influence of pre-training is more important than the absence of self-attention by outperforming Transformers with a CNN pre-trained robot. In conclusion, little is known about why self-attention mechanisms create better representations in specific settings and how different pre-training methods dramatically influence the downstream task. In this study, we compare the robustness of CNN-based and Transformer-based object detection architectures that were trained on large datasets and specifically designed for mass identification in a medium-sized digital mammography dataset (2,864 mammograms included in the training). Transfer learning has been used in mammography breast cancer diagnosis to adapt the model to new domains, such as new scanners and imaging techniques [17]. The two main constraints on transfer learning in medical imaging are data accessibility and catastrophic forgetting. When a model is trained sequentially on numerous tasks, a feature of artificial neural networks known as catastrophic forgetting occurs, causing the model to abruptly forget previously learned knowledge upon learning new information.

DATASETS FOR FULL-FIELD DIGITAL MAMMOGRAPHY

There are several open access X-ray mammography archives listed in the literature [18]. The three FFDM datasets employed in this study to assess the robustness and generalisation of the selected approaches on various domains are the OPTIMAM dataset, a subset of the OMI-DB [19], the INbreast [20], and the Breast Cancer Digital Repository (BCDR) [21]. A subset of OPTIMAM containing 3,500 malignant and 500 benign patients was used in this investigation. Each instance of the dataset may contain various studies from the same subject. The training, validation, and test sets were carefully divided into situations rather than studies. The two most common views of each breast, the medio-lateral oblique (MLO) and cranio-caudal (CC) views, were used as independent inputs. The mammography picture matrix is either 3328 x 4084 or 2560 x 3328 pixels, depending on the manufacturer and compression plate used in the acquisition. OPTIMAM meets the requirements for a multi-center and multi-scanner research because it includes screenings from a total of three separate centres and different scanner manufacturers. Only examples from the various scanner manufacturers that had masses with annotations were selected. The four unique domains generated using OPTIMAM to divide the cases by scanner vendor were Hologic Inc., Siemens, GE, and Philips. About Dataset Breast mammograms were collected using a Siemens Mammo-Novation FFDM equipment from a single Portuguese institution. Images supplied in DICOM format have a matrix of 3328 x 4084 or 2560 x 3328 pixels, depending on the compression plate that was used during the acquisition. This searchable database contains 115 cases in total, including masses, calcifications, asymmetries, and deformities. Only 50 of the 115 examples, which include 116 annotations altogether, have masses. The majority of in breast lesions lack biopsy evidence; hence the malignancy of the tumour is determined using the BI-RADS evaluation categories [22]. Masses with BI-RADS 2 and 3 are typically classified as benign, but those with BI-RADS 2 are classified as malignant. A freely available dataset from 2012, the Digital Breast Cancer Repository (BCDR) Dataset from the BCDR [23], is still easily accessible right now. as needed. The data package contains both digital (BCDRDM) and film mammograms (BCDR-FM). 90 patients total have biopsy-proven mass labelled lesions in the BCDR-dataset. DM’s every image is provided by the Centro Hospitalar So Joo (FMUP-HSJ) of the University of Porto’s Faculty of Medicine and was taken using a Mammo-Novation Siemens FFDM scanner. The picture matrix is either 3328 4084 or 2560. is only available in 8-bit depth TIFF format and depends on the compression plate used in the. BCDR will be used as a single domain for the purposes of this inquiry.

CHANGE IN MAMMOGRAPHY’S DOMAIN

It is well recognized that one of the main causes of domain shift is the employment of various scanner manufacturers and image capture techniques. The differences between domains that are most noticeable are the variations in intensity values and the contrast between the adipose areas of the breast and the fibro glandular tissues. The different data distributions among datasets result in extra covariate shift for medical imaging datasets on top of the acquisition shift. Due to the lack of available data and privacy restrictions, covariate shift is difficult to avoid. Anywhere on the breast might develop masses or nodules, having various sizes and forms that might either seem benign or cancerous. Additionally, additional elements like breast density might make things more challenging. Large-scale detection Breasts with high densities exhibit a greater likelihood of occlusion of the thick tissues (parenchyma) masses and other breast tumours, or even fake them. Because of this, mammography’s total sensitivity for detecting breast cancer In thick breasts, the chance of detection is lowered by more than 20% [24,25], even though thick breasted women had a 4-6 breast cancer risk by a factor of two in comparison to people with low breast density [26].

CONCLUSION

Studied a number of mammography mass detection methods in six different regions. Our experimental results showed that transformer-based detection models were more robust to domain changes. Additionally, we emphasised the value of using SSDG strategies to lessen domain shift and enhance performance in hypothetical clinical settings. In four of the five domains not visible during training, the suggested training pipeline reduced the domain shift that was evident. The findings showed that in one domain, the domain shift brought about by the acquisition pipeline was outweighed by the dataset change brought about by a larger percentage of tiny masses. In addition, we discovered that Transfer Learning improved performance in one domain but degraded performance in other areas. Transfer learning is an effective method for reducing dataset shift, but as the findings demonstrate, it is not always effective and must be used with caution to prevent catastrophic forgetting. We further believe that continuous learning for AI in breast cancer diagnosis should be the focus of future research. Both in a federated and a distributed setting, continuous learning has a great deal of promise to help CADe systems avoid problems like catastrophic forgetting, current dataset changes, and demographic biases.

Conflict of Interest There is no COI

Funding Information There is no funding for this article because it is a review article

References

Li, Z., Cui, Z., Wang, S., Qi, Y., Ouyang, X., Chen, Q., Yang, Y., Xue, Z., Shen, D., Cheng, J.Z., 2021. Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning, in: International Con- ference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 98–108.
Zech, J.R., Badgeley, M.A., Liu, M., Costa, A.B., Titano, J.J., Oermann, E.K., 2018. Variable generalization performance of a deep learning model to de tect pneumonia in chest radiographs: a cross-sectional study. PLoS medicine 15, e1002683.
Cohen, J.P., Hashir, M., Brooks, R., Bertrand, H., 2020. On the limits of cross- domain generalization in automated X-ray prediction, in: Medical Imaging with Deep Learning, PMLR. pp. 136–155.
Zhang, C., Zhang, M., Zhang, S., Jin, D., Zhou, Q., Cai, Z., Zhao, H., Yi, S., Liu, X., Liu, Z., 2021a. Delving Deep into the Generalization of Vision Transformers under Distribution Shifts. arXiv preprint arXiv:2106.07617 .
Zhang, H., Dullerud, N., Seyyed-Kalantari, L., Morris, Q., Joshi, S., Ghassemi, M., 2021b. An empirical framework for domain generalization in clinical settings, in: Proceedings of the Conference on Health, Inference, and Learn- ing, pp. 279–290.
Zhang, H., Wang, Y., Dayoub, F., Sünderhauf, N., 2020a. VarifocalNet: An IoU-aware Dense Object Detector. arXiv preprint arXiv:2008.13367 .
Zhang, L., Wang, X., Yang, D., Sanford, T., Harmon, S., Turkbey, B., Wood, B.J., Roth, H., Myronenko, A., Xu, D., et al., 2020b. Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE transactions on medical imaging 39, 2531– 2540.
Mårtensson, G., Ferreira, D., Granberg, T., Cavallin, L., Oppedal, K., Padovani, A., Rektorova, I., Bonanni, L., Pardini, M., Kramberger, M.G., et al., 2020. The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study. Medical Image Analysis 66, 101714.
Ouyang, C., Chen, C., Li, S., Li, Z., Qin, C., Bai, W., Rueckert, D., 2021. Causality-inspired Single-source Domain Generalization for Medical Image Segmentation. arXiv preprint arXiv:2111.12525 .
Thagaard, J., Hauberg, S., van der Vegt, B., Ebstrup, T., Hansen, J.D., Dahl, A.B., 2020. Can you trust predictive uncertainty under real dataset shifts in digital pathology?, in: International Conference on Medical Image Comput- ing and Computer-Assisted Intervention, Springer. pp. 824–833.
Stacke, K., Eilertsen, G., Unger, J., Lundström, C., 2019. A closer look at domain shift for deep learning in histopathology. arXiv preprint arXiv:1909.11575 .
Stacke, K., Eilertsen, G., Unger, J., Lundström, C., 2020. Measuring domain shift for deep learning in histopathology. IEEE journal of biomedical and health informatics 25, 325–336.
Abdelrahman, L., Al Ghamdi, M., Collado-Mesa, F., Abdel-Mottaleb, M., 2021. Convolutional neural networks for breast cancer detection in mam- mography: A survey. Computers in Biology and Medicine , 104248.
Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788.
Ren, S., He, K., Girshick, R., Sun, J., 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 1137–1149.
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S., 2020. End-to-End Object Detection with Transformers, in: ECCV.
Fort, S., Ren, J., Lakshminarayanan, B., 2021. Exploring the Limits of Out-of- Distribution Detection. arXiv preprint arXiv:2106.03004 .
Bai, Y., Mei, J., Yuille, A.L., Xie, C., 2021. Are Transformers more robust than CNNs? Advances in Neural Information Processing Systems 34.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H., 2021. Training data-efficient image transformers & distillation through at- tention, in: International Conference on Machine Learning, PMLR. pp. 10347–10357.
Ribli, D., Horváth, A., Unger, Z., Pollner, P., Csabai, I., 2018. Detecting and classifying lesions in mammograms with deep learning. Scientific reports 8, 1–7.
Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., Houlsby, N., 2020. Big transfer (bit): General visual representation learning, in: Com- puter Vision–ECCV 2020: 16th European Conference, Glasgow, UK, Au- gust 23–28, 2020, Proceedings, Part V 16, Springer. pp. 491–507.
Diaz, O., Kushibar, K., Osuala, R., Linardos, A., Garrucho, L., Igual, L., Radeva, P., Prior, F., Gkontra, P., Lekadir, K., 2021. Data preparation for artificial intelligence in medical imaging: a comprehensive guide to open- access platforms and tools. Physica Medica 83, 25–37.
Halling-Brown, M.D., Warren, L.M., Ward, D., Lewis, E., Mackenzie, A., Wal- lis, M.G., Wilkinson, L.S., Given-Wilson, R.M., McAvinchey, R., Young, K.C., 2021. OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data. Radiology: Artifi- cial intelligence 3.
Moreira, I.C., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M.J., Cardoso, J.S., 2012. Inbreast: toward a full-field digital mammographic database. Academic radiology 19, 236–248.
Moura, D.C., López, M.A.G., Cunha, P., de Posada, N.G., Pollan, R.R., Ramos I., Loureiro, J.P., Moreira, I.C., de Araújo, B.M.F., Fernandes, T.C., 2013. Benchmarking datasets for breast cancer computer-aided diagnosis (CADx), in: Iberoamerican Congress on Pattern Recognition, Springer. pp. 326–333. Nemenyi, P.B., 1963. Distribution-free multiple comparisons. Princeton University.
Orel, S.G., Kay, N., Reynolds, C., Sullivan, D.C., 1999. BI-RADS categoriza- tion as a predictor of malignancy. Radiology 211, 845–850.
Bien, N., Rajpurkar, P., Ball, R. L., Irvin, J., Park, A., Jones, E & Lungren, M. P. (2018). Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS medicine, 15(11), e1002699.

Post Views: 1,144