1.Introduction1.1.Deep Learning and the Benefits of Synthetic DataThe use of deep learning has increased extensively in the last decade, thanks in part to advances in computing technology (e.g., data storage, graphics processing units) and the digitization of data. In medical imaging, deep learning algorithms have shown promising potential for clinical use due to their capability of extracting and learning meaningful patterns from imaging data and their high performance on clinically-relevant tasks. These include image-based disease diagnosis1,2 and detection,3 as well as medical image reconstruction,4,5 segmentation,6 and image-based treatment planning.7–9 However, deep learning models need vast amounts of well-annotated data to reliably learn to perform clinical tasks, whereas, at the same time, the availability of public medical imaging datasets remains limited due to legal, ethical, and technical patient data sharing constraints.9,10 In the common scenario of limited imaging data, synthetic images, such as the ones illustrated in Fig. 1, are a useful tool to improve the learning of the artificial intelligence (AI) algorithm, e.g., by enlarging its training dataset.7,11,12 Furthermore, synthetic data can be used to minimize problems associated with domain shift, data scarcity, class imbalance, and data privacy.7 For instance, a dataset can be balanced by populating the less frequent classes with synthetic data during training (class imbalance). Further, as domain-adaptation technique, a dataset can be translated from one domain to another, e.g., from MRI to CT13 (domain shift). Regarding data privacy, synthetic data can be shared instead of real patient data to improve privacy preservation.7,14,15 1.2.The Need of Reusable Synthetic Data GeneratorsCommonly, generative models are used to produce synthetic imaging data, with generative adversarial networks (GANs)16 being popular models of choice. However, the adversarial training scheme required by GANs and related networks is known to pose challenges in regard to (i) achieving training stability, (ii) avoiding mode collapse, and (iii) reaching convergence.17–19 Hence, the training process of GANs and generative models at large is nontrivial and requires a considerable time investment for each training iteration as well as specific hardware and a fair amount of knowledge and skills in the area of AI and generative modeling. Given these constraints, researchers and engineers often refrain from generating and integrating synthetic data into their AI training pipelines and experiments. This issue is further exacerbated by the prevailing need of training a new generative model for each new data distribution, which, in practice, often means that a new generative model has to be trained for each new application, use-case, and dataset. 1.3.Community-Driven Model Sharing and ReuseWe argue that a feasible solution to this problem is the community-wide sharing and reuse of pretrained generative models. Once successfully trained, such a model can be of value to multiple researchers and engineers with similar needs. For example, researchers can reuse the same model if they work on the same problem, conduct similar experiments, or evaluate their methods on the same dataset. We note that such reusing ideally is subject to previous inspection of generative model limitations with the model’s output quality having qualified as suitable for the task at hand. The quality of a model’s output data and annotations can commonly be measured via (a) expert assessment, (b) computation of image quality metrics, or (c) downstream task evaluation. In sum, the problem of synthetic data generation calls for a community-driven solution, where a generative model trained by one member of the community can be reused by other members of the community. Motivated by the absence of such a community-driven solution for synthetic medical data generation, we designed and developed medigan to bridge the gap between the need for synthetic data and complex generative model creation and training processes. 2.Background and Related Work2.1.Generative ModelsWhile discriminative models are able to distinguish between data instances of different kinds (label samples), generative models are able to generate new data instances (draw samples). In contrast to modeling decision boundaries in a data space, generative models model how data is distributed within that space. Deep generative models20 are composed of multihidden layer neural networks to explicitly or implicitly estimate a probability density function (PDF) from a set of real data samples. After approximating the PDF from observed data points (i.e., learning the real data distribution), these models can then sample unobserved new data points from that distribution. In computer vision and medical imaging, synthetic images are generated by sampling such unobserved points from high-dimensional imaging data distributions. Popular deep generative models to create synthetic images in these fields include variational autoencoders,21 normalizing flows,22–24 diffusion models,25–27 and GANs.16 From these, the versatile GAN framework has seen the most widespread adoption in medical imaging to date.7 We, hence, center our attention on GANs in the remainder of this work but emphasize that contributions of other types of generative models are equally welcome in the medigan library. 2.2.Generative Adversarial NetworksThe training of GANs comprises two neural networks, the generator network () and the discriminator network (), as illustrated by Fig. 2 for the example of mammography region-of-interest patch generation. and compete against each other in a two-player zero-sum game defined by the value function shown in Eq. (1). Subsequent studies extended the adversarial learning scheme by proposing innovations of the loss function, and network architectures, and GAN applications by introducing conditions into the image generation process 2.2.1.GAN loss functionsGoodfellow et al.16 define the discriminator as a binary classifier classifying whether a sample is either real or generated. The discriminator is, hence, trained via binary-cross entropy with the objective of minimizing the adversarial loss function shown in Eq. (2), which the generator, on the other hand, tries to maximize. In Wasserstein GAN (WGAN),28 the adversarial loss function is replaced with a loss function based on the Wasserstein-1 distance between real and fake sample distributions estimated by D (alias “critic”). Gulrajani et al.29 resolve the need to enforce a 1-Lipschitz constraint in WGAN via gradient penalty (WGAN-GP) instead of WGAN weight clipping. Equation (3) depicts the WGAN-GP discriminator loss with penalty coefficient and distribution based on sampled pairs from (a) the real data distribution and (b) the generated data distribution In addition to changes to the adversarial loss, further studies integrate additional loss terms into the GAN framework. For instance, FastGAN30 uses an additional reconstruction loss in the discriminator, which, for improved regularisation, is trained as self-supervised feature-encoder.2.2.2.GAN network architectures and conditionsA plethora of different GAN network architectures has been proposed7,31 starting with a deep convolutional GAN (DCGAN)32 neural network architecture of both D and G. Later approaches, e.g., include a ResNet-based architecture as backbone29 and progressively-grow the generator and discriminator networks during training to enable high-resolution image synthesis (PGGAN).33 Another line of research has been focusing on conditioning the output of GANs based on discrete or continuous labels. For example, in cGAN this is achieved by feeding a label to both D and G,34 whereas in the auxiliary classifier GAN (AC-GAN), the discriminator additionally predicts the label that is provided to the generator.35 Other models condition the generation process on input images36–40 unlocking image-to-image translation and domain-adaptation GAN applications. A key difference in image-to-image translation methodology is the presence (paired translation) or absence (unpaired translation) of corresponding image pairs in the target and source domain. Using an L1 reconstruction loss between target and source domain alongside the adversarial loss from Eq. (2), pix2pix36 defines a common baseline model for paired image-to-image translation. For unpaired translation, cycleGAN37 is a popular approach, which also consists of an L1 reconstruction (cycle-consistency) loss between a source (target) image and a source (target) image translated to target (source) and back to source (target) via two consecutive generators. A further methodological innovation includes SinGAN,41 which, based on only a single training image, learns to generate multiple synthetic images. This is accomplished via a multi-scale coarse-to-fine pipeline of generators, where a sample is passed sequentially through all generators, each of which also receives a random noise vector as input. 2.3.Generative Model EvaluationOne approach of evaluating generative models is by human expert assessment of their generated synthetic data. In medical imaging, such observer studies often enlist board-certified clinical experts such as radiologists or pathologists to examine the quality and/or realism of the synthetic medical images.42,43 However, this approach is manual, laborious and costly, and, hence, research attention has been devoted to automating generative model evaluation,44,45 including: FID is based on a pretrained Inception47 model (e.g., v1,48 v347) to extract features from synthetic and real datasets, which are then fitted to multivariate Gaussians (e.g., real) and (e.g., synthetic) with means and and covariance matrices and . Next, and are compared via the Wasserstein-2 (Fréchet) distance (FD), as depicted as
For the analysis of generative models in the present study, we discard (ii) due to its limitation of requiring specific reference images. We further deprioritize the IS from (i) due to its limited applicability to medical imagery stemming from it missing a comparison between real and synthetic data distributions combined with it having a strong bias on natural images via its ImageNet52-pretrained Inception classifier as backbone feature extractor. Therefore, we focus on FID from (i) and downstream task performance (iii) as potential evaluation measures for medical image synthesis models in the remainder of this work. 2.4.Image Synthesis Tools and LibrariesRelated libraries, such as pygan,53 torchGAN,54 vegans,55 imaginaire,56 TF-GAN,57 PyTorch-GAN,58 keras-GAN,59 mimicry,60 and studioGAN,31 have focused on facilitating the implementation, training, and comparative evaluation of GANs in computer vision (CV). Despite a strong focus on language models, the HuggingFace transformers library and model hub61 also contain a few pretrained computer vision GAN models. The GAN Lab62 provides an interactive visual experimentation tool to examine the training process and its data flows in GANs. Specific to AI in medical imaging, Diaz et al.63 provided a comprehensive survey of tools, libraries and platforms for privacy preservation, data curation, medical image storage, annotation, and repositories. Compared to CV, fewer GAN and AI libraries and tools exist in medical imaging. Furthermore, CV libraries are not always suited to address the unique challenges of medical imaging data.63–65 For instance, pretrained generative models from computer vision cannot be readily adapted to produce medical imaging-specific outputs. The TorchIO library64 addresses the gap between CV and medical image data processing requirements providing functions for efficient loading, augmentation, preprocessing, and patch-based sampling of medical imagery. The medical open network for AI (MONAI)66 is a PyTorch-based67 framework that facilitates the development of diagnostic AI models with tutorials for classification, segmentation, and AI model deployment. Further efforts in this realm include NiftyNet,68 the deep learning tool kit (DLTK),69 MedicalZooPytorch,70 and nnDetection.71 The recent RadImageNet initiative72 shares baseline image classification models pretrained on a dataset designed as the radiology medical imaging equivalent to ImageNet.52 To the best of our knowledge, no open-access software, tool, or library exists that targets reuse and sharing of pretrained generative models in medical imaging. To this end, we expect the contribution of our medigan library to be instrumental in enabling dissemination of generative models and increased adoption of synthetic data into AI training pipelines. As an open-access plug-and-play solution for generation of multipurpose synthetic data, medigan aims to benefit patients and clinicians by enhancing the performance and robustness of AI-based clinical decision support systems. 3.Method: The medigan LibraryWe contribute medigan as an open-source open-access MIT-licensed Python3 library distributed via the Python package index (Pypi) for synthetic medical dataset generation, e.g., via pretrained generative models. The metadata of medigan is summarized in Table 1. medigan accelerates research in medical imaging by flexibly providing (a) synthetic data augmentation and (b) preprocessing functionality, both readily integrable in machine learning training pipelines. It also allows contributors to add their generative models in a thought-through process and provides simplistic functions for end-users to search for, rank, and visualize models. The overview of medigan in Fig. 3 depicts the core functions demonstrating how end-users can (a) contribute a generative model, (b) find a suitable generative model inside the library, and (c) generate synthetic data with that model. Table 1Overview of medigan library information.
3.1.User Requirements and Design DecisionsEnd-user requirement gathering is recommended for the development of trustworthy AI solutions in medical imaging.75 Therefore, we organized requirement gathering sessions with potential end-users, model contributors, and stakeholders from the EuCanImage Consortium, a large European H2020 project76 building a cancer imaging platform for enhanced AI in oncology. Upon exploring the needs and preferences of medical imaging researchers and AI developers, respective requirements for the design of medigan were formulated to ensure usability and usefulness. For instance, the users articulated a clear preference for a user interface in the format of an importable package as opposed to a graphical user interface (GUI), web application, database system, or API. Table 2 summarizes key requirements and the corresponding design decisions. Table 2Overview of the key requirements gathered together with potential end-user alongside the respective design decisions taken toward fulfilling these requirements with medigan.
3.2.Software Design and Architecturemedigan is built with a focus on simplicity and usability. The integration of pretrained models is designed as internal Python package import and offers simultaneously (a) high flexibility to and (b) low code dependency on these generative models. The latter allows the reuse of the same orchestration functions in medigan for all model packages. Using object-oriented programming, the same model_executor class is used to implement, instantiate, and run all different types of generative model packages. To keep the library maintainable and lightweight, and to avoid limiting interdependencies between library code and generative model code, medigan’s models are hosted outside the library (on Zenodo) as independent Python modules. To avoid long initialization times upon library import, lazy loading is applied. A model is only loaded and its model_executor instance is only initialized if a user specifically requests synthetic data generation for that model. To achieve high cohesion,79 i.e., keeping the library and its functions specific, manageable, and understandable, the library is structured into several modular components. These include the loosely-coupled model_executor, model_selector, and model_contributor modules. The generators module is inspired by the facade design pattern80 and acts as a single point of access to all of medigan’s functionalities. As single interface layer between users and library, it reduces interaction complexity and provides users with a clear set of readily extendable library functions. Also, the generators module increases internal code reusability and allows for combination of functions from other modules. For instance, a single function call can run the generation of samples by the model with the highest FID score of all models found in a keyword search. 3.3.Model MetadataThe FID score and all other model information such as dependencies, modality, type, zenodo link, associated publications, and generate function parameters are stored in a single comprehensive model metadata json file. Alongside its searchability, readability, and flexibility, the choice of json as file format is motivated by its extendability to a nonrelational database. As a single source of model information, the global.json file consists of an array of model IDs, where under each model id the respective model metadata is stored. Toward ensuring model traceability as recommended by the FUTURE-AI consensus guidelines,75 each model (on Zenodo) and its global.json metadata (on GitHub) are version-controlled with the latter being structured into the following objects.
This global.json metadata file is retrieved, provided, and handled by the config_manager module once a user imports the generators module. This facilitates rapid access to a model’s metadata given its model_id and allows one to add new models or model versions to medigan via pull request without requiring a new release of the library. 3.4.Model Search and RankingThe number of models in medigan is expected to grow over time. Potentially this will lead to the foreseeable issue where users of medigan have a large number of models to choose from. Users likely will be uncertain which model best fits their needs depending on their data, modality, use-case, and research problem at hand and would have to go through each model’s metadata to find the most suitable model in medigan. Hence, to facilitate model selection, the model_selector module implements model search and ranking functionalities. This search workflow is shown in Fig. 4 and triggered by running Code Snippet 1. The model_selector module contains a search method that takes search operator (i.e OR, AND, or XOR) and a keyword search values list as parameters and recursively searches through the models’ metadata. The latter is provided by the config_manager module. The model_selector populates a modelMatchCandidates object with matchedEntry instances each of which represents a potential model match to the search query. The modelMatchCandidates class evaluates which of it is associated model matches should be flagged as true match given the search values and search operator. The method rank_models_by_performance compares either all or specified models in medigan by a performance indicator such as FID. This indicator commonly is a metric that correlates with diversity, fidelity, or condition adherence to estimate the quality of generative models and/or the data they generate.7 The model_selector looks up the value for the specified performance indicator in the model metadata and returns a descendingly or ascendingly ranked list of models to the user. Code Snippet 1:Searching for a model in medigan.
3.5.Synthetic Data GenerationSynthetic data generation is medigan’s core functionality toward overcoming scarcity of (a) training data and (b) reusable generative model in medical imaging. Posing a low entry barrier for nonexpert users, medigan’s generate method is both simple and scalable. While a user can run it with only one line of code, it flexibly supports any type of generative model and synthetic data generation process, as illustrated in Table 3 and Fig. 1. Table 3Models currently available in medigan. Also, computed FID scores for each model in medigan are shown. The number of real samples used for FID calculation is indicated by #imgs. The lower bound FIDrr is computed between a pair of randomly sampled sets of real data (real-real), whereas the model FIDrs is computed between two randomly sampled sets of real and synthetic data (real-syn). The results for model 7 (Flair, T1, T1c, T2) and 21 (T1, T2) are averaged across modalities.
3.5.1.Generate workflowAn example of the usage of the generate method is shown in Code Snippet 2, which triggers the model execution workflow illustrated in Fig. 5. Further parameters of the generate method allow users to specify the number of samples to be generated (num_samples), if samples are returned as a list or stored on a disk (save_images), where they are stored (output_path), and whether model dependencies are automatically installed (install_dependencies). Optional model-specific inputs can be provided via the **kwargs parameter. These include for example, (i) a nondefault path to the model weights, (ii) a path to an input image folder for image-to-image translation models, (iii) a conditional input for class-conditional generative models, or (iv) the input_latent_vector as commonly used as model input in GANs. Running the generate method triggers the generators module to initialize a model_executor instance for the user-specified generative model. The model is identified via its model_id as unique key in the global.json model metadata database, parsed and managed by the config_manager module. Using the latter, the model_executor checks if the required Python package dependencies are installed, retrieves the Zenodo URL and downloads, unzips, and imports the model package. It further retrieves the name of the internal data generation function inside the model’s __init_ _.py script. As final step before calling this function, its parameters and their default values are retrieved from the metadata and combined with user-provided arguments. These user-provided arguments customize the generation process, which enables handling of multiple image generation scenarios. For instance, the aforementioned provision of the input image folder allows users to point to their own images to transform them using medigan models that are, e.g., pretrained for cross-modality translation. In the case of large dataset generation, the number of samples indicated by num_samples are chunked into smaller-sized batches and iteratively generated to avoid overloading the random-access memory available on the user’s machine. Code Snippet 2:Executing a medigan model for synthetic data generation.
3.5.2.Generate workflow extensionsApart from storing or returning samples, a callable of the model’s internal generate function can be returned to the user by setting is_gen_function_returned. This function with prepared but adjustable default arguments enables integration of the generate method into other workflows within medigan (e.g., model visualization) or outside of medigan (e.g., a user’s AI model training). As a further alternative, a torch67 dataset or dataloader can be returned for any model in medigan running get_as_torch_dataset or get_as_torch_dataloader, respectively. This further increases the versatility with which users can introduce medigan’s data synthesis capabilities into their AI model training and data preprocessing pipelines. Instead of a user manually selecting a model via model_id, a model can also be automatically selected based on the recommendation from the model search and/or ranking methods. For instance, as triggered by Code Snippet 3, the models found in a search for mammography are ranked in ascending order based on FID, with the highest ranking model being selected and executed to generate the synthetic dataset. Code Snippet 3:Sequential searching, ranking, and data generation with highest ranked model.
3.6.Model VisualizationTo allow users to explore the generative models in medigan, a novel model visualization module has been integrated into the library. It allows users to examine how changing inputs like the latent variable and/or the class conditional label (e.g., malignant/benign) can affect the generation process. Also, the correlation between multiple model outputs, such as the image and corresponding segmentation mask, can be observed and explored. Figure 6 illustrates an example showing an image-mask sample pair from medigan’s polyp generating FastGAN model.51 This depiction of the graphical user interface (GUI) of the model visualization tool can be recreated by running Code Snippet 4. Internally, the model_visualizer module retrieves a model’s internal generate method as callable from the model_executor and adjusts the input parameters based on user interaction input from the GUI. This interaction further provides insight into a model’s performance and capabilities. On one hand, it allows one to assess the fidelity of the generated samples. On the other hand, it also shows the model’s captured sample diversity, i.e., as observed output variation over all possible input latent vectors. We leave the automation of manual visual analysis of this output variation to future work. For instance, such future work can use the model_visualizer to measure the variance of a reconstruction/perceptual error computed between pairs of images sampled from fixed-distance pairs of latent space vectors . The slider controls on the left of the interface allow one to change the latent variable, which for this specific model affects, for instance, polyp size, position, and background. As the size of the latent vector commonly is relatively large, each (e.g., 10) variables are grouped into one indexed slider resulting in adjustable latent input variables. The seed button on the right allows one to initialize a new set of latent variables, which results in a new generated image. The reset buttons allows one to revert user’s modifications to previous random values. Code Snippet 4:Visualization of a model in medigan.
3.7.Model ContributionA core idea of medigan is to provide a platform where researchers can share and access trained models via a standardized interface. We provide in-depth instructions on how to contribute a model to medigan complemented by implementations automating parts of the model contribution process for users. In general, a pretrained model in medigan consists of a Python __init __.py and, in case the generation process is based on a machine learning model, a respective checkpoint or weights file. The former needs to contain a synthetic data storage method and a data generation method with a set of standardized parameters described in Sec. 3.5.1. Ideally, a model package further contains a license file, a metadata.json and/or a requirements.txt file, and a test.sh script to quickly verify the model’s functionalities. To facilitate creation of these files, medigan’s GitHub repository provides model contributors with reusable templates for each of these files. Keeping the effort of pretrained model inclusion to a minimum, the generators module contains a contribute function that initializes a ModelContributor class instance dedicated to automating the remainder of the model contribution process. This includes automated (i) validation of the user-provided model_id; (ii) validation of the path to the model’s __init__.py; (iii) test of importlib import of the model as package; (iv) creation of the model’s metadata dictionary; (v) adding the model metadata to medigan’s global.json metadata; (vi) end-to-end test of model with sample generation via generators.test_model(); (vii) upload of zipped model package to Zenodo via API; and (viii) creation of a GitHub issue, which contains the Zenodo link and model metadata, in the medigan repository. Being assigned to this GitHub issue, the medigan development team is notified about the new model, which can then be added via pull request. Code Snippet 5 shows how a user can run the contribute method illustrated in Fig. 7. Code Snippet 5: Contribution of a model to medigan.
3.8.Model Testing PipelineEach new model contribution is being systematically tested before becoming part of medigan. For instance, on each submitted pull request to medigan’s GitHub repository, a CI pipeline automatically builds, formats, lints, and tests medigan’s codebase. This includes the automatic verification of each model’s package, dependencies, compatibility with the interface, and correct functioning of its generation workflow. This allows one to ensure that all models and their metadata in the global.json file are available and working in a reproducible and standardized manner. 4.Applications4.1.Community-Wide Data Access: Sharing the Essence of Restricted Datasetsmedigan facilitates sharing and reusing trained generative models with the medical research community. On one hand, this reduces the need for researchers to retrain their own similar generative models, which can reduce the extensive carbon footprint94 of deep learning in medical imaging. On the other hand, this provides a platform for researchers and data owners to share their dataset distribution without sharing the real data points of the dataset. Put differently, sharing generative models trained on (and instead of) patient datasets not only is beneficial as data curation step,14 but also minimizes the need to share images and personal data directly attributable to a patient. In particular, the latter can be quantifiably achieved when the generative model is trained using a differential privacy guarantee7,95 before being added to medigan. By reducing the barriers posed by data sharing restrictions and necessary patient privacy protection regulation, medigan unlocks a new paradigm of medical data sharing via generative models. This places medigan at the center toward solving the well-known issue of data scarcity7,9 in medical imaging. Apart from that, medigan’s generative model contributors benefit from an increased exposure, dissemination, and impact of their work, as their generative models become readily usable by other researchers. As Table 3 illustrates, to date, medigan consists of 21 pretrained deep generative models contributed to the community. Among others, these include two conditional DCGAN models, six domain translation CycleGAN models and one mask-to-image pix2pix model. The training data comes from 10 different medical imaging datasets. Various of the models were trained on breast cancer datasets including INbreast,81 OPTIMAM,82 BCDR,83 CBIS-DDSM,86 and CSAW.88 Models allow one to generate samples of different pixel resolutions ranging from regions-of-interest patches of size and to full images of and . 4.2.Investigating Synthetic Data Evaluation MethodsA further application of medigan is testing the properties of medical synthetic data. For instance, evaluation metrics for generative models can be readily tested in medigan’s multiorgan, multimodality, and multimodel synthetic data setting. Compared to generative modeling, synthetic data evaluation is a less explored research area.7 In particular, in medical imaging the existing evaluation frameworks, such as the FID46 or the IS,17 are often limited in their applicability, as mentioned in Sec. 2.3. The models in medigan allow one to compare existing and new synthetic data evaluation metrics and their validation in the field of medical imaging. Multimodel synthetic data evaluation allows one to measure the correlation and statistical significance between synthetic data evaluation metrics and downstream task performance metrics. This enables the assessment of clinical usefulness of generative models on one hand and of synthetic data evaluation metrics on the other hand. In that sense, the metric itself can be evaluated including its variations when measured under different settings, datasets, or preprocessing techniques. 4.2.1.FID of medigan ModelsWe compute the FID to assess the models in medigan and report the results in Table 3. We further note that the FID can be computed not only between a synthetic and a real dataset (rs) but also between two sets of samples of the real dataset (rr). As the describes the distance within two randomly sampled sets of the real data distribution, it can be used as an estimate of the real data variation and optimal lower bound for the as shown in Table 3. Given the above, it follows that a high likely also results in a higher , which highlights the importance of accounting for the when discussing the . To do so, we propose the reporting of a FID ratio to describe the in terms of the . Assuming bounds between 0 and 1, the simplifies the comparison of FIDs computed using different models and datasets. A close to 1 indicates that much of the can be explained by the general variation in the real dataset. The code used to compute the FID scores is available at https://github.com/RichardObi/medigan/blob/main/tests/fid.py.The models in Table 3 yielding the highest ImageNet-based score are the ones with ID 10 (0.677, endoscopy, , FastGAN), ID 13 (0.650, mammography, , CycleGAN), 14 (0.564, mammography, , CycleGAN), 20 (0.543, chest x-ray, , PGGAN) and 1 (0.497, mammography, DCGAN, ). This indicates that the does not depend on the modality, nor on the pixel resolution of the synthetic images. Further, neither image-to-image translation (e.g. CycleGAN) nor noise-to-image models (e.g., PGGAN, DCGAN, FastGAN) seem to have a particular advantage for achieving higher results. The flow chart in Fig. 8 provides further insight into the comparison between the lower bound and the model . The red trend line shows a positive correlation between the and , which corroborates our previous assumption that a higher model is to be expected given a higher lower bound . Hence, for increased transparency, we motivate further studies to routinely report the lower bound and the FID ratio apart from the model . The three-channel RGB endoscopic images represented by orange dots have an comparable with their grayscale radiologic counterparts. However, both chest x-ray datasets ChestX-ray1490 and Node2189 represented by green dots show a slightly lower than other modalities. The model shows a high variation across models without readily observable dependence on modality, generative model, or image size. 4.2.2.Analysing potential sources of bias in FIDThe popular FID metric is computed based on the features of an Inception classifier (e.g., v1,48 v347) trained on ImageNet52—a database of natural images inherently different from the domain of medical images. This potentially limits the applicability of the FID to medical imaging data. Furthermore, the FID has been observed to vary based on the input image resizing methods and ImageNet backbone feature extraction model types.31 Based on this, we further hypothesize a susceptibility of the FID to variation due to (a) different backbone feature extractor weights and random seed initializations, (b) different medical and nonmedical backbone model pretraining datasets, (c) different image normalization procedures for real and synthetic dataset, (d) nuances between different frameworks and libraries used for FID calculation, and (f) the dataset sizes used to compute the FID. Such variations can obstruct a reliable comparison of synthetic images generated by different generative models. Illustrating the potential of medigan to analyze such variations, we report and experiment with the FID. In particular, we subject the FID to variations in (i) the pretraining dataset of its backbone feature extractor and by (ii) testing the effects of image normalization across a set of medigan models. We experiment with the Inception v3 model trained on the recent RadImageNet dataset72 released as radiology-specific alternative to the ImageNet database.52 The RadImageNet-pretrained Inception v3 model weights we used are available at https://github.com/BMEII-AI/RadImageNet. We further compute the and with and without normalization to analyze the respective impact on results. In Table 4, the FID results are summarized allowing for cross-analysis between variations due to image normalization and/or due to the pretraining dataset of the FID feature extraction model. We observe generally lower FID values (1.15 to 7.32) for RadImageNet compared to ImageNet as FID model pretraining datasets (52.17 to 225.85). To increase FID comparability, we compute, as before, the FID ratio . The RadImageNet-based model results in notably lower values for both normalized and non-normalized images. Notably, an exception to this are models with ID 5 (mammography, , DCGAN) and 6 (mammography, , WGAN-GP) achieving respective RadImageNet-based scores of 0.593 and 0.550. In general, the RadImageNet-based model seems more robust at detecting if two sets of data originate from the same distribution resulting in low values. Overall, for most models, the FID is explained only by a limited amount by the variation in the real dataset and for all ImageNet and RadImageNet-based FIDs. The scatter plot in Fig. 9 further compares the RadImagnet-based FID with the ImageNet-FID for the models from Table 4. Noticeably, the difference between non-normalized and normalized images is surprisingly high for several models for both ImageNet and RadImageNet FIDs (e.g., models with IDs 6 and 8) while negligible for others (e.g., models with ID 1, 10, 13-16, and 19-21). Another observation is the relatively modest correlation between RadImageNet and ImageNet FID indicated by the slope of the red regression line. Counterexamples for this correlation include model 2 (normalized), which has a low ImageNet-based FID (80.51) compared to a high RadImageNet-based FID (6.19), and model 6 (normalized), which, in contrast, has a high ImageNet-based FID (221.30) and a low RadImageNet-based FID (1.80). With a low ImageNet-based FID (63.99), but surprisingly high RadImageNet-based FID (7.32), model 10 (both normalized and non-normalized) is a further counterexample. The example of model 10 is of particular interest, as it indicates limited applicability of the Radiology-specific RadImageNet-based FID for out-of-domain data, such as three-channel endoscopic images. Table 4Normalized (left) and non-normalized (right) FID scores. This table measures the normalization impact on FID scores based on a promising set of medigan’s deep generative models. Synthetic samples were randomly-drawn for each model matching the number of available real samples. The lower bound FIDrr is computed between a pair of randomly sampled sets of real data (real-real), whereas the model FIDrs is computed between two randomly sampled sets of real and synthetic data (real-syn). The results for model 7 (Flair, T1, T1c, T2) and 21 (T1, T2) are averaged across modalities.
Given the demonstrated high impact of backbone model training set and image normalization on FID, it is to be recommended that studies specify the exact model used for FID calculation and any applied data preprocessing and normalization steps. Further, where possible, reporting the RadImageNet-based FID allows for reporting a radiology domain-specific FID. The latter is seemingly less susceptible to variation in the real datasets than the ImageNet-based FID while also being capable of capturing other, potentially complementary, patterns in the data. 4.3.Improving Clinical Medical Image AnalysisA high-impact clinical application of synthetic data is the improvement of clinical downstream task performance such as classification, detection, segmentation, or treatment response estimation. This can be achieved by using image synthesis for data augmentation, domain adaptation, and data curation (e.g., artifact removal, noise reduction, super-resolution)7,63 to enhance the performance of clinical decision support systems such as computer-aided diagnosis (CADx) and detection (CADe) software. In Table 5, the capability of improving the clinical downstream task performance is demonstrated for various medigan models and modalities. Downstream task models trained on a combination of real and synthetic imaging data achieve promising results surpassing the alternative results achieved from training only on real data. The results are taken from the respective publications11,14,50,84 and indicate that image synthesis can further improve the promising performance demonstrated by deep learning-based CADx and CADe systems, e.g., in mammography96 and brain MRI.85 For downstream task evaluation, we generally note the importance of avoiding data leakage between training, validation, and test sets by training the generative model either on only the dataset partition used to train the respective downstream task model (e.g., IDs 2, 3, 7, 14, 15) or to train the generative models on an entirely different dataset (e.g., IDs 5, 6). Table 5Examples of the impact of synthetic data generated by medigan models on downstream task performance. Based on real test data, we compare the performance metrics of a model trained only on real data with a model trained on real data augmented with synthetic data. The metrics are taken from the respective publications describing the models.
The approaches displayed in Table 6 represent the application, where synthetic data is used instead of real data to train downstream task models. Despite an observable performance decrease when training on synthetic data only, the results51,91,92 demonstrate the usefulness of synthetic data if none or only limited real training data is available or shareable. For example, if labels or annotations in a target domain are scarce but present in a source domain, a generative model can translate annotated data from the source domain to the target domain to enable supervised training of downstream task models.92,93 Table 6Examples of the impact of synthetic data generated by medigan models on downstream task performance. Based on real test data, we compare the performance metrics of a model trained only on real data with a model trained only on synthetic data. The metrics are taken from the respective publications describing the models. n.a. refers to the case where only synthetic data can be used, as no annotated real training data is available.
5.Discussion and Future WorkIn this work, we introduced medigan, an open-source Python library, which allows one to share pretrained generative models for synthetic medical image generation. The package is easily integrable into other packages and tools, including commercial ones. Synthetic data can enhance the performance, capabilities, and robustness of data-hungry deep learning models as well as to mitigate common issues such as domain shift, data scarcity, class imbalance, and data privacy restrictions. Training one’s own generative network can be complex and expensive since it requires a considerable amount of time, effort, specific dedicated hardware, carbon emissions, as well as knowledge and applied skills in generative AI. An alternative and complementary solution is the distribution of pretrained generative models to allow their reuse by AI researchers and engineers worldwide. medigan can help to reduce the time to run synthetic data experiments and can readily be added as a component, e.g., as a dataloader as discussed in Sec. 3.5.2, in AI training pipelines. As such, the generated data can be used to improve supervised learning models as described in Sec. 4.3 via training or fine-tuning but can also serve as plug-and-play data source for self/semisupervised learning, e.g., to pretrain clinical downstream task models. Furthermore, studies that use additional synthetic training data for training deep learning models often do not report all the specifics about their underlying generative model.7,75 Within medigan, each generative model is documented, openly accessible, and reusable. This increases the reproducibility of studies that use synthetic data and makes it more transparent where the data or parts thereof originated from. This can help to achieve the traceability objectives outlined in the FUTURE-AI consensus guiding principles toward AI trustworthiness in medical imaging.75 medigan’s currently 21 generative models are illustrated in Table 3 and developed and validated by AI researchers and/or specialized medical doctors. Furthermore, each model contains traceable75 and version-controlled metadata in medigan’s global.json file, as outlined in Sec. 3.3. The searchable (see Sec. 3.4) metadata allows one to choose a suitable model for a user’s task at hand and includes, among others, the dataset used during the training process, the trained date, publication, modality, input arguments, model types, and comparable evaluation metrics. To assess model suitability, users are recommended to first (i) ensure the compatibility between their planned downstream task (e.g., mammogram region-of-interest classification) and a candidate medigan model (e.g., mammogram region-of-interest generator). Second, (ii) a user’s real (test) data and the model’s synthetic data should be compatible corresponding, for instance, in domain, organ, or disease manifestation. If the awareness of the domain shifts between real and synthetic data remains limited after this qualitative analysis, (iii) a quantitative assessment (e.g., via FID) is recommended. Finally, (iv) it is to be assessed if a downstream task improvement is plausible. This depends, among others, on the tested scenario and the task at hand, but also on the amount, domain, task specificity and quality of the available real data, and the generative model’s capabilities as indicated by its reported evaluation metrics from previous studies. If a positive impact of synthetic data on downstream task performance is plausible, users are recommended to proceed toward empirical verification. The exploration and multimodel evaluation of the properties of generative models and synthetic data is a further application of medigan. medigan’s visualization tool (see Sec. 3.6) intuitively allows the user to explore and adjust the input latent vector of generative models to visually evaluate, e.g., its inherent diversity and condition adherence7 (i.e., how well does a given mask or label fit the generated image). The evaluation of synthetic data by human experts, such as radiologists, is a costly and time-consuming task, which motivates the usage of automated metric-based evaluation such as the FID. Our multimodel analysis reveals sources of bias in FID reporting. We show the susceptibility of FID to vary substantially based on changes in input image normalization or in the choice of the pretraining dataset of the FID feature extractor. This finding highlights the need to report the specific models, preprocessing, and implementations used to compute the FID alongside the FID ratio proposed in Sec. 4.2.1 to account for the variation immanent in the real dataset. With medigan model experiments demonstrably leading to insights in synthetic data evaluation, future research can use medigan as a tool to accelerate, test, analyze, and compare new synthetic data and generative model evaluation and exploration techniques. 5.1.Legal Frameworks for Sharing of Synthetic and Real Patient DataMany countries have enacted regulations that govern the use and sharing of data related to individuals. The two most recognized legal frameworks are the Health Insurance Portability and Accountability Act (HIPAA)97 from the United States (U.S.) and the General Data Protection Regulation (GDPR)98 from the European Union (E.U.). These regulations govern the use and disclosure of individuals’ protected health information (PHI) and assures individuals’ data is protected while allowing use for providing quality patient care.99–102 Conceptually, synthetic data is not real data about any particular individual and conversely to real data, synthetic data can be generated at high volumes and potentially shared without restriction. In this sense, under both GDPR and HIPAA regulation, the rules govern the use of real data for the generation and evaluation of synthetic datasets, as well as the sharing of the original dataset. However, once fully synthetic data is generated, this new dataset falls outside the scope of the current regulations based on the argument that there is no direct correlation between the original subjects and the synthetic subjects. A common interpretation is that as long as the real data remains in a secure environment during the generation of synthetic data, there is little to no risk to the original subjects.103 As a consequence, the use of synthetic data can help prevent researchers from inadvertently using and possibly exposing patients identifiable data. Synthetic data can also lessen the controls imposed by Institutional Review Boards (IRBs) and based on international regulations by ensuring data is never mapped to real individuals.104 There are multiple methods of generating synthetic data, some of which include building models from real data, which can create a set statistically similar to real data. How similar the synthetic data is to real-world data often defines its “utility,” which will vary depending on the synthesis methods used and the needs of the study at hand. If the utility of the synthetic data is high enough then evaluation results are expected to be similar to those that use real data.103 Being built based on real data, a common concern is patient reidentification and leaking of patient-specific features in generative models.7,15 Despite the arguably permissive aforementioned regulations, deidentification63 of the training data prior to generative model training is to be recommended. This can minimize the possibility of generative models leaking sensitive patient data during inference and after sharing. A further recommended and mathematically-proven tool for privacy preservation is differential privacy (DP).95 DP can be included in the training of deep generative model, among other setups, by adding DP noise to the gradients. 5.2.Expansion of Available ModelsIn the future, further generative models across medical imaging disciplines, modalities, and organs can be integrated into medigan. The capabilities of additional models can range from privatising or translating the user’s data from one domain to another, balancing or debiasing imbalanced datasets, reconstructing, denoising or removing artifacts in medical images, or resizing images, e.g., using image super-resolution techniques. Despite medigan’s current focus on models based on GANs,16 the inclusion of different additional types of generative models is desirable and will enable insightful comparisons. In particular, this is to be further emphasized considering the recent successes of diffusion models,25–27 variational autoencoders,21 and normalizing flows22–24 in the computer vision and medical imaging105–107 domains. Before integrating and testing a new model via the pipeline described in Sec. 3.8, we assess whether a model is to become a candidate for inclusion into medigan. This threefold assessment is based on the SynTRUST framework7 and reviews whether (1) the model is well-documented (e.g., in a respective publication), (2) the model or its synthetic data is applicable to a task of clinical relevance, and (3) whether the model has been methodically validated. 5.3.Synthetic DICOM GenerationSince the dominant data format used for medical imaging is Digital Imaging and Communications in Medicine (DICOM), we plan to enhance medigan by integrating the generation of DICOM compliant files. DICOM consists of two main components, pixel data and the DICOM header. The latter can be described as an embedded dataset rich with information related to the pixel data such as the image sequence, patient, physicians, institutions, treatments, observations, and equipment.63 Future work will explore combining our GAN generated images with synthetic DICOM headers. The latter will be created from the same training images from which the medigan models are trained to create synthetic DICOM data with high statistical similarity to real-world data. In this regard, a key research focus will be the creation of an appropriate and DICOM-compliant description of the image acquisition protocol for a synthetic image. The design and development of an open-source software package for generating DICOM files based on synthesized DICOM headers associated to (synthetic) images will extend prior work108 that demonstrated the generation of synthetic headers for the purpose of evaluating deidentification methods. 6.ConclusionWe presented the open-source medigan package, which helps research in medical imaging to rapidly create synthetic datasets for a multitude of purposes such as AI model training and benchmarking, data augmentation, domain adaptation, and intercentre data sharing. medigan provides simple functions and interfaces for users, allowing one to automate generative model search, ranking, synthetic data generation, and model contribution. By reuse and dissemination of existing generative models in the medical imaging community, medigan allows researchers to speed up their experiments with synthetic data in a reproducible and transparent manner. We discuss three key applications of medigan, which include (i) sharing of restricted datasets, (ii) improving clinical downstream task performance, and (iii) analyzing the properties of generative models, synthetic data, and associated evaluation metrics. Ultimately, the aim of medigan is to contribute to benefiting patients and clinicians, e.g., by increasing the performance and robustness of AI models in clinical decision support systems. DisclosuresThe authors have no conflicts of interest to declare that are relevant to the content of this article. AcknowledgmentsWe would like to thank all model contributors, such as Alyafi et al.,11 Szafranowska et al.,14 Thambawita et al.,51 Kim et al.,84 Segal et al.,91 Joshi et al.,92 and Garrucho et al.50 This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreements No. 952103 and No. 101057699. Eloy García and Kaisar Kushibar hold the Juan de la Cierva fellowship from the Ministry of Science and Innovation of Spain with reference numbers FJC2019-040039-I and FJC2021-047659-I, respectively. Data, Materials, and Code Availabilitymedigan is a free Python (v3.6+) package published under the MIT license and distributed via the Python Package Index ( https://pypi.org/project/medigan/). The package is open-source and invites the community to contribute on GitHub ( https://github.com/RichardObi/medigan). A detailed documentation of medigan is available ( https://medigan.readthedocs.io/en/latest/) that contains installation instructions, the API reference, a general description, code examples, a testing guide, a model contribution user guide, and documentation of the generative models available in medigan. ReferencesC. Martin-Isla et al.,
“Image-based cardiac diagnosis with machine learning: a review,”
Front. Cardiovasc. Med., 7 1 https://doi.org/10.3389/fcvm.2020.00001
(2020).
Google Scholar
R. Aggarwal et al.,
“Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis,”
NPJ Digital Med., 4
(1), 1
–23 https://doi.org/10.1038/s41746-021-00438-z
(2021).
Google Scholar
X. Liu et al.,
“A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis,”
Lancet Digital Health, 1
(6), e271
–e297 https://doi.org/10.1016/S2589-7500(19)30123-2
(2019).
Google Scholar
J. Schlemper et al.,
“A deep cascade of convolutional neural networks for MR image reconstruction,”
Lect. Notes Comput. Sci., 10265 647
–658 https://doi.org/10.1007/978-3-319-59050-9_51 LNCSD9 0302-9743
(2017).
Google Scholar
E. Ahishakiye et al.,
“A survey on deep learning in medical image reconstruction,”
Intell. Med., 1
(03), 118
–127 https://doi.org/10.1016/j.imed.2021.03.003
(2021).
Google Scholar
N. Tajbakhsh et al.,
“Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation,”
Med. Image Anal., 63 101693 https://doi.org/10.1016/j.media.2020.101693
(2020).
Google Scholar
R. Osuala et al.,
“Data synthesis and adversarial networks: a review and meta-analysis in cancer imaging,”
Med. Image Anal., 84 102704 https://doi.org/10.1016/j.media.2022.102704
(2023).
Google Scholar
C. Jin et al.,
“Predicting treatment response from longitudinal images using multi-task deep learning,”
Nat. Commun., 12 1851 https://doi.org/10.1038/s41467-021-22188-y NCAOBW 2041-1723
(2021).
Google Scholar
W. L. Bi et al.,
“Artificial intelligence in cancer imaging: clinical challenges and applications,”
CA: Cancer J. Clin., 69
(2), 127
–157 https://doi.org/10.3322/caac.21552
(2019).
Google Scholar
F. Prior et al.,
“Open access image repositories: high-quality data to enable machine learning research,”
Clin. Radiol., 75
(1), 7
–12 https://doi.org/10.1016/j.crad.2019.04.002 CLRAAG 0009-9260
(2020).
Google Scholar
B. Alyafi, O. Diaz and R. Marti,
“DCGANs for realistic breast mass augmentation in x-ray mammography,”
Proc. SPIE, 11314 1131420 https://doi.org/10.1117/12.2543506 PSISDG 0277-786X
(2020).
Google Scholar
X. Yi, E. Walia and P. Babyn,
“Generative adversarial network in medical imaging: a review,”
Med. Image Anal., 58 101552 https://doi.org/10.1016/j.media.2019.101552
(2019).
Google Scholar
J. M. Wolterink et al.,
“Deep MR to CT synthesis using unpaired data,”
Lect. Notes Comput. Sci., 10557 14
–23 https://doi.org/10.1007/978-3-319-68127-6_2 LNCSD9 0302-9743
(2017).
Google Scholar
Z. Szafranowska et al.,
“Sharing generative models instead of private data: a simulation study on mammography patch classification,”
Proc. SPIE, 12286 122860Q https://doi.org/10.1117/12.2625781 PSISDG 0277-786X
(2022).
Google Scholar
T. Stadler, B. Oprisanu and C. Troncoso,
“Synthetic data–anonymisation groundhog day,”
in 31st USENIX Secur. Symp. (USENIX Security 22),
1451
–1468
(2022). Google Scholar
I. Goodfellow et al.,
““Generative adversarial nets,”
in Adv. Neural Inf. Process. Syst.,
2672
–2680
(2014). Google Scholar
T. Salimans et al.,
“Improved techniques for training GANs,”
in Adv. Neural Inf. Process. Syst. 29,
2234
–2242
(2016). Google Scholar
L. Mescheder, A. Geiger and S. Nowozin,
“Which training methods for GANs do actually converge?,”
in Int. Conf. Mach. Learn.,
3481
–3490
(2018). Google Scholar
S. Arora, A. Risteski and Y. Zhang,
“Do GANs learn the distribution? Some theory and empirics,”
in Int. Conf. Learn. Represent.,
(2018). Google Scholar
L. Ruthotto and E. Haber,
“An introduction to deep generative modeling,”
GAMM-Mitteilungen, 44
(2), e202100008 https://doi.org/10.1002/gamm.202100008
(2021).
Google Scholar
D. P. Kingma and M. Welling,
“Auto-encoding variational Bayes,”
(2013). Google Scholar
D. Rezende and S. Mohamed,
“Variational inference with normalizing flows,”
in Int. Conf. Mach. Learn.,
1530
–1538
(2015). Google Scholar
L. Dinh, D. Krueger and Y. Bengio,
“Nice: non-linear independent components estimation,”
(2014). Google Scholar
L. Dinh, J. Sohl-Dickstein and S. Bengio,
“Density estimation using real NVP,”
(2016). Google Scholar
J. Sohl-Dickstein et al.,
“Deep unsupervised learning using nonequilibrium thermodynamics,”
in Int. Conf. Mach. Learn.,
2256
–2265
(2015). Google Scholar
Y. Song and S. Ermon,
“Generative modeling by estimating gradients of the data distribution,”
in Adv. Neural Inf. Process. Syst. 32,
(2019). Google Scholar
J. Ho, A. Jain and P. Abbeel,
““Denoising diffusion probabilistic models,”
in Adv. Neural Inf. Process. Syst. 33,
6840
–6851
(2020). Google Scholar
M. Arjovsky, S. Chintala and L. Bottou,
“Wasserstein generative adversarial networks,”
in Int. Conf. Mach. Learn.,
214
–223
(2017). Google Scholar
I. Gulrajani et al.,
“Improved training of wasserstein gans,”
(2017). Google Scholar
B. Liu et al.,
“Towards faster and stabilized gan training for high-fidelity few-shot image synthesis,”
in Int. Conf. Learn. Represent.,
(2020). Google Scholar
M. Kang, J. Shin and J. Park,
“StudioGAN: a taxonomy and benchmark of GANs for image synthesis,”
(2022). Google Scholar
A. Radford, L. Metz and S. Chintala,
“Unsupervised representation learning with deep convolutional generative adversarial networks,”
(2015). Google Scholar
T. Karras et al.,
“Progressive growing of gans for improved quality, stability, and variation,”
(2017). Google Scholar
M. Mirza and S. Osindero,
“Conditional generative adversarial nets,”
(2014). Google Scholar
A. Odena, C. Olah and J. Shlens,
“Conditional image synthesis with auxiliary classifier GANs,”
in Int. Conf. Mach. Learn.,
2642
–2651
(2017). Google Scholar
P. Isola et al.,
“Image-to-image translation with conditional adversarial networks,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
1125
–1134
(2017). https://doi.org/10.1109/CVPR.2017.632 Google Scholar
J.-Y. Zhu et al.,
“Unpaired image-to-image translation using cycle-consistent adversarial networks,”
in Proc. IEEE Int. Conf. Comput. Vision,
2223
–2232
(2017). https://doi.org/10.1109/ICCV.2017.244 Google Scholar
Y. Choi et al.,
“Stargan: unified generative adversarial networks for multi-domain image-to-image translation,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
8789
–8797
(2018). https://doi.org/10.1109/CVPR.2018.00916 Google Scholar
T. Park et al.,
“Semantic image synthesis with spatially-adaptive normalization,”
in Proc. IEEE/CVF Conf. Comput. Vision and Pattern Recognit.,
2337
–2346
(2019). Google Scholar
V. Sushko et al.,
“OASIS: only adversarial supervision for semantic image synthesis,”
Int. J. Comput. Vision, 130
(12), 2903
–2923 https://doi.org/10.1007/s11263-022-01673-x
(2022).
Google Scholar
T. R. Shaham, T. Dekel and T. Michaeli,
“Singan: learning a generative model from a single natural image,”
in Proc. IEEE/CVF Int. Conf. Comput. Vision,
4570
–4580
(2019). https://doi.org/10.1109/ICCV.2019.00467 Google Scholar
D. Korkinof et al.,
“Perceived realism of high resolution generative adversarial network derived synthetic mammograms,”
Radiol.: Artif. Intell., 3 e190181 https://doi.org/10.1148/ryai.2020190181 AINTBB 0004-3702
(2020).
Google Scholar
B. Alyafi et al.,
“Quality analysis of DCGAN-generated mammography lesions,”
Proc. SPIE, 11513 115130B https://doi.org/10.1117/12.2560473 PSISDG 0277-786X
(2020).
Google Scholar
A. Borji,
“Pros and cons of GAN evaluation measures,”
Comput. Vision Image Understanding, 179 41
–65 https://doi.org/10.1016/j.cviu.2018.10.009 CVIUF4 1077-3142
(2019).
Google Scholar
A. Borji,
“Pros and cons of GAN evaluation measures: new developments,”
Comput. Vision Image Understanding, 215 103329 https://doi.org/10.1016/j.cviu.2021.103329 CVIUF4 1077-3142
(2022).
Google Scholar
M. Heusel et al.,
“GANs trained by a two time-scale update rule converge to a local nash equilibrium,”
Adv. Neural Inf. Process. Syst., 30
(2017).
Google Scholar
C. Szegedy et al.,
“Rethinking the inception architecture for computer vision,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
2818
–2826
(2016). https://doi.org/10.1109/CVPR.2016.308 Google Scholar
C. Szegedy et al.,
“Going deeper with convolutions,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
1
–9
(2015). https://doi.org/10.1109/CVPR.2015.7298594 Google Scholar
Z. Wang et al.,
“Image quality assessment: from error visibility to structural similarity,”
IEEE Trans. Image Process., 13
(4), 600
–612 https://doi.org/10.1109/TIP.2003.819861 IIPRE4 1057-7149
(2004).
Google Scholar
L. Garrucho et al.,
“High-resolution synthesis of high-density breast mammograms: application to improved fairness in deep learning based mass detection,”
Front. Oncol., 12 1044496 https://doi.org/10.3389/fonc.2022.1044496
(2022).
Google Scholar
V. Thambawita et al.,
“SinGAN-Seg: synthetic training data generation for medical image segmentation,”
PLoS One, 17
(5), e0267976 https://doi.org/10.1371/journal.pone.0267976 POLNCL 1932-6203
(2022).
Google Scholar
J. Deng et al.,
“Imagenet: a large-scale hierarchical image database,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
248
–255
(2009). https://doi.org/10.1109/CVPR.2009.5206848 Google Scholar
accel brain,
“Generative adversarial networks library: Pygan,”
(2021). https://github.com/accel-brain/accel-brain-code/tree/master/Generative-Adversarial-Networks/ Google Scholar
A. Pal and A. Das,
“TorchGAN: a flexible framework for gan training and evaluation,”
J. Open Source Software, 6
(66), 2606 https://doi.org/10.21105/joss.02606
(2021).
Google Scholar
J. Shor,
“Tensorflow-GAN (TF-GAN): tooling for gans in tensorflow,”
https://github.com/tensorflow/gan
(2022).
Google Scholar
E. Linder-Norén,
“Keras-GAN: Pytorch implementations of generative adversarial networks,”
https://github.com/eriklindernoren/PyTorch-GAN
(2021).
Google Scholar
E. Linder-Norén,
“Keras-GAN: Keras implementations of generative adversarial networks,”
(2022). https://github.com/eriklindernoren/Keras-GAN Google Scholar
K. S. Lee and C. Town,
“Mimicry: towards the reproducibility of GAN research,”
(2020). Google Scholar
T. Wolf et al.,
“Transformers: state-of-the-art natural language processing,”
in Proc. 2020 Conf. Empirical Methods in Nat. Language Process.: Syst. Demonstrations,
38
–45
(2020). Google Scholar
M. Kahng et al.,
“GAN lab: understanding complex deep generative models using interactive visual experimentation,”
IEEE Trans. Vision Comput. Graphics, 25
(1), 310
–320 https://doi.org/10.1109/TVCG.2018.2864500 1077-2626
(2018).
Google Scholar
O. Diaz et al.,
“Data preparation for artificial intelligence in medical imaging: a comprehensive guide to open-access platforms and tools,”
Phys. Med., 83 25
–37 https://doi.org/10.1016/j.ejmp.2021.02.007 PHYME2 1120-1797
(2021).
Google Scholar
F. Pérez-Garca, R. Sparks and S. Ourselin,
“TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning,”
Comput. Methods Prog. Biomed., 208 106236 https://doi.org/10.1016/j.cmpb.2021.106236 CMPBEK 0169-2607
(2021).
Google Scholar
C. M. Moore et al.,
“CleanX: a Python library for data cleaning of large sets of radiology images,”
J. Open Source Software, 7
(76), 3632 https://doi.org/10.21105/joss.03632
(2022).
Google Scholar
M. J. Cardoso et al.,
“MONAI: an open-source framework for deep learning in healthcare,”
(2022). Google Scholar
A. Paszke et al.,
“Pytorch: an imperative style, high-performance deep learning library,”
in Adv. Neural Inf. Process. Syst. 32,
8024
–8035
(2019). Google Scholar
E. Gibson et al.,
“Niftynet: a deep-learning platform for medical imaging,”
Comput. Methods Prog. Biomed., 158 113
–122 https://doi.org/10.1016/j.cmpb.2018.01.025 CMPBEK 0169-2607
(2018).
Google Scholar
N. Pawlowski et al.,
“DLTK: state of the art reference implementations for deep learning on medical images,”
(2017). Google Scholar
A. Nikolaos,
“Deep learning in medical image analysis: a comparative analysis of multi-modal brain-MRI segmentation with 3D deep neural networks,”
University of Patras,
(2019).
Google Scholar
M. Baumgartner et al.,
“nndetection: a self-configuring method for medical object detection,”
Lect. Notes Comput. Sci., 12905 530
–539 https://doi.org/10.1007/978-3-030-87240-3_51
(2021).
Google Scholar
X. Mei et al.,
“RadImageNet: an open radiologic deep learning research dataset for effective transfer learning,”
Radiol.: Artif. Intell., 4 e210315 https://doi.org/10.1148/ryai.210315 AINTBB 0004-3702
(2022).
Google Scholar
The Python Package Index, “medigan 1.0.0,”
https://pypi.org/project/medigan/
(2022).
Google Scholar
R. Osuala, G. Skorupko and N. Lazrak,
“medigan getting started,”
https://medigan.readthedocs.io/en/latest
(2022).
Google Scholar
K. Lekadir et al.,
“FUTURE-AI: guiding principles and consensus recommendations for trustworthy artificial intelligence in medical imaging,”
(2021). Google Scholar
EuCanImage Consortium, “EuCanImage towards a European cancer imaging platform for enhanced artificial intelligence in oncology,”
https://eucanimage.eu/
(2020).
Google Scholar
M. Abadi et al.,
“TensorFlow: large-scale machine learning on heterogeneous systems,”
(2015). tensorflow.org Google Scholar
C. Larman, Applying UML and Pattern: An Introduction to Object Oriented Analysis and Design and the Unified Process, Prentice Hall PTR(
(2001). Google Scholar
E. Gamma et al., Design Patterns: Elements of Reusable Object-Oriented Software, Pearson Deutschland GmbH(
(1995). Google Scholar
I. C. Moreira et al.,
“INbreast: toward a full-field digital mammographic database,”
Acad. Radiol., 19
(2), 236
–248 https://doi.org/10.1016/j.acra.2011.09.014
(2012).
Google Scholar
M. D. Halling-Brown et al.,
“Optimam mammography image database: a large-scale resource of mammography images and clinical data,”
Radiol.: Artif. Intell., 3 e200103 https://doi.org/10.1148/ryai.2020200103 AINTBB 0004-3702
(2020).
Google Scholar
M. G. Lopez et al.,
“BCDR: a breast cancer digital repository,”
in 15th Int. Conf. Exp. Mech.,
(2012). Google Scholar
S. Kim, B. Kim and H. Park,
“Synthesis of brain tumor multicontrast MR images for improved data augmentation,”
Med. Phys., 48
(5), 2185
–2198 https://doi.org/10.1002/mp.14701 MPHYA6 0094-2405
(2021).
Google Scholar
B. H. Menze et al.,
“The multimodal brain tumor image segmentation benchmark (BRATS),”
IEEE Trans. Med. Imaging, 34
(10), 1993
–2024 https://doi.org/10.1109/TMI.2014.2377694 ITMID4 0278-0062
(2014).
Google Scholar
R. S. Lee et al.,
“A curated mammography data set for use in computer-aided detection and diagnosis research,”
Sci. Data, 4
(1), 170177 https://doi.org/10.1038/sdata.2017.177
(2017).
Google Scholar
H. Borgli et al.,
“Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy,”
Sci. Data, 7
(1), 283 https://doi.org/10.1038/s41597-020-00622-y
(2020).
Google Scholar
K. Dembrower, P. Lindholm and F. Strand,
“A multi-million mammography image dataset and population-based screening cohort for the training and evaluation of deep neural networks-the cohort of screen-aged women (CSAW),”
J. Digital Imaging, 33
(2), 408
–413 https://doi.org/10.1007/s10278-019-00278-0 JDIMEW
(2020).
Google Scholar
E. Sogancioglu, K. Murphy and B. van Ginneken,
“NODE21 (v-5) [data set],”
Zenodo, https://doi.org/10.5281/zenodo.5548363
(2021).
Google Scholar
X. Wang et al.,
“Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
2097
–2106
(2017). https://doi.org/10.1109/CVPR.2017.369 Google Scholar
B. Segal et al.,
“Evaluating the clinical realism of synthetic chest x-rays generated using progressively growing GANs,”
SN Comput. Sci., 2
(4), 1
–17 https://doi.org/10.1007/s42979-021-00720-7
(2021).
Google Scholar
S. Joshi et al.,
“nn-UNet training on CycleGAN-translated images for cross-modal domain adaptation in biomedical imaging,”
Lect. Notes Comput. Sci., 12963 540
–551
(2022).
Google Scholar
R. Dorent et al.,
“CrossMoDA 2021 challenge: benchmark of cross-modality domain adaptation techniques for vestibular schwannoma and cochlea segmentation,”
Med. Image Anal., 83 102628 https://doi.org/10.1016/j.media.2022.102628
(2022).
Google Scholar
R. Selvan et al.,
“Carbon footprint of selecting and training deep learning models for medical image analysis,”
in Med. Image Comput. and Comput. Assist. Intervention–MICCAI 2022: 25th Int. Conf.,
506
–516
(2022). Google Scholar
C. Dwork et al.,
“The algorithmic foundations of differential privacy,”
Found. Trends® Theor. Comput. Sci., 9
(3–4), 211
–407 https://doi.org/10.1561/0400000042
(2014).
Google Scholar
L. Abdelrahman et al.,
“Convolutional neural networks for breast cancer detection in mammography: a survey,”
Comput. Biol. Med., 131
(Jan.), 104248 https://doi.org/10.1016/j.compbiomed.2021.104248 CBMDAW 0010-4825
(2021).
Google Scholar
Centers for Medicare & Medicaid Services,
“The Health Insurance Portability and Accountability Act of 1996 (HIPAA),”
(1996). http://www.cms.hhs.gov/hipaa/ Google Scholar
European Parliament and Council of European Union,
“Council regulation (EU) no 2016/679,”
(2018). https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679/ Google Scholar
Committee on Health Research and the Privacy of Health Information: The HIPAA Privacy Rule, Board on Health Sciences Policy, Board on Health Care Serviceset al., Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research, 12458 National Academies Press, Washington, D.C
(2009). Google Scholar
U.S. Dept. of Health and Human Services,
“Summary of the HIPAA privacy rule: HIPAA compliance assistance,”
(2003). http://purl.fdlp.gov/GPO/gpo9756 Google Scholar
S. M. Shah and R. A. Khan,
“Secondary use of electronic health record: opportunities and challenges,”
IEEE Access, 8 136947
–136965 https://doi.org/10.1109/ACCESS.2020.3011099
(2020).
Google Scholar
C. F. Mondschein, C. Monda,
“The EU’s general data protection regulation (GDPR) in a research context,”
Fundamentals of Clinical Data Science, 55
–71 Springer International Publishing, Cham
(2019). Google Scholar
K. El Emam, L. Mosquera and R. Hoptroff, Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data, 1st ed.O’Reilly Media, Inc, Sebastopol, California
(2020). Google Scholar
F. K. Dankar and M. Ibrahim,
“Fake it till you make it: guidelines for effective synthetic data generation,”
Applied Sciences, 11 2158 https://doi.org/10.3390/app11052158
(2021).
Google Scholar
W. H. L. Pinaya et al.,
“Brain imaging generation with latent diffusion models,”
in Deep Generative Models: Second MICCAI Workshop, DGM4MICCAI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings,
117
–126
(2022). Google Scholar
W. H. L. Pinaya et al.,
“Unsupervised brain imaging 3D anomaly detection and segmentation with transformers,”
Med. Image Anal., 79 102475 https://doi.org/10.1016/j.media.2022.102475
(2022).
Google Scholar
N. Pawlowski, D. Coelho de Castro and B. Glocker,
““Deep structural causal models for tractable counterfactual inference,”
in Adv. Neural Inf. Process. Syst. 33,
857
–869
(2020). Google Scholar
M. Rutherford et al.,
“A DICOM dataset for evaluation of medical image de-identification,”
Sci. Data, 8 183 https://doi.org/10.1038/s41597-021-00967-y
(2021).
Google Scholar
|
CITATIONS
Cited by 2 scholarly publications.
Data modeling
Education and training
Medical imaging
Process modeling
Statistical modeling
Mammography
Visual process modeling