Synthetic data are frequently used to supplement a small set of real images and create a dataset with diverse features, but this may not improve the equivariance of a computer vision model. Our work answers the following questions: First, what metrics are useful for measuring a domain gap between real and synthetic data distributions? Second, is there an effective method for bridging an observed domain gap? We explore these questions by presenting a pathological case where the inclusion of synthetic data did not improve model performance, then presenting measurements of the difference between the real and synthetic distributions in the image space, latent space, and model prediction space. We find that augmenting the dataset with pixel-level augmentation effectively reduced the observed domain gap, and improves the model F1 score to 0.95 compared to 0.43 for un-augmented data. We also observe that an increase in the average cross entropy of the latent space feature vectors is positively correlated with increased model equivariance and the closing of the domain gap. The results are explained using a framework of model regularization effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.