Guangyu Shi, Luming Li, Mengke Song
Journal of Electronic Imaging, Vol. 33, Issue 05, 053059, (October 2024) https://doi.org/10.1117/1.JEI.33.5.053059
TOPICS: Graphic design, Visualization, Color, Design, Image quality, Lithium, Education and training, Semantics, Image processing, Atmospheric modeling
The rapid development of computer vision and deep learning has significantly advanced image aesthetic assessment, yet traditional methods, which primarily rely on low-level visual features such as color and texture, often struggle with the complexity of graphic design images. These images are characterized by diverse design elements, including color, typography, and layout, as well as various styles such as minimalism, retro, and modernism, presenting substantial challenges to conventional assessment techniques. To overcome these limitations, we propose an innovative multimodal learning approach that integrates image content with textual descriptions to comprehensively analyze the aesthetic qualities of graphic design images. The core innovation of our method lies in the utilization of two distinct textual description methodologies: holistic descriptions, which capture the main theme of the design, and detailed descriptions, which focus on specific aspects such as composition, color, detail, and atmosphere. This dual approach allows for a more nuanced and complete assessment of aesthetic value. To effectively merge these descriptions with visual content, we introduce a feature similarity blending mechanism that aligns and integrates features from both modalities, enhancing the representation of aesthetic attributes. In addition, we employ a score bagging technique to aggregate scores from multiple fused features, ensuring robustness and reliability in the assessments. Our method is implemented within a multi-task learning framework, enabling simultaneous prediction across multiple rating dimensions. Experimental results demonstrate that, compared with the state-of-the-art TAHF method, our approach achieves notable improvements in Spearman’s rank correlation coefficient—by 1.7%, 3.4%, and 2.6% on the HDDI, BAID, and TAD66K datasets, respectively—along with consistent gains in Pearson’s linear correlation coefficient and accuracy. Moreover, our method achieves these performance improvements with fewer parameters and lower computational complexity, highlighting its efficiency and effectiveness in graphic design image aesthetic assessment.