Images can convey rich semantic information and arouse strong emotions in the viewer. With the growing trend of online images and videos to express opinions, evaluating emotions from visual content has attracted considerable attention. Image emotion recognition aims to classify the emotions conveyed by images automatically. The existing image sentiment classification studies using manual features or deep models mainly focus on low-level visual features or high-level semantic representation without considering all factors. In this paper, we adopt visualization to study the working principle of deep representation in emotion recognition. Research shows that the deep model mainly relies on deep semantic information while ignoring the features of shallow visual details, which are essential to evoke emotions. To form a more discriminative representation for emotion recognition, we propose a multi-level representation model with side branches that learns and integrates different depth representations of the backbone for sentiment analysis. Unlike the hierarchy CNN structure, our model provides a description from the deep semantic representation to shallow visual representation. Additionally, several feature fusion approaches are analyzed and discussed to optimize the deep model. Extensive experiments on several image emotion recognition datasets show that our model outperforms various existing methods.
3D Medical image segmentation is a basic and key task in computer-aided diagnosis, and glioma has a high degree of non-uniformity and irregular shape in multimodal MRI images. Therefore, accurate and reliable segmentation of brain tumors is still a challenging work in medical image analysis. U-Net has become the de facto standard in various medical image segmentation tasks and has achieved great success. However, due to the inherent locality of convolution operations, U-Net is generally limited in terms of explicitly modeling long-term dependencies. Although this problem is solved in Transformer, it has extreme complexity in terms of calculation and space when processing high-resolution 3D feature maps. In this paper, we propose the Trans-coder, which is embedded in the end of the U-Net encoder to improve the segmentation performance while reducing the amount of calculation. The Trans-coder takes the feature map from U-Net as the input sequence, and extracts the relative position information of the feature map, so as to get more detailed information of the image, and input it into the decoder to obtain good segmentation performance. At the same time, a variational autoencoder is used for regularization to prevent over-fitting problems. our method achieves superior performances to certain competing methods on the Multimodal Brain Tumor Segmentation Challenge (Brats) dataset.
KEYWORDS: Convolution, RGB color model, Bone, Data modeling, Video, Performance modeling, Detection and tracking algorithms, Visualization, Video surveillance, Neural networks
Skeleton-based human action recognition has achieved a great interest in recent years. The strong robustness of skeleton data to scene and camera interference makes the recognition algorithm follow with interest robust features of actions. Recent works has proved the effectiveness of skeleton modeling based on graph and learning spatio-temporal modes by Graph Convolutional Network (GCN). Although GCNs have excellent ability of neighborhood feature learning, it is not good at capturing long-distance dependence between joints. In particular, linear temporal skeleton sequences contain a great quantity of joints., which makes the process of learning advanced temporal cues unduly slow. In this paper, we propose a temporal feature enhancer using Temporal Kernel Attention (TKA). And guided by TKA, we design a performance-oriented network TKA-GCN and a lightweight network Mini-TKA-GCN for skeleton-based action recognition. Finally, on NTU-RGBD 60 and Kinetics-Skeleton 400 datasets, TKA-GCN and Mini-TKA-GCN proposed by this work, outperform most advanced works.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.