People detection is an important task in video surveillance. Due to the people’s similar characteristics and occlusion, crowded people detection for occluded classroom surveillance scenes is challenging. In this paper, a new detection framework based on the relation model method is proposed to detect crowded people in occluded classroom surveillance scenes. Our method is mainly to predict a box set of related objects and then use the positive boxes to refine the noisy boxes. Specifically, a new box set selector is designed to select positive boxes prone to generating accurate predictions, and then the rest occluded boxes are refined through the relation model module. To demonstrate the effectiveness of our proposed method, a new classroom video surveillance dataset ICDU is made, and we conduct extensive experiments on this classroom video surveillance dataset and the public dataset CrowdHuman. Experiment results show that our proposed method performs excellently on our ICDU dataset and CrowdHuman dataset.
Person re-identification technology is being utilized increasingly frequently in autonomous processing and analysis of surveillance video jobs as a result of recent advancements in deep learning, particularly with safety precautions and smart transportation. As a result of the issues with inadequate illumination and reflection in indoor settings, At the moment, much of the related research on human re-identification concentrates on outside situations, with little attention paid to indoor scenarios. These make the process of person re-identification in complicated indoor scenarios very difficult. The indoor person re-identification algorithm is investigated in this research in order to increase the precision of person recognition in indoor settings. The IBN layer is an addition to the Resnet50 backbone network that uses a combination of instance normalization (IN) and batch normalization (BN) to eliminate individual appearance difference while retaining the feature difference of different individuals to address the issues with the obvious difference of light and shade in person images taken by indoor monitoring. To enhance the expressiveness capability of individual features, the attention module based on feature channel is added to the residual network. In specifically, the learning approach automatically determines the value of each channel in a person's attributes in order to amplify the important qualities and suppress the unnecessary ones. On the other hand, to address the issue of it being challenging to distinguish between similar people caused by more interference factors such as occlusion and reflection in indoor scene, we introduce triple loss in the model training process, which can make the model better learn the details of persons. The three primary validation data sets utilized in this study are Market1501, OUC365, and DukeMTMC-reID. The indoor style and high definition in the OUC365 data set are more obvious, the noise is more obvious in the Market1501 data set, and there is a significant difference in the number of photos among various people in the DukeMTMC-reID data set. The proposed method is tested on several data sets in this paper, and successful results are obtained.
Panoptic segmentation is an important method for UAV platforms to implement road condition monitoring and urban planning. In recent years, the panoptic segmentation technology provides more comprehensive information than the current semantic segmentation technology. In this paper, the framework of the panoptic segmentation algorithm is designed for the UAV application scenario. Due to the large target scene and small target of UAV, resulting in the lack of foreground targets in the segmentation results and the poor quality of the segmentation mask. To solve these problems, this paper introduces deformable convolution in the feature extraction network to improve the ability of network feature extraction. In addition, the MaskIoU module is introduced in the instance segmentation branch to improve the overall quality of the foreground target mask. In this paper, a series of data are collected by UAV and organized into UAV_OUC panoptic segmentation dataset. We tested on the UAV_OUC panoptic segmentation dataset. The experimental results on UAV_OUC panoptic benchmark validate the effectiveness of our proposed method.
Person re-identification (ReID) is an important task in video surveillance and can be applied in various practical applications. The traditional methods and deep learning model cannot satisfy the real-world challenges of environmental complexity and scene dynamics, especially under fixed scene. What’s more, most of the existing datasets are outdoor and has a single style, which is not good for indoor person re-identification. Focusing on these problems, the paper improves a Stride Convolutional Neural Network (S-CNN) to process indoor images based on multi-features fusion. The deep model is established in which the identity information, stride information and other information are learned to handle more challenging indoor images. Then a metric learning method (Joint Bayesian) is employed based on the deep model. Finally, the entire classifier is retrained with supervised learning. The experiment is tested on the OUC365 dataset created by us which is captured for 365 days including all seasons style. Compared with other state-of-the-art methods, the performance of the proposed method yields best results
Multiple people tracking is a significant sub-problem of object tracking with high demand during recent years. In the large view scene, the main difficulties are that the objects are small and they may be occluded or have sudden appearance changes. So most existing methods have high ID switches (a evaluation metric for multiple people tracking) in large view scene. We propose a multiple people tracking method that focus on solving high ID switches in large view scene. Our method uses intersection over union (IOU) information that is not sensitive to appearance changes and Euclidean distance-based appearance similarity that is helpful in solving the problem of occlusions to associate data. In order to make our Euclidean distance-based appearance similarity metric work better, we employ a soft-margin loss function to train a convolutional neural network (CNN), it can make the features extracted by the CNN more suitable for our similarity metric, so our method can effectively solve high ID switches problem. IOU-based data association has low computational complexity and the CNN is a lightweight network, it makes our method have real-time speed. On the other hand, we propose a multiple people tracking dataset of large view scene for research. We design our dataset according to the standards of MOT Challenge benchmark and we select yolov3 detector that has relatively good performance for small objects as a public detector. Finally, our method is compared with several multiple people tracking methods on our dataset. The experimental results show that our method has a better performance in large view scene.
KEYWORDS: Sensors, Data modeling, Performance modeling, Sodium, Lutetium, Information science, Video surveillance, Target detection, Cameras, Detector development
Pedestrian detection is a canonical sub-problem of object detection with high demand during recent years. Although recent deep learning object detectors such as Fast/Faster R-CNN have shown excellent performance for general object detection, they have limited success for small size pedestrian detection in large-view scene. We study that the insufficient resolution of feature maps lead to the unsatisfactory accuracy when handling small instances. In this paper, we investigate issues involving Fast R-CNN for pedestrian detection. Driven by the observations, we propose a very simple but effective baseline for pedestrian detection based on Fast R-CNN, employing the DPM detector to generate proposals for accuracy, and training a fast R-CNN style network to jointly optimize small size pedestrian detection with skip connection concatenating feature from different layers to solving coarseness of feature maps. And the accuracy is improved in our research for small size pedestrian detection in the real large scene.
With the development of earth observation programs, many multitemporal synthetic aperture radar (SAR) images over the same geographical area are available. It is demanding to develop automatic change detection techniques to take advantage of these images. Most existing techniques directly analyze the difference image (DI), and therefore, they are easily affected by the speckle noise. We proposed an SAR image change detection method based on frequency-domain analysis and random multigraphs. The proposed method follows a coarse-to-fine procedure: in the coarse changed regions localization stage, frequency-domain analysis is utilized to select distinctive and salient regions from the DI. Therefore, nonsalient regions are neglected, and noisy unchanged regions incurred by the speckle noise are suppressed. In the fine changed regions classification stage, random multigraphs are employed as the classification model. By selecting a subset of neighborhood features to create graphs, the proposed method can efficiently exploit the nonlinear relations between multitemporal SAR images. The experimental results on two real SAR datasets and one simulated dataset have demonstrated the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.