Image classification is an essential component in the modern computer vision field, in which dictionary learning–based classification has garnered significant attention due to its robustness. Generally, most dictionary learning algorithms can be optimized through data augmentation and regularization techniques. In data augmentation research, focus is often placed on enhancing features of specific class samples, whereas the impact of shared information among images of different categories is overlooked. High inter-class shared information can make it challenging to differentiate among categories. To tackle this concern, the paper advocates an innovative data augmentation approach. The proposed method reduces excessive similarity within class samples by randomly replacing pixel values, thereby improving the classification performance. Building on this, we designed a joint dictionary learning algorithm that embeds label and local consistency. The basic steps of the proposed algorithm are as follows: (1) generate specific auxiliary samples as training samples, (2) initialize the dictionary and expression coefficients, (3) introduce label constraints and local constraints and update the dictionary, and (4) generate a classifier and classify the test samples. Extensive experiments have demonstrated the efficiency of the proposed approach.
We propose an innovative stereo matching algorithm using three constraints and collaborative optimization between pixels. First, to initialize a better disparity plane, the random points will be selected using matching energy stability, disparity stability, and left–right consistency as restrictions, instead of choosing the commonly used random sample consensus (RANSAC) method, which has certain randomness and affects the final optimization. Second, for the problem that the grid-based propagation method cannot effectively propagate the labels between adjacent similar pixels and its easiness to fall into local minimums, a cooperative competition mechanism between adjacent similar pixels is proposed, and the matching energy is minimized through collaborative optimization. Third, we iteratively optimize the parallax plane parameters and minimize the matching energy function, and thus improve the stereo matching accuracy. Finally, we test our method on Middlebury V3 test set, which achieves better performance against other state-of-the-art algorithms and ranks first in the matching error <1 pixel all metric, ranks second in the matching error <0.5 pixel all metric, and ranks third in the matching error <2 pixels all metric, and further the number of iterations is reduced and the processing speed was accelerated (recorded on May 3, 2021).
We present a stereo matching approach referred to as HLocalExp-CM by exploiting the hierarchical local contextual information and a confidence map based on a new grid structure. The proposed approach preserves fine depth edges and extracts accurate disparities in weak texture, textureless, and repeated texture regions. The proposed approach adopts a two-stage optimization strategy. In the framework of first stage, a multiresolution cost aggregation is minimized to reduce the search space of the disparity plane of each pixel. The second stage iteratively optimizes the confidence map and a global energy function to progressively improve the disparity accuracy for each pixel. The confidence map is estimated through classifying the pixels into distinctive and ambiguous ones by computing the decreasing rate of the multiresolution cost aggregation and then performs a spatial propagation and plane refinement for the update of the disparity of each pixel, thereby successfully eliminating the ambiguity of nondistinctive pixels. The global energy function based on a pairwise Markov random field uses cross-scale cost aggregation for taking advantage of context information of objects in different scenarios on local grid regions, which is different from the deep learning technique uses convolution layers extracting the context information. The proposed approach is evaluated on Middlebury benchmark V3, and is ranked first based on “bad 2.0 all metric,” a widely used criterion for the evaluation of stereo images, while the eighth place on “bad 2.0 nonocc metric” (recorded on July 24, 2021).
Autonomous cars establish driving strategies by employing detection of the road. Most of the previous methods detect road with image semantic segmentation, which identifies pixel-wise class labels and predicts segmentation masks. We propose U-net1 , a novel segmentation network by learning deep convolution and deconvolution features. The architecture consists of an encoder and decoder network. The encoder network is trainable with a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the convolutional layers. The novelty of U-net lies in the manner in which the decoder deconvolves its lower resolution input feature maps. Specifically, the decoder network conjoins the encoder convolution features and decoder deconvolution features using the "concat" function, which achieves a good mapping between classes and filters at the expansion side of the network. The network is trained end-to-end and yields precise pixel-wise predictions at the original input resolution.
Single Shot MultiBox Detector (SSD) is one of the fastest algorithms in the current object detection field, which uses fully convolutional neural network to detect all scaled objects in an image. Deconvolutional Single Shot Detector (DSSD) is an approach which introduces more context information by adding the deconvolution module to SSD. And the mean Average Precision (mAP) of DSSD on PASCAL VOC2007 is improved from SSD’s 77.5% to 78.6%. Although DSSD obtains higher mAP than SSD by 1.1%, the frames per second (FPS) decreases from 46 to 11.8. In this paper, we propose a single stage end-to-end image detection model called ESSD to overcome this dilemma. Our solution to this problem is to cleverly extend better context information for the shallow layers of the best single stage (e.g. SSD) detectors. Experimental results show that our model can reach 79.4% mAP, which is higher than DSSD and SSD by 0.8 and 1.9 points respectively. For 300×300 input, our testing speed is 25 FPS in single Nvidia Titan X GPU which is more than the original execution speed of DSSD.
Fully convolutional networks (FCNs) have shown outstanding performance in image semantic segmentation, which is the key work in license plate detection (LPD). An FCN architecture for LPD is presented. First, a multiscale hierarchical network structure is used to combine multiscale and multilevel features produced by FCN. Then, an enhanced loss structure that contains three loss layers is defined to emphasize the license plates in images. Finally, the FCN generates prediction maps that directly show the location of license plates. Experiments show that our approach is more accurate than many state-of-the-art methods.
The performance and robustness of fatigue detection largely decrease if the driver with glasses. To address this issue, this paper proposes a practical driver fatigue detection method based on face alignment at 3000 FPS algorithm. Firstly, the eye regions of the driver are localized by exploiting 6 landmarks surrounding each eye. Secondly, the HOG features of the extracted eye regions are calculated and put into SVM classifier to recognize the eye state. Finally, the value of PERCLOS is calculated to determine whether the driver is drowsy or not. An alarm will be generated if the eye is closed for a specified period of time. The accuracy and real-time on testing videos with different drivers demonstrate that the proposed algorithm is robust and obtain better accuracy for driver fatigue detection compared with some previous method.
A vehicle detection and tracking system is one of the indispensable methods to reduce the occurrence of traffic accidents. The nearest vehicle is the most likely to cause harm to us. So, this paper will do more research on about the nearest vehicle in the region of interest (ROI). For this system, high accuracy, real-time and intelligence are the basic requirement. In this paper, we set up a system that combines the advanced KCF tracking algorithm with the HaarAdaBoost detection algorithm. The KCF algorithm reduces computation time and increase the speed through the cyclic shift and diagonalization. This algorithm satisfies the real-time requirement. At the same time, Haar features also have the same advantage of simple operation and high speed for detection. The combination of this two algorithm contribute to an obvious improvement of the system running rate comparing with previous works. The detection result of the HaarAdaBoost classifier provides the initial value for the KCF algorithm. This fact optimizes KCF algorithm flaws that manual car marking in the initial phase, which is more scientific and more intelligent. Haar detection and KCF tracking with Histogram of Oriented Gradient (HOG) ensures the accuracy of the system. We evaluate the performance of framework on dataset that were self-collected. The experimental results demonstrate that the proposed method is robust and real-time. The algorithm can effectively adapt to illumination variation, even in the night it can meet the detection and tracking requirements, which is an improvement compared with the previous work.
Segmentation of moving objects from video sequences is the fundamental step in intelligent surveillance applications. Numerous methods have been proposed to obtain object segmentation. In this paper, we present an effective approach based on the mixture of Gaussians. The approach makes use of a feedback strategy with multiple levels: the pixel level, the region level, and the frame level. Pixel-level feedback helps to provide each pixel with an adaptive learning rate. The maintenance strategy of the background model is adjusted by region-level feedback based on tracking. Frame-level feedback is used to detect the global change in scenes. These different levels of feedback strategies ensure our approach’s effectiveness and robustness. This is demonstrated through experimental results on the Change Detection 2014 benchmark dataset.
Hand tracking is becoming more and more popular in the field of human-computer interaction (HCI). A lot of studies in this area have made good progress. However, robust hand tracking is still difficult in long-term. On-line learning technology has great potential in terms of tracking for its strong adaptive learning ability. To address the problem we combined an on-line learning technology called on-line boosting with an off-line trained detector to track the hand. The contributions of this paper are: 1) we propose a learning method with an off-line model to solve the drift of on-line learning; 2) we build a framework for hand tracking based on the learning method. The experiments show that compared with other three methods, the proposed tracker is more robust in the strain case.
A panorama parking assistant system (PPAS) for the automotive aftermarket together with a practical improved particle swarm optimization method (IPSO) are proposed in this paper. In the PPAS system, four fisheye cameras are installed in the vehicle with different views, and four channels of video frames captured by the cameras are processed as a 360-deg top-view image around the vehicle. Besides the embedded design of PPAS, the key problem for image distortion correction and mosaicking is the efficiency of parameter optimization in the process of camera calibration. In order to address this problem, an IPSO method is proposed. Compared with other parameter optimization methods, the proposed method allows a certain range of dynamic change for the intrinsic and extrinsic parameters, and can exploit only one reference image to complete all of the optimization; therefore, the efficiency of the whole camera calibration is increased. The PPAS is commercially available, and the IPSO method is a highly practical way to increase the efficiency of the installation and the calibration of PPAS in automobile 4S shops.
Automotive Active Safety(AAS) is the main branch of intelligence automobile study and pedestrian detection is the key
problem of AAS, because it is related with the casualties of most vehicle accidents. For on-board pedestrian detection
algorithms, the main problem is to balance efficiency and accuracy to make the on-board system available in real scenes,
so an on-board pedestrian detection and warning system with the algorithm considered the features of side pedestrian is
proposed.
The system includes two modules, pedestrian detecting and warning module. Haar feature and a cascade of stage
classifiers trained by Adaboost are first applied, and then HOG feature and SVM classifier are used to refine false
positives. To make these time-consuming algorithms available in real-time use, a divide-window method together with
operator context scanning(OCS) method are applied to increase efficiency. To merge the velocity information of the
automotive, the distance of the detected pedestrian is also obtained, so the system could judge if there is a potential
danger for the pedestrian in the front. With a new dataset captured in urban environment with side pedestrians on zebra,
the embedded system and its algorithm perform an on-board available result on side pedestrian detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.